October 16, 2019

2958 words 14 mins read

Paper Group ANR 1046

Joint Vertebrae Identification and Localization in Spinal CT Images by Combining Short- and Long-Range Contextual Information. Precise Temporal Action Localization by Evolving Temporal Proposals. Neural machine translation framework based cross-lingual document vector with distance constraint training. Highly Automated Learning for Improved Active …

Joint Vertebrae Identification and Localization in Spinal CT Images by Combining Short- and Long-Range Contextual Information


Title	Joint Vertebrae Identification and Localization in Spinal CT Images by Combining Short- and Long-Range Contextual Information
Authors	Haofu Liao, Addisu Mesfin, Jiebo Luo
Abstract	Automatic vertebrae identification and localization from arbitrary CT images is challenging. Vertebrae usually share similar morphological appearance. Because of pathology and the arbitrary field-of-view of CT scans, one can hardly rely on the existence of some anchor vertebrae or parametric methods to model the appearance and shape. To solve the problem, we argue that one should make use of the short-range contextual information, such as the presence of some nearby organs (if any), to roughly estimate the target vertebrae; due to the unique anatomic structure of the spine column, vertebrae have fixed sequential order which provides the important long-range contextual information to further calibrate the results. We propose a robust and efficient vertebrae identification and localization system that can inherently learn to incorporate both the short-range and long-range contextual information in a supervised manner. To this end, we develop a multi-task 3D fully convolutional neural network (3D FCN) to effectively extract the short-range contextual information around the target vertebrae. For the long-range contextual information, we propose a multi-task bidirectional recurrent neural network (Bi-RNN) to encode the spatial and contextual information among the vertebrae of the visible spine column. We demonstrate the effectiveness of the proposed approach on a challenging dataset and the experimental results show that our approach outperforms the state-of-the-art methods by a significant margin.
Tasks	Joint Vertebrae Identification And Localization In Spinal Ct Images
Published	2018-12-09
URL	http://arxiv.org/abs/1812.03500v1
PDF	http://arxiv.org/pdf/1812.03500v1.pdf
PWC	https://paperswithcode.com/paper/joint-vertebrae-identification-and
Repo
Framework

Precise Temporal Action Localization by Evolving Temporal Proposals


Title	Precise Temporal Action Localization by Evolving Temporal Proposals
Authors	Haonan Qiu, Yingbin Zheng, Hao Ye, Yao Lu, Feng Wang, Liang He
Abstract	Locating actions in long untrimmed videos has been a challenging problem in video content analysis. The performances of existing action localization approaches remain unsatisfactory in precisely determining the beginning and the end of an action. Imitating the human perception procedure with observations and refinements, we propose a novel three-phase action localization framework. Our framework is embedded with an Actionness Network to generate initial proposals through frame-wise similarity grouping, and then a Refinement Network to conduct boundary adjustment on these proposals. Finally, the refined proposals are sent to a Localization Network for further fine-grained location regression. The whole process can be deemed as multi-stage refinement using a novel non-local pyramid feature under various temporal granularities. We evaluate our framework on THUMOS14 benchmark and obtain a significant improvement over the state-of-the-arts approaches. Specifically, the performance gain is remarkable under precise localization with high IoU thresholds. Our proposed framework achieves mAP@IoU=0.5 of 34.2%.
Tasks	Action Localization, Temporal Action Localization
Published	2018-04-13
URL	http://arxiv.org/abs/1804.04803v1
PDF	http://arxiv.org/pdf/1804.04803v1.pdf
PWC	https://paperswithcode.com/paper/precise-temporal-action-localization-by
Repo
Framework

Neural machine translation framework based cross-lingual document vector with distance constraint training


Title	Neural machine translation framework based cross-lingual document vector with distance constraint training
Authors	Wei Li, Brian Mak
Abstract	A universal cross-lingual representation of documents is very important for many natural language processing tasks. In this paper, we present a document vectorization method which can effectively create document vectors via self-attention mechanism using a neural machine translation (NMT) framework. The model used by our method can be trained with parallel corpora that are unrelated to the task at hand. During testing, our method will take a monolingual document and convert it into a “Neural machine Translation framework based crosslingual Document Vector with distance constraint training” (cNTDV). cNTDV is a follow-up study from our previous research on the neural machine translation framework based document vector. The cNTDV can produce the document vector from a forward-pass of the encoder with fast speed. Moreover, it is trained with a distance constraint, so that the document vector obtained from different language pair is always consistent with each other. In a cross-lingual document classification task, our cNTDV embeddings surpass the published state-of-the-art performance in the English-to-German classification test, and, to our best knowledge, it also achieves the second best performance in German-to-English classification test. Comparing to our previous research, it does not need a translator in the testing process, which makes the model faster and more convenient.
Tasks	Cross-Lingual Document Classification, Document Classification, Machine Translation
Published	2018-07-29
URL	http://arxiv.org/abs/1807.11057v2
PDF	http://arxiv.org/pdf/1807.11057v2.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-framework-based
Repo
Framework

Highly Automated Learning for Improved Active Safety of Vulnerable Road Users


Title	Highly Automated Learning for Improved Active Safety of Vulnerable Road Users
Authors	Maarten Bieshaar, Günther Reitberger, Viktor Kreß, Stefan Zernetsch, Konrad Doll, Erich Fuchs, Bernhard Sick
Abstract	Highly automated driving requires precise models of traffic participants. Many state of the art models are currently based on machine learning techniques. Among others, the required amount of labeled data is one major challenge. An autonomous learning process addressing this problem is proposed. The initial models are iteratively refined in three steps: (1) detection and context identification, (2) novelty detection and active learning and (3) online model adaption.
Tasks	Active Learning
Published	2018-03-09
URL	http://arxiv.org/abs/1803.03479v1
PDF	http://arxiv.org/pdf/1803.03479v1.pdf
PWC	https://paperswithcode.com/paper/highly-automated-learning-for-improved-active
Repo
Framework

Elastic CRFs for Open-ontology Slot Filling


Title	Elastic CRFs for Open-ontology Slot Filling
Authors	Yinpei Dai, Yichi Zhang, Zhijian Ou, Yanmeng Wang, Junlan Feng
Abstract	Slot filling is a crucial component in task-oriented dialog systems, which is to parse (user) utterances into semantic concepts called slots. An ontology is defined by the collection of slots and the values that each slot can take. The widely-used practice of treating slot filling as a sequence labeling task suffers from two drawbacks. First, the ontology is usually pre-defined and fixed. Most current methods are unable to predict new labels for unseen slots. Second, the one-hot encoding of slot labels ignores the semantic meanings and relations for slots, which are implicit in their natural language descriptions. These observations motivate us to propose a novel model called elastic conditional random field (eCRF), for open-ontology slot filling. eCRFs can leverage the neural features of both the utterance and the slot descriptions, and are able to model the interactions between different slots. Experimental results show that eCRFs outperforms existing models on both the in-domain and the cross-doamin tasks, especially in predictions of unseen slots and values.
Tasks	Slot Filling
Published	2018-11-04
URL	http://arxiv.org/abs/1811.01331v1
PDF	http://arxiv.org/pdf/1811.01331v1.pdf
PWC	https://paperswithcode.com/paper/elastic-crfs-for-open-ontology-slot-filling
Repo
Framework

Variational learning across domains with triplet information


Title	Variational learning across domains with triplet information
Authors	Rita Kuznetsova, Oleg Bakhteev, Alexandr Ogaltsov
Abstract	The work investigates deep generative models, which allow us to use training data from one domain to build a model for another domain. We propose the Variational Bi-domain Triplet Autoencoder (VBTA) that learns a joint distribution of objects from different domains. We extend the VBTAs objective function by the relative constraints or triplets that sampled from the shared latent space across domains. In other words, we combine the deep generative models with a metric learning ideas in order to improve the final objective with the triplets information. The performance of the VBTA model is demonstrated on different tasks: image-to-image translation, bi-directional image generation and cross-lingual document classification.
Tasks	Cross-Lingual Document Classification, Document Classification, Image Generation, Image-to-Image Translation, Metric Learning
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08672v2
PDF	http://arxiv.org/pdf/1806.08672v2.pdf
PWC	https://paperswithcode.com/paper/variational-learning-across-domains-with
Repo
Framework

A Log-Euclidean and Total Variation based Variational Framework for Computational Sonography


Title	A Log-Euclidean and Total Variation based Variational Framework for Computational Sonography
Authors	Jyotirmoy Banerjee, Premal A. Patel, Fred Ushakov, Donald Peebles, Jan Deprest, Sebastien Ourselin, David Hawkes, Tom Vercauteren
Abstract	We propose a spatial compounding technique and variational framework to improve 3D ultrasound image quality by compositing multiple ultrasound volumes acquired from different probe orientations. In the composite volume, instead of intensity values, we estimate a tensor at every voxel. The resultant tensor image encapsulates the directional information of the underlying imaging data and can be used to generate ultrasound volumes from arbitrary, potentially unseen, probe positions. Extending the work of Hennersperger et al., we introduce a log-Euclidean framework to ensure that the tensors are positive-definite, eventually ensuring non-negative images. Additionally, we regularise the underpinning ill-posed variational problem while preserving edge information by relying on a total variation penalisation of the tensor field in the log domain. We present results on in vivo human data to show the efficacy of the approach.
Tasks
Published	2018-02-06
URL	http://arxiv.org/abs/1802.02088v1
PDF	http://arxiv.org/pdf/1802.02088v1.pdf
PWC	https://paperswithcode.com/paper/a-log-euclidean-and-total-variation-based
Repo
Framework

Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network


Title	Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network
Authors	Xuanqing Liu, Yao Li, Chongruo Wu, Cho-Jui Hsieh
Abstract	We present a new algorithm to train a robust neural network against adversarial attacks. Our algorithm is motivated by the following two ideas. First, although recent work has demonstrated that fusing randomness can improve the robustness of neural networks (Liu 2017), we noticed that adding noise blindly to all the layers is not the optimal way to incorporate randomness. Instead, we model randomness under the framework of Bayesian Neural Network (BNN) to formally learn the posterior distribution of models in a scalable way. Second, we formulate the mini-max problem in BNN to learn the best model distribution under adversarial attacks, leading to an adversarial-trained Bayesian neural net. Experiment results demonstrate that the proposed algorithm achieves state-of-the-art performance under strong attacks. On CIFAR-10 with VGG network, our model leads to 14% accuracy improvement compared with adversarial training (Madry 2017) and random self-ensemble (Liu 2017) under PGD attack with $0.035$ distortion, and the gap becomes even larger on a subset of ImageNet.
Tasks	Adversarial Defense
Published	2018-10-01
URL	https://arxiv.org/abs/1810.01279v2
PDF	https://arxiv.org/pdf/1810.01279v2.pdf
PWC	https://paperswithcode.com/paper/adv-bnn-improved-adversarial-defense-through
Repo
Framework

Online Model Distillation for Efficient Video Inference


Title	Online Model Distillation for Efficient Video Inference
Authors	Ravi Teja Mullapudi, Steven Chen, Keyi Zhang, Deva Ramanan, Kayvon Fatahalian
Abstract	High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. Rather than learn a specialized student model on offline data from the video stream, we train the student in an online fashion on the live video, intermittently running the teacher to provide a target for learning. Online model distillation yields semantic segmentation models that closely approximate their Mask R-CNN teacher with 7 to 17$\times$ lower inference runtime cost (11 to 26$\times$ in FLOPs), even when the target video’s distribution is non-stationary. Our method requires no offline pretraining on the target video stream, achieves higher accuracy and lower cost than solutions based on flow or video object segmentation, and can exhibit better temporal stability than the original teacher. We also provide a new video dataset for evaluating the efficiency of inference over long running video streams.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-12-06
URL	https://arxiv.org/abs/1812.02699v2
PDF	https://arxiv.org/pdf/1812.02699v2.pdf
PWC	https://paperswithcode.com/paper/online-model-distillation-for-efficient-video
Repo
Framework

DNQ: Dynamic Network Quantization


Title	DNQ: Dynamic Network Quantization
Authors	Yuhui Xu, Shuai Zhang, Yingyong Qi, Jiaxian Guo, Weiyao Lin, Hongkai Xiong
Abstract	Network quantization is an effective method for the deployment of neural networks on memory and energy constrained mobile devices. In this paper, we propose a Dynamic Network Quantization (DNQ) framework which is composed of two modules: a bit-width controller and a quantizer. Unlike most existing quantization methods that use a universal quantization bit-width for the whole network, we utilize policy gradient to train an agent to learn the bit-width of each layer by the bit-width controller. This controller can make a trade-off between accuracy and compression ratio. Given the quantization bit-width sequence, the quantizer adopts the quantization distance as the criterion of the weights importance during quantization. We extensively validate the proposed approach on various main-stream neural networks and obtain impressive results.
Tasks	Quantization
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02375v1
PDF	http://arxiv.org/pdf/1812.02375v1.pdf
PWC	https://paperswithcode.com/paper/dnq-dynamic-network-quantization
Repo
Framework

Meta Learning Deep Visual Words for Fast Video Object Segmentation


Title	Meta Learning Deep Visual Words for Fast Video Object Segmentation
Authors	Harkirat Singh Behl, Mohammad Najafi, Anurag Arnab, Philip H. S. Torr
Abstract	Accurate video object segmentation methods finetune a model using the first annotated frame, and/or use additional inputs such as optical flow and complex post-processing. In contrast, we develop a fast algorithm that requires no finetuning, auxiliary inputs or post-processing, and segments a variable number of objects in a single forward-pass. We represent an object with clusters, or “visual words”, in the embedding space, which correspond to object parts in the image space. This allows us to robustly match to the reference objects throughout the video, because although the global appearance of an object changes as it undergoes occlusions and deformations, the appearance of more local parts may stay consistent. We learn these visual words in an unsupervised manner, using meta-learning to ensure that our training objective matches our inference procedure. We achieve comparable accuracy to finetuning based methods, and state-of-the-art in terms of speed/accuracy trade-offs on four video segmentation datasets.
Tasks	Meta-Learning, Optical Flow Estimation, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01397v2
PDF	http://arxiv.org/pdf/1812.01397v2.pdf
PWC	https://paperswithcode.com/paper/meta-learning-deep-visual-words-for-fast
Repo
Framework

Safe Reinforcement Learning via Probabilistic Shields


Title	Safe Reinforcement Learning via Probabilistic Shields
Authors	Nils Jansen, Bettina Könighofer, Sebastian Junges, Alexandru C. Serban, Roderick Bloem
Abstract	This paper targets the efficient construction of a safety shield for decision making in scenarios that incorporate uncertainty. Markov decision processes (MDPs) are prominent models to capture such planning problems. Reinforcement learning (RL) is a machine learning technique to determine near-optimal policies in MDPs that may be unknown prior to exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables decision-making to adhere to safety constraints with high probability. In a separation of concerns, we employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. We use these results to realize a shield that is applied to an RL algorithm which then optimizes the actual performance objective. We discuss tradeoffs between sufficient progress in exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.
Tasks	Decision Making, Safe Exploration
Published	2018-07-16
URL	https://arxiv.org/abs/1807.06096v2
PDF	https://arxiv.org/pdf/1807.06096v2.pdf
PWC	https://paperswithcode.com/paper/shielded-decision-making-in-mdps
Repo
Framework

Deep neural network ensemble by data augmentation and bagging for skin lesion classification


Title	Deep neural network ensemble by data augmentation and bagging for skin lesion classification
Authors	Manik Goyal, Jagath C. Rajapakse
Abstract	This work summarizes our submission for the Task 3: Disease Classification of ISIC 2018 challenge in Skin Lesion Analysis Towards Melanoma Detection. We use a novel deep neural network (DNN) ensemble architecture introduced by us that can effectively classify skin lesions by using data-augmentation and bagging to address paucity of data and prevent over-fitting. The ensemble is composed of two DNN architectures: Inception-v4 and Inception-Resnet-v2. The DNN architectures are combined in to an ensemble by using a $1\times1$ convolution for fusion in a meta-learning layer.
Tasks	Data Augmentation, Meta-Learning, Skin Lesion Classification
Published	2018-07-15
URL	http://arxiv.org/abs/1807.05496v2
PDF	http://arxiv.org/pdf/1807.05496v2.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-ensemble-by-data
Repo
Framework

Heterogeneity Aware Deep Embedding for Mobile Periocular Recognition


Title	Heterogeneity Aware Deep Embedding for Mobile Periocular Recognition
Authors	Rishabh Garg, Yashasvi Baweja, Soumyadeep Ghosh, Mayank Vatsa, Richa Singh, Nalini Ratha
Abstract	Mobile biometric approaches provide the convenience of secure authentication with an omnipresent technology. However, this brings an additional challenge of recognizing biometric patterns in unconstrained environment including variations in mobile camera sensors, illumination conditions, and capture distance. To address the heterogeneous challenge, this research presents a novel heterogeneity aware loss function within a deep learning framework. The effectiveness of the proposed loss function is evaluated for periocular biometrics using the CSIP, IMP and VISOB mobile periocular databases. The results show that the proposed algorithm yields state-of-the-art results in a heterogeneous environment and improves generalizability for cross-database experiments.
Tasks	Mobile Periocular Recognition
Published	2018-11-02
URL	http://arxiv.org/abs/1811.00846v1
PDF	http://arxiv.org/pdf/1811.00846v1.pdf
PWC	https://paperswithcode.com/paper/heterogeneity-aware-deep-embedding-for-mobile
Repo
Framework

Forming IDEAS Interactive Data Exploration & Analysis System


Title	Forming IDEAS Interactive Data Exploration & Analysis System
Authors	Robert A. Bridges, Maria A. Vincent, Kelly M. T. Huffer, John R. Goodall, Jessie D. Jamieson, Zachary Burch
Abstract	Modern cyber security operations collect an enormous amount of logging and alerting data. While analysts have the ability to query and compute simple statistics and plots from their data, current analytical tools are too simple to admit deep understanding. To detect advanced and novel attacks, analysts turn to manual investigations. While commonplace, current investigations are time-consuming, intuition-based, and proving insufficient. Our hypothesis is that arming the analyst with easy-to-use data science tools will increase their work efficiency, provide them with the ability to resolve hypotheses with scientific inquiry of their data, and support their decisions with evidence over intuition. To this end, we present our work to build IDEAS (Interactive Data Exploration and Analysis System). We present three real-world use-cases that drive the system design from the algorithmic capabilities to the user interface. Finally, a modular and scalable software architecture is discussed along with plans for our pilot deployment with a security operation command.
Tasks
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09676v2
PDF	http://arxiv.org/pdf/1805.09676v2.pdf
PWC	https://paperswithcode.com/paper/forming-ideas-interactive-data-exploration
Repo
Framework