Paper Group ANR 1046
Joint Vertebrae Identification and Localization in Spinal CT Images by Combining Short- and Long-Range Contextual Information. Precise Temporal Action Localization by Evolving Temporal Proposals. Neural machine translation framework based cross-lingual document vector with distance constraint training. Highly Automated Learning for Improved Active …
Joint Vertebrae Identification and Localization in Spinal CT Images by Combining Short- and Long-Range Contextual Information
Title | Joint Vertebrae Identification and Localization in Spinal CT Images by Combining Short- and Long-Range Contextual Information |
Authors | Haofu Liao, Addisu Mesfin, Jiebo Luo |
Abstract | Automatic vertebrae identification and localization from arbitrary CT images is challenging. Vertebrae usually share similar morphological appearance. Because of pathology and the arbitrary field-of-view of CT scans, one can hardly rely on the existence of some anchor vertebrae or parametric methods to model the appearance and shape. To solve the problem, we argue that one should make use of the short-range contextual information, such as the presence of some nearby organs (if any), to roughly estimate the target vertebrae; due to the unique anatomic structure of the spine column, vertebrae have fixed sequential order which provides the important long-range contextual information to further calibrate the results. We propose a robust and efficient vertebrae identification and localization system that can inherently learn to incorporate both the short-range and long-range contextual information in a supervised manner. To this end, we develop a multi-task 3D fully convolutional neural network (3D FCN) to effectively extract the short-range contextual information around the target vertebrae. For the long-range contextual information, we propose a multi-task bidirectional recurrent neural network (Bi-RNN) to encode the spatial and contextual information among the vertebrae of the visible spine column. We demonstrate the effectiveness of the proposed approach on a challenging dataset and the experimental results show that our approach outperforms the state-of-the-art methods by a significant margin. |
Tasks | Joint Vertebrae Identification And Localization In Spinal Ct Images |
Published | 2018-12-09 |
URL | http://arxiv.org/abs/1812.03500v1 |
http://arxiv.org/pdf/1812.03500v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-vertebrae-identification-and |
Repo | |
Framework | |
Precise Temporal Action Localization by Evolving Temporal Proposals
Title | Precise Temporal Action Localization by Evolving Temporal Proposals |
Authors | Haonan Qiu, Yingbin Zheng, Hao Ye, Yao Lu, Feng Wang, Liang He |
Abstract | Locating actions in long untrimmed videos has been a challenging problem in video content analysis. The performances of existing action localization approaches remain unsatisfactory in precisely determining the beginning and the end of an action. Imitating the human perception procedure with observations and refinements, we propose a novel three-phase action localization framework. Our framework is embedded with an Actionness Network to generate initial proposals through frame-wise similarity grouping, and then a Refinement Network to conduct boundary adjustment on these proposals. Finally, the refined proposals are sent to a Localization Network for further fine-grained location regression. The whole process can be deemed as multi-stage refinement using a novel non-local pyramid feature under various temporal granularities. We evaluate our framework on THUMOS14 benchmark and obtain a significant improvement over the state-of-the-arts approaches. Specifically, the performance gain is remarkable under precise localization with high IoU thresholds. Our proposed framework achieves mAP@IoU=0.5 of 34.2%. |
Tasks | Action Localization, Temporal Action Localization |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.04803v1 |
http://arxiv.org/pdf/1804.04803v1.pdf | |
PWC | https://paperswithcode.com/paper/precise-temporal-action-localization-by |
Repo | |
Framework | |
Neural machine translation framework based cross-lingual document vector with distance constraint training
Title | Neural machine translation framework based cross-lingual document vector with distance constraint training |
Authors | Wei Li, Brian Mak |
Abstract | A universal cross-lingual representation of documents is very important for many natural language processing tasks. In this paper, we present a document vectorization method which can effectively create document vectors via self-attention mechanism using a neural machine translation (NMT) framework. The model used by our method can be trained with parallel corpora that are unrelated to the task at hand. During testing, our method will take a monolingual document and convert it into a “Neural machine Translation framework based crosslingual Document Vector with distance constraint training” (cNTDV). cNTDV is a follow-up study from our previous research on the neural machine translation framework based document vector. The cNTDV can produce the document vector from a forward-pass of the encoder with fast speed. Moreover, it is trained with a distance constraint, so that the document vector obtained from different language pair is always consistent with each other. In a cross-lingual document classification task, our cNTDV embeddings surpass the published state-of-the-art performance in the English-to-German classification test, and, to our best knowledge, it also achieves the second best performance in German-to-English classification test. Comparing to our previous research, it does not need a translator in the testing process, which makes the model faster and more convenient. |
Tasks | Cross-Lingual Document Classification, Document Classification, Machine Translation |
Published | 2018-07-29 |
URL | http://arxiv.org/abs/1807.11057v2 |
http://arxiv.org/pdf/1807.11057v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-framework-based |
Repo | |
Framework | |
Highly Automated Learning for Improved Active Safety of Vulnerable Road Users
Title | Highly Automated Learning for Improved Active Safety of Vulnerable Road Users |
Authors | Maarten Bieshaar, Günther Reitberger, Viktor Kreß, Stefan Zernetsch, Konrad Doll, Erich Fuchs, Bernhard Sick |
Abstract | Highly automated driving requires precise models of traffic participants. Many state of the art models are currently based on machine learning techniques. Among others, the required amount of labeled data is one major challenge. An autonomous learning process addressing this problem is proposed. The initial models are iteratively refined in three steps: (1) detection and context identification, (2) novelty detection and active learning and (3) online model adaption. |
Tasks | Active Learning |
Published | 2018-03-09 |
URL | http://arxiv.org/abs/1803.03479v1 |
http://arxiv.org/pdf/1803.03479v1.pdf | |
PWC | https://paperswithcode.com/paper/highly-automated-learning-for-improved-active |
Repo | |
Framework | |
Elastic CRFs for Open-ontology Slot Filling
Title | Elastic CRFs for Open-ontology Slot Filling |
Authors | Yinpei Dai, Yichi Zhang, Zhijian Ou, Yanmeng Wang, Junlan Feng |
Abstract | Slot filling is a crucial component in task-oriented dialog systems, which is to parse (user) utterances into semantic concepts called slots. An ontology is defined by the collection of slots and the values that each slot can take. The widely-used practice of treating slot filling as a sequence labeling task suffers from two drawbacks. First, the ontology is usually pre-defined and fixed. Most current methods are unable to predict new labels for unseen slots. Second, the one-hot encoding of slot labels ignores the semantic meanings and relations for slots, which are implicit in their natural language descriptions. These observations motivate us to propose a novel model called elastic conditional random field (eCRF), for open-ontology slot filling. eCRFs can leverage the neural features of both the utterance and the slot descriptions, and are able to model the interactions between different slots. Experimental results show that eCRFs outperforms existing models on both the in-domain and the cross-doamin tasks, especially in predictions of unseen slots and values. |
Tasks | Slot Filling |
Published | 2018-11-04 |
URL | http://arxiv.org/abs/1811.01331v1 |
http://arxiv.org/pdf/1811.01331v1.pdf | |
PWC | https://paperswithcode.com/paper/elastic-crfs-for-open-ontology-slot-filling |
Repo | |
Framework | |
Variational learning across domains with triplet information
Title | Variational learning across domains with triplet information |
Authors | Rita Kuznetsova, Oleg Bakhteev, Alexandr Ogaltsov |
Abstract | The work investigates deep generative models, which allow us to use training data from one domain to build a model for another domain. We propose the Variational Bi-domain Triplet Autoencoder (VBTA) that learns a joint distribution of objects from different domains. We extend the VBTAs objective function by the relative constraints or triplets that sampled from the shared latent space across domains. In other words, we combine the deep generative models with a metric learning ideas in order to improve the final objective with the triplets information. The performance of the VBTA model is demonstrated on different tasks: image-to-image translation, bi-directional image generation and cross-lingual document classification. |
Tasks | Cross-Lingual Document Classification, Document Classification, Image Generation, Image-to-Image Translation, Metric Learning |
Published | 2018-06-22 |
URL | http://arxiv.org/abs/1806.08672v2 |
http://arxiv.org/pdf/1806.08672v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-learning-across-domains-with |
Repo | |
Framework | |
A Log-Euclidean and Total Variation based Variational Framework for Computational Sonography
Title | A Log-Euclidean and Total Variation based Variational Framework for Computational Sonography |
Authors | Jyotirmoy Banerjee, Premal A. Patel, Fred Ushakov, Donald Peebles, Jan Deprest, Sebastien Ourselin, David Hawkes, Tom Vercauteren |
Abstract | We propose a spatial compounding technique and variational framework to improve 3D ultrasound image quality by compositing multiple ultrasound volumes acquired from different probe orientations. In the composite volume, instead of intensity values, we estimate a tensor at every voxel. The resultant tensor image encapsulates the directional information of the underlying imaging data and can be used to generate ultrasound volumes from arbitrary, potentially unseen, probe positions. Extending the work of Hennersperger et al., we introduce a log-Euclidean framework to ensure that the tensors are positive-definite, eventually ensuring non-negative images. Additionally, we regularise the underpinning ill-posed variational problem while preserving edge information by relying on a total variation penalisation of the tensor field in the log domain. We present results on in vivo human data to show the efficacy of the approach. |
Tasks | |
Published | 2018-02-06 |
URL | http://arxiv.org/abs/1802.02088v1 |
http://arxiv.org/pdf/1802.02088v1.pdf | |
PWC | https://paperswithcode.com/paper/a-log-euclidean-and-total-variation-based |
Repo | |
Framework | |
Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network
Title | Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network |
Authors | Xuanqing Liu, Yao Li, Chongruo Wu, Cho-Jui Hsieh |
Abstract | We present a new algorithm to train a robust neural network against adversarial attacks. Our algorithm is motivated by the following two ideas. First, although recent work has demonstrated that fusing randomness can improve the robustness of neural networks (Liu 2017), we noticed that adding noise blindly to all the layers is not the optimal way to incorporate randomness. Instead, we model randomness under the framework of Bayesian Neural Network (BNN) to formally learn the posterior distribution of models in a scalable way. Second, we formulate the mini-max problem in BNN to learn the best model distribution under adversarial attacks, leading to an adversarial-trained Bayesian neural net. Experiment results demonstrate that the proposed algorithm achieves state-of-the-art performance under strong attacks. On CIFAR-10 with VGG network, our model leads to 14% accuracy improvement compared with adversarial training (Madry 2017) and random self-ensemble (Liu 2017) under PGD attack with $0.035$ distortion, and the gap becomes even larger on a subset of ImageNet. |
Tasks | Adversarial Defense |
Published | 2018-10-01 |
URL | https://arxiv.org/abs/1810.01279v2 |
https://arxiv.org/pdf/1810.01279v2.pdf | |
PWC | https://paperswithcode.com/paper/adv-bnn-improved-adversarial-defense-through |
Repo | |
Framework | |
Online Model Distillation for Efficient Video Inference
Title | Online Model Distillation for Efficient Video Inference |
Authors | Ravi Teja Mullapudi, Steven Chen, Keyi Zhang, Deva Ramanan, Kayvon Fatahalian |
Abstract | High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. Rather than learn a specialized student model on offline data from the video stream, we train the student in an online fashion on the live video, intermittently running the teacher to provide a target for learning. Online model distillation yields semantic segmentation models that closely approximate their Mask R-CNN teacher with 7 to 17$\times$ lower inference runtime cost (11 to 26$\times$ in FLOPs), even when the target video’s distribution is non-stationary. Our method requires no offline pretraining on the target video stream, achieves higher accuracy and lower cost than solutions based on flow or video object segmentation, and can exhibit better temporal stability than the original teacher. We also provide a new video dataset for evaluating the efficiency of inference over long running video streams. |
Tasks | Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2018-12-06 |
URL | https://arxiv.org/abs/1812.02699v2 |
https://arxiv.org/pdf/1812.02699v2.pdf | |
PWC | https://paperswithcode.com/paper/online-model-distillation-for-efficient-video |
Repo | |
Framework | |
DNQ: Dynamic Network Quantization
Title | DNQ: Dynamic Network Quantization |
Authors | Yuhui Xu, Shuai Zhang, Yingyong Qi, Jiaxian Guo, Weiyao Lin, Hongkai Xiong |
Abstract | Network quantization is an effective method for the deployment of neural networks on memory and energy constrained mobile devices. In this paper, we propose a Dynamic Network Quantization (DNQ) framework which is composed of two modules: a bit-width controller and a quantizer. Unlike most existing quantization methods that use a universal quantization bit-width for the whole network, we utilize policy gradient to train an agent to learn the bit-width of each layer by the bit-width controller. This controller can make a trade-off between accuracy and compression ratio. Given the quantization bit-width sequence, the quantizer adopts the quantization distance as the criterion of the weights importance during quantization. We extensively validate the proposed approach on various main-stream neural networks and obtain impressive results. |
Tasks | Quantization |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02375v1 |
http://arxiv.org/pdf/1812.02375v1.pdf | |
PWC | https://paperswithcode.com/paper/dnq-dynamic-network-quantization |
Repo | |
Framework | |
Meta Learning Deep Visual Words for Fast Video Object Segmentation
Title | Meta Learning Deep Visual Words for Fast Video Object Segmentation |
Authors | Harkirat Singh Behl, Mohammad Najafi, Anurag Arnab, Philip H. S. Torr |
Abstract | Accurate video object segmentation methods finetune a model using the first annotated frame, and/or use additional inputs such as optical flow and complex post-processing. In contrast, we develop a fast algorithm that requires no finetuning, auxiliary inputs or post-processing, and segments a variable number of objects in a single forward-pass. We represent an object with clusters, or “visual words”, in the embedding space, which correspond to object parts in the image space. This allows us to robustly match to the reference objects throughout the video, because although the global appearance of an object changes as it undergoes occlusions and deformations, the appearance of more local parts may stay consistent. We learn these visual words in an unsupervised manner, using meta-learning to ensure that our training objective matches our inference procedure. We achieve comparable accuracy to finetuning based methods, and state-of-the-art in terms of speed/accuracy trade-offs on four video segmentation datasets. |
Tasks | Meta-Learning, Optical Flow Estimation, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01397v2 |
http://arxiv.org/pdf/1812.01397v2.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-deep-visual-words-for-fast |
Repo | |
Framework | |
Safe Reinforcement Learning via Probabilistic Shields
Title | Safe Reinforcement Learning via Probabilistic Shields |
Authors | Nils Jansen, Bettina Könighofer, Sebastian Junges, Alexandru C. Serban, Roderick Bloem |
Abstract | This paper targets the efficient construction of a safety shield for decision making in scenarios that incorporate uncertainty. Markov decision processes (MDPs) are prominent models to capture such planning problems. Reinforcement learning (RL) is a machine learning technique to determine near-optimal policies in MDPs that may be unknown prior to exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables decision-making to adhere to safety constraints with high probability. In a separation of concerns, we employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. We use these results to realize a shield that is applied to an RL algorithm which then optimizes the actual performance objective. We discuss tradeoffs between sufficient progress in exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes. |
Tasks | Decision Making, Safe Exploration |
Published | 2018-07-16 |
URL | https://arxiv.org/abs/1807.06096v2 |
https://arxiv.org/pdf/1807.06096v2.pdf | |
PWC | https://paperswithcode.com/paper/shielded-decision-making-in-mdps |
Repo | |
Framework | |
Deep neural network ensemble by data augmentation and bagging for skin lesion classification
Title | Deep neural network ensemble by data augmentation and bagging for skin lesion classification |
Authors | Manik Goyal, Jagath C. Rajapakse |
Abstract | This work summarizes our submission for the Task 3: Disease Classification of ISIC 2018 challenge in Skin Lesion Analysis Towards Melanoma Detection. We use a novel deep neural network (DNN) ensemble architecture introduced by us that can effectively classify skin lesions by using data-augmentation and bagging to address paucity of data and prevent over-fitting. The ensemble is composed of two DNN architectures: Inception-v4 and Inception-Resnet-v2. The DNN architectures are combined in to an ensemble by using a $1\times1$ convolution for fusion in a meta-learning layer. |
Tasks | Data Augmentation, Meta-Learning, Skin Lesion Classification |
Published | 2018-07-15 |
URL | http://arxiv.org/abs/1807.05496v2 |
http://arxiv.org/pdf/1807.05496v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-ensemble-by-data |
Repo | |
Framework | |
Heterogeneity Aware Deep Embedding for Mobile Periocular Recognition
Title | Heterogeneity Aware Deep Embedding for Mobile Periocular Recognition |
Authors | Rishabh Garg, Yashasvi Baweja, Soumyadeep Ghosh, Mayank Vatsa, Richa Singh, Nalini Ratha |
Abstract | Mobile biometric approaches provide the convenience of secure authentication with an omnipresent technology. However, this brings an additional challenge of recognizing biometric patterns in unconstrained environment including variations in mobile camera sensors, illumination conditions, and capture distance. To address the heterogeneous challenge, this research presents a novel heterogeneity aware loss function within a deep learning framework. The effectiveness of the proposed loss function is evaluated for periocular biometrics using the CSIP, IMP and VISOB mobile periocular databases. The results show that the proposed algorithm yields state-of-the-art results in a heterogeneous environment and improves generalizability for cross-database experiments. |
Tasks | Mobile Periocular Recognition |
Published | 2018-11-02 |
URL | http://arxiv.org/abs/1811.00846v1 |
http://arxiv.org/pdf/1811.00846v1.pdf | |
PWC | https://paperswithcode.com/paper/heterogeneity-aware-deep-embedding-for-mobile |
Repo | |
Framework | |
Forming IDEAS Interactive Data Exploration & Analysis System
Title | Forming IDEAS Interactive Data Exploration & Analysis System |
Authors | Robert A. Bridges, Maria A. Vincent, Kelly M. T. Huffer, John R. Goodall, Jessie D. Jamieson, Zachary Burch |
Abstract | Modern cyber security operations collect an enormous amount of logging and alerting data. While analysts have the ability to query and compute simple statistics and plots from their data, current analytical tools are too simple to admit deep understanding. To detect advanced and novel attacks, analysts turn to manual investigations. While commonplace, current investigations are time-consuming, intuition-based, and proving insufficient. Our hypothesis is that arming the analyst with easy-to-use data science tools will increase their work efficiency, provide them with the ability to resolve hypotheses with scientific inquiry of their data, and support their decisions with evidence over intuition. To this end, we present our work to build IDEAS (Interactive Data Exploration and Analysis System). We present three real-world use-cases that drive the system design from the algorithmic capabilities to the user interface. Finally, a modular and scalable software architecture is discussed along with plans for our pilot deployment with a security operation command. |
Tasks | |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09676v2 |
http://arxiv.org/pdf/1805.09676v2.pdf | |
PWC | https://paperswithcode.com/paper/forming-ideas-interactive-data-exploration |
Repo | |
Framework | |