October 21, 2019

3000 words 15 mins read

Paper Group AWR 102

Multimodal Generative Models for Scalable Weakly-Supervised Learning. Topology classification with deep learning to improve real-time event selection at the LHC. Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder. emrQA: A Large Corpus for Question Answering on Electronic Medical Records. Differentiable Fine-gra …

Multimodal Generative Models for Scalable Weakly-Supervised Learning


Title	Multimodal Generative Models for Scalable Weakly-Supervised Learning
Authors	Mike Wu, Noah Goodman
Abstract	Multiple modalities often co-occur when describing natural phenomena. Learning a joint representation of these modalities should yield deeper and more useful representations. Previous generative approaches to multi-modal input either do not learn a joint distribution or require additional computation to handle missing data. Here, we introduce a multimodal variational autoencoder (MVAE) that uses a product-of-experts inference network and a sub-sampled training paradigm to solve the multi-modal inference problem. Notably, our model shares parameters to efficiently learn under any combination of missing modalities. We apply the MVAE on four datasets and match state-of-the-art performance using many fewer parameters. In addition, we show that the MVAE is directly applicable to weakly-supervised learning, and is robust to incomplete supervision. We then consider two case studies, one of learning image transformations—edge detection, colorization, segmentation—as a set of modalities, followed by one of machine translation between two languages. We find appealing results across this range of tasks.
Tasks	Colorization, Edge Detection, Machine Translation
Published	2018-02-14
URL	http://arxiv.org/abs/1802.05335v3
PDF	http://arxiv.org/pdf/1802.05335v3.pdf
PWC	https://paperswithcode.com/paper/multimodal-generative-models-for-scalable
Repo	https://github.com/YugeTen/QMVAE-mmdgm
Framework	pytorch

Topology classification with deep learning to improve real-time event selection at the LHC


Title	Topology classification with deep learning to improve real-time event selection at the LHC
Authors	Thong Q. Nguyen, Daniel Weitekamp III, Dustin Anderson, Roberto Castello, Olmo Cerri, Maurizio Pierini, Maria Spiropulu, Jean-Roch Vlimant
Abstract	We show how event topology classification based on deep learning could be used to improve the purity of data samples selected in real time at at the Large Hadron Collider. We consider different data representations, on which different kinds of multi-class classifiers are trained. Both raw data and high-level features are utilized. In the considered examples, a filter based on the classifier’s score can be trained to retain ~99% of the interesting events and reduce the false-positive rate by as much as one order of magnitude for certain background processes. By operating such a filter as part of the online event selection infrastructure of the LHC experiments, one could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives. The saved resources could be translated into a reduction of the detector operation cost or into an effective increase of storage and processing capabilities, which could be reinvested to extend the physics reach of the LHC experiments.
Tasks
Published	2018-06-29
URL	https://arxiv.org/abs/1807.00083v3
PDF	https://arxiv.org/pdf/1807.00083v3.pdf
PWC	https://paperswithcode.com/paper/topology-classification-with-deep-learning-to
Repo	https://github.com/Mmiglio/SparkML
Framework	none

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder


Title	Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder
Authors	Ryo Takahashi, Ran Tian, Kentaro Inui
Abstract	Embedding models for entities and relations are extremely useful for recovering missing facts in a knowledge base. Intuitively, a relation can be modeled by a matrix mapping entity vectors. However, relations reside on low dimension sub-manifolds in the parameter space of arbitrary matrices—for one reason, composition of two relations $\boldsymbol{M}_1,\boldsymbol{M}_2$ may match a third $\boldsymbol{M}_3$ (e.g. composition of relations currency_of_country and country_of_film usually matches currency_of_film_budget), which imposes compositional constraints to be satisfied by the parameters (i.e. $\boldsymbol{M}_1\cdot \boldsymbol{M}_2\approx \boldsymbol{M}_3$). In this paper we investigate a dimension reduction technique by training relations jointly with an autoencoder, which is expected to better capture compositional constraints. We achieve state-of-the-art on Knowledge Base Completion tasks with strongly improved Mean Rank, and show that joint training with an autoencoder leads to interpretable sparse codings of relations, helps discovering compositional constraints and benefits from compositional training. Our source code is released at github.com/tianran/glimvec.
Tasks	Dimensionality Reduction, Knowledge Base Completion
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09547v1
PDF	http://arxiv.org/pdf/1805.09547v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-and-compositional-relation
Repo	https://github.com/tianran/glimvec
Framework	none

emrQA: A Large Corpus for Question Answering on Electronic Medical Records


Title	emrQA: A Large Corpus for Question Answering on Electronic Medical Records
Authors	Anusri Pampari, Preethi Raghavan, Jennifer Liang, Jian Peng
Abstract	We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.
Tasks	Question Answering
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00732v1
PDF	http://arxiv.org/pdf/1809.00732v1.pdf
PWC	https://paperswithcode.com/paper/emrqa-a-large-corpus-for-question-answering
Repo	https://github.com/panushri25/emrQA
Framework	none

Differentiable Fine-grained Quantization for Deep Neural Network Compression


Title	Differentiable Fine-grained Quantization for Deep Neural Network Compression
Authors	Hsin-Pai Cheng, Yuanjun Huang, Xuyang Guo, Yifei Huang, Feng Yan, Hai Li, Yiran Chen
Abstract	Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but usually results in big accuracy drop. In practice, 8-bit or 16-bit quantization is often used aiming at maintaining the same accuracy as the original 32-bit precision. We observe different layers have different accuracy sensitivity of quantization. Thus judiciously selecting different precision for different layers/structures can potentially produce more efficient models compared to traditional quantization methods by striking a better balance between accuracy and compression rate. In this work, we propose a fine-grained quantization approach for deep neural network compression by relaxing the search space of quantization bitwidth from discrete to a continuous domain. The proposed approach applies gradient descend based optimization to generate a mixed-precision quantization scheme that outperforms the accuracy of traditional quantization methods under the same compression rate.
Tasks	Neural Network Compression, Quantization
Published	2018-10-20
URL	http://arxiv.org/abs/1810.10351v3
PDF	http://arxiv.org/pdf/1810.10351v3.pdf
PWC	https://paperswithcode.com/paper/differentiable-fine-grained-quantization-for
Repo	https://github.com/newwhitecheng/compress-all-nn
Framework	tf

HandyNet: A One-stop Solution to Detect, Segment, Localize & Analyze Driver Hands


Title	HandyNet: A One-stop Solution to Detect, Segment, Localize & Analyze Driver Hands
Authors	Akshay Rangesh, Mohan M. Trivedi
Abstract	Tasks related to human hands have long been part of the computer vision community. Hands being the primary actuators for humans, convey a lot about activities and intents, in addition to being an alternative form of communication/interaction with other humans and machines. In this study, we focus on training a single feedforward convolutional neural network (CNN) capable of executing many hand related tasks that may be of use in autonomous and semi-autonomous vehicles of the future. The resulting network, which we refer to as HandyNet, is capable of detecting, segmenting and localizing (in 3D) driver hands inside a vehicle cabin. The network is additionally trained to identify handheld objects that the driver may be interacting with. To meet the data requirements to train such a network, we propose a method for cheap annotation based on chroma-keying, thereby bypassing weeks of human effort required to label such data. This process can generate thousands of labeled training samples in an efficient manner, and may be replicated in new environments with relative ease.
Tasks	Autonomous Vehicles
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07834v2
PDF	http://arxiv.org/pdf/1804.07834v2.pdf
PWC	https://paperswithcode.com/paper/handynet-a-one-stop-solution-to-detect
Repo	https://github.com/arangesh/HandyNet
Framework	tf

Naturalistic Driver Intention and Path Prediction using Recurrent Neural Networks


Title	Naturalistic Driver Intention and Path Prediction using Recurrent Neural Networks
Authors	Alex Zyner, Stewart Worrall, Eduardo Nebot
Abstract	Understanding the intentions of drivers at intersections is a critical component for autonomous vehicles. Urban intersections that do not have traffic signals are a common epicentre of highly variable vehicle movement and interactions. We present a method for predicting driver intent at urban intersections through multi-modal trajectory prediction with uncertainty. Our method is based on recurrent neural networks combined with a mixture density network output layer. To consolidate the multi-modal nature of the output probability distribution, we introduce a clustering algorithm that extracts the set of possible paths that exist in the prediction output, and ranks them according to likelihood. To verify the method’s performance and generalizability, we present a real-world dataset that consists of over 23,000 vehicles traversing five different intersections, collected using a vehicle mounted Lidar based tracking system. An array of metrics is used to demonstrate the performance of the model against several baselines.
Tasks	Autonomous Vehicles, Trajectory Prediction
Published	2018-07-26
URL	http://arxiv.org/abs/1807.09995v1
PDF	http://arxiv.org/pdf/1807.09995v1.pdf
PWC	https://paperswithcode.com/paper/naturalistic-driver-intention-and-path
Repo	https://github.com/azyner/radip
Framework	tf

FixaTons: A collection of Human Fixations Datasets and Metrics for Scanpath Similarity


Title	FixaTons: A collection of Human Fixations Datasets and Metrics for Scanpath Similarity
Authors	Dario Zanca, Valeria Serchi, Pietro Piu, Francesca Rosini, Alessandra Rufa
Abstract	In the last three decades, human visual attention has been a topic of great interest in various disciplines. In computer vision, many models have been proposed to predict the distribution of human fixations on a visual stimulus. Recently, thanks to the creation of large collections of data, machine learning algorithms have obtained state-of-the-art performance on the task of saliency map estimation. On the other hand, computational models of scanpath are much less studied. Works are often only descriptive or task specific. This is due to the fact that the scanpath is harder to model because it must include the description of a dynamic. General purpose computational models are present in the literature, but are then evaluated in tasks of saliency prediction, losing therefore information about the dynamics and the behaviour. In addition, two technical reasons have limited the research. The first reason is the lack of robust and uniformly used set of metrics to compare the similarity between scanpath. The second reason is the lack of sufficiently large and varied scanpath datasets. In this report we want to help in both directions. We present FixaTons, a large collection of datasets human scanpaths (temporally ordered sequences of fixations) and saliency maps. It comes along with a software library for easy data usage, statistics calculation and implementation of metrics for scanpath and saliency prediction evaluation.
Tasks	Saliency Prediction
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02534v3
PDF	http://arxiv.org/pdf/1802.02534v3.pdf
PWC	https://paperswithcode.com/paper/fixatons-a-collection-of-human-fixations
Repo	https://github.com/dariozanca/FixaTons
Framework	none

PDNet: Prior-model Guided Depth-enhanced Network for Salient Object Detection


Title	PDNet: Prior-model Guided Depth-enhanced Network for Salient Object Detection
Authors	Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H Li, Ge Li
Abstract	Fully convolutional neural networks (FCNs) have shown outstanding performance in many computer vision tasks including salient object detection. However, there still remains two issues needed to be addressed in deep learning based saliency detection. One is the lack of tremendous amount of annotated data to train a network. The other is the lack of robustness for extracting salient objects in images containing complex scenes. In this paper, we present a new architecture$ - $PDNet, a robust prior-model guided depth-enhanced network for RGB-D salient object detection. In contrast to existing works, in which RGB-D values of image pixels are fed directly to a network, the proposed architecture is composed of a master network for processing RGB values, and a sub-network making full use of depth cues and incorporate depth-based features into the master network. To overcome the limited size of the labeled RGB-D dataset for training, we employ a large conventional RGB dataset to pre-train the master network, which proves to contribute largely to the final accuracy. Extensive evaluations over five benchmark datasets demonstrate that our proposed method performs favorably against the state-of-the-art approaches.
Tasks	Object Detection, Saliency Detection, Salient Object Detection
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08636v2
PDF	http://arxiv.org/pdf/1803.08636v2.pdf
PWC	https://paperswithcode.com/paper/pdnet-prior-model-guided-depth-enhanced
Repo	https://github.com/ChunbiaoZhu/PDNet
Framework	none

MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects


Title	MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects
Authors	Lisha Cui, Rui Ma, Pei Lv, Xiaoheng Jiang, Zhimin Gao, Bing Zhou, Mingliang Xu
Abstract	For most of the object detectors based on multi-scale feature maps, the shallow layers are rich in fine spatial information and thus mainly responsible for small object detection. The performance of small object detection, however, is still less than satisfactory because of the deficiency of semantic information on shallow feature maps. In this paper, we design a Multi-scale Deconvolutional Single Shot Detector (MDSSD), especially for small object detection. In MDSSD, multiple high-level feature maps at different scales are upsampled simultaneously to increase the spatial resolution. Afterwards, we implement the skip connections with low-level feature maps via Fusion Block. The fusion feature maps, named Fusion Module, are of strong feature representational power of small instances. It is noteworthy that these high-level feature maps utilized in Fusion Block preserve both strong semantic information and some fine details of small instances, rather than the top-most layer where the representation of fine details for small objects are potentially wiped out. The proposed framework achieves 77.6% mAP for small object detection on the challenging dataset TT100K with 512 x 512 input, outperforming other detectors with a large margin. Moreover, it can also achieve state-of-the-art results for general object detection on PASCAL VOC2007 test and MS COCO test-dev2015, especially achieving 2 to 5 points improvement on small object categories.
Tasks	Object Detection, Small Object Detection
Published	2018-05-18
URL	https://arxiv.org/abs/1805.07009v3
PDF	https://arxiv.org/pdf/1805.07009v3.pdf
PWC	https://paperswithcode.com/paper/mdssd-multi-scale-deconvolutional-single-shot
Repo	https://github.com/siddhanthaldar/PyTorch_Object_Detection
Framework	pytorch

Compositional Attention Networks for Machine Reasoning


Title	Compositional Attention Networks for Machine Reasoning
Authors	Drew A. Hudson, Christopher D. Manning
Abstract	We present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning. MAC moves away from monolithic black-box neural architectures towards a design that encourages both transparency and versatility. The model approaches problems by decomposing them into a series of attention-based reasoning steps, each performed by a novel recurrent Memory, Attention, and Composition (MAC) cell that maintains a separation between control and memory. By stringing the cells together and imposing structural constraints that regulate their interaction, MAC effectively learns to perform iterative reasoning processes that are directly inferred from the data in an end-to-end approach. We demonstrate the model’s strength, robustness and interpretability on the challenging CLEVR dataset for visual reasoning, achieving a new state-of-the-art 98.9% accuracy, halving the error rate of the previous best model. More importantly, we show that the model is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results.
Tasks	Visual Reasoning
Published	2018-03-08
URL	http://arxiv.org/abs/1803.03067v2
PDF	http://arxiv.org/pdf/1803.03067v2.pdf
PWC	https://paperswithcode.com/paper/compositional-attention-networks-for-machine
Repo	https://github.com/ronilp/mac-network-pytorch-gqa
Framework	pytorch

IIIDYT at IEST 2018: Implicit Emotion Classification With Deep Contextualized Word Representations


Title	IIIDYT at IEST 2018: Implicit Emotion Classification With Deep Contextualized Word Representations
Authors	Jorge A. Balazs, Edison Marrese-Taylor, Yutaka Matsuo
Abstract	In this paper we describe our system designed for the WASSA 2018 Implicit Emotion Shared Task (IEST), which obtained 2$^{\text{nd}}$ place out of 26 teams with a test macro F1 score of $0.710$. The system is composed of a single pre-trained ELMo layer for encoding words, a Bidirectional Long-Short Memory Network BiLSTM for enriching word representations with context, a max-pooling operation for creating sentence representations from said word vectors, and a Dense Layer for projecting the sentence representations into label space. Our official submission was obtained by ensembling 6 of these models initialized with different random seeds. The code for replicating this paper is available at https://github.com/jabalazs/implicit_emotion.
Tasks	Emotion Classification
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08672v2
PDF	http://arxiv.org/pdf/1808.08672v2.pdf
PWC	https://paperswithcode.com/paper/iiidyt-at-iest-2018-implicit-emotion
Repo	https://github.com/jabalazs/implicit_emotion
Framework	pytorch

Using Deep Learning for Segmentation and Counting within Microscopy Data


Title	Using Deep Learning for Segmentation and Counting within Microscopy Data
Authors	Carlos X. Hernández, Mohammad M. Sultan, Vijay S. Pande
Abstract	Cell counting is a ubiquitous, yet tedious task that would greatly benefit from automation. From basic biological questions to clinical trials, cell counts provide key quantitative feedback that drive research. Unfortunately, cell counting is most commonly a manual task and can be time-intensive. The task is made even more difficult due to overlapping cells, existence of multiple focal planes, and poor imaging quality, among other factors. Here, we describe a convolutional neural network approach, using a recently described feature pyramid network combined with a VGG-style neural network, for segmenting and subsequent counting of cells in a given microscopy image.
Tasks
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10548v1
PDF	http://arxiv.org/pdf/1802.10548v1.pdf
PWC	https://paperswithcode.com/paper/using-deep-learning-for-segmentation-and
Repo	https://github.com/cxhernandez/cellcount
Framework	pytorch

CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++


Title	CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++
Authors	Xiaolin Wang, Masao Utiyama, Eiichiro Sumita
Abstract	This paper presents an open-source neural machine translation toolkit named CytonMT (https://github.com/arthurxlw/cytonMt). The toolkit is built from scratch only using C++ and NVIDIA’s GPU-accelerated libraries. The toolkit features training efficiency, code simplicity and translation quality. Benchmarks show that CytonMT accelerates the training speed by 64.5% to 110.8% on neural networks of various sizes, and achieves competitive translation quality.
Tasks	Machine Translation
Published	2018-02-17
URL	http://arxiv.org/abs/1802.07170v2
PDF	http://arxiv.org/pdf/1802.07170v2.pdf
PWC	https://paperswithcode.com/paper/cytonmt-an-efficient-neural-machine
Repo	https://github.com/arthurxlw/cytonMt
Framework	tf

Angiodysplasia Detection and Localization Using Deep Convolutional Neural Networks


Title	Angiodysplasia Detection and Localization Using Deep Convolutional Neural Networks
Authors	Alexey Shvets, Vladimir Iglovikov, Alexander Rakhlin, Alexandr A. Kalinin
Abstract	Accurate detection and localization for angiodysplasia lesions is an important problem in early stage diagnostics of gastrointestinal bleeding and anemia. Gold-standard for angiodysplasia detection and localization is performed using wireless capsule endoscopy. This pill-like device is able to produce thousand of high enough resolution images during one passage through gastrointestinal tract. In this paper we present our winning solution for MICCAI 2017 Endoscopic Vision SubChallenge: Angiodysplasia Detection and Localization its further improvements over the state-of-the-art results using several novel deep neural network architectures. It address the binary segmentation problem, where every pixel in an image is labeled as an angiodysplasia lesions or background. Then, we analyze connected component of each predicted mask. Based on the analysis we developed a classifier that predict angiodysplasia lesions (binary variable) and a detector for their localization (center of a component). In this setting, our approach outperforms other methods in every task subcategory for angiodysplasia detection and localization thereby providing state-of-the-art results for these problems. The source code for our solution is made publicly available at https://github.com/ternaus/angiodysplasia-segmentatio
Tasks
Published	2018-04-21
URL	http://arxiv.org/abs/1804.08024v1
PDF	http://arxiv.org/pdf/1804.08024v1.pdf
PWC	https://paperswithcode.com/paper/angiodysplasia-detection-and-localization
Repo	https://github.com/ternaus/angiodysplasia-segmentation
Framework	pytorch