Paper Group AWR 102
Multimodal Generative Models for Scalable Weakly-Supervised Learning. Topology classification with deep learning to improve real-time event selection at the LHC. Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder. emrQA: A Large Corpus for Question Answering on Electronic Medical Records. Differentiable Fine-gra …
Multimodal Generative Models for Scalable Weakly-Supervised Learning
Title | Multimodal Generative Models for Scalable Weakly-Supervised Learning |
Authors | Mike Wu, Noah Goodman |
Abstract | Multiple modalities often co-occur when describing natural phenomena. Learning a joint representation of these modalities should yield deeper and more useful representations. Previous generative approaches to multi-modal input either do not learn a joint distribution or require additional computation to handle missing data. Here, we introduce a multimodal variational autoencoder (MVAE) that uses a product-of-experts inference network and a sub-sampled training paradigm to solve the multi-modal inference problem. Notably, our model shares parameters to efficiently learn under any combination of missing modalities. We apply the MVAE on four datasets and match state-of-the-art performance using many fewer parameters. In addition, we show that the MVAE is directly applicable to weakly-supervised learning, and is robust to incomplete supervision. We then consider two case studies, one of learning image transformations—edge detection, colorization, segmentation—as a set of modalities, followed by one of machine translation between two languages. We find appealing results across this range of tasks. |
Tasks | Colorization, Edge Detection, Machine Translation |
Published | 2018-02-14 |
URL | http://arxiv.org/abs/1802.05335v3 |
http://arxiv.org/pdf/1802.05335v3.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-generative-models-for-scalable |
Repo | https://github.com/YugeTen/QMVAE-mmdgm |
Framework | pytorch |
Topology classification with deep learning to improve real-time event selection at the LHC
Title | Topology classification with deep learning to improve real-time event selection at the LHC |
Authors | Thong Q. Nguyen, Daniel Weitekamp III, Dustin Anderson, Roberto Castello, Olmo Cerri, Maurizio Pierini, Maria Spiropulu, Jean-Roch Vlimant |
Abstract | We show how event topology classification based on deep learning could be used to improve the purity of data samples selected in real time at at the Large Hadron Collider. We consider different data representations, on which different kinds of multi-class classifiers are trained. Both raw data and high-level features are utilized. In the considered examples, a filter based on the classifier’s score can be trained to retain ~99% of the interesting events and reduce the false-positive rate by as much as one order of magnitude for certain background processes. By operating such a filter as part of the online event selection infrastructure of the LHC experiments, one could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives. The saved resources could be translated into a reduction of the detector operation cost or into an effective increase of storage and processing capabilities, which could be reinvested to extend the physics reach of the LHC experiments. |
Tasks | |
Published | 2018-06-29 |
URL | https://arxiv.org/abs/1807.00083v3 |
https://arxiv.org/pdf/1807.00083v3.pdf | |
PWC | https://paperswithcode.com/paper/topology-classification-with-deep-learning-to |
Repo | https://github.com/Mmiglio/SparkML |
Framework | none |
Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder
Title | Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder |
Authors | Ryo Takahashi, Ran Tian, Kentaro Inui |
Abstract | Embedding models for entities and relations are extremely useful for recovering missing facts in a knowledge base. Intuitively, a relation can be modeled by a matrix mapping entity vectors. However, relations reside on low dimension sub-manifolds in the parameter space of arbitrary matrices—for one reason, composition of two relations $\boldsymbol{M}_1,\boldsymbol{M}_2$ may match a third $\boldsymbol{M}_3$ (e.g. composition of relations currency_of_country and country_of_film usually matches currency_of_film_budget), which imposes compositional constraints to be satisfied by the parameters (i.e. $\boldsymbol{M}_1\cdot \boldsymbol{M}_2\approx \boldsymbol{M}_3$). In this paper we investigate a dimension reduction technique by training relations jointly with an autoencoder, which is expected to better capture compositional constraints. We achieve state-of-the-art on Knowledge Base Completion tasks with strongly improved Mean Rank, and show that joint training with an autoencoder leads to interpretable sparse codings of relations, helps discovering compositional constraints and benefits from compositional training. Our source code is released at github.com/tianran/glimvec. |
Tasks | Dimensionality Reduction, Knowledge Base Completion |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09547v1 |
http://arxiv.org/pdf/1805.09547v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-and-compositional-relation |
Repo | https://github.com/tianran/glimvec |
Framework | none |
emrQA: A Large Corpus for Question Answering on Electronic Medical Records
Title | emrQA: A Large Corpus for Question Answering on Electronic Medical Records |
Authors | Anusri Pampari, Preethi Raghavan, Jennifer Liang, Jian Peng |
Abstract | We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping. |
Tasks | Question Answering |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00732v1 |
http://arxiv.org/pdf/1809.00732v1.pdf | |
PWC | https://paperswithcode.com/paper/emrqa-a-large-corpus-for-question-answering |
Repo | https://github.com/panushri25/emrQA |
Framework | none |
Differentiable Fine-grained Quantization for Deep Neural Network Compression
Title | Differentiable Fine-grained Quantization for Deep Neural Network Compression |
Authors | Hsin-Pai Cheng, Yuanjun Huang, Xuyang Guo, Yifei Huang, Feng Yan, Hai Li, Yiran Chen |
Abstract | Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but usually results in big accuracy drop. In practice, 8-bit or 16-bit quantization is often used aiming at maintaining the same accuracy as the original 32-bit precision. We observe different layers have different accuracy sensitivity of quantization. Thus judiciously selecting different precision for different layers/structures can potentially produce more efficient models compared to traditional quantization methods by striking a better balance between accuracy and compression rate. In this work, we propose a fine-grained quantization approach for deep neural network compression by relaxing the search space of quantization bitwidth from discrete to a continuous domain. The proposed approach applies gradient descend based optimization to generate a mixed-precision quantization scheme that outperforms the accuracy of traditional quantization methods under the same compression rate. |
Tasks | Neural Network Compression, Quantization |
Published | 2018-10-20 |
URL | http://arxiv.org/abs/1810.10351v3 |
http://arxiv.org/pdf/1810.10351v3.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-fine-grained-quantization-for |
Repo | https://github.com/newwhitecheng/compress-all-nn |
Framework | tf |
HandyNet: A One-stop Solution to Detect, Segment, Localize & Analyze Driver Hands
Title | HandyNet: A One-stop Solution to Detect, Segment, Localize & Analyze Driver Hands |
Authors | Akshay Rangesh, Mohan M. Trivedi |
Abstract | Tasks related to human hands have long been part of the computer vision community. Hands being the primary actuators for humans, convey a lot about activities and intents, in addition to being an alternative form of communication/interaction with other humans and machines. In this study, we focus on training a single feedforward convolutional neural network (CNN) capable of executing many hand related tasks that may be of use in autonomous and semi-autonomous vehicles of the future. The resulting network, which we refer to as HandyNet, is capable of detecting, segmenting and localizing (in 3D) driver hands inside a vehicle cabin. The network is additionally trained to identify handheld objects that the driver may be interacting with. To meet the data requirements to train such a network, we propose a method for cheap annotation based on chroma-keying, thereby bypassing weeks of human effort required to label such data. This process can generate thousands of labeled training samples in an efficient manner, and may be replicated in new environments with relative ease. |
Tasks | Autonomous Vehicles |
Published | 2018-04-20 |
URL | http://arxiv.org/abs/1804.07834v2 |
http://arxiv.org/pdf/1804.07834v2.pdf | |
PWC | https://paperswithcode.com/paper/handynet-a-one-stop-solution-to-detect |
Repo | https://github.com/arangesh/HandyNet |
Framework | tf |
Naturalistic Driver Intention and Path Prediction using Recurrent Neural Networks
Title | Naturalistic Driver Intention and Path Prediction using Recurrent Neural Networks |
Authors | Alex Zyner, Stewart Worrall, Eduardo Nebot |
Abstract | Understanding the intentions of drivers at intersections is a critical component for autonomous vehicles. Urban intersections that do not have traffic signals are a common epicentre of highly variable vehicle movement and interactions. We present a method for predicting driver intent at urban intersections through multi-modal trajectory prediction with uncertainty. Our method is based on recurrent neural networks combined with a mixture density network output layer. To consolidate the multi-modal nature of the output probability distribution, we introduce a clustering algorithm that extracts the set of possible paths that exist in the prediction output, and ranks them according to likelihood. To verify the method’s performance and generalizability, we present a real-world dataset that consists of over 23,000 vehicles traversing five different intersections, collected using a vehicle mounted Lidar based tracking system. An array of metrics is used to demonstrate the performance of the model against several baselines. |
Tasks | Autonomous Vehicles, Trajectory Prediction |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.09995v1 |
http://arxiv.org/pdf/1807.09995v1.pdf | |
PWC | https://paperswithcode.com/paper/naturalistic-driver-intention-and-path |
Repo | https://github.com/azyner/radip |
Framework | tf |
FixaTons: A collection of Human Fixations Datasets and Metrics for Scanpath Similarity
Title | FixaTons: A collection of Human Fixations Datasets and Metrics for Scanpath Similarity |
Authors | Dario Zanca, Valeria Serchi, Pietro Piu, Francesca Rosini, Alessandra Rufa |
Abstract | In the last three decades, human visual attention has been a topic of great interest in various disciplines. In computer vision, many models have been proposed to predict the distribution of human fixations on a visual stimulus. Recently, thanks to the creation of large collections of data, machine learning algorithms have obtained state-of-the-art performance on the task of saliency map estimation. On the other hand, computational models of scanpath are much less studied. Works are often only descriptive or task specific. This is due to the fact that the scanpath is harder to model because it must include the description of a dynamic. General purpose computational models are present in the literature, but are then evaluated in tasks of saliency prediction, losing therefore information about the dynamics and the behaviour. In addition, two technical reasons have limited the research. The first reason is the lack of robust and uniformly used set of metrics to compare the similarity between scanpath. The second reason is the lack of sufficiently large and varied scanpath datasets. In this report we want to help in both directions. We present FixaTons, a large collection of datasets human scanpaths (temporally ordered sequences of fixations) and saliency maps. It comes along with a software library for easy data usage, statistics calculation and implementation of metrics for scanpath and saliency prediction evaluation. |
Tasks | Saliency Prediction |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02534v3 |
http://arxiv.org/pdf/1802.02534v3.pdf | |
PWC | https://paperswithcode.com/paper/fixatons-a-collection-of-human-fixations |
Repo | https://github.com/dariozanca/FixaTons |
Framework | none |
PDNet: Prior-model Guided Depth-enhanced Network for Salient Object Detection
Title | PDNet: Prior-model Guided Depth-enhanced Network for Salient Object Detection |
Authors | Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H Li, Ge Li |
Abstract | Fully convolutional neural networks (FCNs) have shown outstanding performance in many computer vision tasks including salient object detection. However, there still remains two issues needed to be addressed in deep learning based saliency detection. One is the lack of tremendous amount of annotated data to train a network. The other is the lack of robustness for extracting salient objects in images containing complex scenes. In this paper, we present a new architecture$ - $PDNet, a robust prior-model guided depth-enhanced network for RGB-D salient object detection. In contrast to existing works, in which RGB-D values of image pixels are fed directly to a network, the proposed architecture is composed of a master network for processing RGB values, and a sub-network making full use of depth cues and incorporate depth-based features into the master network. To overcome the limited size of the labeled RGB-D dataset for training, we employ a large conventional RGB dataset to pre-train the master network, which proves to contribute largely to the final accuracy. Extensive evaluations over five benchmark datasets demonstrate that our proposed method performs favorably against the state-of-the-art approaches. |
Tasks | Object Detection, Saliency Detection, Salient Object Detection |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08636v2 |
http://arxiv.org/pdf/1803.08636v2.pdf | |
PWC | https://paperswithcode.com/paper/pdnet-prior-model-guided-depth-enhanced |
Repo | https://github.com/ChunbiaoZhu/PDNet |
Framework | none |
MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects
Title | MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects |
Authors | Lisha Cui, Rui Ma, Pei Lv, Xiaoheng Jiang, Zhimin Gao, Bing Zhou, Mingliang Xu |
Abstract | For most of the object detectors based on multi-scale feature maps, the shallow layers are rich in fine spatial information and thus mainly responsible for small object detection. The performance of small object detection, however, is still less than satisfactory because of the deficiency of semantic information on shallow feature maps. In this paper, we design a Multi-scale Deconvolutional Single Shot Detector (MDSSD), especially for small object detection. In MDSSD, multiple high-level feature maps at different scales are upsampled simultaneously to increase the spatial resolution. Afterwards, we implement the skip connections with low-level feature maps via Fusion Block. The fusion feature maps, named Fusion Module, are of strong feature representational power of small instances. It is noteworthy that these high-level feature maps utilized in Fusion Block preserve both strong semantic information and some fine details of small instances, rather than the top-most layer where the representation of fine details for small objects are potentially wiped out. The proposed framework achieves 77.6% mAP for small object detection on the challenging dataset TT100K with 512 x 512 input, outperforming other detectors with a large margin. Moreover, it can also achieve state-of-the-art results for general object detection on PASCAL VOC2007 test and MS COCO test-dev2015, especially achieving 2 to 5 points improvement on small object categories. |
Tasks | Object Detection, Small Object Detection |
Published | 2018-05-18 |
URL | https://arxiv.org/abs/1805.07009v3 |
https://arxiv.org/pdf/1805.07009v3.pdf | |
PWC | https://paperswithcode.com/paper/mdssd-multi-scale-deconvolutional-single-shot |
Repo | https://github.com/siddhanthaldar/PyTorch_Object_Detection |
Framework | pytorch |
Compositional Attention Networks for Machine Reasoning
Title | Compositional Attention Networks for Machine Reasoning |
Authors | Drew A. Hudson, Christopher D. Manning |
Abstract | We present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning. MAC moves away from monolithic black-box neural architectures towards a design that encourages both transparency and versatility. The model approaches problems by decomposing them into a series of attention-based reasoning steps, each performed by a novel recurrent Memory, Attention, and Composition (MAC) cell that maintains a separation between control and memory. By stringing the cells together and imposing structural constraints that regulate their interaction, MAC effectively learns to perform iterative reasoning processes that are directly inferred from the data in an end-to-end approach. We demonstrate the model’s strength, robustness and interpretability on the challenging CLEVR dataset for visual reasoning, achieving a new state-of-the-art 98.9% accuracy, halving the error rate of the previous best model. More importantly, we show that the model is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results. |
Tasks | Visual Reasoning |
Published | 2018-03-08 |
URL | http://arxiv.org/abs/1803.03067v2 |
http://arxiv.org/pdf/1803.03067v2.pdf | |
PWC | https://paperswithcode.com/paper/compositional-attention-networks-for-machine |
Repo | https://github.com/ronilp/mac-network-pytorch-gqa |
Framework | pytorch |
IIIDYT at IEST 2018: Implicit Emotion Classification With Deep Contextualized Word Representations
Title | IIIDYT at IEST 2018: Implicit Emotion Classification With Deep Contextualized Word Representations |
Authors | Jorge A. Balazs, Edison Marrese-Taylor, Yutaka Matsuo |
Abstract | In this paper we describe our system designed for the WASSA 2018 Implicit Emotion Shared Task (IEST), which obtained 2$^{\text{nd}}$ place out of 26 teams with a test macro F1 score of $0.710$. The system is composed of a single pre-trained ELMo layer for encoding words, a Bidirectional Long-Short Memory Network BiLSTM for enriching word representations with context, a max-pooling operation for creating sentence representations from said word vectors, and a Dense Layer for projecting the sentence representations into label space. Our official submission was obtained by ensembling 6 of these models initialized with different random seeds. The code for replicating this paper is available at https://github.com/jabalazs/implicit_emotion. |
Tasks | Emotion Classification |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08672v2 |
http://arxiv.org/pdf/1808.08672v2.pdf | |
PWC | https://paperswithcode.com/paper/iiidyt-at-iest-2018-implicit-emotion |
Repo | https://github.com/jabalazs/implicit_emotion |
Framework | pytorch |
Using Deep Learning for Segmentation and Counting within Microscopy Data
Title | Using Deep Learning for Segmentation and Counting within Microscopy Data |
Authors | Carlos X. Hernández, Mohammad M. Sultan, Vijay S. Pande |
Abstract | Cell counting is a ubiquitous, yet tedious task that would greatly benefit from automation. From basic biological questions to clinical trials, cell counts provide key quantitative feedback that drive research. Unfortunately, cell counting is most commonly a manual task and can be time-intensive. The task is made even more difficult due to overlapping cells, existence of multiple focal planes, and poor imaging quality, among other factors. Here, we describe a convolutional neural network approach, using a recently described feature pyramid network combined with a VGG-style neural network, for segmenting and subsequent counting of cells in a given microscopy image. |
Tasks | |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1802.10548v1 |
http://arxiv.org/pdf/1802.10548v1.pdf | |
PWC | https://paperswithcode.com/paper/using-deep-learning-for-segmentation-and |
Repo | https://github.com/cxhernandez/cellcount |
Framework | pytorch |
CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++
Title | CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++ |
Authors | Xiaolin Wang, Masao Utiyama, Eiichiro Sumita |
Abstract | This paper presents an open-source neural machine translation toolkit named CytonMT (https://github.com/arthurxlw/cytonMt). The toolkit is built from scratch only using C++ and NVIDIA’s GPU-accelerated libraries. The toolkit features training efficiency, code simplicity and translation quality. Benchmarks show that CytonMT accelerates the training speed by 64.5% to 110.8% on neural networks of various sizes, and achieves competitive translation quality. |
Tasks | Machine Translation |
Published | 2018-02-17 |
URL | http://arxiv.org/abs/1802.07170v2 |
http://arxiv.org/pdf/1802.07170v2.pdf | |
PWC | https://paperswithcode.com/paper/cytonmt-an-efficient-neural-machine |
Repo | https://github.com/arthurxlw/cytonMt |
Framework | tf |
Angiodysplasia Detection and Localization Using Deep Convolutional Neural Networks
Title | Angiodysplasia Detection and Localization Using Deep Convolutional Neural Networks |
Authors | Alexey Shvets, Vladimir Iglovikov, Alexander Rakhlin, Alexandr A. Kalinin |
Abstract | Accurate detection and localization for angiodysplasia lesions is an important problem in early stage diagnostics of gastrointestinal bleeding and anemia. Gold-standard for angiodysplasia detection and localization is performed using wireless capsule endoscopy. This pill-like device is able to produce thousand of high enough resolution images during one passage through gastrointestinal tract. In this paper we present our winning solution for MICCAI 2017 Endoscopic Vision SubChallenge: Angiodysplasia Detection and Localization its further improvements over the state-of-the-art results using several novel deep neural network architectures. It address the binary segmentation problem, where every pixel in an image is labeled as an angiodysplasia lesions or background. Then, we analyze connected component of each predicted mask. Based on the analysis we developed a classifier that predict angiodysplasia lesions (binary variable) and a detector for their localization (center of a component). In this setting, our approach outperforms other methods in every task subcategory for angiodysplasia detection and localization thereby providing state-of-the-art results for these problems. The source code for our solution is made publicly available at https://github.com/ternaus/angiodysplasia-segmentatio |
Tasks | |
Published | 2018-04-21 |
URL | http://arxiv.org/abs/1804.08024v1 |
http://arxiv.org/pdf/1804.08024v1.pdf | |
PWC | https://paperswithcode.com/paper/angiodysplasia-detection-and-localization |
Repo | https://github.com/ternaus/angiodysplasia-segmentation |
Framework | pytorch |