October 21, 2019

3112 words 15 mins read

Paper Group AWR 78

Understanding Convolutional Neural Networks for Text Classification. Linguistically-Informed Self-Attention for Semantic Role Labeling. End-to-End Argument Mining for Discussion Threads Based on Parallel Constrained Pointer Architecture. Boosting Trust Region Policy Optimization by Normalizing Flows Policy. Inferring Complementary Products from Bas …

Understanding Convolutional Neural Networks for Text Classification


Title	Understanding Convolutional Neural Networks for Text Classification
Authors	Alon Jacovi, Oren Sar Shalom, Yoav Goldberg
Abstract	We present an analysis into the inner workings of Convolutional Neural Networks (CNNs) for processing text. CNNs used for computer vision can be interpreted by projecting filters into image space, but for discrete sequence inputs CNNs remain a mystery. We aim to understand the method by which the networks process and classify text. We examine common hypotheses to this problem: that filters, accompanied by global max-pooling, serve as ngram detectors. We show that filters may capture several different semantic classes of ngrams by using different activation patterns, and that global max-pooling induces behavior which separates important ngrams from the rest. Finally, we show practical use cases derived from our findings in the form of model interpretability (explaining a trained model by deriving a concrete identity for each filter, bridging the gap between visualization tools in vision tasks and NLP) and prediction interpretability (explaining predictions). Code implementation is available online at github.com/sayaendo/interpreting-cnn-for-text.
Tasks	Text Classification
Published	2018-09-21
URL	https://arxiv.org/abs/1809.08037v2
PDF	https://arxiv.org/pdf/1809.08037v2.pdf
PWC	https://paperswithcode.com/paper/understanding-convolutional-neural-networks
Repo	https://github.com/sayaendo/interpreting-cnn-for-text
Framework	pytorch

Linguistically-Informed Self-Attention for Semantic Role Labeling


Title	Linguistically-Informed Self-Attention for Semantic Role Labeling
Authors	Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew McCallum
Abstract	Current state-of-the-art semantic role labeling (SRL) uses a deep neural network with no explicit linguistic features. However, prior work has shown that gold syntax trees can dramatically improve SRL decoding, suggesting the possibility of increased accuracy from explicit modeling of syntax. In this work, we present linguistically-informed self-attention (LISA): a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL. Unlike previous models which require significant pre-processing to prepare linguistic features, LISA can incorporate syntax using merely raw tokens as input, encoding the sequence only once to simultaneously perform parsing, predicate detection and role labeling for all predicates. Syntax is incorporated by training one attention head to attend to syntactic parents for each token. Moreover, if a high-quality syntactic parse is already available, it can be beneficially injected at test time without re-training our SRL model. In experiments on CoNLL-2005 SRL, LISA achieves new state-of-the-art performance for a model using predicted predicates and standard word embeddings, attaining 2.5 F1 absolute higher than the previous state-of-the-art on newswire and more than 3.5 F1 on out-of-domain data, nearly 10% reduction in error. On ConLL-2012 English SRL we also show an improvement of more than 2.5 F1. LISA also out-performs the state-of-the-art with contextually-encoded (ELMo) word representations, by nearly 1.0 F1 on news and more than 2.0 F1 on out-of-domain text.
Tasks	Dependency Parsing, Multi-Task Learning, Part-Of-Speech Tagging, Predicate Detection, Semantic Role Labeling, Semantic Role Labeling (predicted predicates), Word Embeddings
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08199v3
PDF	http://arxiv.org/pdf/1804.08199v3.pdf
PWC	https://paperswithcode.com/paper/linguistically-informed-self-attention-for
Repo	https://github.com/strubell/LISA
Framework	tf

End-to-End Argument Mining for Discussion Threads Based on Parallel Constrained Pointer Architecture


Title	End-to-End Argument Mining for Discussion Threads Based on Parallel Constrained Pointer Architecture
Authors	Gaku Morio, Katsuhide Fujita
Abstract	Argument Mining (AM) is a relatively recent discipline, which concentrates on extracting claims or premises from discourses, and inferring their structures. However, many existing works do not consider micro-level AM studies on discussion threads sufficiently. In this paper, we tackle AM for discussion threads. Our main contributions are follows: (1) A novel combination scheme focusing on micro-level inner- and inter- post schemes for a discussion thread. (2) Annotation of large-scale civic discussion threads with the scheme. (3) Parallel constrained pointer architecture (PCPA), a novel end-to-end technique to discriminate sentence types, inner-post relations, and inter-post interactions simultaneously. The experimental results demonstrate that our proposed model shows better accuracy in terms of relations extraction, in comparison to existing state-of-the-art models.
Tasks	Argument Mining
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00563v1
PDF	http://arxiv.org/pdf/1809.00563v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-argument-mining-for-discussion
Repo	https://github.com/EdoFrank/EMNLP2018-ArgMining-Morio
Framework	none

Boosting Trust Region Policy Optimization by Normalizing Flows Policy


Title	Boosting Trust Region Policy Optimization by Normalizing Flows Policy
Authors	Yunhao Tang, Shipra Agrawal
Abstract	We propose to improve trust region policy search with normalizing flows policy. We illustrate that when the trust region is constructed by KL divergence constraints, normalizing flows policy generates samples far from the ‘center’ of the previous policy iterate, which potentially enables better exploration and helps avoid bad local optima. Through extensive comparisons, we show that the normalizing flows policy significantly improves upon baseline architectures especially on high-dimensional tasks with complex dynamics.
Tasks
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10326v3
PDF	http://arxiv.org/pdf/1809.10326v3.pdf
PWC	https://paperswithcode.com/paper/boosting-trust-region-policy-optimization-by
Repo	https://github.com/robintyh1/onpolicybaselines
Framework	tf

Inferring Complementary Products from Baskets and Browsing Sessions


Title	Inferring Complementary Products from Baskets and Browsing Sessions
Authors	Ilya Trofimov
Abstract	Complementary products recommendation is an important problem in e-commerce. Such recommendations increase the average order price and the number of products in baskets. Complementary products are typically inferred from basket data. In this study, we propose the BB2vec model. The BB2vec model learns vector representations of products by analyzing jointly two types of data - Baskets and Browsing sessions (visiting web pages of products). These vector representations are used for making complementary products recommendation. The proposed model alleviates the cold start problem by delivering better recommendations for products having few or no purchases. We show that the BB2vec model has better performance than other models which use only basket data.
Tasks
Published	2018-09-25
URL	http://arxiv.org/abs/1809.09621v1
PDF	http://arxiv.org/pdf/1809.09621v1.pdf
PWC	https://paperswithcode.com/paper/inferring-complementary-products-from-baskets
Repo	https://github.com/IlyaTrofimov/bb2vec
Framework	none

Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints from Limited Training Data


Title	Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints from Limited Training Data
Authors	Yutong Bai, Qing Liu, Lingxi Xie, Weichao Qiu, Yan Zheng, Alan Yuille
Abstract	Detecting semantic parts of an object is a challenging task in computer vision, particularly because it is hard to construct large annotated datasets due to the difficulty of annotating semantic parts. In this paper we present an approach which learns from a small training dataset of annotated semantic parts, where the object is seen from a limited range of viewpoints, but generalizes to detect semantic parts from a much larger range of viewpoints. Our approach is based on a matching algorithm for finding accurate spatial correspondence between two images, which enables semantic parts annotated on one image to be transplanted to another. In particular, this enables images in the training dataset to be matched to a virtual 3D model of the object (for simplicity, we assume that the object viewpoint can be estimated by standard techniques). Then a clustering algorithm is used to annotate the semantic parts of the 3D virtual model. This virtual 3D model can be used to synthesize annotated images from a large range of viewpoint. These can be matched to images in the test set, using the same matching algorithm, to detect semantic parts in novel viewpoints of the object. Our algorithm is very simple, intuitive, and contains very few parameters. We evaluate our approach in the car subclass of the VehicleSemanticPart dataset. We show it outperforms standard deep network approaches and, in particular, performs much better on novel viewpoints. For facilitating the future research, code is available: https://github.com/ytongbai/SemanticPartDetection
Tasks
Published	2018-11-28
URL	https://arxiv.org/abs/1811.11823v4
PDF	https://arxiv.org/pdf/1811.11823v4.pdf
PWC	https://paperswithcode.com/paper/semantic-part-detection-via-matching-learning
Repo	https://github.com/ytongbai/SemanticPartDetection
Framework	pytorch

Gated Path Planning Networks


Title	Gated Path Planning Networks
Authors	Lisa Lee, Emilio Parisotto, Devendra Singh Chaplot, Eric Xing, Ruslan Salakhutdinov
Abstract	Value Iteration Networks (VINs) are effective differentiable path planning modules that can be used by agents to perform navigation while still maintaining end-to-end differentiability of the entire architecture. Despite their effectiveness, they suffer from several disadvantages including training instability, random seed sensitivity, and other optimization problems. In this work, we reframe VINs as recurrent-convolutional networks which demonstrates that VINs couple recurrent convolutions with an unconventional max-pooling activation. From this perspective, we argue that standard gated recurrent update equations could potentially alleviate the optimization issues plaguing VIN. The resulting architecture, which we call the Gated Path Planning Network, is shown to empirically outperform VIN on a variety of metrics such as learning speed, hyperparameter sensitivity, iteration count, and even generalization. Furthermore, we show that this performance gap is consistent across different maze transition types, maze sizes and even show success on a challenging 3D environment, where the planner is only provided with first-person RGB images.
Tasks
Published	2018-06-17
URL	http://arxiv.org/abs/1806.06408v1
PDF	http://arxiv.org/pdf/1806.06408v1.pdf
PWC	https://paperswithcode.com/paper/gated-path-planning-networks
Repo	https://github.com/lileee/gated-path-planning-networks
Framework	pytorch

Adaptive Path-Integral Autoencoder: Representation Learning and Planning for Dynamical Systems


Title	Adaptive Path-Integral Autoencoder: Representation Learning and Planning for Dynamical Systems
Authors	Jung-Su Ha, Young-Jin Park, Hyeok-Joo Chae, Soon-Seo Park, Han-Lim Choi
Abstract	We present a representation learning algorithm that learns a low-dimensional latent dynamical system from high-dimensional \textit{sequential} raw data, e.g., video. The framework builds upon recent advances in amortized inference methods that use both an inference network and a refinement procedure to output samples from a variational distribution given an observation sequence, and takes advantage of the duality between control and inference to approximately solve the intractable inference problem using the path integral control approach. The learned dynamical model can be used to predict and plan the future states; we also present the efficient planning method that exploits the learned low-dimensional latent dynamics. Numerical experiments show that the proposed path-integral control based variational inference method leads to tighter lower bounds in statistical model learning of sequential data. The supplementary video: https://youtu.be/xCp35crUoLQ
Tasks	Representation Learning
Published	2018-07-05
URL	http://arxiv.org/abs/1807.02128v4
PDF	http://arxiv.org/pdf/1807.02128v4.pdf
PWC	https://paperswithcode.com/paper/adaptive-path-integral-autoencoder
Repo	https://github.com/yjparkLiCS/18-NIPS-APIAE
Framework	tf

Detecting Zones and Threat on 3D Body for Security in Airports using Deep Machine Learning


Title	Detecting Zones and Threat on 3D Body for Security in Airports using Deep Machine Learning
Authors	Abel Ag Rb Guimaraes, Ghassem Tofighi
Abstract	In this research, it was used a segmentation and classification method to identify threat recognition in human scanner images of airport security. The Department of Homeland Security’s (DHS) in USA has a higher false alarm, produced from theirs algorithms using today’s scanners at the airports. To repair this problem they started a new competition at Kaggle site asking the science community to improve their detection with new algorithms. The dataset used in this research comes from DHS at https://www.kaggle.com/c/passenger-screening-algorithm-challenge/data According to DHS: “This dataset contains a large number of body scans acquired by a new generation of millimeter wave scanner called the High Definition-Advanced Imaging Technology (HD-AIT) system. They are comprised of volunteers wearing different clothing types (from light summer clothes to heavy winter clothes), different body mass indices, different genders, different numbers of threats, and different types of threats”. Using Python as a principal language, the preprocessed of the dataset images extracted features from 200 bodies using: intensity, intensity differences and local neighbourhood to detect, to produce segmentation regions and label those regions to be used as a truth in a training and test dataset. The regions are subsequently give to a CNN deep learning classifier to predict 17 classes (that represents the body zones): zone1, zone2, … zone17 and zones with threat in a total of 34 zones. The analysis showed the results of the classifier an accuracy of 98.2863% and a loss of 0.091319, as well as an average of 100% for recall and precision.
Tasks
Published	2018-02-02
URL	http://arxiv.org/abs/1802.00565v2
PDF	http://arxiv.org/pdf/1802.00565v2.pdf
PWC	https://paperswithcode.com/paper/detecting-zones-and-threat-on-3d-body-for
Repo	https://github.com/abelguima/ryerson-capstone-CKME136
Framework	none

Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images


Title	Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images
Authors	Avisek Lahiri, Charan Reddy, Prabir Kumar Biswas
Abstract	Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essential component for video understanding and curating large-scale automated annotations in videos. Domain difference between images and videos makes the transferability of image object detectors to videos sub-optimal. The most common solution is to use weakly supervised annotations where a video frame has to be tagged for presence/absence of object categories. This still takes up manual effort. In this paper we take a step forward by adapting the concept of unsupervised adversarial image-to-image translation to perturb static high quality images to be visually indistinguishable from a set of video frames. We assume the presence of a fully annotated static image dataset and an unannotated video dataset. Object detector is trained on adversarially transformed image dataset using the annotations of the original dataset. Experiments on Youtube-Objects and Youtube-Objects-Subset datasets with two contemporary baseline object detectors reveal that such unsupervised pixel level domain adaptation boosts the generalization performance on video frames compared to direct application of original image object detector. Also, we achieve competitive performance compared to recent baselines of weakly supervised methods. This paper can be seen as an application of image translation for cross domain object detection.
Tasks	Domain Adaptation, Image-to-Image Translation, Object Detection, Video Understanding
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02074v1
PDF	http://arxiv.org/pdf/1810.02074v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-adversarial-visual-level-domain
Repo	https://github.com/avisekiit/wacv_2019
Framework	pytorch

Vision-based Control of a Quadrotor in User Proximity: Mediated vs End-to-End Learning Approaches


Title	Vision-based Control of a Quadrotor in User Proximity: Mediated vs End-to-End Learning Approaches
Authors	Dario Mantegazza, Jérôme Guzzi, Luca M. Gambardella, Alessandro Giusti
Abstract	We consider the task of controlling a quadrotor to hover in front of a freely moving user, using input data from an onboard camera. On this specific task we compare two widespread learning paradigms: a mediated approach, which learns an high-level state from the input and then uses it for deriving control signals; and an end-to-end approach, which skips high-level state estimation altogether. We show that despite their fundamental difference, both approaches yield equivalent performance on this task. We finally qualitatively analyze the behavior of a quadrotor implementing such approaches.
Tasks
Published	2018-09-24
URL	http://arxiv.org/abs/1809.08881v2
PDF	http://arxiv.org/pdf/1809.08881v2.pdf
PWC	https://paperswithcode.com/paper/vision-based-control-of-a-quadrotor-in-user
Repo	https://github.com/idsia-robotics/proximity-quadrotor-learning
Framework	none

Group Equivariant Capsule Networks


Title	Group Equivariant Capsule Networks
Authors	Jan Eric Lenssen, Matthias Fey, Pascal Libuschewski
Abstract	We present group equivariant capsule networks, a framework to introduce guaranteed equivariance and invariance properties to the capsule network idea. Our work can be divided into two contributions. First, we present a generic routing by agreement algorithm defined on elements of a group and prove that equivariance of output pose vectors, as well as invariance of output activations, hold under certain conditions. Second, we connect the resulting equivariant capsule networks with work from the field of group convolutional networks. Through this connection, we provide intuitions of how both methods relate and are able to combine the strengths of both approaches in one deep neural network architecture. The resulting framework allows sparse evaluation of the group convolution operator, provides control over specific equivariance and invariance properties, and can use routing by agreement instead of pooling operations. In addition, it is able to provide interpretable and equivariant representation vectors as output capsules, which disentangle evidence of object existence from its pose.
Tasks
Published	2018-06-13
URL	http://arxiv.org/abs/1806.05086v2
PDF	http://arxiv.org/pdf/1806.05086v2.pdf
PWC	https://paperswithcode.com/paper/group-equivariant-capsule-networks
Repo	https://github.com/mrjel/group_equivariant_capsules_pytorch
Framework	pytorch

Natural Environment Benchmarks for Reinforcement Learning


Title	Natural Environment Benchmarks for Reinforcement Learning
Authors	Amy Zhang, Yuxin Wu, Joelle Pineau
Abstract	While current benchmark reinforcement learning (RL) tasks have been useful to drive progress in the field, they are in many ways poor substitutes for learning with real-world data. By testing increasingly complex RL algorithms on low-complexity simulation environments, we often end up with brittle RL policies that generalize poorly beyond the very specific domain. To combat this, we propose three new families of benchmark RL domains that contain some of the complexity of the natural world, while still supporting fast and extensive data acquisition. The proposed domains also permit a characterization of generalization through fair train/test separation, and easy comparison and replication of results. Through this work, we challenge the RL research community to develop more robust algorithms that meet high standards of evaluation.
Tasks
Published	2018-11-14
URL	http://arxiv.org/abs/1811.06032v1
PDF	http://arxiv.org/pdf/1811.06032v1.pdf
PWC	https://paperswithcode.com/paper/natural-environment-benchmarks-for
Repo	https://github.com/bchidamb/RL-Image-Classification
Framework	none

GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration


Title	GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration
Authors	Jacob R. Gardner, Geoff Pleiss, David Bindel, Kilian Q. Weinberger, Andrew Gordon Wilson
Abstract	Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware. We present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM). BBMM inference uses a modified batched version of the conjugate gradients algorithm to derive all terms for training and inference in a single call. BBMM reduces the asymptotic complexity of exact GP inference from $O(n^3)$ to $O(n^2)$. Adapting this algorithm to scalable approximations and complex GP models simply requires a routine for efficient matrix-matrix multiplication with the kernel and its derivative. In addition, BBMM uses a specialized preconditioner to substantially speed up convergence. In experiments we show that BBMM effectively uses GPU hardware to dramatically accelerate both exact GP inference and scalable approximations. Additionally, we provide GPyTorch, a software platform for scalable GP inference via BBMM, built on PyTorch.
Tasks	Gaussian Processes
Published	2018-09-28
URL	http://arxiv.org/abs/1809.11165v5
PDF	http://arxiv.org/pdf/1809.11165v5.pdf
PWC	https://paperswithcode.com/paper/gpytorch-blackbox-matrix-matrix-gaussian
Repo	https://github.com/3springs/np_vs_kriging
Framework	none

On the Interaction Effects Between Prediction and Clustering


Title	On the Interaction Effects Between Prediction and Clustering
Authors	Matt Barnes, Artur Dubrawski
Abstract	Machine learning systems increasingly depend on pipelines of multiple algorithms to provide high quality and well structured predictions. This paper argues interaction effects between clustering and prediction (e.g. classification, regression) algorithms can cause subtle adverse behaviors during cross-validation that may not be initially apparent. In particular, we focus on the problem of estimating the out-of-cluster (OOC) prediction loss given an approximate clustering with probabilistic error rate $p_0$. Traditional cross-validation techniques exhibit significant empirical bias in this setting, and the few attempts to estimate and correct for these effects are intractable on larger datasets. Further, no previous work has been able to characterize the conditions under which these empirical effects occur, and if they do, what properties they have. We precisely answer these questions by providing theoretical properties which hold in various settings, and prove that expected out-of-cluster loss behavior rapidly decays with even minor clustering errors. Fortunately, we are able to leverage these same properties to construct hypothesis tests and scalable estimators necessary for correcting the problem. Empirical results on benchmark datasets validate our theoretical results and demonstrate how scaling techniques provide solutions to new classes of problems.
Tasks
Published	2018-07-18
URL	http://arxiv.org/abs/1807.06713v2
PDF	http://arxiv.org/pdf/1807.06713v2.pdf
PWC	https://paperswithcode.com/paper/on-the-interaction-effects-between-prediction
Repo	https://github.com/mbarnes1/B3
Framework	none