July 29, 2019

2937 words 14 mins read

Paper Group AWR 114

Zero-Shot Learning - The Good, the Bad and the Ugly. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control. SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person. Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks. Expecting the Unexpected …

Zero-Shot Learning - The Good, the Bad and the Ugly


Title	Zero-Shot Learning - The Good, the Bad and the Ugly
Authors	Yongqin Xian, Bernt Schiele, Zeynep Akata
Abstract	Due to the importance of zero-shot learning, the number of proposed approaches has increased steadily recently. We argue that it is time to take a step back and to analyze the status quo of the area. The purpose of this paper is three-fold. First, given the fact that there is no agreed upon zero-shot learning benchmark, we first define a new benchmark by unifying both the evaluation protocols and data splits. This is an important contribution as published results are often not comparable and sometimes even flawed due to, e.g. pre-training on zero-shot test classes. Second, we compare and analyze a significant number of the state-of-the-art methods in depth, both in the classic zero-shot setting but also in the more realistic generalized zero-shot setting. Finally, we discuss limitations of the current status of the area which can be taken as a basis for advancing it.
Tasks	Zero-Shot Learning
Published	2017-03-13
URL	http://arxiv.org/abs/1703.04394v1
PDF	http://arxiv.org/pdf/1703.04394v1.pdf
PWC	https://paperswithcode.com/paper/zero-shot-learning-the-good-the-bad-and-the
Repo	https://github.com/HadoopIt/paper-reading-notes
Framework	none

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control


Title	Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
Authors	Riashat Islam, Peter Henderson, Maziar Gomrokchi, Doina Precup
Abstract	Policy gradient methods in reinforcement learning have become increasingly prevalent for state-of-the-art performance in continuous control tasks. Novel methods typically benchmark against a few key algorithms such as deep deterministic policy gradients and trust region policy optimization. As such, it is important to present and use consistent baselines experiments. However, this can be difficult due to general variance in the algorithms, hyper-parameter tuning, and environment stochasticity. We investigate and discuss: the significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results. We provide guidelines on reporting novel results as comparisons against baseline methods such that future researchers can make informed decisions when investigating novel methods.
Tasks	Continuous Control, Policy Gradient Methods
Published	2017-08-10
URL	http://arxiv.org/abs/1708.04133v1
PDF	http://arxiv.org/pdf/1708.04133v1.pdf
PWC	https://paperswithcode.com/paper/reproducibility-of-benchmarked-deep
Repo	https://github.com/mmajewsk/ml_research
Framework	none

SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person


Title	SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person
Authors	Sungeun Hong, Woobin Im, Jongbin Ryu, Hyun S. Yang
Abstract	Real-world face recognition using a single sample per person (SSPP) is a challenging task. The problem is exacerbated if the conditions under which the gallery image and the probe set are captured are completely different. To address these issues from the perspective of domain adaptation, we introduce an SSPP domain adaptation network (SSPP-DAN). In the proposed approach, domain adaptation, feature extraction, and classification are performed jointly using a deep architecture with domain-adversarial training. However, the SSPP characteristic of one training sample per class is insufficient to train the deep architecture. To overcome this shortage, we generate synthetic images with varying poses using a 3D face model. Experimental evaluations using a realistic SSPP dataset show that deep domain adaptation and image synthesis complement each other and dramatically improve accuracy. Experiments on a benchmark dataset using the proposed approach show state-of-the-art performance. All the dataset and the source code can be found in our online repository (https://github.com/csehong/SSPP-DAN).
Tasks	Domain Adaptation, Face Recognition, Image Generation
Published	2017-02-14
URL	http://arxiv.org/abs/1702.04069v4
PDF	http://arxiv.org/pdf/1702.04069v4.pdf
PWC	https://paperswithcode.com/paper/sspp-dan-deep-domain-adaptation-network-for
Repo	https://github.com/csehong/SSPP-DAN
Framework	tf

Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks


Title	Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks
Authors	Xu Sun, Bingzhen Wei, Xuancheng Ren, Shuming Ma
Abstract	We propose a method, called Label Embedding Network, which can learn label representation (label embedding) during the training process of deep networks. With the proposed method, the label embedding is adaptively and automatically learned through back propagation. The original one-hot represented loss function is converted into a new loss function with soft distributions, such that the originally unrelated labels have continuous interactions with each other during the training process. As a result, the trained model can achieve substantially higher accuracy and with faster convergence speed. Experimental results based on competitive tasks demonstrate the effectiveness of the proposed method, and the learned label embedding is reasonable and interpretable. The proposed method achieves comparable or even better results than the state-of-the-art systems. The source code is available at \url{https://github.com/lancopku/LabelEmb}.
Tasks
Published	2017-10-28
URL	http://arxiv.org/abs/1710.10393v1
PDF	http://arxiv.org/pdf/1710.10393v1.pdf
PWC	https://paperswithcode.com/paper/label-embedding-network-learning-label
Repo	https://github.com/lancopku/LabelEmb
Framework	pytorch

Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters


Title	Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters
Authors	Shiyu Huang, Deva Ramanan
Abstract	As autonomous vehicles become an every-day reality, high-accuracy pedestrian detection is of paramount practical importance. Pedestrian detection is a highly researched topic with mature methods, but most datasets focus on common scenes of people engaged in typical walking poses on sidewalks. But performance is most crucial for dangerous scenarios, such as children playing in the street or people using bicycles/skateboards in unexpected ways. Such “in-the-tail” data is notoriously hard to observe, making both training and testing difficult. To analyze this problem, we have collected a novel annotated dataset of dangerous scenarios called the Precarious Pedestrian dataset. Even given a dedicated collection effort, it is relatively small by contemporary standards (around 1000 images). To allow for large-scale data-driven learning, we explore the use of synthetic data generated by a game engine. A significant challenge is selected the right “priors” or parameters for synthesis: we would like realistic data with poses and object configurations that mimic true Precarious Pedestrians. Inspired by Generative Adversarial Networks (GANs), we generate a massive amount of synthetic data and train a discriminative classifier to select a realistic subset, which we deem the Adversarial Imposters. We demonstrate that this simple pipeline allows one to synthesize realistic training data by making use of rendering/animation engines within a GAN framework. Interestingly, we also demonstrate that such data can be used to rank algorithms, suggesting that Adversarial Imposters can also be used for “in-the-tail” validation at test-time, a notoriously difficult challenge for real-world deployment.
Tasks	Autonomous Vehicles, Pedestrian Detection
Published	2017-03-18
URL	http://arxiv.org/abs/1703.06283v2
PDF	http://arxiv.org/pdf/1703.06283v2.pdf
PWC	https://paperswithcode.com/paper/expecting-the-unexpected-training-detectors
Repo	https://github.com/huangshiyu13/RPNplus
Framework	tf

Deep Learning for Hate Speech Detection in Tweets


Title	Deep Learning for Hate Speech Detection in Tweets
Authors	Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, Vasudeva Varma
Abstract	Hate speech detection on Twitter is critical for applications like controversial event extraction, building AI chatterbots, content recommendation, and sentiment analysis. We define this task as being able to classify a tweet as racist, sexist or neither. The complexity of the natural language constructs makes this task very challenging. We perform extensive experiments with multiple deep learning architectures to learn semantic word embeddings to handle this complexity. Our experiments on a benchmark dataset of 16K annotated tweets show that such deep learning methods outperform state-of-the-art char/word n-gram methods by ~18 F1 points.
Tasks	Hate Speech Detection, Sentiment Analysis, Word Embeddings
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00188v1
PDF	http://arxiv.org/pdf/1706.00188v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-hate-speech-detection-in
Repo	https://github.com/pinkeshbadjatiya/twitter-hatespeech
Framework	tf

Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping


Title	Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping
Authors	Xuebin Qin, Shida He, Camilo Perez Quintero, Abhineet Singh, Masood Dehghan, Martin Jagersand
Abstract	This paper presents a novel real-time method for tracking salient closed boundaries from video image sequences. This method operates on a set of straight line segments that are produced by line detection. The tracking scheme is coherently integrated into a perceptual grouping framework in which the visual tracking problem is tackled by identifying a subset of these line segments and connecting them sequentially to form a closed boundary with the largest saliency and a certain similarity to the previous one. Specifically, we define a new tracking criterion which combines a grouping cost and an area similarity constraint. The proposed criterion makes the resulting boundary tracking more robust to local minima. To achieve real-time tracking performance, we use Delaunay Triangulation to build a graph model with the detected line segments and then reduce the tracking problem to finding the optimal cycle in this graph. This is solved by our newly proposed closed boundary candidates searching algorithm called “Bidirectional Shortest Path (BDSP)". The efficiency and robustness of the proposed method are tested on real video sequences as well as during a robot arm pouring experiment.
Tasks	Visual Tracking
Published	2017-04-30
URL	http://arxiv.org/abs/1705.00360v2
PDF	http://arxiv.org/pdf/1705.00360v2.pdf
PWC	https://paperswithcode.com/paper/real-time-salient-closed-boundary-tracking
Repo	https://github.com/NathanUA/SalientClosedBoundaryTracking
Framework	none

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention


Title	Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
Authors	Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara
Abstract	This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without any recurrent units. Recurrent neural network (RNN) has been a standard technique to model sequential data recently, and this technique has been used in some cutting-edge neural TTS techniques. However, training RNN component often requires a very powerful computer, or very long time typically several days or weeks. Recent other studies, on the other hand, have shown that CNN-based sequence synthesis can be much faster than RNN-based techniques, because of high parallelizability. The objective of this paper is to show an alternative neural TTS system, based only on CNN, that can alleviate these economic costs of training. In our experiment, the proposed Deep Convolutional TTS can be sufficiently trained only in a night (15 hours), using an ordinary gaming PC equipped with two GPUs, while the quality of the synthesized speech was almost acceptable.
Tasks	Text-To-Speech Synthesis
Published	2017-10-24
URL	http://arxiv.org/abs/1710.08969v1
PDF	http://arxiv.org/pdf/1710.08969v1.pdf
PWC	https://paperswithcode.com/paper/efficiently-trainable-text-to-speech-system
Repo	https://github.com/r9y9/deepvoice3_pytorch
Framework	pytorch

Scene Parsing with Global Context Embedding


Title	Scene Parsing with Global Context Embedding
Authors	Wei-Chih Hung, Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang
Abstract	We present a scene parsing method that utilizes global context information based on both the parametric and non- parametric models. Compared to previous methods that only exploit the local relationship between objects, we train a context network based on scene similarities to generate feature representations for global contexts. In addition, these learned features are utilized to generate global and spatial priors for explicit classes inference. We then design modules to embed the feature representations and the priors into the segmentation network as additional global context cues. We show that the proposed method can eliminate false positives that are not compatible with the global context representations. Experiments on both the MIT ADE20K and PASCAL Context datasets show that the proposed method performs favorably against existing methods.
Tasks	Scene Parsing
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06507v2
PDF	http://arxiv.org/pdf/1710.06507v2.pdf
PWC	https://paperswithcode.com/paper/scene-parsing-with-global-context-embedding
Repo	https://github.com/hfslyc/GCPNet
Framework	caffe2

Neural Multi-Step Reasoning for Question Answering on Semi-Structured Tables


Title	Neural Multi-Step Reasoning for Question Answering on Semi-Structured Tables
Authors	Till Haug, Octavian-Eugen Ganea, Paulina Grnarova
Abstract	Advances in natural language processing tasks have gained momentum in recent years due to the increasingly popular neural network methods. In this paper, we explore deep learning techniques for answering multi-step reasoning questions that operate on semi-structured tables. Challenges here arise from the level of logical compositionality expressed by questions, as well as the domain openness. Our approach is weakly supervised, trained on question-answer-table triples without requiring intermediate strong supervision. It performs two phases: first, machine understandable logical forms (programs) are generated from natural language questions following the work of [Pasupat and Liang, 2015]. Second, paraphrases of logical forms and questions are embedded in a jointly learned vector space using word and character convolutional neural networks. A neural scoring function is further used to rank and retrieve the most probable logical form (interpretation) of a question. Our best single model achieves 34.8% accuracy on the WikiTableQuestions dataset, while the best ensemble of our models pushes the state-of-the-art score on this task to 38.7%, thus slightly surpassing both the engineered feature scoring baseline, as well as the Neural Programmer model of [Neelakantan et al., 2016].
Tasks	Question Answering
Published	2017-02-21
URL	http://arxiv.org/abs/1702.06589v2
PDF	http://arxiv.org/pdf/1702.06589v2.pdf
PWC	https://paperswithcode.com/paper/neural-multi-step-reasoning-for-question
Repo	https://github.com/dalab/neural_qa
Framework	tf

Measuring Thematic Fit with Distributional Feature Overlap


Title	Measuring Thematic Fit with Distributional Feature Overlap
Authors	Enrico Santus, Emmanuele Chersoni, Alessandro Lenci, Philippe Blache
Abstract	In this paper, we introduce a new distributional method for modeling predicate-argument thematic fit judgments. We use a syntax-based DSM to build a prototypical representation of verb-specific roles: for every verb, we extract the most salient second order contexts for each of its roles (i.e. the most salient dimensions of typical role fillers), and then we compute thematic fit as a weighted overlap between the top features of candidate fillers and role prototypes. Our experiments show that our method consistently outperforms a baseline re-implementing a state-of-the-art system, and achieves better or comparable results to those reported in the literature for the other unsupervised systems. Moreover, it provides an explicit representation of the features characterizing verb-specific semantic roles.
Tasks
Published	2017-07-19
URL	http://arxiv.org/abs/1707.05967v2
PDF	http://arxiv.org/pdf/1707.05967v2.pdf
PWC	https://paperswithcode.com/paper/measuring-thematic-fit-with-distributional
Repo	https://github.com/esantus/Thematic_Fit
Framework	none

Learned D-AMP: Principled Neural Network based Compressive Image Recovery


Title	Learned D-AMP: Principled Neural Network based Compressive Image Recovery
Authors	Christopher A. Metzler, Ali Mousavi, Richard G. Baraniuk
Abstract	Compressive image recovery is a challenging problem that requires fast and accurate algorithms. Recently, neural networks have been applied to this problem with promising results. By exploiting massively parallel GPU processing architectures and oodles of training data, they can run orders of magnitude faster than existing techniques. However, these methods are largely unprincipled black boxes that are difficult to train and often-times specific to a single measurement matrix. It was recently demonstrated that iterative sparse-signal-recovery algorithms can be “unrolled” to form interpretable deep networks. Taking inspiration from this work, we develop a novel neural network architecture that mimics the behavior of the denoising-based approximate message passing (D-AMP) algorithm. We call this new network Learned D-AMP (LDAMP). The LDAMP network is easy to train, can be applied to a variety of different measurement matrices, and comes with a state-evolution heuristic that accurately predicts its performance. Most importantly, it outperforms the state-of-the-art BM3D-AMP and NLR-CS algorithms in terms of both accuracy and run time. At high resolutions, and when used with sensing matrices that have fast implementations, LDAMP runs over $50\times$ faster than BM3D-AMP and hundreds of times faster than NLR-CS.
Tasks	Denoising
Published	2017-04-21
URL	http://arxiv.org/abs/1704.06625v4
PDF	http://arxiv.org/pdf/1704.06625v4.pdf
PWC	https://paperswithcode.com/paper/learned-d-amp-principled-neural-network-based
Repo	https://github.com/ricedsp/D-AMP_Toolbox
Framework	tf

The Promise of Premise: Harnessing Question Premises in Visual Question Answering


Title	The Promise of Premise: Harnessing Question Premises in Visual Question Answering
Authors	Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee
Abstract	In this paper, we make a simple observation that questions about images often contain premises - objects and relationships implied by the question - and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer purely based on learned language biases, resulting in non-sensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel question relevance detection models and show that models that reason about premises consistently outperform models that do not. We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning.
Tasks	Question Answering, Visual Question Answering
Published	2017-05-01
URL	http://arxiv.org/abs/1705.00601v2
PDF	http://arxiv.org/pdf/1705.00601v2.pdf
PWC	https://paperswithcode.com/paper/the-promise-of-premise-harnessing-question
Repo	https://github.com/virajprabhu/premise-emnlp17
Framework	none

A Sinkhorn-Newton method for entropic optimal transport


Title	A Sinkhorn-Newton method for entropic optimal transport
Authors	Christoph Brauer, Christian Clason, Dirk Lorenz, Benedikt Wirth
Abstract	We consider the entropic regularization of discretized optimal transport and propose to solve its optimality conditions via a logarithmic Newton iteration. We show a quadratic convergence rate and validate numerically that the method compares favorably with the more commonly used Sinkhorn–Knopp algorithm for small regularization strength. We further investigate numerically the robustness of the proposed method with respect to parameters such as the mesh size of the discretization.
Tasks
Published	2017-10-18
URL	http://arxiv.org/abs/1710.06635v2
PDF	http://arxiv.org/pdf/1710.06635v2.pdf
PWC	https://paperswithcode.com/paper/a-sinkhorn-newton-method-for-entropic-optimal
Repo	https://github.com/dirloren/sinkhornnewton
Framework	none

AdaGAN: Boosting Generative Models


Title	AdaGAN: Boosting Generative Models
Authors	Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf
Abstract	Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every step we add a new component into a mixture model by running a GAN algorithm on a reweighted sample. This is inspired by boosting algorithms, where many potentially weak individual predictors are greedily aggregated to form a strong composite predictor. We prove that such an incremental procedure leads to convergence to the true distribution in a finite number of steps if each step is optimal, and convergence at an exponential rate otherwise. We also illustrate experimentally that this procedure addresses the problem of missing modes.
Tasks
Published	2017-01-09
URL	http://arxiv.org/abs/1701.02386v2
PDF	http://arxiv.org/pdf/1701.02386v2.pdf
PWC	https://paperswithcode.com/paper/adagan-boosting-generative-models
Repo	https://github.com/tolstikhin/adagan
Framework	tf