Paper Group AWR 114
![Paper Group AWR 114](/2017/images/pwc/paper-arxiv_hu144ec288a26b3e360d673e256787de3e_28623_900x500_fit_q75_box.jpg)
Zero-Shot Learning - The Good, the Bad and the Ugly. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control. SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person. Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks. Expecting the Unexpected …
Zero-Shot Learning - The Good, the Bad and the Ugly
Title | Zero-Shot Learning - The Good, the Bad and the Ugly |
Authors | Yongqin Xian, Bernt Schiele, Zeynep Akata |
Abstract | Due to the importance of zero-shot learning, the number of proposed approaches has increased steadily recently. We argue that it is time to take a step back and to analyze the status quo of the area. The purpose of this paper is three-fold. First, given the fact that there is no agreed upon zero-shot learning benchmark, we first define a new benchmark by unifying both the evaluation protocols and data splits. This is an important contribution as published results are often not comparable and sometimes even flawed due to, e.g. pre-training on zero-shot test classes. Second, we compare and analyze a significant number of the state-of-the-art methods in depth, both in the classic zero-shot setting but also in the more realistic generalized zero-shot setting. Finally, we discuss limitations of the current status of the area which can be taken as a basis for advancing it. |
Tasks | Zero-Shot Learning |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04394v1 |
http://arxiv.org/pdf/1703.04394v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-learning-the-good-the-bad-and-the |
Repo | https://github.com/HadoopIt/paper-reading-notes |
Framework | none |
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
Title | Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control |
Authors | Riashat Islam, Peter Henderson, Maziar Gomrokchi, Doina Precup |
Abstract | Policy gradient methods in reinforcement learning have become increasingly prevalent for state-of-the-art performance in continuous control tasks. Novel methods typically benchmark against a few key algorithms such as deep deterministic policy gradients and trust region policy optimization. As such, it is important to present and use consistent baselines experiments. However, this can be difficult due to general variance in the algorithms, hyper-parameter tuning, and environment stochasticity. We investigate and discuss: the significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results. We provide guidelines on reporting novel results as comparisons against baseline methods such that future researchers can make informed decisions when investigating novel methods. |
Tasks | Continuous Control, Policy Gradient Methods |
Published | 2017-08-10 |
URL | http://arxiv.org/abs/1708.04133v1 |
http://arxiv.org/pdf/1708.04133v1.pdf | |
PWC | https://paperswithcode.com/paper/reproducibility-of-benchmarked-deep |
Repo | https://github.com/mmajewsk/ml_research |
Framework | none |
SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person
Title | SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person |
Authors | Sungeun Hong, Woobin Im, Jongbin Ryu, Hyun S. Yang |
Abstract | Real-world face recognition using a single sample per person (SSPP) is a challenging task. The problem is exacerbated if the conditions under which the gallery image and the probe set are captured are completely different. To address these issues from the perspective of domain adaptation, we introduce an SSPP domain adaptation network (SSPP-DAN). In the proposed approach, domain adaptation, feature extraction, and classification are performed jointly using a deep architecture with domain-adversarial training. However, the SSPP characteristic of one training sample per class is insufficient to train the deep architecture. To overcome this shortage, we generate synthetic images with varying poses using a 3D face model. Experimental evaluations using a realistic SSPP dataset show that deep domain adaptation and image synthesis complement each other and dramatically improve accuracy. Experiments on a benchmark dataset using the proposed approach show state-of-the-art performance. All the dataset and the source code can be found in our online repository (https://github.com/csehong/SSPP-DAN). |
Tasks | Domain Adaptation, Face Recognition, Image Generation |
Published | 2017-02-14 |
URL | http://arxiv.org/abs/1702.04069v4 |
http://arxiv.org/pdf/1702.04069v4.pdf | |
PWC | https://paperswithcode.com/paper/sspp-dan-deep-domain-adaptation-network-for |
Repo | https://github.com/csehong/SSPP-DAN |
Framework | tf |
Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks
Title | Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks |
Authors | Xu Sun, Bingzhen Wei, Xuancheng Ren, Shuming Ma |
Abstract | We propose a method, called Label Embedding Network, which can learn label representation (label embedding) during the training process of deep networks. With the proposed method, the label embedding is adaptively and automatically learned through back propagation. The original one-hot represented loss function is converted into a new loss function with soft distributions, such that the originally unrelated labels have continuous interactions with each other during the training process. As a result, the trained model can achieve substantially higher accuracy and with faster convergence speed. Experimental results based on competitive tasks demonstrate the effectiveness of the proposed method, and the learned label embedding is reasonable and interpretable. The proposed method achieves comparable or even better results than the state-of-the-art systems. The source code is available at \url{https://github.com/lancopku/LabelEmb}. |
Tasks | |
Published | 2017-10-28 |
URL | http://arxiv.org/abs/1710.10393v1 |
http://arxiv.org/pdf/1710.10393v1.pdf | |
PWC | https://paperswithcode.com/paper/label-embedding-network-learning-label |
Repo | https://github.com/lancopku/LabelEmb |
Framework | pytorch |
Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters
Title | Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters |
Authors | Shiyu Huang, Deva Ramanan |
Abstract | As autonomous vehicles become an every-day reality, high-accuracy pedestrian detection is of paramount practical importance. Pedestrian detection is a highly researched topic with mature methods, but most datasets focus on common scenes of people engaged in typical walking poses on sidewalks. But performance is most crucial for dangerous scenarios, such as children playing in the street or people using bicycles/skateboards in unexpected ways. Such “in-the-tail” data is notoriously hard to observe, making both training and testing difficult. To analyze this problem, we have collected a novel annotated dataset of dangerous scenarios called the Precarious Pedestrian dataset. Even given a dedicated collection effort, it is relatively small by contemporary standards (around 1000 images). To allow for large-scale data-driven learning, we explore the use of synthetic data generated by a game engine. A significant challenge is selected the right “priors” or parameters for synthesis: we would like realistic data with poses and object configurations that mimic true Precarious Pedestrians. Inspired by Generative Adversarial Networks (GANs), we generate a massive amount of synthetic data and train a discriminative classifier to select a realistic subset, which we deem the Adversarial Imposters. We demonstrate that this simple pipeline allows one to synthesize realistic training data by making use of rendering/animation engines within a GAN framework. Interestingly, we also demonstrate that such data can be used to rank algorithms, suggesting that Adversarial Imposters can also be used for “in-the-tail” validation at test-time, a notoriously difficult challenge for real-world deployment. |
Tasks | Autonomous Vehicles, Pedestrian Detection |
Published | 2017-03-18 |
URL | http://arxiv.org/abs/1703.06283v2 |
http://arxiv.org/pdf/1703.06283v2.pdf | |
PWC | https://paperswithcode.com/paper/expecting-the-unexpected-training-detectors |
Repo | https://github.com/huangshiyu13/RPNplus |
Framework | tf |
Deep Learning for Hate Speech Detection in Tweets
Title | Deep Learning for Hate Speech Detection in Tweets |
Authors | Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, Vasudeva Varma |
Abstract | Hate speech detection on Twitter is critical for applications like controversial event extraction, building AI chatterbots, content recommendation, and sentiment analysis. We define this task as being able to classify a tweet as racist, sexist or neither. The complexity of the natural language constructs makes this task very challenging. We perform extensive experiments with multiple deep learning architectures to learn semantic word embeddings to handle this complexity. Our experiments on a benchmark dataset of 16K annotated tweets show that such deep learning methods outperform state-of-the-art char/word n-gram methods by ~18 F1 points. |
Tasks | Hate Speech Detection, Sentiment Analysis, Word Embeddings |
Published | 2017-06-01 |
URL | http://arxiv.org/abs/1706.00188v1 |
http://arxiv.org/pdf/1706.00188v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-hate-speech-detection-in |
Repo | https://github.com/pinkeshbadjatiya/twitter-hatespeech |
Framework | tf |
Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping
Title | Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping |
Authors | Xuebin Qin, Shida He, Camilo Perez Quintero, Abhineet Singh, Masood Dehghan, Martin Jagersand |
Abstract | This paper presents a novel real-time method for tracking salient closed boundaries from video image sequences. This method operates on a set of straight line segments that are produced by line detection. The tracking scheme is coherently integrated into a perceptual grouping framework in which the visual tracking problem is tackled by identifying a subset of these line segments and connecting them sequentially to form a closed boundary with the largest saliency and a certain similarity to the previous one. Specifically, we define a new tracking criterion which combines a grouping cost and an area similarity constraint. The proposed criterion makes the resulting boundary tracking more robust to local minima. To achieve real-time tracking performance, we use Delaunay Triangulation to build a graph model with the detected line segments and then reduce the tracking problem to finding the optimal cycle in this graph. This is solved by our newly proposed closed boundary candidates searching algorithm called “Bidirectional Shortest Path (BDSP)". The efficiency and robustness of the proposed method are tested on real video sequences as well as during a robot arm pouring experiment. |
Tasks | Visual Tracking |
Published | 2017-04-30 |
URL | http://arxiv.org/abs/1705.00360v2 |
http://arxiv.org/pdf/1705.00360v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-salient-closed-boundary-tracking |
Repo | https://github.com/NathanUA/SalientClosedBoundaryTracking |
Framework | none |
Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
Title | Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention |
Authors | Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara |
Abstract | This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without any recurrent units. Recurrent neural network (RNN) has been a standard technique to model sequential data recently, and this technique has been used in some cutting-edge neural TTS techniques. However, training RNN component often requires a very powerful computer, or very long time typically several days or weeks. Recent other studies, on the other hand, have shown that CNN-based sequence synthesis can be much faster than RNN-based techniques, because of high parallelizability. The objective of this paper is to show an alternative neural TTS system, based only on CNN, that can alleviate these economic costs of training. In our experiment, the proposed Deep Convolutional TTS can be sufficiently trained only in a night (15 hours), using an ordinary gaming PC equipped with two GPUs, while the quality of the synthesized speech was almost acceptable. |
Tasks | Text-To-Speech Synthesis |
Published | 2017-10-24 |
URL | http://arxiv.org/abs/1710.08969v1 |
http://arxiv.org/pdf/1710.08969v1.pdf | |
PWC | https://paperswithcode.com/paper/efficiently-trainable-text-to-speech-system |
Repo | https://github.com/r9y9/deepvoice3_pytorch |
Framework | pytorch |
Scene Parsing with Global Context Embedding
Title | Scene Parsing with Global Context Embedding |
Authors | Wei-Chih Hung, Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang |
Abstract | We present a scene parsing method that utilizes global context information based on both the parametric and non- parametric models. Compared to previous methods that only exploit the local relationship between objects, we train a context network based on scene similarities to generate feature representations for global contexts. In addition, these learned features are utilized to generate global and spatial priors for explicit classes inference. We then design modules to embed the feature representations and the priors into the segmentation network as additional global context cues. We show that the proposed method can eliminate false positives that are not compatible with the global context representations. Experiments on both the MIT ADE20K and PASCAL Context datasets show that the proposed method performs favorably against existing methods. |
Tasks | Scene Parsing |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06507v2 |
http://arxiv.org/pdf/1710.06507v2.pdf | |
PWC | https://paperswithcode.com/paper/scene-parsing-with-global-context-embedding |
Repo | https://github.com/hfslyc/GCPNet |
Framework | caffe2 |
Neural Multi-Step Reasoning for Question Answering on Semi-Structured Tables
Title | Neural Multi-Step Reasoning for Question Answering on Semi-Structured Tables |
Authors | Till Haug, Octavian-Eugen Ganea, Paulina Grnarova |
Abstract | Advances in natural language processing tasks have gained momentum in recent years due to the increasingly popular neural network methods. In this paper, we explore deep learning techniques for answering multi-step reasoning questions that operate on semi-structured tables. Challenges here arise from the level of logical compositionality expressed by questions, as well as the domain openness. Our approach is weakly supervised, trained on question-answer-table triples without requiring intermediate strong supervision. It performs two phases: first, machine understandable logical forms (programs) are generated from natural language questions following the work of [Pasupat and Liang, 2015]. Second, paraphrases of logical forms and questions are embedded in a jointly learned vector space using word and character convolutional neural networks. A neural scoring function is further used to rank and retrieve the most probable logical form (interpretation) of a question. Our best single model achieves 34.8% accuracy on the WikiTableQuestions dataset, while the best ensemble of our models pushes the state-of-the-art score on this task to 38.7%, thus slightly surpassing both the engineered feature scoring baseline, as well as the Neural Programmer model of [Neelakantan et al., 2016]. |
Tasks | Question Answering |
Published | 2017-02-21 |
URL | http://arxiv.org/abs/1702.06589v2 |
http://arxiv.org/pdf/1702.06589v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-multi-step-reasoning-for-question |
Repo | https://github.com/dalab/neural_qa |
Framework | tf |
Measuring Thematic Fit with Distributional Feature Overlap
Title | Measuring Thematic Fit with Distributional Feature Overlap |
Authors | Enrico Santus, Emmanuele Chersoni, Alessandro Lenci, Philippe Blache |
Abstract | In this paper, we introduce a new distributional method for modeling predicate-argument thematic fit judgments. We use a syntax-based DSM to build a prototypical representation of verb-specific roles: for every verb, we extract the most salient second order contexts for each of its roles (i.e. the most salient dimensions of typical role fillers), and then we compute thematic fit as a weighted overlap between the top features of candidate fillers and role prototypes. Our experiments show that our method consistently outperforms a baseline re-implementing a state-of-the-art system, and achieves better or comparable results to those reported in the literature for the other unsupervised systems. Moreover, it provides an explicit representation of the features characterizing verb-specific semantic roles. |
Tasks | |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.05967v2 |
http://arxiv.org/pdf/1707.05967v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-thematic-fit-with-distributional |
Repo | https://github.com/esantus/Thematic_Fit |
Framework | none |
Learned D-AMP: Principled Neural Network based Compressive Image Recovery
Title | Learned D-AMP: Principled Neural Network based Compressive Image Recovery |
Authors | Christopher A. Metzler, Ali Mousavi, Richard G. Baraniuk |
Abstract | Compressive image recovery is a challenging problem that requires fast and accurate algorithms. Recently, neural networks have been applied to this problem with promising results. By exploiting massively parallel GPU processing architectures and oodles of training data, they can run orders of magnitude faster than existing techniques. However, these methods are largely unprincipled black boxes that are difficult to train and often-times specific to a single measurement matrix. It was recently demonstrated that iterative sparse-signal-recovery algorithms can be “unrolled” to form interpretable deep networks. Taking inspiration from this work, we develop a novel neural network architecture that mimics the behavior of the denoising-based approximate message passing (D-AMP) algorithm. We call this new network Learned D-AMP (LDAMP). The LDAMP network is easy to train, can be applied to a variety of different measurement matrices, and comes with a state-evolution heuristic that accurately predicts its performance. Most importantly, it outperforms the state-of-the-art BM3D-AMP and NLR-CS algorithms in terms of both accuracy and run time. At high resolutions, and when used with sensing matrices that have fast implementations, LDAMP runs over $50\times$ faster than BM3D-AMP and hundreds of times faster than NLR-CS. |
Tasks | Denoising |
Published | 2017-04-21 |
URL | http://arxiv.org/abs/1704.06625v4 |
http://arxiv.org/pdf/1704.06625v4.pdf | |
PWC | https://paperswithcode.com/paper/learned-d-amp-principled-neural-network-based |
Repo | https://github.com/ricedsp/D-AMP_Toolbox |
Framework | tf |
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
Title | The Promise of Premise: Harnessing Question Premises in Visual Question Answering |
Authors | Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee |
Abstract | In this paper, we make a simple observation that questions about images often contain premises - objects and relationships implied by the question - and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer purely based on learned language biases, resulting in non-sensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel question relevance detection models and show that models that reason about premises consistently outperform models that do not. We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning. |
Tasks | Question Answering, Visual Question Answering |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00601v2 |
http://arxiv.org/pdf/1705.00601v2.pdf | |
PWC | https://paperswithcode.com/paper/the-promise-of-premise-harnessing-question |
Repo | https://github.com/virajprabhu/premise-emnlp17 |
Framework | none |
A Sinkhorn-Newton method for entropic optimal transport
Title | A Sinkhorn-Newton method for entropic optimal transport |
Authors | Christoph Brauer, Christian Clason, Dirk Lorenz, Benedikt Wirth |
Abstract | We consider the entropic regularization of discretized optimal transport and propose to solve its optimality conditions via a logarithmic Newton iteration. We show a quadratic convergence rate and validate numerically that the method compares favorably with the more commonly used Sinkhorn–Knopp algorithm for small regularization strength. We further investigate numerically the robustness of the proposed method with respect to parameters such as the mesh size of the discretization. |
Tasks | |
Published | 2017-10-18 |
URL | http://arxiv.org/abs/1710.06635v2 |
http://arxiv.org/pdf/1710.06635v2.pdf | |
PWC | https://paperswithcode.com/paper/a-sinkhorn-newton-method-for-entropic-optimal |
Repo | https://github.com/dirloren/sinkhornnewton |
Framework | none |
AdaGAN: Boosting Generative Models
Title | AdaGAN: Boosting Generative Models |
Authors | Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf |
Abstract | Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every step we add a new component into a mixture model by running a GAN algorithm on a reweighted sample. This is inspired by boosting algorithms, where many potentially weak individual predictors are greedily aggregated to form a strong composite predictor. We prove that such an incremental procedure leads to convergence to the true distribution in a finite number of steps if each step is optimal, and convergence at an exponential rate otherwise. We also illustrate experimentally that this procedure addresses the problem of missing modes. |
Tasks | |
Published | 2017-01-09 |
URL | http://arxiv.org/abs/1701.02386v2 |
http://arxiv.org/pdf/1701.02386v2.pdf | |
PWC | https://paperswithcode.com/paper/adagan-boosting-generative-models |
Repo | https://github.com/tolstikhin/adagan |
Framework | tf |