October 20, 2019

3112 words 15 mins read

Paper Group AWR 198

Dirichlet belief networks for topic structure learning. Confidence from Invariance to Image Transformations. PyText: A Seamless Path from NLP research to production. Solving Jigsaw Puzzles By the Graph Connection Laplacian. Revisiting Video Saliency: A Large-scale Benchmark and a New Model. Attention Based Fully Convolutional Network for Speech Emo …

Dirichlet belief networks for topic structure learning


Title	Dirichlet belief networks for topic structure learning
Authors	He Zhao, Lan Du, Wray Buntine, Mingyuan Zhou
Abstract	Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model.
Tasks	Topic Models
Published	2018-11-02
URL	http://arxiv.org/abs/1811.00717v1
PDF	http://arxiv.org/pdf/1811.00717v1.pdf
PWC	https://paperswithcode.com/paper/dirichlet-belief-networks-for-topic-structure
Repo	https://github.com/ethanhezhao/DirBN
Framework	none

Confidence from Invariance to Image Transformations


Title	Confidence from Invariance to Image Transformations
Authors	Yuval Bahat, Gregory Shakhnarovich
Abstract	We develop a technique for automatically detecting the classification errors of a pre-trained visual classifier. Our method is agnostic to the form of the classifier, requiring access only to classifier responses to a set of inputs. We train a parametric binary classifier (error/correct) on a representation derived from a set of classifier responses generated from multiple copies of the same input, each subject to a different natural image transformation. Thus, we establish a measure of confidence in classifier’s decision by analyzing the invariance of its decision under various transformations. In experiments with multiple data sets (STL-10,CIFAR-100,ImageNet) and classifiers, we demonstrate new state of the art for the error detection task. In addition, we apply our technique to novelty detection scenarios, where we also demonstrate state of the art results.
Tasks
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00657v1
PDF	http://arxiv.org/pdf/1804.00657v1.pdf
PWC	https://paperswithcode.com/paper/confidence-from-invariance-to-image
Repo	https://github.com/YuvalBahat/Confidence_From_Invariance
Framework	tf

PyText: A Seamless Path from NLP research to production


Title	PyText: A Seamless Path from NLP research to production
Authors	Ahmed Aly, Kushal Lakhotia, Shicong Zhao, Mrinal Mohit, Barlas Oguz, Abhinav Arora, Sonal Gupta, Christopher Dewan, Stef Nelson-Lindall, Rushin Shah
Abstract	We introduce PyText - a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We report our own experience of migrating experimentation and production workflows to PyText, which enabled us to iterate faster on novel modeling ideas and then seamlessly ship them at industrial scale.
Tasks
Published	2018-12-12
URL	http://arxiv.org/abs/1812.08729v1
PDF	http://arxiv.org/pdf/1812.08729v1.pdf
PWC	https://paperswithcode.com/paper/pytext-a-seamless-path-from-nlp-research-to
Repo	https://github.com/AMinerOpen/pytext_clf
Framework	none

Solving Jigsaw Puzzles By the Graph Connection Laplacian


Title	Solving Jigsaw Puzzles By the Graph Connection Laplacian
Authors	Vahan Huroyan, Gilad Lerman, Hau-Tieng Wu
Abstract	We propose a novel mathematical framework to address the problem of automatically solving large jigsaw puzzles. This problem assumes a large image, which is cut into equal square pieces that are arbitrarily rotated and shuffled, and asks to recover the original image given the transformed pieces. The main contribution of this work is a method for recovering the rotations of the pieces when both shuffles and rotations are unknown. A major challenge of this procedure is estimating the graph connection Laplacian without the knowledge of shuffles. We guarantee some robustness of the latter estimate to measurement errors. A careful combination of our proposed method for estimating rotations with any existing method for estimating shuffles results in a practical solution for the jigsaw puzzle problem. Numerical experiments demonstrate the competitive performance of this solution.
Tasks
Published	2018-11-07
URL	https://arxiv.org/abs/1811.03188v3
PDF	https://arxiv.org/pdf/1811.03188v3.pdf
PWC	https://paperswithcode.com/paper/solving-jigsaw-puzzles-by-the-graph
Repo	https://github.com/ctralie/DynamicsSynchronization
Framework	none

Revisiting Video Saliency: A Large-scale Benchmark and a New Model


Title	Revisiting Video Saliency: A Large-scale Benchmark and a New Model
Authors	Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, Ali Borji
Abstract	In this work, we contribute to video saliency research in two ways. First, we introduce a new benchmark for predicting human eye movements during dynamic scene free-viewing, which is long-time urged in this field. Our dataset, named DHF1K (Dynamic Human Fixation), consists of 1K high-quality, elaborately selected video sequences spanning a large range of scenes, motions, object types and background complexity. Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments. In contrast, DHF1K makes a significant leap in terms of scalability, diversity and difficulty, and is expected to boost video saliency modeling. Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames. Such a design fully leverages existing large-scale static fixation datasets, avoids overfitting, and significantly improves training efficiency and testing performance. We thoroughly examine the performance of our model, with respect to state-of-the-art saliency models, on three large-scale datasets (i.e., DHF1K, Hollywood2, UCF sports). Experimental results over more than 1.2K testing videos containing 400K frames demonstrate that our model outperforms other competitors.
Tasks
Published	2018-01-23
URL	http://arxiv.org/abs/1801.07424v3
PDF	http://arxiv.org/pdf/1801.07424v3.pdf
PWC	https://paperswithcode.com/paper/revisiting-video-saliency-a-large-scale
Repo	https://github.com/wenguanwang/DHF1K
Framework	tf

Attention Based Fully Convolutional Network for Speech Emotion Recognition


Title	Attention Based Fully Convolutional Network for Speech Emotion Recognition
Authors	Yuanyuan Zhang, Jun Du, Zirui Wang, Jianshu Zhang
Abstract	Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long utterance; 3) speech data with emotional labeling is usually limited. In this paper, we present a novel attention based fully convolutional network for speech emotion recognition. We employ fully convolutional network as it is able to handle variable-length speech, free of the demand of segmentation to keep critical information not lost. The proposed attention mechanism can make our model be aware of which time-frequency region of speech spectrogram is more emotion-relevant. Considering limited data, the transfer learning is also adapted to improve the accuracy. Especially, it’s interesting to observe obvious improvement obtained with natural scene image based pre-trained model. Validated on the publicly available IEMOCAP corpus, the proposed model outperformed the state-of-the-art methods with a weighted accuracy of 70.4% and an unweighted accuracy of 63.9% respectively.
Tasks	Emotion Recognition, Speech Emotion Recognition, Transfer Learning
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01506v2
PDF	http://arxiv.org/pdf/1806.01506v2.pdf
PWC	https://paperswithcode.com/paper/attention-based-fully-convolutional-network
Repo	https://github.com/Speech-VINO/SER
Framework	pytorch

Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features


Title	Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features
Authors	Didan Deng, Yuqian Zhou, Jimin Pi, Bertram E. Shi
Abstract	The integration of information across multiple modalities and across time is a promising way to enhance the emotion recognition performance of affective systems. Much previous work has focused on instantaneous emotion recognition. The 2018 One-Minute Gradual-Emotion Recognition (OMG-Emotion) challenge, which was held in conjunction with the IEEE World Congress on Computational Intelligence, encouraged participants to address long-term emotion recognition by integrating cues from multiple modalities, including facial expression, audio and language. Intuitively, a multi-modal inference network should be able to leverage information from each modality and their correlations to improve recognition over that achievable by a single modality network. We describe here a multi-modal neural architecture that integrates visual information over time using an LSTM, and combines it with utterance level audio and text cues to recognize human sentiment from multimodal clips. Our model outperforms the unimodal baseline, achieving the concordance correlation coefficients (CCC) of 0.400 on the arousal task, and 0.353 on the valence task.
Tasks	Emotion Recognition
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00625v2
PDF	http://arxiv.org/pdf/1805.00625v2.pdf
PWC	https://paperswithcode.com/paper/multimodal-utterance-level-affect-analysis
Repo	https://github.com/toxtli/AutomEditor
Framework	none

Fast Cylinder and Plane Extraction from Depth Cameras for Visual Odometry


Title	Fast Cylinder and Plane Extraction from Depth Cameras for Visual Odometry
Authors	Pedro F. Proença, Yang Gao
Abstract	This paper presents CAPE, a method to extract planes and cylinder segments from organized point clouds, which processes 640x480 depth images on a single CPU core at an average of 300 Hz, by operating on a grid of planar cells. While, compared to state-of-the-art plane extraction, the latency of CAPE is more consistent and 4-10 times faster, depending on the scene, we also demonstrate empirically that applying CAPE to visual odometry can improve trajectory estimation on scenes made of cylindrical surfaces (e.g. tunnels), whereas using a plane extraction approach that is not curve-aware deteriorates performance on these scenes. To use these geometric primitives in visual odometry, we propose extending a probabilistic RGB-D odometry framework based on points, lines and planes to cylinder primitives. Following this framework, CAPE runs on fused depth maps and the parameters of cylinders are modelled probabilistically to account for uncertainty and weight accordingly the pose optimization residuals.
Tasks	Visual Odometry
Published	2018-03-06
URL	http://arxiv.org/abs/1803.02380v3
PDF	http://arxiv.org/pdf/1803.02380v3.pdf
PWC	https://paperswithcode.com/paper/fast-cylinder-and-plane-extraction-from-depth
Repo	https://github.com/pedropro/CAPE
Framework	none

Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond


Title	Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond
Authors	Dimitrios Kollias, Panagiotis Tzirakis, Mihalis A. Nicolaou, Athanasios Papaioannou, Guoying Zhao, Björn Schuller, Irene Kotsia, Stefanos Zafeiriou
Abstract	Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the activation of the emotion) constitute popular and effective affect representations. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge that was organized in conjunction with CVPR 2017 on the Aff-Wild database and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional & recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emotion recognition, using the RECOLA, AFEW-VA and EmotiW datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge.
Tasks	Emotion Recognition
Published	2018-04-29
URL	http://arxiv.org/abs/1804.10938v5
PDF	http://arxiv.org/pdf/1804.10938v5.pdf
PWC	https://paperswithcode.com/paper/deep-affect-prediction-in-the-wild-aff-wild
Repo	https://github.com/dkollias/Aff-Wild-models
Framework	tf

Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers


Title	Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers
Authors	Zhen He, Jian Li, Daxue Liu, Hangen He, David Barber
Abstract	Online Multi-Object Tracking (MOT) from videos is a challenging computer vision task which has been extensively studied for decades. Most of the existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm combined with popular machine learning approaches which largely reduce the human effort to tune algorithm parameters. However, the commonly used supervised learning approaches require the labeled data (e.g., bounding boxes), which is expensive for videos. Also, the TBD framework is usually suboptimal since it is not end-to-end, i.e., it considers the task as detection and tracking, but not jointly. To achieve both label-free and end-to-end learning of MOT, we propose a Tracking-by-Animation framework, where a differentiable neural model first tracks objects from input frames and then animates these objects into reconstructed frames. Learning is then driven by the reconstruction error through backpropagation. We further propose a Reprioritized Attentive Tracking to improve the robustness of data association. Experiments conducted on both synthetic and real video datasets show the potential of the proposed model. Our project page is publicly available at: https://github.com/zhen-he/tracking-by-animation
Tasks	Multi-Object Tracking, Object Tracking, Online Multi-Object Tracking
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03137v3
PDF	http://arxiv.org/pdf/1809.03137v3.pdf
PWC	https://paperswithcode.com/paper/tracking-by-animation-unsupervised-learning
Repo	https://github.com/zhen-he/tracking-by-animation
Framework	pytorch

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks


Title	Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
Authors	Arindam Das, Saikat Roy, Ujjwal Bhattacharya, Swapan Kumar Parui
Abstract	In this work, a region-based Deep Convolutional Neural Network framework is proposed for document structure learning. The contribution of this work involves efficient training of region based classifiers and effective ensembling for document image classification. A primary level of `inter-domain' transfer learning is used by exporting weights from a pre-trained VGG16 architecture on the ImageNet dataset to train a document classifier on whole document images. Exploiting the nature of region based influence modelling, a secondary level of` intra-domain’ transfer learning is used for rapid training of deep learning models for image segments. Finally, stacked generalization based ensembling is utilized for combining the predictions of the base deep neural network models. The proposed method achieves state-of-the-art accuracy of 92.2% on the popular RVL-CDIP document image dataset, exceeding benchmarks set by existing algorithms.
Tasks	Document Image Classification, Image Classification, Transfer Learning
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09321v3
PDF	http://arxiv.org/pdf/1801.09321v3.pdf
PWC	https://paperswithcode.com/paper/document-image-classification-with-intra
Repo	https://github.com/microsoft/unilm/tree/master/layoutlm
Framework	pytorch

Diversity is All You Need: Learning Skills without a Reward Function


Title	Diversity is All You Need: Learning Skills without a Reward Function
Authors	Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine
Abstract	Intelligent creatures can explore their environments and learn useful skills without supervision. In this paper, we propose DIAYN (‘Diversity is All You Need’), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. We show how pretrained skills can provide a good parameter initialization for downstream tasks, and can be composed hierarchically to solve complex, sparse reward tasks. Our results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning.
Tasks
Published	2018-02-16
URL	http://arxiv.org/abs/1802.06070v6
PDF	http://arxiv.org/pdf/1802.06070v6.pdf
PWC	https://paperswithcode.com/paper/diversity-is-all-you-need-learning-skills
Repo	https://github.com/navneet-nmk/Hierarchical-Meta-Reinforcement-Learning
Framework	pytorch

Bidirectional Learning for Robust Neural Networks


Title	Bidirectional Learning for Robust Neural Networks
Authors	Sidney Pontes-Filho, Marcus Liwicki
Abstract	A multilayer perceptron can behave as a generative classifier by applying bidirectional learning (BL). It consists of training an undirected neural network to map input to output and vice-versa; therefore it can produce a classifier in one direction, and a generator in the opposite direction for the same data. The learning process of BL tries to reproduce the neuroplasticity stated in Hebbian theory using only backward propagation of errors. In this paper, two novel learning techniques are introduced which use BL for improving robustness to white noise static and adversarial examples. The first method is bidirectional propagation of errors, which the error propagation occurs in backward and forward directions. Motivated by the fact that its generative model receives as input a constant vector per class, we introduce as a second method the hybrid adversarial networks (HAN). Its generative model receives a random vector as input and its training is based on generative adversarial networks (GAN). To assess the performance of BL, we perform experiments using several architectures with fully and convolutional layers, with and without bias. Experimental results show that both methods improve robustness to white noise static and adversarial examples, and even increase accuracy, but have different behavior depending on the architecture and task, being more beneficial to use the one or the other. Nevertheless, HAN using a convolutional architecture with batch normalization presents outstanding robustness, reaching state-of-the-art accuracy on adversarial examples of hand-written digits.
Tasks
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08006v2
PDF	http://arxiv.org/pdf/1805.08006v2.pdf
PWC	https://paperswithcode.com/paper/bidirectional-learning-for-robust-neural
Repo	https://github.com/sidneyp/bidirectional
Framework	tf

Stochastic Answer Networks for SQuAD 2.0


Title	Stochastic Answer Networks for SQuAD 2.0
Authors	Xiaodong Liu, Wei Li, Yuwei Fang, Aerin Kim, Kevin Duh, Jianfeng Gao
Abstract	This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not. The extended SAN contains two components: a span detector and a binary classifier for judging whether the question is unanswerable, and both components are jointly optimized. Experiments show that SAN achieves the results competitive to the state-of-the-art on Stanford Question Answering Dataset (SQuAD) 2.0. To facilitate the research on this field, we release our code: https://github.com/kevinduh/san_mrc.
Tasks	Machine Reading Comprehension, Question Answering, Reading Comprehension
Published	2018-09-24
URL	http://arxiv.org/abs/1809.09194v1
PDF	http://arxiv.org/pdf/1809.09194v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-answer-networks-for-squad-20
Repo	https://github.com/kevinduh/san_mrc
Framework	pytorch

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models


Title	Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Authors	Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alan Aspuru-Guzik, Alex Zhavoronkov
Abstract	Deep generative models such as generative adversarial networks, variational autoencoders, and autoregressive models are rapidly growing in popularity for the discovery of new molecules and materials. In this work, we introduce MOlecular SEtS (MOSES), a benchmarking platform to support research on machine learning for drug discovery. MOSES implements several popular molecular generation models and includes a set of metrics that evaluate the diversity and quality of generated molecules. MOSES is meant to standardize the research on molecular generation and facilitate the sharing and comparison of new models. Additionally, we provide a large-scale comparison of existing state of the art models and elaborate on current challenges for generative models that might prove fertile ground for new research. Our platform and source code are freely available at https://github.com/molecularsets/moses.
Tasks	Drug Discovery
Published	2018-11-29
URL	https://arxiv.org/abs/1811.12823v3
PDF	https://arxiv.org/pdf/1811.12823v3.pdf
PWC	https://paperswithcode.com/paper/molecular-sets-moses-a-benchmarking-platform
Repo	https://github.com/aclyde11/RNNGenerator
Framework	pytorch