October 20, 2019

3112 words 15 mins read

Paper Group AWR 198

Paper Group AWR 198

Dirichlet belief networks for topic structure learning. Confidence from Invariance to Image Transformations. PyText: A Seamless Path from NLP research to production. Solving Jigsaw Puzzles By the Graph Connection Laplacian. Revisiting Video Saliency: A Large-scale Benchmark and a New Model. Attention Based Fully Convolutional Network for Speech Emo …

Dirichlet belief networks for topic structure learning

Title Dirichlet belief networks for topic structure learning
Authors He Zhao, Lan Du, Wray Buntine, Mingyuan Zhou
Abstract Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model.
Tasks Topic Models
Published 2018-11-02
URL http://arxiv.org/abs/1811.00717v1
PDF http://arxiv.org/pdf/1811.00717v1.pdf
PWC https://paperswithcode.com/paper/dirichlet-belief-networks-for-topic-structure
Repo https://github.com/ethanhezhao/DirBN
Framework none

Confidence from Invariance to Image Transformations

Title Confidence from Invariance to Image Transformations
Authors Yuval Bahat, Gregory Shakhnarovich
Abstract We develop a technique for automatically detecting the classification errors of a pre-trained visual classifier. Our method is agnostic to the form of the classifier, requiring access only to classifier responses to a set of inputs. We train a parametric binary classifier (error/correct) on a representation derived from a set of classifier responses generated from multiple copies of the same input, each subject to a different natural image transformation. Thus, we establish a measure of confidence in classifier’s decision by analyzing the invariance of its decision under various transformations. In experiments with multiple data sets (STL-10,CIFAR-100,ImageNet) and classifiers, we demonstrate new state of the art for the error detection task. In addition, we apply our technique to novelty detection scenarios, where we also demonstrate state of the art results.
Tasks
Published 2018-04-02
URL http://arxiv.org/abs/1804.00657v1
PDF http://arxiv.org/pdf/1804.00657v1.pdf
PWC https://paperswithcode.com/paper/confidence-from-invariance-to-image
Repo https://github.com/YuvalBahat/Confidence_From_Invariance
Framework tf

PyText: A Seamless Path from NLP research to production

Title PyText: A Seamless Path from NLP research to production
Authors Ahmed Aly, Kushal Lakhotia, Shicong Zhao, Mrinal Mohit, Barlas Oguz, Abhinav Arora, Sonal Gupta, Christopher Dewan, Stef Nelson-Lindall, Rushin Shah
Abstract We introduce PyText - a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We report our own experience of migrating experimentation and production workflows to PyText, which enabled us to iterate faster on novel modeling ideas and then seamlessly ship them at industrial scale.
Tasks
Published 2018-12-12
URL http://arxiv.org/abs/1812.08729v1
PDF http://arxiv.org/pdf/1812.08729v1.pdf
PWC https://paperswithcode.com/paper/pytext-a-seamless-path-from-nlp-research-to
Repo https://github.com/AMinerOpen/pytext_clf
Framework none

Solving Jigsaw Puzzles By the Graph Connection Laplacian

Title Solving Jigsaw Puzzles By the Graph Connection Laplacian
Authors Vahan Huroyan, Gilad Lerman, Hau-Tieng Wu
Abstract We propose a novel mathematical framework to address the problem of automatically solving large jigsaw puzzles. This problem assumes a large image, which is cut into equal square pieces that are arbitrarily rotated and shuffled, and asks to recover the original image given the transformed pieces. The main contribution of this work is a method for recovering the rotations of the pieces when both shuffles and rotations are unknown. A major challenge of this procedure is estimating the graph connection Laplacian without the knowledge of shuffles. We guarantee some robustness of the latter estimate to measurement errors. A careful combination of our proposed method for estimating rotations with any existing method for estimating shuffles results in a practical solution for the jigsaw puzzle problem. Numerical experiments demonstrate the competitive performance of this solution.
Tasks
Published 2018-11-07
URL https://arxiv.org/abs/1811.03188v3
PDF https://arxiv.org/pdf/1811.03188v3.pdf
PWC https://paperswithcode.com/paper/solving-jigsaw-puzzles-by-the-graph
Repo https://github.com/ctralie/DynamicsSynchronization
Framework none

Revisiting Video Saliency: A Large-scale Benchmark and a New Model

Title Revisiting Video Saliency: A Large-scale Benchmark and a New Model
Authors Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, Ali Borji
Abstract In this work, we contribute to video saliency research in two ways. First, we introduce a new benchmark for predicting human eye movements during dynamic scene free-viewing, which is long-time urged in this field. Our dataset, named DHF1K (Dynamic Human Fixation), consists of 1K high-quality, elaborately selected video sequences spanning a large range of scenes, motions, object types and background complexity. Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments. In contrast, DHF1K makes a significant leap in terms of scalability, diversity and difficulty, and is expected to boost video saliency modeling. Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames. Such a design fully leverages existing large-scale static fixation datasets, avoids overfitting, and significantly improves training efficiency and testing performance. We thoroughly examine the performance of our model, with respect to state-of-the-art saliency models, on three large-scale datasets (i.e., DHF1K, Hollywood2, UCF sports). Experimental results over more than 1.2K testing videos containing 400K frames demonstrate that our model outperforms other competitors.
Tasks
Published 2018-01-23
URL http://arxiv.org/abs/1801.07424v3
PDF http://arxiv.org/pdf/1801.07424v3.pdf
PWC https://paperswithcode.com/paper/revisiting-video-saliency-a-large-scale
Repo https://github.com/wenguanwang/DHF1K
Framework tf

Attention Based Fully Convolutional Network for Speech Emotion Recognition

Title Attention Based Fully Convolutional Network for Speech Emotion Recognition
Authors Yuanyuan Zhang, Jun Du, Zirui Wang, Jianshu Zhang
Abstract Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long utterance; 3) speech data with emotional labeling is usually limited. In this paper, we present a novel attention based fully convolutional network for speech emotion recognition. We employ fully convolutional network as it is able to handle variable-length speech, free of the demand of segmentation to keep critical information not lost. The proposed attention mechanism can make our model be aware of which time-frequency region of speech spectrogram is more emotion-relevant. Considering limited data, the transfer learning is also adapted to improve the accuracy. Especially, it’s interesting to observe obvious improvement obtained with natural scene image based pre-trained model. Validated on the publicly available IEMOCAP corpus, the proposed model outperformed the state-of-the-art methods with a weighted accuracy of 70.4% and an unweighted accuracy of 63.9% respectively.
Tasks Emotion Recognition, Speech Emotion Recognition, Transfer Learning
Published 2018-06-05
URL http://arxiv.org/abs/1806.01506v2
PDF http://arxiv.org/pdf/1806.01506v2.pdf
PWC https://paperswithcode.com/paper/attention-based-fully-convolutional-network
Repo https://github.com/Speech-VINO/SER
Framework pytorch

Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features

Title Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features
Authors Didan Deng, Yuqian Zhou, Jimin Pi, Bertram E. Shi
Abstract The integration of information across multiple modalities and across time is a promising way to enhance the emotion recognition performance of affective systems. Much previous work has focused on instantaneous emotion recognition. The 2018 One-Minute Gradual-Emotion Recognition (OMG-Emotion) challenge, which was held in conjunction with the IEEE World Congress on Computational Intelligence, encouraged participants to address long-term emotion recognition by integrating cues from multiple modalities, including facial expression, audio and language. Intuitively, a multi-modal inference network should be able to leverage information from each modality and their correlations to improve recognition over that achievable by a single modality network. We describe here a multi-modal neural architecture that integrates visual information over time using an LSTM, and combines it with utterance level audio and text cues to recognize human sentiment from multimodal clips. Our model outperforms the unimodal baseline, achieving the concordance correlation coefficients (CCC) of 0.400 on the arousal task, and 0.353 on the valence task.
Tasks Emotion Recognition
Published 2018-05-02
URL http://arxiv.org/abs/1805.00625v2
PDF http://arxiv.org/pdf/1805.00625v2.pdf
PWC https://paperswithcode.com/paper/multimodal-utterance-level-affect-analysis
Repo https://github.com/toxtli/AutomEditor
Framework none

Fast Cylinder and Plane Extraction from Depth Cameras for Visual Odometry

Title Fast Cylinder and Plane Extraction from Depth Cameras for Visual Odometry
Authors Pedro F. Proença, Yang Gao
Abstract This paper presents CAPE, a method to extract planes and cylinder segments from organized point clouds, which processes 640x480 depth images on a single CPU core at an average of 300 Hz, by operating on a grid of planar cells. While, compared to state-of-the-art plane extraction, the latency of CAPE is more consistent and 4-10 times faster, depending on the scene, we also demonstrate empirically that applying CAPE to visual odometry can improve trajectory estimation on scenes made of cylindrical surfaces (e.g. tunnels), whereas using a plane extraction approach that is not curve-aware deteriorates performance on these scenes. To use these geometric primitives in visual odometry, we propose extending a probabilistic RGB-D odometry framework based on points, lines and planes to cylinder primitives. Following this framework, CAPE runs on fused depth maps and the parameters of cylinders are modelled probabilistically to account for uncertainty and weight accordingly the pose optimization residuals.
Tasks Visual Odometry
Published 2018-03-06
URL http://arxiv.org/abs/1803.02380v3
PDF http://arxiv.org/pdf/1803.02380v3.pdf
PWC https://paperswithcode.com/paper/fast-cylinder-and-plane-extraction-from-depth
Repo https://github.com/pedropro/CAPE
Framework none

Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond

Title Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond
Authors Dimitrios Kollias, Panagiotis Tzirakis, Mihalis A. Nicolaou, Athanasios Papaioannou, Guoying Zhao, Björn Schuller, Irene Kotsia, Stefanos Zafeiriou
Abstract Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the activation of the emotion) constitute popular and effective affect representations. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge that was organized in conjunction with CVPR 2017 on the Aff-Wild database and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional & recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emotion recognition, using the RECOLA, AFEW-VA and EmotiW datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge.
Tasks Emotion Recognition
Published 2018-04-29
URL http://arxiv.org/abs/1804.10938v5
PDF http://arxiv.org/pdf/1804.10938v5.pdf
PWC https://paperswithcode.com/paper/deep-affect-prediction-in-the-wild-aff-wild
Repo https://github.com/dkollias/Aff-Wild-models
Framework tf

Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

Title Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers
Authors Zhen He, Jian Li, Daxue Liu, Hangen He, David Barber
Abstract Online Multi-Object Tracking (MOT) from videos is a challenging computer vision task which has been extensively studied for decades. Most of the existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm combined with popular machine learning approaches which largely reduce the human effort to tune algorithm parameters. However, the commonly used supervised learning approaches require the labeled data (e.g., bounding boxes), which is expensive for videos. Also, the TBD framework is usually suboptimal since it is not end-to-end, i.e., it considers the task as detection and tracking, but not jointly. To achieve both label-free and end-to-end learning of MOT, we propose a Tracking-by-Animation framework, where a differentiable neural model first tracks objects from input frames and then animates these objects into reconstructed frames. Learning is then driven by the reconstruction error through backpropagation. We further propose a Reprioritized Attentive Tracking to improve the robustness of data association. Experiments conducted on both synthetic and real video datasets show the potential of the proposed model. Our project page is publicly available at: https://github.com/zhen-he/tracking-by-animation
Tasks Multi-Object Tracking, Object Tracking, Online Multi-Object Tracking
Published 2018-09-10
URL http://arxiv.org/abs/1809.03137v3
PDF http://arxiv.org/pdf/1809.03137v3.pdf
PWC https://paperswithcode.com/paper/tracking-by-animation-unsupervised-learning
Repo https://github.com/zhen-he/tracking-by-animation
Framework pytorch

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

Title Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
Authors Arindam Das, Saikat Roy, Ujjwal Bhattacharya, Swapan Kumar Parui
Abstract In this work, a region-based Deep Convolutional Neural Network framework is proposed for document structure learning. The contribution of this work involves efficient training of region based classifiers and effective ensembling for document image classification. A primary level of inter-domain' transfer learning is used by exporting weights from a pre-trained VGG16 architecture on the ImageNet dataset to train a document classifier on whole document images. Exploiting the nature of region based influence modelling, a secondary level of intra-domain’ transfer learning is used for rapid training of deep learning models for image segments. Finally, stacked generalization based ensembling is utilized for combining the predictions of the base deep neural network models. The proposed method achieves state-of-the-art accuracy of 92.2% on the popular RVL-CDIP document image dataset, exceeding benchmarks set by existing algorithms.
Tasks Document Image Classification, Image Classification, Transfer Learning
Published 2018-01-29
URL http://arxiv.org/abs/1801.09321v3
PDF http://arxiv.org/pdf/1801.09321v3.pdf
PWC https://paperswithcode.com/paper/document-image-classification-with-intra
Repo https://github.com/microsoft/unilm/tree/master/layoutlm
Framework pytorch

Diversity is All You Need: Learning Skills without a Reward Function

Title Diversity is All You Need: Learning Skills without a Reward Function
Authors Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine
Abstract Intelligent creatures can explore their environments and learn useful skills without supervision. In this paper, we propose DIAYN (‘Diversity is All You Need’), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. We show how pretrained skills can provide a good parameter initialization for downstream tasks, and can be composed hierarchically to solve complex, sparse reward tasks. Our results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning.
Tasks
Published 2018-02-16
URL http://arxiv.org/abs/1802.06070v6
PDF http://arxiv.org/pdf/1802.06070v6.pdf
PWC https://paperswithcode.com/paper/diversity-is-all-you-need-learning-skills
Repo https://github.com/navneet-nmk/Hierarchical-Meta-Reinforcement-Learning
Framework pytorch

Bidirectional Learning for Robust Neural Networks

Title Bidirectional Learning for Robust Neural Networks
Authors Sidney Pontes-Filho, Marcus Liwicki
Abstract A multilayer perceptron can behave as a generative classifier by applying bidirectional learning (BL). It consists of training an undirected neural network to map input to output and vice-versa; therefore it can produce a classifier in one direction, and a generator in the opposite direction for the same data. The learning process of BL tries to reproduce the neuroplasticity stated in Hebbian theory using only backward propagation of errors. In this paper, two novel learning techniques are introduced which use BL for improving robustness to white noise static and adversarial examples. The first method is bidirectional propagation of errors, which the error propagation occurs in backward and forward directions. Motivated by the fact that its generative model receives as input a constant vector per class, we introduce as a second method the hybrid adversarial networks (HAN). Its generative model receives a random vector as input and its training is based on generative adversarial networks (GAN). To assess the performance of BL, we perform experiments using several architectures with fully and convolutional layers, with and without bias. Experimental results show that both methods improve robustness to white noise static and adversarial examples, and even increase accuracy, but have different behavior depending on the architecture and task, being more beneficial to use the one or the other. Nevertheless, HAN using a convolutional architecture with batch normalization presents outstanding robustness, reaching state-of-the-art accuracy on adversarial examples of hand-written digits.
Tasks
Published 2018-05-21
URL http://arxiv.org/abs/1805.08006v2
PDF http://arxiv.org/pdf/1805.08006v2.pdf
PWC https://paperswithcode.com/paper/bidirectional-learning-for-robust-neural
Repo https://github.com/sidneyp/bidirectional
Framework tf

Stochastic Answer Networks for SQuAD 2.0

Title Stochastic Answer Networks for SQuAD 2.0
Authors Xiaodong Liu, Wei Li, Yuwei Fang, Aerin Kim, Kevin Duh, Jianfeng Gao
Abstract This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not. The extended SAN contains two components: a span detector and a binary classifier for judging whether the question is unanswerable, and both components are jointly optimized. Experiments show that SAN achieves the results competitive to the state-of-the-art on Stanford Question Answering Dataset (SQuAD) 2.0. To facilitate the research on this field, we release our code: https://github.com/kevinduh/san_mrc.
Tasks Machine Reading Comprehension, Question Answering, Reading Comprehension
Published 2018-09-24
URL http://arxiv.org/abs/1809.09194v1
PDF http://arxiv.org/pdf/1809.09194v1.pdf
PWC https://paperswithcode.com/paper/stochastic-answer-networks-for-squad-20
Repo https://github.com/kevinduh/san_mrc
Framework pytorch

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Title Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Authors Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alan Aspuru-Guzik, Alex Zhavoronkov
Abstract Deep generative models such as generative adversarial networks, variational autoencoders, and autoregressive models are rapidly growing in popularity for the discovery of new molecules and materials. In this work, we introduce MOlecular SEtS (MOSES), a benchmarking platform to support research on machine learning for drug discovery. MOSES implements several popular molecular generation models and includes a set of metrics that evaluate the diversity and quality of generated molecules. MOSES is meant to standardize the research on molecular generation and facilitate the sharing and comparison of new models. Additionally, we provide a large-scale comparison of existing state of the art models and elaborate on current challenges for generative models that might prove fertile ground for new research. Our platform and source code are freely available at https://github.com/molecularsets/moses.
Tasks Drug Discovery
Published 2018-11-29
URL https://arxiv.org/abs/1811.12823v3
PDF https://arxiv.org/pdf/1811.12823v3.pdf
PWC https://paperswithcode.com/paper/molecular-sets-moses-a-benchmarking-platform
Repo https://github.com/aclyde11/RNNGenerator
Framework pytorch
comments powered by Disqus