Paper Group AWR 198
Dirichlet belief networks for topic structure learning. Confidence from Invariance to Image Transformations. PyText: A Seamless Path from NLP research to production. Solving Jigsaw Puzzles By the Graph Connection Laplacian. Revisiting Video Saliency: A Large-scale Benchmark and a New Model. Attention Based Fully Convolutional Network for Speech Emo …
Dirichlet belief networks for topic structure learning
Title | Dirichlet belief networks for topic structure learning |
Authors | He Zhao, Lan Du, Wray Buntine, Mingyuan Zhou |
Abstract | Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model. |
Tasks | Topic Models |
Published | 2018-11-02 |
URL | http://arxiv.org/abs/1811.00717v1 |
http://arxiv.org/pdf/1811.00717v1.pdf | |
PWC | https://paperswithcode.com/paper/dirichlet-belief-networks-for-topic-structure |
Repo | https://github.com/ethanhezhao/DirBN |
Framework | none |
Confidence from Invariance to Image Transformations
Title | Confidence from Invariance to Image Transformations |
Authors | Yuval Bahat, Gregory Shakhnarovich |
Abstract | We develop a technique for automatically detecting the classification errors of a pre-trained visual classifier. Our method is agnostic to the form of the classifier, requiring access only to classifier responses to a set of inputs. We train a parametric binary classifier (error/correct) on a representation derived from a set of classifier responses generated from multiple copies of the same input, each subject to a different natural image transformation. Thus, we establish a measure of confidence in classifier’s decision by analyzing the invariance of its decision under various transformations. In experiments with multiple data sets (STL-10,CIFAR-100,ImageNet) and classifiers, we demonstrate new state of the art for the error detection task. In addition, we apply our technique to novelty detection scenarios, where we also demonstrate state of the art results. |
Tasks | |
Published | 2018-04-02 |
URL | http://arxiv.org/abs/1804.00657v1 |
http://arxiv.org/pdf/1804.00657v1.pdf | |
PWC | https://paperswithcode.com/paper/confidence-from-invariance-to-image |
Repo | https://github.com/YuvalBahat/Confidence_From_Invariance |
Framework | tf |
PyText: A Seamless Path from NLP research to production
Title | PyText: A Seamless Path from NLP research to production |
Authors | Ahmed Aly, Kushal Lakhotia, Shicong Zhao, Mrinal Mohit, Barlas Oguz, Abhinav Arora, Sonal Gupta, Christopher Dewan, Stef Nelson-Lindall, Rushin Shah |
Abstract | We introduce PyText - a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We report our own experience of migrating experimentation and production workflows to PyText, which enabled us to iterate faster on novel modeling ideas and then seamlessly ship them at industrial scale. |
Tasks | |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.08729v1 |
http://arxiv.org/pdf/1812.08729v1.pdf | |
PWC | https://paperswithcode.com/paper/pytext-a-seamless-path-from-nlp-research-to |
Repo | https://github.com/AMinerOpen/pytext_clf |
Framework | none |
Solving Jigsaw Puzzles By the Graph Connection Laplacian
Title | Solving Jigsaw Puzzles By the Graph Connection Laplacian |
Authors | Vahan Huroyan, Gilad Lerman, Hau-Tieng Wu |
Abstract | We propose a novel mathematical framework to address the problem of automatically solving large jigsaw puzzles. This problem assumes a large image, which is cut into equal square pieces that are arbitrarily rotated and shuffled, and asks to recover the original image given the transformed pieces. The main contribution of this work is a method for recovering the rotations of the pieces when both shuffles and rotations are unknown. A major challenge of this procedure is estimating the graph connection Laplacian without the knowledge of shuffles. We guarantee some robustness of the latter estimate to measurement errors. A careful combination of our proposed method for estimating rotations with any existing method for estimating shuffles results in a practical solution for the jigsaw puzzle problem. Numerical experiments demonstrate the competitive performance of this solution. |
Tasks | |
Published | 2018-11-07 |
URL | https://arxiv.org/abs/1811.03188v3 |
https://arxiv.org/pdf/1811.03188v3.pdf | |
PWC | https://paperswithcode.com/paper/solving-jigsaw-puzzles-by-the-graph |
Repo | https://github.com/ctralie/DynamicsSynchronization |
Framework | none |
Revisiting Video Saliency: A Large-scale Benchmark and a New Model
Title | Revisiting Video Saliency: A Large-scale Benchmark and a New Model |
Authors | Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, Ali Borji |
Abstract | In this work, we contribute to video saliency research in two ways. First, we introduce a new benchmark for predicting human eye movements during dynamic scene free-viewing, which is long-time urged in this field. Our dataset, named DHF1K (Dynamic Human Fixation), consists of 1K high-quality, elaborately selected video sequences spanning a large range of scenes, motions, object types and background complexity. Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments. In contrast, DHF1K makes a significant leap in terms of scalability, diversity and difficulty, and is expected to boost video saliency modeling. Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames. Such a design fully leverages existing large-scale static fixation datasets, avoids overfitting, and significantly improves training efficiency and testing performance. We thoroughly examine the performance of our model, with respect to state-of-the-art saliency models, on three large-scale datasets (i.e., DHF1K, Hollywood2, UCF sports). Experimental results over more than 1.2K testing videos containing 400K frames demonstrate that our model outperforms other competitors. |
Tasks | |
Published | 2018-01-23 |
URL | http://arxiv.org/abs/1801.07424v3 |
http://arxiv.org/pdf/1801.07424v3.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-video-saliency-a-large-scale |
Repo | https://github.com/wenguanwang/DHF1K |
Framework | tf |
Attention Based Fully Convolutional Network for Speech Emotion Recognition
Title | Attention Based Fully Convolutional Network for Speech Emotion Recognition |
Authors | Yuanyuan Zhang, Jun Du, Zirui Wang, Jianshu Zhang |
Abstract | Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long utterance; 3) speech data with emotional labeling is usually limited. In this paper, we present a novel attention based fully convolutional network for speech emotion recognition. We employ fully convolutional network as it is able to handle variable-length speech, free of the demand of segmentation to keep critical information not lost. The proposed attention mechanism can make our model be aware of which time-frequency region of speech spectrogram is more emotion-relevant. Considering limited data, the transfer learning is also adapted to improve the accuracy. Especially, it’s interesting to observe obvious improvement obtained with natural scene image based pre-trained model. Validated on the publicly available IEMOCAP corpus, the proposed model outperformed the state-of-the-art methods with a weighted accuracy of 70.4% and an unweighted accuracy of 63.9% respectively. |
Tasks | Emotion Recognition, Speech Emotion Recognition, Transfer Learning |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01506v2 |
http://arxiv.org/pdf/1806.01506v2.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-fully-convolutional-network |
Repo | https://github.com/Speech-VINO/SER |
Framework | pytorch |
Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features
Title | Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features |
Authors | Didan Deng, Yuqian Zhou, Jimin Pi, Bertram E. Shi |
Abstract | The integration of information across multiple modalities and across time is a promising way to enhance the emotion recognition performance of affective systems. Much previous work has focused on instantaneous emotion recognition. The 2018 One-Minute Gradual-Emotion Recognition (OMG-Emotion) challenge, which was held in conjunction with the IEEE World Congress on Computational Intelligence, encouraged participants to address long-term emotion recognition by integrating cues from multiple modalities, including facial expression, audio and language. Intuitively, a multi-modal inference network should be able to leverage information from each modality and their correlations to improve recognition over that achievable by a single modality network. We describe here a multi-modal neural architecture that integrates visual information over time using an LSTM, and combines it with utterance level audio and text cues to recognize human sentiment from multimodal clips. Our model outperforms the unimodal baseline, achieving the concordance correlation coefficients (CCC) of 0.400 on the arousal task, and 0.353 on the valence task. |
Tasks | Emotion Recognition |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00625v2 |
http://arxiv.org/pdf/1805.00625v2.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-utterance-level-affect-analysis |
Repo | https://github.com/toxtli/AutomEditor |
Framework | none |
Fast Cylinder and Plane Extraction from Depth Cameras for Visual Odometry
Title | Fast Cylinder and Plane Extraction from Depth Cameras for Visual Odometry |
Authors | Pedro F. Proença, Yang Gao |
Abstract | This paper presents CAPE, a method to extract planes and cylinder segments from organized point clouds, which processes 640x480 depth images on a single CPU core at an average of 300 Hz, by operating on a grid of planar cells. While, compared to state-of-the-art plane extraction, the latency of CAPE is more consistent and 4-10 times faster, depending on the scene, we also demonstrate empirically that applying CAPE to visual odometry can improve trajectory estimation on scenes made of cylindrical surfaces (e.g. tunnels), whereas using a plane extraction approach that is not curve-aware deteriorates performance on these scenes. To use these geometric primitives in visual odometry, we propose extending a probabilistic RGB-D odometry framework based on points, lines and planes to cylinder primitives. Following this framework, CAPE runs on fused depth maps and the parameters of cylinders are modelled probabilistically to account for uncertainty and weight accordingly the pose optimization residuals. |
Tasks | Visual Odometry |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02380v3 |
http://arxiv.org/pdf/1803.02380v3.pdf | |
PWC | https://paperswithcode.com/paper/fast-cylinder-and-plane-extraction-from-depth |
Repo | https://github.com/pedropro/CAPE |
Framework | none |
Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond
Title | Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond |
Authors | Dimitrios Kollias, Panagiotis Tzirakis, Mihalis A. Nicolaou, Athanasios Papaioannou, Guoying Zhao, Björn Schuller, Irene Kotsia, Stefanos Zafeiriou |
Abstract | Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the activation of the emotion) constitute popular and effective affect representations. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge that was organized in conjunction with CVPR 2017 on the Aff-Wild database and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional & recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emotion recognition, using the RECOLA, AFEW-VA and EmotiW datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge. |
Tasks | Emotion Recognition |
Published | 2018-04-29 |
URL | http://arxiv.org/abs/1804.10938v5 |
http://arxiv.org/pdf/1804.10938v5.pdf | |
PWC | https://paperswithcode.com/paper/deep-affect-prediction-in-the-wild-aff-wild |
Repo | https://github.com/dkollias/Aff-Wild-models |
Framework | tf |
Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers
Title | Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers |
Authors | Zhen He, Jian Li, Daxue Liu, Hangen He, David Barber |
Abstract | Online Multi-Object Tracking (MOT) from videos is a challenging computer vision task which has been extensively studied for decades. Most of the existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm combined with popular machine learning approaches which largely reduce the human effort to tune algorithm parameters. However, the commonly used supervised learning approaches require the labeled data (e.g., bounding boxes), which is expensive for videos. Also, the TBD framework is usually suboptimal since it is not end-to-end, i.e., it considers the task as detection and tracking, but not jointly. To achieve both label-free and end-to-end learning of MOT, we propose a Tracking-by-Animation framework, where a differentiable neural model first tracks objects from input frames and then animates these objects into reconstructed frames. Learning is then driven by the reconstruction error through backpropagation. We further propose a Reprioritized Attentive Tracking to improve the robustness of data association. Experiments conducted on both synthetic and real video datasets show the potential of the proposed model. Our project page is publicly available at: https://github.com/zhen-he/tracking-by-animation |
Tasks | Multi-Object Tracking, Object Tracking, Online Multi-Object Tracking |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03137v3 |
http://arxiv.org/pdf/1809.03137v3.pdf | |
PWC | https://paperswithcode.com/paper/tracking-by-animation-unsupervised-learning |
Repo | https://github.com/zhen-he/tracking-by-animation |
Framework | pytorch |
Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks
Title | Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks |
Authors | Arindam Das, Saikat Roy, Ujjwal Bhattacharya, Swapan Kumar Parui |
Abstract | In this work, a region-based Deep Convolutional Neural Network framework is proposed for document structure learning. The contribution of this work involves efficient training of region based classifiers and effective ensembling for document image classification. A primary level of inter-domain' transfer learning is used by exporting weights from a pre-trained VGG16 architecture on the ImageNet dataset to train a document classifier on whole document images. Exploiting the nature of region based influence modelling, a secondary level of intra-domain’ transfer learning is used for rapid training of deep learning models for image segments. Finally, stacked generalization based ensembling is utilized for combining the predictions of the base deep neural network models. The proposed method achieves state-of-the-art accuracy of 92.2% on the popular RVL-CDIP document image dataset, exceeding benchmarks set by existing algorithms. |
Tasks | Document Image Classification, Image Classification, Transfer Learning |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09321v3 |
http://arxiv.org/pdf/1801.09321v3.pdf | |
PWC | https://paperswithcode.com/paper/document-image-classification-with-intra |
Repo | https://github.com/microsoft/unilm/tree/master/layoutlm |
Framework | pytorch |
Diversity is All You Need: Learning Skills without a Reward Function
Title | Diversity is All You Need: Learning Skills without a Reward Function |
Authors | Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine |
Abstract | Intelligent creatures can explore their environments and learn useful skills without supervision. In this paper, we propose DIAYN (‘Diversity is All You Need’), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. We show how pretrained skills can provide a good parameter initialization for downstream tasks, and can be composed hierarchically to solve complex, sparse reward tasks. Our results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning. |
Tasks | |
Published | 2018-02-16 |
URL | http://arxiv.org/abs/1802.06070v6 |
http://arxiv.org/pdf/1802.06070v6.pdf | |
PWC | https://paperswithcode.com/paper/diversity-is-all-you-need-learning-skills |
Repo | https://github.com/navneet-nmk/Hierarchical-Meta-Reinforcement-Learning |
Framework | pytorch |
Bidirectional Learning for Robust Neural Networks
Title | Bidirectional Learning for Robust Neural Networks |
Authors | Sidney Pontes-Filho, Marcus Liwicki |
Abstract | A multilayer perceptron can behave as a generative classifier by applying bidirectional learning (BL). It consists of training an undirected neural network to map input to output and vice-versa; therefore it can produce a classifier in one direction, and a generator in the opposite direction for the same data. The learning process of BL tries to reproduce the neuroplasticity stated in Hebbian theory using only backward propagation of errors. In this paper, two novel learning techniques are introduced which use BL for improving robustness to white noise static and adversarial examples. The first method is bidirectional propagation of errors, which the error propagation occurs in backward and forward directions. Motivated by the fact that its generative model receives as input a constant vector per class, we introduce as a second method the hybrid adversarial networks (HAN). Its generative model receives a random vector as input and its training is based on generative adversarial networks (GAN). To assess the performance of BL, we perform experiments using several architectures with fully and convolutional layers, with and without bias. Experimental results show that both methods improve robustness to white noise static and adversarial examples, and even increase accuracy, but have different behavior depending on the architecture and task, being more beneficial to use the one or the other. Nevertheless, HAN using a convolutional architecture with batch normalization presents outstanding robustness, reaching state-of-the-art accuracy on adversarial examples of hand-written digits. |
Tasks | |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08006v2 |
http://arxiv.org/pdf/1805.08006v2.pdf | |
PWC | https://paperswithcode.com/paper/bidirectional-learning-for-robust-neural |
Repo | https://github.com/sidneyp/bidirectional |
Framework | tf |
Stochastic Answer Networks for SQuAD 2.0
Title | Stochastic Answer Networks for SQuAD 2.0 |
Authors | Xiaodong Liu, Wei Li, Yuwei Fang, Aerin Kim, Kevin Duh, Jianfeng Gao |
Abstract | This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not. The extended SAN contains two components: a span detector and a binary classifier for judging whether the question is unanswerable, and both components are jointly optimized. Experiments show that SAN achieves the results competitive to the state-of-the-art on Stanford Question Answering Dataset (SQuAD) 2.0. To facilitate the research on this field, we release our code: https://github.com/kevinduh/san_mrc. |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.09194v1 |
http://arxiv.org/pdf/1809.09194v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-answer-networks-for-squad-20 |
Repo | https://github.com/kevinduh/san_mrc |
Framework | pytorch |
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Title | Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models |
Authors | Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alan Aspuru-Guzik, Alex Zhavoronkov |
Abstract | Deep generative models such as generative adversarial networks, variational autoencoders, and autoregressive models are rapidly growing in popularity for the discovery of new molecules and materials. In this work, we introduce MOlecular SEtS (MOSES), a benchmarking platform to support research on machine learning for drug discovery. MOSES implements several popular molecular generation models and includes a set of metrics that evaluate the diversity and quality of generated molecules. MOSES is meant to standardize the research on molecular generation and facilitate the sharing and comparison of new models. Additionally, we provide a large-scale comparison of existing state of the art models and elaborate on current challenges for generative models that might prove fertile ground for new research. Our platform and source code are freely available at https://github.com/molecularsets/moses. |
Tasks | Drug Discovery |
Published | 2018-11-29 |
URL | https://arxiv.org/abs/1811.12823v3 |
https://arxiv.org/pdf/1811.12823v3.pdf | |
PWC | https://paperswithcode.com/paper/molecular-sets-moses-a-benchmarking-platform |
Repo | https://github.com/aclyde11/RNNGenerator |
Framework | pytorch |