October 20, 2019

3053 words 15 mins read

Paper Group AWR 335

Learning a Neural-network-based Representation for Open Set Recognition. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. A Neural Temporal Model for Human Motion Prediction. Neural Program Synthesis with Priority Queue Training. Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirect …

Learning a Neural-network-based Representation for Open Set Recognition


Title	Learning a Neural-network-based Representation for Open Set Recognition
Authors	Mehadi Hassen, Philip K. Chan
Abstract	Open set recognition problems exist in many domains. For example in security, new malware classes emerge regularly; therefore malware classification systems need to identify instances from unknown classes in addition to discriminating between known classes. In this paper we present a neural network based representation for addressing the open set recognition problem. In this representation instances from the same class are close to each other while instances from different classes are further apart, resulting in statistically significant improvement when compared to other approaches on three datasets from two different domains.
Tasks	Malware Classification, Open Set Learning
Published	2018-02-12
URL	http://arxiv.org/abs/1802.04365v1
PDF	http://arxiv.org/pdf/1802.04365v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-neural-network-based
Repo	https://github.com/shrtCKT/opennet
Framework	tf

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations


Title	MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
Authors	Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, Rada Mihalcea
Abstract	Emotion recognition in conversations is a challenging task that has recently gained popularity due to its potential applications. Until now, however, a large-scale multimodal multi-party emotional conversational database containing more than two speakers per dialogue was missing. Thus, we propose the Multimodal EmotionLines Dataset (MELD), an extension and enhancement of EmotionLines. MELD contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends. Each utterance is annotated with emotion and sentiment labels, and encompasses audio, visual and textual modalities. We propose several strong multimodal baselines and show the importance of contextual and multimodal information for emotion recognition in conversations. The full dataset is available for use at http:// affective-meld.github.io.
Tasks	Dialogue Generation, Emotion Recognition
Published	2018-10-05
URL	https://arxiv.org/abs/1810.02508v6
PDF	https://arxiv.org/pdf/1810.02508v6.pdf
PWC	https://paperswithcode.com/paper/meld-a-multimodal-multi-party-dataset-for
Repo	https://github.com/SenticNet/MELD
Framework	tf

A Neural Temporal Model for Human Motion Prediction


Title	A Neural Temporal Model for Human Motion Prediction
Authors	Anand Gopalakrishnan, Ankur Mali, Dan Kifer, C. Lee Giles, Alexander G. Ororbia
Abstract	We propose novel neural temporal models for predicting and synthesizing human motion, achieving state-of-the-art in modeling long-term motion trajectories while being competitive with prior work in short-term prediction and requiring significantly less computation. Key aspects of our proposed system include: 1) a novel, two-level processing architecture that aids in generating planned trajectories, 2) a simple set of easily computable features that integrate derivative information, and 3) a novel multi-objective loss function that helps the model to slowly progress from simple next-step prediction to the harder task of multi-step, closed-loop prediction. Our results demonstrate that these innovations improve the modeling of long-term motion trajectories. Finally, we propose a novel metric, called Normalized Power Spectrum Similarity (NPSS), to evaluate the long-term predictive ability of motion synthesis models, complementing the popular mean-squared error (MSE) measure of Euler joint angles over time. We conduct a user study to determine if the proposed NPSS correlates with human evaluation of long-term motion more strongly than MSE and find that it indeed does. We release code and additional results (visualizations) for this paper at: https://github.com/cr7anand/neural_temporal_models
Tasks	motion prediction
Published	2018-09-09
URL	https://arxiv.org/abs/1809.03036v5
PDF	https://arxiv.org/pdf/1809.03036v5.pdf
PWC	https://paperswithcode.com/paper/a-neural-temporal-model-for-human-motion
Repo	https://github.com/cr7anand/neural_temporal_models
Framework	tf

Neural Program Synthesis with Priority Queue Training


Title	Neural Program Synthesis with Priority Queue Training
Authors	Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le
Abstract	Models and examples built with TensorFlow
Tasks	Program Synthesis
Published	2018-01-10
URL	http://arxiv.org/abs/1801.03526v2
PDF	http://arxiv.org/pdf/1801.03526v2.pdf
PWC	https://paperswithcode.com/paper/neural-program-synthesis-with-priority-queue
Repo	https://github.com/tensorflow/models/tree/master/research/brain_coder
Framework	tf

Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirectional Recurrent Neural Network (RNN)


Title	Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirectional Recurrent Neural Network (RNN)
Authors	Abien Fred Agarap, Paul Grafilon
Abstract	Understanding customer sentiments is of paramount importance in marketing strategies today. Not only will it give companies an insight as to how customers perceive their products and/or services, but it will also give them an idea on how to improve their offers. This paper attempts to understand the correlation of different variables in customer reviews on a women clothing e-commerce, and to classify each review whether it recommends the reviewed product or not and whether it consists of positive, negative, or neutral sentiment. To achieve these goals, we employed univariate and multivariate analyses on dataset features except for review titles and review texts, and we implemented a bidirectional recurrent neural network (RNN) with long-short term memory unit (LSTM) for recommendation and sentiment classification. Results have shown that a recommendation is a strong indicator of a positive sentiment score, and vice-versa. On the other hand, ratings in product reviews are fuzzy indicators of sentiment scores. We also found out that the bidirectional LSTM was able to reach an F1-score of 0.88 for recommendation classification, and 0.93 for sentiment classification.
Tasks	Sentiment Analysis
Published	2018-05-08
URL	http://arxiv.org/abs/1805.03687v1
PDF	http://arxiv.org/pdf/1805.03687v1.pdf
PWC	https://paperswithcode.com/paper/statistical-analysis-on-e-commerce-reviews
Repo	https://github.com/arjit3004/Recommended-System
Framework	none

VulDeePecker: A Deep Learning-Based System for Vulnerability Detection


Title	VulDeePecker: A Deep Learning-Based System for Vulnerability Detection
Authors	Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, Yuyi Zhong
Abstract	The automatic detection of software vulnerabilities is an important research problem. However, existing solutions to this problem rely on human experts to define features and often miss many vulnerabilities (i.e., incurring high false negative rate). In this paper, we initiate the study of using deep learning-based vulnerability detection to relieve human experts from the tedious and subjective task of manually defining features. Since deep learning is motivated to deal with problems that are very different from the problem of vulnerability detection, we need some guiding principles for applying deep learning to vulnerability detection. In particular, we need to find representations of software programs that are suitable for deep learning. For this purpose, we propose using code gadgets to represent programs and then transform them into vectors, where a code gadget is a number of (not necessarily consecutive) lines of code that are semantically related to each other. This leads to the design and implementation of a deep learning-based vulnerability detection system, called Vulnerability Deep Pecker (VulDeePecker). In order to evaluate VulDeePecker, we present the first vulnerability dataset for deep learning approaches. Experimental results show that VulDeePecker can achieve much fewer false negatives (with reasonable false positives) than other approaches. We further apply VulDeePecker to 3 software products (namely Xen, Seamonkey, and Libav) and detect 4 vulnerabilities, which are not reported in the National Vulnerability Database but were “silently” patched by the vendors when releasing later versions of these products; in contrast, these vulnerabilities are almost entirely missed by the other vulnerability detection systems we experimented with.
Tasks	Vulnerability Detection
Published	2018-01-05
URL	http://arxiv.org/abs/1801.01681v1
PDF	http://arxiv.org/pdf/1801.01681v1.pdf
PWC	https://paperswithcode.com/paper/vuldeepecker-a-deep-learning-based-system-for
Repo	https://github.com/dascimal-org/MDSeqVAE/blob/master/VulDeePeck.py
Framework	tf

Deep Multi-Center Learning for Face Alignment


Title	Deep Multi-Center Learning for Face Alignment
Authors	Zhiwen Shao, Hengliang Zhu, Xin Tan, Yangyang Hao, Lizhuang Ma
Abstract	Facial landmarks are highly correlated with each other since a certain landmark can be estimated by its neighboring landmarks. Most of the existing deep learning methods only use one fully-connected layer called shape prediction layer to estimate the locations of facial landmarks. In this paper, we propose a novel deep learning framework named Multi-Center Learning with multiple shape prediction layers for face alignment. In particular, each shape prediction layer emphasizes on the detection of a certain cluster of semantically relevant landmarks respectively. Challenging landmarks are focused firstly, and each cluster of landmarks is further optimized respectively. Moreover, to reduce the model complexity, we propose a model assembling method to integrate multiple shape prediction layers into one shape prediction layer. Extensive experiments demonstrate that our method is effective for handling complex occlusions and appearance variations with real-time performance. The code for our method is available at https://github.com/ZhiwenShao/MCNet-Extension.
Tasks	Face Alignment
Published	2018-08-05
URL	http://arxiv.org/abs/1808.01558v2
PDF	http://arxiv.org/pdf/1808.01558v2.pdf
PWC	https://paperswithcode.com/paper/deep-multi-center-learning-for-face-alignment
Repo	https://github.com/ZhiwenShao/MCNet-Extension
Framework	none

Approximate Eigenvalue Decompositions of Linear Transformations with a Few Householder Reflectors


Title	Approximate Eigenvalue Decompositions of Linear Transformations with a Few Householder Reflectors
Authors	Cristian Rusu
Abstract	The ability to decompose a signal in an orthonormal basis (a set of orthogonal components, each normalized to have unit length) using a fast numerical procedure rests at the heart of many signal processing methods and applications. The classic examples are the Fourier and wavelet transforms that enjoy numerically efficient implementations (FFT and FWT, respectively). Unfortunately, orthonormal transformations are in general unstructured, and therefore they do not enjoy low computational complexity properties. In this paper, based on Householder reflectors, we introduce a class of orthonormal matrices that are numerically efficient to manipulate: we control the complexity of matrix-vector multiplications with these matrices using a given parameter. We provide numerical algorithms that approximate any orthonormal or symmetric transform with a new orthonormal or symmetric structure made up of products of a given number of Householder reflectors. We show analyses and numerical evidence to highlight the accuracy of the proposed approximations and provide an application to the case of learning fast Mahanalobis distance metric transformations.
Tasks
Published	2018-11-19
URL	https://arxiv.org/abs/1811.07624v2
PDF	https://arxiv.org/pdf/1811.07624v2.pdf
PWC	https://paperswithcode.com/paper/approximate-eigenvalue-decompositions-of
Repo	https://github.com/cristian-rusu-research/approximate-householder-decomposition
Framework	none

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation


Title	Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation
Authors	Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, Jure Leskovec
Abstract	Generating novel graph structures that optimize given objectives while obeying some given underlying rules is fundamental for chemistry, biology and social science research. This is especially important in the task of molecular graph generation, whose goal is to discover novel molecules with desired properties such as drug-likeness and synthetic accessibility, while obeying physical laws such as chemical valency. However, designing models to find molecules that optimize desired properties while incorporating highly complex and non-differentiable rules remains to be a challenging task. Here we propose Graph Convolutional Policy Network (GCPN), a general graph convolutional network based model for goal-directed graph generation through reinforcement learning. The model is trained to optimize domain-specific rewards and adversarial loss through policy gradient, and acts in an environment that incorporates domain-specific rules. Experimental results show that GCPN can achieve 61% improvement on chemical property optimization over state-of-the-art baselines while resembling known molecules, and achieve 184% improvement on the constrained property optimization task.
Tasks	Graph Generation
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02473v3
PDF	http://arxiv.org/pdf/1806.02473v3.pdf
PWC	https://paperswithcode.com/paper/graph-convolutional-policy-network-for-goal
Repo	https://github.com/LeeJunHyun/The-Databases-for-Drug-Discovery
Framework	tf

Bounding Box Regression with Uncertainty for Accurate Object Detection


Title	Bounding Box Regression with Uncertainty for Accurate Object Detection
Authors	Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, Xiangyu Zhang
Abstract	Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still introduced when labeling the bounding boxes. In this paper, we propose a novel bounding box regression loss for learning bounding box transformation and localization variance together. Our loss greatly improves the localization accuracies of various architectures with nearly no additional computation. The learned localization variance allows us to merge neighboring bounding boxes during non-maximum suppression (NMS), which further improves the localization performance. On MS-COCO, we boost the Average Precision (AP) of VGG-16 Faster R-CNN from 23.6% to 29.1%. More importantly, for ResNet-50-FPN Mask R-CNN, our method improves the AP and AP90 by 1.8% and 6.2% respectively, which significantly outperforms previous state-of-the-art bounding box refinement methods. Our code and models are available at: github.com/yihui-he/KL-Loss
Tasks	Object Detection, Object Localization
Published	2018-09-23
URL	http://arxiv.org/abs/1809.08545v3
PDF	http://arxiv.org/pdf/1809.08545v3.pdf
PWC	https://paperswithcode.com/paper/softer-nms-rethinking-bounding-box-regression
Repo	https://github.com/yihui-he/softer-NMS
Framework	none

Monge-Ampère Flow for Generative Modeling


Title	Monge-Ampère Flow for Generative Modeling
Authors	Linfeng Zhang, Weinan E, Lei Wang
Abstract	We present a deep generative model, named Monge-Amp`ere flow, which builds on continuous-time gradient flow arising from the Monge-Amp`ere equation in optimal transport theory. The generative map from the latent space to the data space follows a dynamical system, where a learnable potential function guides a compressible fluid to flow towards the target density distribution. Training of the model amounts to solving an optimal control problem. The Monge-Amp`ere flow has tractable likelihoods and supports efficient sampling and inference. One can easily impose symmetry constraints in the generative model by designing suitable scalar potential functions. We apply the approach to unsupervised density estimation of the MNIST dataset and variational calculation of the two-dimensional Ising model at the critical point. This approach brings insights and techniques from Monge-Amp`ere equation, optimal transport, and fluid dynamics into reversible flow-based generative models.
Tasks	Density Estimation
Published	2018-09-26
URL	http://arxiv.org/abs/1809.10188v1
PDF	http://arxiv.org/pdf/1809.10188v1.pdf
PWC	https://paperswithcode.com/paper/monge-ampere-flow-for-generative-modeling
Repo	https://github.com/wangleiphy/MongeAmpereFlow
Framework	pytorch

An Automated System for Epilepsy Detection using EEG Brain Signals based on Deep Learning Approach


Title	An Automated System for Epilepsy Detection using EEG Brain Signals based on Deep Learning Approach
Authors	Ihsan Ullah, Muhammad Hussain, Emad-ul-Haq Qazi, Hatim Aboalsamh
Abstract	Epilepsy is a neurological disorder and for its detection, encephalography (EEG) is a commonly used clinical approach. Manual inspection of EEG brain signals is a time-consuming and laborious process, which puts heavy burden on neurologists and affects their performance. Several automatic techniques have been proposed using traditional approaches to assist neurologists in detecting binary epilepsy scenarios e.g. seizure vs. non-seizure or normal vs. ictal. These methods do not perform well when classifying ternary case e.g. ictal vs. normal vs. inter-ictal; the maximum accuracy for this case by the state-of-the-art-methods is 97+-1%. To overcome this problem, we propose a system based on deep learning, which is an ensemble of pyramidal one-dimensional convolutional neural network (P-1D-CNN) models. In a CNN model, the bottleneck is the large number of learnable parameters. P-1D-CNN works on the concept of refinement approach and it results in 60% fewer parameters compared to traditional CNN models. Further to overcome the limitations of small amount of data, we proposed augmentation schemes for learning P-1D-CNN model. In almost all the cases concerning epilepsy detection, the proposed system gives an accuracy of 99.1+-0.9% on the University of Bonn dataset.
Tasks	EEG
Published	2018-01-16
URL	http://arxiv.org/abs/1801.05412v1
PDF	http://arxiv.org/pdf/1801.05412v1.pdf
PWC	https://paperswithcode.com/paper/an-automated-system-for-epilepsy-detection
Repo	https://github.com/majorash/eeg_epilepsy_conv1d
Framework	none

Devil in the Details: Towards Accurate Single and Multiple Human Parsing


Title	Devil in the Details: Towards Accurate Single and Multiple Human Parsing
Authors	Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, Yao Zhao, Thomas Huang
Abstract	Human parsing has received considerable interest due to its wide application potentials. Nevertheless, it is still unclear how to develop an accurate human parsing system in an efficient and elegant way. In this paper, we identify several useful properties, including feature resolution, global context information and edge details, and perform rigorous analyses to reveal how to leverage them to benefit the human parsing task. The advantages of these useful properties finally result in a simple yet effective Context Embedding with Edge Perceiving (CE2P) framework for single human parsing. Our CE2P is end-to-end trainable and can be easily adopted for conducting multiple human parsing. Benefiting the superiority of CE2P, we achieved the 1st places on all three human parsing benchmarks. Without any bells and whistles, we achieved 56.50% (mIoU), 45.31% (mean $AP^r$) and 33.34% ($AP^p_{0.5}$) in LIP, CIHP and MHP v2.0, which outperform the state-of-the-arts more than 2.06%, 3.81% and 1.87%, respectively. We hope our CE2P will serve as a solid baseline and help ease future research in single/multiple human parsing. Code has been made available at \url{https://github.com/liutinglt/CE2P}.
Tasks	Human Parsing, Semantic Segmentation
Published	2018-09-17
URL	http://arxiv.org/abs/1809.05996v3
PDF	http://arxiv.org/pdf/1809.05996v3.pdf
PWC	https://paperswithcode.com/paper/devil-in-the-details-towards-accurate-single
Repo	https://github.com/liutinglt/CE2P
Framework	pytorch

Interpretable Convolutional Filters with SincNet


Title	Interpretable Convolutional Filters with SincNet
Authors	Mirco Ravanelli, Yoshua Bengio
Abstract	Deep learning is currently playing a crucial role toward higher levels of artificial intelligence. This paradigm allows neural networks to learn complex and abstract representations, that are progressively obtained by combining simpler ones. Nevertheless, the internal “black-box” representations automatically discovered by current neural architectures often suffer from a lack of interpretability, making of primary interest the study of explainable machine learning techniques. This paper summarizes our recent efforts to develop a more interpretable neural model for directly processing speech from the raw waveform. In particular, we propose SincNet, a novel Convolutional Neural Network (CNN) that encourages the first layer to discover more meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band-pass filters are directly learned from data. This inductive bias offers a very compact way to derive a customized filter-bank front-end, that only depends on some parameters with a clear physical meaning. Our experiments, conducted on both speaker and speech recognition, show that the proposed architecture converges faster, performs better, and is more interpretable than standard CNNs.
Tasks	Distant Speech Recognition, Speech Recognition
Published	2018-11-23
URL	https://arxiv.org/abs/1811.09725v2
PDF	https://arxiv.org/pdf/1811.09725v2.pdf
PWC	https://paperswithcode.com/paper/interpretable-convolutional-filters-with
Repo	https://github.com/mravanelli/pytorch-kaldi
Framework	pytorch

MixUp as Locally Linear Out-Of-Manifold Regularization


Title	MixUp as Locally Linear Out-Of-Manifold Regularization
Authors	Hongyu Guo, Yongyi Mao, Richong Zhang
Abstract	MixUp is a recently proposed data-augmentation scheme, which linearly interpolates a random pair of training examples and correspondingly the one-hot representations of their labels. Training deep neural networks with such additional data is shown capable of significantly improving the predictive accuracy of the current art. The power of MixUp, however, is primarily established empirically and its working and effectiveness have not been explained in any depth. In this paper, we develop an understanding for MixUp as a form of “out-of-manifold regularization”, which imposes certain “local linearity” constraints on the model’s input space beyond the data manifold. This analysis enables us to identify a limitation of MixUp, which we call “manifold intrusion”. In a nutshell, manifold intrusion in MixUp is a form of under-fitting resulting from conflicts between the synthetic labels of the mixed-up examples and the labels of original training data. Such a phenomenon usually happens when the parameters controlling the generation of mixing policies are not sufficiently fine-tuned on the training data. To address this issue, we propose a novel adaptive version of MixUp, where the mixing policies are automatically learned from the data using an additional network and objective function designed to avoid manifold intrusion. The proposed regularizer, AdaMixUp, is empirically evaluated on several benchmark datasets. Extensive experiments demonstrate that AdaMixUp improves upon MixUp when applied to the current art of deep classification models.
Tasks	Data Augmentation
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02499v3
PDF	http://arxiv.org/pdf/1809.02499v3.pdf
PWC	https://paperswithcode.com/paper/mixup-as-locally-linear-out-of-manifold
Repo	https://github.com/SITE5039/AdaMixUp
Framework	tf