January 30, 2020

2934 words 14 mins read

Paper Group ANR 343

High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks. Hierarchic Neighbors Embedding. Modeling the Biological Pathology Continuum with HSIC-regularized Wasserstein Auto-encoders. Incremental Learning from Scratch for Task-Oriented Dialogue Systems. Scalable and Order-robust Continual Learning with Additive Para …

High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks


Title	High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks
Authors	Haohan Wang, Xindi Wu, Zeyi Huang, Eric P. Xing
Abstract	We investigate the relationship between the frequency spectrum of image data and the generalization behavior of convolutional neural networks (CNN). We first notice CNN’s ability in capturing the high-frequency components of images. These high-frequency components are almost imperceptible to a human. Thus the observation leads to multiple hypotheses that are related to the generalization behaviors of CNN, including a potential explanation for adversarial examples, a discussion of CNN’s trade-off between robustness and accuracy, and some evidence in understanding training heuristics.
Tasks	Adversarial Attack
Published	2019-05-28
URL	https://arxiv.org/abs/1905.13545v3
PDF	https://arxiv.org/pdf/1905.13545v3.pdf
PWC	https://paperswithcode.com/paper/190513545
Repo
Framework

Hierarchic Neighbors Embedding


Title	Hierarchic Neighbors Embedding
Authors	Shenglan Liu, Yang Yu, Yang Liu, Hong Qiao, Lin Feng, Jiashi Feng
Abstract	Manifold learning now plays a very important role in machine learning and many relevant applications. Although its superior performance in dealing with nonlinear data distribution, data sparsity is always a thorny knot. There are few researches to well handle it in manifold learning. In this paper, we propose Hierarchic Neighbors Embedding (HNE), which enhance local connection by the hierarchic combination of neighbors. After further analyzing topological connection and reconstruction performance, three different versions of HNE are given. The experimental results show that our methods work well on both synthetic data and high-dimensional real-world tasks. HNE develops the outstanding advantages in dealing with general data. Furthermore, comparing with other popular manifold learning methods, the performance on sparse samples and weak-connected manifolds is better for HNE.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07142v1
PDF	https://arxiv.org/pdf/1909.07142v1.pdf
PWC	https://paperswithcode.com/paper/hierarchic-neighbors-embedding
Repo
Framework

Modeling the Biological Pathology Continuum with HSIC-regularized Wasserstein Auto-encoders


Title	Modeling the Biological Pathology Continuum with HSIC-regularized Wasserstein Auto-encoders
Authors	Denny Wu, Hirofumi Kobayashi, Charles Ding, Lei Cheng, Keisuke Goda Marzyeh Ghassemi
Abstract	A crucial challenge in image-based modeling of biomedical data is to identify trends and features that separate normality and pathology. In many cases, the morphology of the imaged object exhibits continuous change as it deviates from normality, and thus a generative model can be trained to model this morphological continuum. Moreover, given side information that correlates to certain trend in morphological change, a latent variable model can be regularized such that its latent representation reflects this side information. In this work, we use the Wasserstein Auto-encoder to model this pathology continuum, and apply the Hilbert-Schmitt Independence Criterion (HSIC) to enforce dependency between certain latent features and the provided side information. We experimentally show that the model can provide disentangled and interpretable latent representations and also generate a continuum of morphological changes that corresponds to change in the side information.
Tasks
Published	2019-01-20
URL	http://arxiv.org/abs/1901.06618v1
PDF	http://arxiv.org/pdf/1901.06618v1.pdf
PWC	https://paperswithcode.com/paper/modeling-the-biological-pathology-continuum
Repo
Framework

Incremental Learning from Scratch for Task-Oriented Dialogue Systems


Title	Incremental Learning from Scratch for Task-Oriented Dialogue Systems
Authors	Weikang Wang, Jiajun Zhang, Qian Li, Mei-Yuh Hwang, Chengqing Zong, Zhifei Li
Abstract	Clarifying user needs is essential for existing task-oriented dialogue systems. However, in real-world applications, developers can never guarantee that all possible user demands are taken into account in the design phase. Consequently, existing systems will break down when encountering unconsidered user needs. To address this problem, we propose a novel incremental learning framework to design task-oriented dialogue systems, or for short Incremental Dialogue System (IDS), without pre-defining the exhaustive list of user needs. Specifically, we introduce an uncertainty estimation module to evaluate the confidence of giving correct responses. If there is high confidence, IDS will provide responses to users. Otherwise, humans will be involved in the dialogue process, and IDS can learn from human intervention through an online learning module. To evaluate our method, we propose a new dataset which simulates unanticipated user needs in the deployment stage. Experiments show that IDS is robust to unconsidered user actions, and can update itself online by smartly selecting only the most effective training data, and hence attains better performance with less annotation cost.
Tasks	Task-Oriented Dialogue Systems
Published	2019-06-12
URL	https://arxiv.org/abs/1906.04991v1
PDF	https://arxiv.org/pdf/1906.04991v1.pdf
PWC	https://paperswithcode.com/paper/incremental-learning-from-scratch-for-task
Repo
Framework

Scalable and Order-robust Continual Learning with Additive Parameter Decomposition


Title	Scalable and Order-robust Continual Learning with Additive Parameter Decomposition
Authors	Jaehong Yoon, Saehoon Kim, Eunho Yang, Sung Ju Hwang
Abstract	While recent continual learning methods largely alleviate the catastrophic problem on toy-sized datasets, some issues remain to be tackled to apply them to real-world problem domains. First, a continual learning model should effectively handle catastrophic forgetting and be efficient to train even with a large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where the performance of the tasks largely varies based on the order of the task arrival sequence, as it may cause serious problems where fairness plays a critical role (e.g. medical diagnosis). To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameters for each task as a sum of task-shared and sparse task-adaptive parameters. With our Additive Parameter Decomposition (APD), the task-adaptive parameters for earlier tasks remain mostly unaffected, where we update them only to reflect the changes made to the task-shared parameters. This decomposition of parameters effectively prevents catastrophic forgetting and order-sensitivity, while being computation- and memory-efficient. Further, we can achieve even better scalability with APD using hierarchical knowledge consolidation, which clusters the task-adaptive parameters to obtain hierarchically shared parameters. We validate our network with APD, APD-Net, on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, scalability, and order-robustness.
Tasks	Continual Learning, Medical Diagnosis
Published	2019-02-25
URL	https://arxiv.org/abs/1902.09432v3
PDF	https://arxiv.org/pdf/1902.09432v3.pdf
PWC	https://paperswithcode.com/paper/oracle-order-robust-adaptive-continual
Repo
Framework

Toward Dimensional Emotion Detection from Categorical Emotion Annotations


Title	Toward Dimensional Emotion Detection from Categorical Emotion Annotations
Authors	Sungjoon Park, Jiseon Kim, Jaeyeol Jeon, Heeyoung Park, Alice Oh
Abstract	We propose a framework which makes a model predict fine-grained dimensional emotions (valence-arousal-dominance, VAD) trained on corpus annotated with coarse-grained categorical emotions. We train a model by minimizing EMD distances between predicted VAD score distribution and \textit{sorted} categorical emotion distributions in terms of VAD, as a proxy of target VAD score distributions. With our model, we can simultaneously classify a given sentence to categorical emotions as well as predict VAD scores. We use pre-trained BERT-Large and fine-tune on SemEval dataset (11 categorical emotions) and evaluate on EmoBank (VAD dimensional emotions), in order to show our approach reaches comparable performance to that of the state-of-the-art classifiers in categorical emotion classification task and significant positive correlations with ground truth VAD scores. Also, if one continues training our model with supervision of VAD labels, it outperforms state-of-the-art VAD regression models. We further present examples showing our model can annotate emotional words suitable for a given text even those words are not seen as categorical labels during training.
Tasks	Emotion Classification
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02499v1
PDF	https://arxiv.org/pdf/1911.02499v1.pdf
PWC	https://paperswithcode.com/paper/toward-dimensional-emotion-detection-from
Repo
Framework

Active Multi-Label Crowd Consensus


Title	Active Multi-Label Crowd Consensus
Authors	Jinzheng Tu, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Xiangliang Zhang
Abstract	Crowdsourcing is an economic and efficient strategy aimed at collecting annotations of data through an online platform. Crowd workers with different expertise are paid for their service, and the task requester usually has a limited budget. How to collect reliable annotations for multi-label data and how to compute the consensus within budget is an interesting and challenging, but rarely studied, problem. In this paper, we propose a novel approach to accomplish Active Multi-label Crowd Consensus (AMCC). AMCC accounts for the commonality and individuality of workers, and assumes that workers can be organized into different groups. Each group includes a set of workers who share a similar annotation behavior and label correlations. To achieve an effective multi-label consensus, AMCC models workers’ annotations via a linear combination of commonality and individuality, and reduces the impact of unreliable workers by assigning smaller weights to the group. To collect reliable annotations with reduced cost, AMCC introduces an active crowdsourcing learning strategy that selects sample-label-worker triplets. In a triplet, the selected sample and label are the most informative for the consensus model, and the selected worker can reliably annotate the sample with low cost. Our experimental results on multi-label datasets demonstrate the advantages of AMCC over state-of-the-art solutions on computing crowd consensus and on reducing the budget by choosing cost-effective triplets.
Tasks
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02789v1
PDF	https://arxiv.org/pdf/1911.02789v1.pdf
PWC	https://paperswithcode.com/paper/active-multi-label-crowd-consensus
Repo
Framework

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy


Title	Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
Authors	Jiaoyang Huang, Horng-Tzer Yau
Abstract	The evolution of a deep neural network trained by the gradient descent can be described by its neural tangent kernel (NTK) as introduced in [20], where it was proven that in the infinite width limit the NTK converges to an explicit limiting kernel and it stays constant during training. The NTK was also implicit in some other recent papers [6,13,14]. In the overparametrization regime, a fully-trained deep neural network is indeed equivalent to the kernel regression predictor using the limiting NTK. And the gradient descent achieves zero training loss for a deep overparameterized neural network. However, it was observed in [5] that there is a performance gap between the kernel regression using the limiting NTK and the deep neural networks. This performance gap is likely to originate from the change of the NTK along training due to the finite width effect. The change of the NTK along the training is central to describe the generalization features of deep neural networks. In the current paper, we study the dynamic of the NTK for finite width deep fully-connected neural networks. We derive an infinite hierarchy of ordinary differential equations, the neural tangent hierarchy (NTH) which captures the gradient descent dynamic of the deep neural network. Moreover, under certain conditions on the neural network width and the data set dimension, we prove that the truncated hierarchy of NTH approximates the dynamic of the NTK up to arbitrary precision. This description makes it possible to directly study the change of the NTK for deep neural networks, and sheds light on the observation that deep neural networks outperform kernel regressions using the corresponding limiting NTK.
Tasks
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08156v1
PDF	https://arxiv.org/pdf/1909.08156v1.pdf
PWC	https://paperswithcode.com/paper/dynamics-of-deep-neural-networks-and-neural
Repo
Framework

Speaker-invariant Affective Representation Learning via Adversarial Training


Title	Speaker-invariant Affective Representation Learning via Adversarial Training
Authors	Haoqi Li, Ming Tu, Jing Huang, Shrikanth Narayanan, Panayiotis Georgiou
Abstract	Representation learning for speech emotion recognition is challenging due to labeled data sparsity issue and lack of gold standard references. In addition, there is much variability from input speech signals, human subjective perception of the signals and emotion label ambiguity. In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect of speaker variability in the speech signals. Specifically, we propose to disentangle the speaker characteristics from emotion through an adversarial training network in order to better represent emotion. Our method combines the gradient reversal technique with an entropy loss function to remove such speaker information. Our approach is evaluated on both IEMOCAP and CMU-MOSEI datasets. We show that our method improves speech emotion classification and increases generalization to unseen speakers.
Tasks	Emotion Classification, Emotion Recognition, Representation Learning, Speech Emotion Recognition
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01533v2
PDF	https://arxiv.org/pdf/1911.01533v2.pdf
PWC	https://paperswithcode.com/paper/speaker-invariant-affective-representation
Repo
Framework

Reconsidering Analytical Variational Bounds for Output Layers of Deep Networks


Title	Reconsidering Analytical Variational Bounds for Output Layers of Deep Networks
Authors	Otmane Sakhi, Stephen Bonner, David Rohde, Flavian Vasile
Abstract	The combination of the re-parameterization trick with the use of variational auto-encoders has caused a sensation in Bayesian deep learning, allowing the training of realistic generative models of images and has considerably increased our ability to use scalable latent variable models. The re-parameterization trick is necessary for models in which no analytical variational bound is available and allows noisy gradients to be computed for arbitrary models. However, for certain standard output layers of a neural network, analytical bounds are available and the variational auto-encoder may be used both without the re-parameterization trick or the need for any Monte Carlo approximation. In this work, we show that using Jaakola and Jordan bound, we can produce a binary classification layer that allows a Bayesian output layer to be trained, using the standard stochastic gradient descent algorithm. We further demonstrate that a latent variable model utilizing the Bouchard bound for multi-class classification allows for fast training of a fully probabilistic latent factor model, even when the number of classes is very large.
Tasks	Latent Variable Models
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00877v2
PDF	https://arxiv.org/pdf/1910.00877v2.pdf
PWC	https://paperswithcode.com/paper/reconsidering-analytical-variational-bounds
Repo
Framework

Jointly Learning to Detect Emotions and Predict Facebook Reactions


Title	Jointly Learning to Detect Emotions and Predict Facebook Reactions
Authors	Lisa Graziani, Stefano Melacci, Marco Gori
Abstract	The growing ubiquity of Social Media data offers an attractive perspective for improving the quality of machine learning-based models in several fields, ranging from Computer Vision to Natural Language Processing. In this paper we focus on Facebook posts paired with reactions of multiple users, and we investigate their relationships with classes of emotions that are typically considered in the task of emotion detection. We are inspired by the idea of introducing a connection between reactions and emotions by means of First-Order Logic formulas, and we propose an end-to-end neural model that is able to jointly learn to detect emotions and predict Facebook reactions in a multi-task environment, where the logic formulas are converted into polynomial constraints. Our model is trained using a large collection of unsupervised texts together with data labeled with emotion classes and Facebook posts that include reactions. An extended experimental analysis that leverages a large collection of Facebook posts shows that the tasks of emotion classification and reaction prediction can both benefit from their interaction.
Tasks	Emotion Classification
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10779v1
PDF	https://arxiv.org/pdf/1909.10779v1.pdf
PWC	https://paperswithcode.com/paper/jointly-learning-to-detect-emotions-and
Repo
Framework

Neurogeometry of perception: isotropic and anisotropic aspects


Title	Neurogeometry of perception: isotropic and anisotropic aspects
Authors	Giovanna Citti, Alessandro Sarti
Abstract	In this paper we first recall the definition of geometical model of the visual cortex, focusing in particular on the geometrical properties of horizontal cortical connectivity. Then we recognize that histograms of edges - co-occurrences are not isotropic distributed, and are strongly biased in horizontal and vertical directions of the stimulus. Finally we introduce a new model of non isotropic cortical connectivity modeled on the histogram of edges - co-occurrences. Using this kernel we are able to justify oblique phenomena comparable with experimental findings.
Tasks
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03495v1
PDF	https://arxiv.org/pdf/1906.03495v1.pdf
PWC	https://paperswithcode.com/paper/neurogeometry-of-perception-isotropic-and
Repo
Framework

Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise


Title	Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise
Authors	Andrei Kulunchakov, Julien Mairal
Abstract	In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization. By extending the concept of estimate sequence introduced by Nesterov, we interpret a large class of stochastic optimization methods as procedures that iteratively minimize a surrogate of the objective. This point of view covers stochastic gradient descent (SGD), the variance-reduction approaches SAGA, SVRG, MISO, their proximal variants, and has several advantages: (i) we provide a simple generic proof of convergence for all of the aforementioned methods; (ii) we naturally obtain new algorithms with the same guarantees; (iii) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we show that this viewpoint is useful to obtain accelerated algorithms.
Tasks	Stochastic Optimization
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08788v1
PDF	http://arxiv.org/pdf/1901.08788v1.pdf
PWC	https://paperswithcode.com/paper/estimate-sequences-for-stochastic-composite
Repo
Framework

Enhancing Recurrent Neural Networks with Sememes


Title	Enhancing Recurrent Neural Networks with Sememes
Authors	Yujia Qin, Fanchao Qi, Sicong Ouyang, Zhiyuan Liu, Cheng Yang, Yasheng Wang, Qun Liu, Maosong Sun
Abstract	Sememes, the minimum semantic units of human languages, have been successfully utilized in various natural language processing applications. However, most existing studies exploit sememes in specific tasks and few efforts are made to utilize sememes more fundamentally. In this paper, we propose to incorporate sememes into recurrent neural networks (RNNs) to improve their sequence modeling ability, which is beneficial to all kinds of downstream tasks. We design three different sememe incorporation methods and employ them in typical RNNs including LSTM, GRU and their bidirectional variants. For evaluation, we use several benchmark datasets involving PTB and WikiText-2 for language modeling, SNLI for natural language inference. Experimental results show evident and consistent improvement of our sememe-incorporated models compared with vanilla RNNs, which proves the effectiveness of our sememe incorporation methods. Moreover, we find the sememe-incorporated models have great robustness and outperform adversarial training in defending adversarial attack. All the code and data of this work will be made available to the public.
Tasks	Adversarial Attack, Language Modelling, Natural Language Inference
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08910v1
PDF	https://arxiv.org/pdf/1910.08910v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-recurrent-neural-networks-with
Repo
Framework

Security of Deep Learning Methodologies: Challenges and Opportunities


Title	Security of Deep Learning Methodologies: Challenges and Opportunities
Authors	Shahbaz Rezaei, Xin Liu
Abstract	Despite the plethora of studies about security vulnerabilities and defenses of deep learning models, security aspects of deep learning methodologies, such as transfer learning, have been rarely studied. In this article, we highlight the security challenges and research opportunities of these methodologies, focusing on vulnerabilities and attacks unique to them.
Tasks	Transfer Learning
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03735v1
PDF	https://arxiv.org/pdf/1912.03735v1.pdf
PWC	https://paperswithcode.com/paper/security-of-deep-learning-methodologies
Repo
Framework