April 2, 2020

2852 words 14 mins read

Paper Group ANR 307

Guiding attention in Sequence-to-sequence models for Dialogue Act prediction. Nonparametric Regression Quantum Neural Networks. Label-guided Learning for Text Classification. Neural Autopoiesis: Organizing Self-Boundary by Stimulus Avoidance in Biological and Artificial Neural Networks. Randomization matters. How to defend against strong adversaria …

Guiding attention in Sequence-to-sequence models for Dialogue Act prediction


Title	Guiding attention in Sequence-to-sequence models for Dialogue Act prediction
Authors	Pierre Colombo, Emile Chapuis, Matteo Manica, Emmanuel Vignon, Giovanna Varni, Chloe Clavel
Abstract	The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.
Tasks	Machine Translation
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08801v2
PDF	https://arxiv.org/pdf/2002.08801v2.pdf
PWC	https://paperswithcode.com/paper/guiding-attention-in-sequence-to-sequence
Repo
Framework

Nonparametric Regression Quantum Neural Networks


Title	Nonparametric Regression Quantum Neural Networks
Authors	Do Ngoc Diep, Koji Nagata, Tadao Nakamura
Abstract	In two pervious papers \cite{dndiep3}, \cite{dndiep4}, the first author constructed the least square quantum neural networks (LS-QNN), and ploynomial interpolation quantum neural networks ( PI-QNN), parametrico-stattistical QNN like: leanr regrassion quantum neural networks (LR-QNN), polynomial regression quantum neural networks (PR-QNN), chi-squared quantum neural netowrks ($\chi^2$-QNN). We observed that the method works also in the cases by using nonparametric statistics. In this paper we analyze and implement the nonparametric tests on QNN such as: linear nonparametric regression quantum neural networks (LNR-QNN), polynomial nonparametric regression quantum neural networks (PNR-QNN). The implementation is constructed through the Gauss-Jordan Elimination quantum neural networks (GJE-QNN).The training rule is to use the high probability confidence regions or intervals.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02818v1
PDF	https://arxiv.org/pdf/2002.02818v1.pdf
PWC	https://paperswithcode.com/paper/nonparametric-regression-quantum-neural
Repo
Framework

Label-guided Learning for Text Classification


Title	Label-guided Learning for Text Classification
Authors	Xien Liu, Song Wang, Xiao Zhang, Xinxin You, Ji Wu, Dejing Dou
Abstract	Text classification is one of the most important and fundamental tasks in natural language processing. Performance of this task mainly dependents on text representation learning. Currently, most existing learning frameworks mainly focus on encoding local contextual information between words. These methods always neglect to exploit global clues, such as label information, for encoding text information. In this study, we propose a label-guided learning framework LguidedLearn for text representation and classification. Our method is novel but simple that we only insert a label-guided encoding layer into the commonly used text representation learning schemas. That label-guided layer performs label-based attentive encoding to map the universal text embedding (encoded by a contextual information learner) into different label spaces, resulting in label-wise embeddings. In our proposed framework, the label-guided layer can be easily and directly applied with a contextual encoding method to perform jointly learning. Text information is encoded based on both the local contextual information and the global label clues. Therefore, the obtained text embeddings are more robust and discriminative for text classification. Extensive experiments are conducted on benchmark datasets to illustrate the effectiveness of our proposed method.
Tasks	Representation Learning, Text Classification
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10772v1
PDF	https://arxiv.org/pdf/2002.10772v1.pdf
PWC	https://paperswithcode.com/paper/label-guided-learning-for-text-classification
Repo
Framework

Neural Autopoiesis: Organizing Self-Boundary by Stimulus Avoidance in Biological and Artificial Neural Networks


Title	Neural Autopoiesis: Organizing Self-Boundary by Stimulus Avoidance in Biological and Artificial Neural Networks
Authors	Atsushi Masumori, Lana Sinapayen, Norihiro Maruyama, Takeshi Mita, Douglas Bakkum, Urs Frey, Hirokazu Takahashi, Takashi Ikegami
Abstract	Living organisms must actively maintain themselves in order to continue existing. Autopoiesis is a key concept in the study of living organisms, where the boundaries of the organism is not static by dynamically regulated by the system itself. To study the autonomous regulation of self-boundary, we focus on neural homeodynamic responses to environmental changes using both biological and artificial neural networks. Previous studies showed that embodied cultured neural networks and spiking neural networks with spike-timing dependent plasticity (STDP) learn an action as they avoid stimulation from outside. In this paper, as a result of our experiments using embodied cultured neurons, we find that there is also a second property allowing the network to avoid stimulation: if the agent cannot learn an action to avoid the external stimuli, it tends to decrease the stimulus-evoked spikes, as if to ignore the uncontrollable-input. We also show such a behavior is reproduced by spiking neural networks with asymmetric STDP. We consider that these properties are regarded as autonomous regulation of self and non-self for the network, in which a controllable-neuron is regarded as self, and an uncontrollable-neuron is regarded as non-self. Finally, we introduce neural autopoiesis by proposing the principle of stimulus avoidance.
Tasks
Published	2020-01-27
URL	https://arxiv.org/abs/2001.09641v1
PDF	https://arxiv.org/pdf/2001.09641v1.pdf
PWC	https://paperswithcode.com/paper/neural-autopoiesis-organizing-self-boundary
Repo
Framework

Randomization matters. How to defend against strong adversarial attacks


Title	Randomization matters. How to defend against strong adversarial attacks
Authors	Rafael Pinot, Raphael Ettedgui, Geovani Rizk, Yann Chevaleyre, Jamal Atif
Abstract	Is there a classifier that ensures optimal robustness against all adversarial attacks? This paper answers this question by adopting a game-theoretic point of view. We show that adversarial attacks and defenses form an infinite zero-sum game where classical results (e.g. Sion theorem) do not apply. We demonstrate the non-existence of a Nash equilibrium in our game when the classifier and the Adversary are both deterministic, hence giving a negative answer to the above question in the deterministic regime. Nonetheless, the question remains open in the randomized regime. We tackle this problem by showing that, undermild conditions on the dataset distribution, any deterministic classifier can be outperformed by a randomized one. This gives arguments for using randomization, and leads us to a new algorithm for building randomized classifiers that are robust to strong adversarial attacks. Empirical results validate our theoretical analysis, and show that our defense method considerably outperforms Adversarial Training against state-of-the-art attacks.
Tasks
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11565v1
PDF	https://arxiv.org/pdf/2002.11565v1.pdf
PWC	https://paperswithcode.com/paper/randomization-matters-how-to-defend-against
Repo
Framework

Key Phrase Classification in Complex Assignments


Title	Key Phrase Classification in Complex Assignments
Authors	Manikandan Ravikiran
Abstract	Complex assignments typically consist of open-ended questions with large and diverse content in the context of both classroom and online graduate programs. With the sheer scale of these programs comes a variety of problems in peer and expert feedback, including rogue reviews. As such with the hope of identifying important contents needed for the review, in this work we present a very first work on key phrase classification with a detailed empirical study on traditional and most recent language modeling approaches. From this study, we find that the task of classification of key phrases is ambiguous at a human level producing Cohen’s kappa of 0.77 on a new data set. Both pretrained language models and simple TFIDF SVM classifiers produce similar results with a former producing average of 0.6 F1 higher than the latter. We finally derive practical advice from our extensive empirical and model interpretability results for those interested in key phrase classification from educational reports in the future.
Tasks	Language Modelling
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07019v1
PDF	https://arxiv.org/pdf/2003.07019v1.pdf
PWC	https://paperswithcode.com/paper/key-phrase-classification-in-complex
Repo
Framework

Improved prediction of soil properties with Multi-target Stacked Generalisation on EDXRF spectra


Title	Improved prediction of soil properties with Multi-target Stacked Generalisation on EDXRF spectra
Authors	Everton Jose Santana, Felipe Rodrigues dos Santos, Saulo Martiello Mastelini, Fabio Luiz Melquiades, Sylvio Barbon Jr
Abstract	Machine Learning (ML) algorithms have been used for assessing soil quality parameters along with non-destructive methodologies. Among spectroscopic analytical methodologies, energy dispersive X-ray fluorescence (EDXRF) is one of the more quick, environmentally friendly and less expensive when compared to conventional methods. However, some challenges in EDXRF spectral data analysis still demand more efficient methods capable of providing accurate outcomes. Using Multi-target Regression (MTR) methods, multiple parameters can be predicted, and also taking advantage of inter-correlated parameters the overall predictive performance can be improved. In this study, we proposed the Multi-target Stacked Generalisation (MTSG), a novel MTR method relying on learning from different regressors arranged in stacking structure for a boosted outcome. We compared MTSG and 5 MTR methods for predicting 10 parameters of soil fertility. Random Forest and Support Vector Machine (with linear and radial kernels) were used as learning algorithms embedded into each MTR method. Results showed the superiority of MTR methods over the Single-target Regression (the traditional ML method), reducing the predictive error for 5 parameters. Particularly, MTSG obtained the lowest error for phosphorus, total organic carbon and cation exchange capacity. When observing the relative performance of Support Vector Machine with a radial kernel, the prediction of base saturation percentage was improved in 19%. Finally, the proposed method was able to reduce the average error from 0.67 (single-target) to 0.64 analysing all targets, representing a global improvement of 4.48%.
Tasks
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04312v1
PDF	https://arxiv.org/pdf/2002.04312v1.pdf
PWC	https://paperswithcode.com/paper/improved-prediction-of-soil-properties-with
Repo
Framework

Fast Reinforcement Learning for Anti-jamming Communications


Title	Fast Reinforcement Learning for Anti-jamming Communications
Authors	Pei-Gen Ye, Yuan-Gen Wang, Jin Li, Liang Xiao
Abstract	This letter presents a fast reinforcement learning algorithm for anti-jamming communications which chooses previous action with probability $\tau$ and applies $\epsilon$-greedy with probability $(1-\tau)$. A dynamic threshold based on the average value of previous several actions is designed and probability $\tau$ is formulated as a Gaussian-like function to guide the wireless devices. As a concrete example, the proposed algorithm is implemented in a wireless communication system against multiple jammers. Experimental results demonstrate that the proposed algorithm exceeds Q-learing, deep Q-networks (DQN), double DQN (DDQN), and prioritized experience reply based DDQN (PDDQN), in terms of signal-to-interference-plus-noise ratio and convergence rate.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05364v1
PDF	https://arxiv.org/pdf/2002.05364v1.pdf
PWC	https://paperswithcode.com/paper/fast-reinforcement-learning-for-anti-jamming
Repo
Framework

Query-Efficient Physical Hard-Label Attacks on Deep Learning Visual Classification


Title	Query-Efficient Physical Hard-Label Attacks on Deep Learning Visual Classification
Authors	Ryan Feng, Jiefeng Chen, Nelson Manohar, Earlence Fernandes, Somesh Jha, Atul Prakash
Abstract	We present Survival-OPT, a physical adversarial example algorithm in the black-box hard-label setting where the attacker only has access to the model prediction class label. Assuming such limited access to the model is more relevant for settings such as proprietary cyber-physical and cloud systems than the whitebox setting assumed by prior work. By leveraging the properties of physical attacks, we create a novel approach based on the survivability of perturbations corresponding to physical transformations. Through simply querying the model for hard-label predictions, we optimize perturbations to survive in many different physical conditions and show that adversarial examples remain a security risk to cyber-physical systems (CPSs) even in the hard-label threat model. We show that Survival-OPT is query-efficient and robust: using fewer than 200K queries, we successfully attack a stop sign to be misclassified as a speed limit 30 km/hr sign in 98.5% of video frames in a drive-by setting. Survival-OPT also outperforms our baseline combination of existing hard-label and physical approaches, which required over 10x more queries for less robust results.
Tasks
Published	2020-02-17
URL	https://arxiv.org/abs/2002.07088v1
PDF	https://arxiv.org/pdf/2002.07088v1.pdf
PWC	https://paperswithcode.com/paper/query-efficient-physical-hard-label-attacks
Repo
Framework

The role of regularization in classification of high-dimensional noisy Gaussian mixture


Title	The role of regularization in classification of high-dimensional noisy Gaussian mixture
Authors	Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborová
Abstract	We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and their dimension $d$ go to infinity while their ratio is fixed to $\alpha= n/d$. We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances. We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters.
Tasks
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11544v1
PDF	https://arxiv.org/pdf/2002.11544v1.pdf
PWC	https://paperswithcode.com/paper/the-role-of-regularization-in-classification
Repo
Framework

Towards new cross-validation-based estimators for Gaussian process regression: efficient adjoint computation of gradients


Title	Towards new cross-validation-based estimators for Gaussian process regression: efficient adjoint computation of gradients
Authors	Sébastien Petit, Julien Bect, Sébastien da Veiga, Paul Feliot, Emmanuel Vazquez
Abstract	We consider the problem of estimating the parameters of the covariance function of a Gaussian process by cross-validation. We suggest using new cross-validation criteria derived from the literature of scoring rules. We also provide an efficient method for computing the gradient of a cross-validation criterion. To the best of our knowledge, our method is more efficient than what has been proposed in the literature so far. It makes it possible to lower the complexity of jointly evaluating leave-one-out criteria and their gradients.
Tasks
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11543v1
PDF	https://arxiv.org/pdf/2002.11543v1.pdf
PWC	https://paperswithcode.com/paper/towards-new-cross-validation-based-estimators
Repo
Framework

A general framework for ensemble distribution distillation


Title	A general framework for ensemble distribution distillation
Authors	Jakob Lindqvist, Amanda Olmin, Fredrik Lindsten, Lennart Svensson
Abstract	Ensembles of neural networks have been shown to give better performance than single networks, both in terms of predictions and uncertainty estimation. Additionally, ensembles allow the uncertainty to be decomposed into aleatoric (data) and epistemic (model) components, giving a more complete picture of the predictive uncertainty. Ensemble distillation is the process of compressing an ensemble into a single model, often resulting in a leaner model that still outperforms the individual ensemble members. Unfortunately, standard distillation erases the natural uncertainty decomposition of the ensemble. We present a general framework for distilling both regression and classification ensembles in a way that preserves the decomposition. We demonstrate the desired behaviour of our framework and show that its predictive performance is on par with standard distillation.
Tasks
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11531v1
PDF	https://arxiv.org/pdf/2002.11531v1.pdf
PWC	https://paperswithcode.com/paper/a-general-framework-for-ensemble-distribution
Repo
Framework

Decidability of Sample Complexity of PAC Learning in finite setting


Title	Decidability of Sample Complexity of PAC Learning in finite setting
Authors	Alberto Gandolfi
Abstract	In this short note we observe that the sample complexity of PAC machine learning of various concepts, including learning the maximum (EMX), can be exactly determined when the support of the probability measures considered as models satisfies an a-priori bound. This result contrasts with the recently discovered undecidability of EMX within ZFC for finitely supported probabilities (with no a priori bound). Unfortunately, the decision procedure is at present, at least doubly exponential in the number of points times the uniform bound on the support size.
Tasks
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11519v1
PDF	https://arxiv.org/pdf/2002.11519v1.pdf
PWC	https://paperswithcode.com/paper/decidability-of-sample-complexity-of-pac
Repo
Framework

Low-Rank Bottleneck in Multi-head Attention Models


Title	Low-Rank Bottleneck in Multi-head Attention Models
Authors	Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
Abstract	Attention based Transformer architecture has enabled significant advances in the field of natural language processing. In addition to new pre-training techniques, recent improvements crucially rely on working with a relatively larger embedding dimension for tokens. Unfortunately, this leads to models that are prohibitively large to be employed in the downstream tasks. In this paper we identify one of the important factors contributing to the large embedding size requirement. In particular, our analysis highlights that the scaling between the number of heads and the size of each head in the current architecture gives rise to a low-rank bottleneck in attention heads, causing this limitation. We further validate this in our experiments. As a solution we propose to set the head size of an attention unit to input sequence length, and independent of the number of heads, resulting in multi-head attention layers with provably more expressive power. We empirically show that this allows us to train models with a relatively smaller embedding dimension and with better performance scaling.
Tasks
Published	2020-02-17
URL	https://arxiv.org/abs/2002.07028v1
PDF	https://arxiv.org/pdf/2002.07028v1.pdf
PWC	https://paperswithcode.com/paper/low-rank-bottleneck-in-multi-head-attention
Repo
Framework

On the Global Convergence of Training Deep Linear ResNets


Title	On the Global Convergence of Training Deep Linear ResNets
Authors	Difan Zou, Philip M. Long, Quanquan Gu
Abstract	We study the convergence of gradient descent (GD) and stochastic gradient descent (SGD) for training $L$-hidden-layer linear residual networks (ResNets). We prove that for training deep residual networks with certain linear transformations at input and output layers, which are fixed throughout training, both GD and SGD with zero initialization on all hidden weights can converge to the global minimum of the training loss. Moreover, when specializing to appropriate Gaussian random linear transformations, GD and SGD provably optimize wide enough deep linear ResNets. Compared with the global convergence result of GD for training standard deep linear networks (Du & Hu 2019), our condition on the neural network width is sharper by a factor of $O(\kappa L)$, where $\kappa$ denotes the condition number of the covariance matrix of the training data. We further propose a modified identity input and output transformations, and show that a $(d+k)$-wide neural network is sufficient to guarantee the global convergence of GD/SGD, where $d,k$ are the input and output dimensions respectively.
Tasks
Published	2020-03-02
URL	https://arxiv.org/abs/2003.01094v1
PDF	https://arxiv.org/pdf/2003.01094v1.pdf
PWC	https://paperswithcode.com/paper/on-the-global-convergence-of-training-deep-1
Repo
Framework