Paper Group ANR 307
Guiding attention in Sequence-to-sequence models for Dialogue Act prediction. Nonparametric Regression Quantum Neural Networks. Label-guided Learning for Text Classification. Neural Autopoiesis: Organizing Self-Boundary by Stimulus Avoidance in Biological and Artificial Neural Networks. Randomization matters. How to defend against strong adversaria …
Guiding attention in Sequence-to-sequence models for Dialogue Act prediction
Title | Guiding attention in Sequence-to-sequence models for Dialogue Act prediction |
Authors | Pierre Colombo, Emile Chapuis, Matteo Manica, Emmanuel Vignon, Giovanna Varni, Chloe Clavel |
Abstract | The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA. |
Tasks | Machine Translation |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08801v2 |
https://arxiv.org/pdf/2002.08801v2.pdf | |
PWC | https://paperswithcode.com/paper/guiding-attention-in-sequence-to-sequence |
Repo | |
Framework | |
Nonparametric Regression Quantum Neural Networks
Title | Nonparametric Regression Quantum Neural Networks |
Authors | Do Ngoc Diep, Koji Nagata, Tadao Nakamura |
Abstract | In two pervious papers \cite{dndiep3}, \cite{dndiep4}, the first author constructed the least square quantum neural networks (LS-QNN), and ploynomial interpolation quantum neural networks ( PI-QNN), parametrico-stattistical QNN like: leanr regrassion quantum neural networks (LR-QNN), polynomial regression quantum neural networks (PR-QNN), chi-squared quantum neural netowrks ($\chi^2$-QNN). We observed that the method works also in the cases by using nonparametric statistics. In this paper we analyze and implement the nonparametric tests on QNN such as: linear nonparametric regression quantum neural networks (LNR-QNN), polynomial nonparametric regression quantum neural networks (PNR-QNN). The implementation is constructed through the Gauss-Jordan Elimination quantum neural networks (GJE-QNN).The training rule is to use the high probability confidence regions or intervals. |
Tasks | |
Published | 2020-02-07 |
URL | https://arxiv.org/abs/2002.02818v1 |
https://arxiv.org/pdf/2002.02818v1.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-regression-quantum-neural |
Repo | |
Framework | |
Label-guided Learning for Text Classification
Title | Label-guided Learning for Text Classification |
Authors | Xien Liu, Song Wang, Xiao Zhang, Xinxin You, Ji Wu, Dejing Dou |
Abstract | Text classification is one of the most important and fundamental tasks in natural language processing. Performance of this task mainly dependents on text representation learning. Currently, most existing learning frameworks mainly focus on encoding local contextual information between words. These methods always neglect to exploit global clues, such as label information, for encoding text information. In this study, we propose a label-guided learning framework LguidedLearn for text representation and classification. Our method is novel but simple that we only insert a label-guided encoding layer into the commonly used text representation learning schemas. That label-guided layer performs label-based attentive encoding to map the universal text embedding (encoded by a contextual information learner) into different label spaces, resulting in label-wise embeddings. In our proposed framework, the label-guided layer can be easily and directly applied with a contextual encoding method to perform jointly learning. Text information is encoded based on both the local contextual information and the global label clues. Therefore, the obtained text embeddings are more robust and discriminative for text classification. Extensive experiments are conducted on benchmark datasets to illustrate the effectiveness of our proposed method. |
Tasks | Representation Learning, Text Classification |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10772v1 |
https://arxiv.org/pdf/2002.10772v1.pdf | |
PWC | https://paperswithcode.com/paper/label-guided-learning-for-text-classification |
Repo | |
Framework | |
Neural Autopoiesis: Organizing Self-Boundary by Stimulus Avoidance in Biological and Artificial Neural Networks
Title | Neural Autopoiesis: Organizing Self-Boundary by Stimulus Avoidance in Biological and Artificial Neural Networks |
Authors | Atsushi Masumori, Lana Sinapayen, Norihiro Maruyama, Takeshi Mita, Douglas Bakkum, Urs Frey, Hirokazu Takahashi, Takashi Ikegami |
Abstract | Living organisms must actively maintain themselves in order to continue existing. Autopoiesis is a key concept in the study of living organisms, where the boundaries of the organism is not static by dynamically regulated by the system itself. To study the autonomous regulation of self-boundary, we focus on neural homeodynamic responses to environmental changes using both biological and artificial neural networks. Previous studies showed that embodied cultured neural networks and spiking neural networks with spike-timing dependent plasticity (STDP) learn an action as they avoid stimulation from outside. In this paper, as a result of our experiments using embodied cultured neurons, we find that there is also a second property allowing the network to avoid stimulation: if the agent cannot learn an action to avoid the external stimuli, it tends to decrease the stimulus-evoked spikes, as if to ignore the uncontrollable-input. We also show such a behavior is reproduced by spiking neural networks with asymmetric STDP. We consider that these properties are regarded as autonomous regulation of self and non-self for the network, in which a controllable-neuron is regarded as self, and an uncontrollable-neuron is regarded as non-self. Finally, we introduce neural autopoiesis by proposing the principle of stimulus avoidance. |
Tasks | |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.09641v1 |
https://arxiv.org/pdf/2001.09641v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-autopoiesis-organizing-self-boundary |
Repo | |
Framework | |
Randomization matters. How to defend against strong adversarial attacks
Title | Randomization matters. How to defend against strong adversarial attacks |
Authors | Rafael Pinot, Raphael Ettedgui, Geovani Rizk, Yann Chevaleyre, Jamal Atif |
Abstract | Is there a classifier that ensures optimal robustness against all adversarial attacks? This paper answers this question by adopting a game-theoretic point of view. We show that adversarial attacks and defenses form an infinite zero-sum game where classical results (e.g. Sion theorem) do not apply. We demonstrate the non-existence of a Nash equilibrium in our game when the classifier and the Adversary are both deterministic, hence giving a negative answer to the above question in the deterministic regime. Nonetheless, the question remains open in the randomized regime. We tackle this problem by showing that, undermild conditions on the dataset distribution, any deterministic classifier can be outperformed by a randomized one. This gives arguments for using randomization, and leads us to a new algorithm for building randomized classifiers that are robust to strong adversarial attacks. Empirical results validate our theoretical analysis, and show that our defense method considerably outperforms Adversarial Training against state-of-the-art attacks. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11565v1 |
https://arxiv.org/pdf/2002.11565v1.pdf | |
PWC | https://paperswithcode.com/paper/randomization-matters-how-to-defend-against |
Repo | |
Framework | |
Key Phrase Classification in Complex Assignments
Title | Key Phrase Classification in Complex Assignments |
Authors | Manikandan Ravikiran |
Abstract | Complex assignments typically consist of open-ended questions with large and diverse content in the context of both classroom and online graduate programs. With the sheer scale of these programs comes a variety of problems in peer and expert feedback, including rogue reviews. As such with the hope of identifying important contents needed for the review, in this work we present a very first work on key phrase classification with a detailed empirical study on traditional and most recent language modeling approaches. From this study, we find that the task of classification of key phrases is ambiguous at a human level producing Cohen’s kappa of 0.77 on a new data set. Both pretrained language models and simple TFIDF SVM classifiers produce similar results with a former producing average of 0.6 F1 higher than the latter. We finally derive practical advice from our extensive empirical and model interpretability results for those interested in key phrase classification from educational reports in the future. |
Tasks | Language Modelling |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07019v1 |
https://arxiv.org/pdf/2003.07019v1.pdf | |
PWC | https://paperswithcode.com/paper/key-phrase-classification-in-complex |
Repo | |
Framework | |
Improved prediction of soil properties with Multi-target Stacked Generalisation on EDXRF spectra
Title | Improved prediction of soil properties with Multi-target Stacked Generalisation on EDXRF spectra |
Authors | Everton Jose Santana, Felipe Rodrigues dos Santos, Saulo Martiello Mastelini, Fabio Luiz Melquiades, Sylvio Barbon Jr |
Abstract | Machine Learning (ML) algorithms have been used for assessing soil quality parameters along with non-destructive methodologies. Among spectroscopic analytical methodologies, energy dispersive X-ray fluorescence (EDXRF) is one of the more quick, environmentally friendly and less expensive when compared to conventional methods. However, some challenges in EDXRF spectral data analysis still demand more efficient methods capable of providing accurate outcomes. Using Multi-target Regression (MTR) methods, multiple parameters can be predicted, and also taking advantage of inter-correlated parameters the overall predictive performance can be improved. In this study, we proposed the Multi-target Stacked Generalisation (MTSG), a novel MTR method relying on learning from different regressors arranged in stacking structure for a boosted outcome. We compared MTSG and 5 MTR methods for predicting 10 parameters of soil fertility. Random Forest and Support Vector Machine (with linear and radial kernels) were used as learning algorithms embedded into each MTR method. Results showed the superiority of MTR methods over the Single-target Regression (the traditional ML method), reducing the predictive error for 5 parameters. Particularly, MTSG obtained the lowest error for phosphorus, total organic carbon and cation exchange capacity. When observing the relative performance of Support Vector Machine with a radial kernel, the prediction of base saturation percentage was improved in 19%. Finally, the proposed method was able to reduce the average error from 0.67 (single-target) to 0.64 analysing all targets, representing a global improvement of 4.48%. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04312v1 |
https://arxiv.org/pdf/2002.04312v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-prediction-of-soil-properties-with |
Repo | |
Framework | |
Fast Reinforcement Learning for Anti-jamming Communications
Title | Fast Reinforcement Learning for Anti-jamming Communications |
Authors | Pei-Gen Ye, Yuan-Gen Wang, Jin Li, Liang Xiao |
Abstract | This letter presents a fast reinforcement learning algorithm for anti-jamming communications which chooses previous action with probability $\tau$ and applies $\epsilon$-greedy with probability $(1-\tau)$. A dynamic threshold based on the average value of previous several actions is designed and probability $\tau$ is formulated as a Gaussian-like function to guide the wireless devices. As a concrete example, the proposed algorithm is implemented in a wireless communication system against multiple jammers. Experimental results demonstrate that the proposed algorithm exceeds Q-learing, deep Q-networks (DQN), double DQN (DDQN), and prioritized experience reply based DDQN (PDDQN), in terms of signal-to-interference-plus-noise ratio and convergence rate. |
Tasks | |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05364v1 |
https://arxiv.org/pdf/2002.05364v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-reinforcement-learning-for-anti-jamming |
Repo | |
Framework | |
Query-Efficient Physical Hard-Label Attacks on Deep Learning Visual Classification
Title | Query-Efficient Physical Hard-Label Attacks on Deep Learning Visual Classification |
Authors | Ryan Feng, Jiefeng Chen, Nelson Manohar, Earlence Fernandes, Somesh Jha, Atul Prakash |
Abstract | We present Survival-OPT, a physical adversarial example algorithm in the black-box hard-label setting where the attacker only has access to the model prediction class label. Assuming such limited access to the model is more relevant for settings such as proprietary cyber-physical and cloud systems than the whitebox setting assumed by prior work. By leveraging the properties of physical attacks, we create a novel approach based on the survivability of perturbations corresponding to physical transformations. Through simply querying the model for hard-label predictions, we optimize perturbations to survive in many different physical conditions and show that adversarial examples remain a security risk to cyber-physical systems (CPSs) even in the hard-label threat model. We show that Survival-OPT is query-efficient and robust: using fewer than 200K queries, we successfully attack a stop sign to be misclassified as a speed limit 30 km/hr sign in 98.5% of video frames in a drive-by setting. Survival-OPT also outperforms our baseline combination of existing hard-label and physical approaches, which required over 10x more queries for less robust results. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.07088v1 |
https://arxiv.org/pdf/2002.07088v1.pdf | |
PWC | https://paperswithcode.com/paper/query-efficient-physical-hard-label-attacks |
Repo | |
Framework | |
The role of regularization in classification of high-dimensional noisy Gaussian mixture
Title | The role of regularization in classification of high-dimensional noisy Gaussian mixture |
Authors | Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborová |
Abstract | We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and their dimension $d$ go to infinity while their ratio is fixed to $\alpha= n/d$. We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances. We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11544v1 |
https://arxiv.org/pdf/2002.11544v1.pdf | |
PWC | https://paperswithcode.com/paper/the-role-of-regularization-in-classification |
Repo | |
Framework | |
Towards new cross-validation-based estimators for Gaussian process regression: efficient adjoint computation of gradients
Title | Towards new cross-validation-based estimators for Gaussian process regression: efficient adjoint computation of gradients |
Authors | Sébastien Petit, Julien Bect, Sébastien da Veiga, Paul Feliot, Emmanuel Vazquez |
Abstract | We consider the problem of estimating the parameters of the covariance function of a Gaussian process by cross-validation. We suggest using new cross-validation criteria derived from the literature of scoring rules. We also provide an efficient method for computing the gradient of a cross-validation criterion. To the best of our knowledge, our method is more efficient than what has been proposed in the literature so far. It makes it possible to lower the complexity of jointly evaluating leave-one-out criteria and their gradients. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11543v1 |
https://arxiv.org/pdf/2002.11543v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-new-cross-validation-based-estimators |
Repo | |
Framework | |
A general framework for ensemble distribution distillation
Title | A general framework for ensemble distribution distillation |
Authors | Jakob Lindqvist, Amanda Olmin, Fredrik Lindsten, Lennart Svensson |
Abstract | Ensembles of neural networks have been shown to give better performance than single networks, both in terms of predictions and uncertainty estimation. Additionally, ensembles allow the uncertainty to be decomposed into aleatoric (data) and epistemic (model) components, giving a more complete picture of the predictive uncertainty. Ensemble distillation is the process of compressing an ensemble into a single model, often resulting in a leaner model that still outperforms the individual ensemble members. Unfortunately, standard distillation erases the natural uncertainty decomposition of the ensemble. We present a general framework for distilling both regression and classification ensembles in a way that preserves the decomposition. We demonstrate the desired behaviour of our framework and show that its predictive performance is on par with standard distillation. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11531v1 |
https://arxiv.org/pdf/2002.11531v1.pdf | |
PWC | https://paperswithcode.com/paper/a-general-framework-for-ensemble-distribution |
Repo | |
Framework | |
Decidability of Sample Complexity of PAC Learning in finite setting
Title | Decidability of Sample Complexity of PAC Learning in finite setting |
Authors | Alberto Gandolfi |
Abstract | In this short note we observe that the sample complexity of PAC machine learning of various concepts, including learning the maximum (EMX), can be exactly determined when the support of the probability measures considered as models satisfies an a-priori bound. This result contrasts with the recently discovered undecidability of EMX within ZFC for finitely supported probabilities (with no a priori bound). Unfortunately, the decision procedure is at present, at least doubly exponential in the number of points times the uniform bound on the support size. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11519v1 |
https://arxiv.org/pdf/2002.11519v1.pdf | |
PWC | https://paperswithcode.com/paper/decidability-of-sample-complexity-of-pac |
Repo | |
Framework | |
Low-Rank Bottleneck in Multi-head Attention Models
Title | Low-Rank Bottleneck in Multi-head Attention Models |
Authors | Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar |
Abstract | Attention based Transformer architecture has enabled significant advances in the field of natural language processing. In addition to new pre-training techniques, recent improvements crucially rely on working with a relatively larger embedding dimension for tokens. Unfortunately, this leads to models that are prohibitively large to be employed in the downstream tasks. In this paper we identify one of the important factors contributing to the large embedding size requirement. In particular, our analysis highlights that the scaling between the number of heads and the size of each head in the current architecture gives rise to a low-rank bottleneck in attention heads, causing this limitation. We further validate this in our experiments. As a solution we propose to set the head size of an attention unit to input sequence length, and independent of the number of heads, resulting in multi-head attention layers with provably more expressive power. We empirically show that this allows us to train models with a relatively smaller embedding dimension and with better performance scaling. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.07028v1 |
https://arxiv.org/pdf/2002.07028v1.pdf | |
PWC | https://paperswithcode.com/paper/low-rank-bottleneck-in-multi-head-attention |
Repo | |
Framework | |
On the Global Convergence of Training Deep Linear ResNets
Title | On the Global Convergence of Training Deep Linear ResNets |
Authors | Difan Zou, Philip M. Long, Quanquan Gu |
Abstract | We study the convergence of gradient descent (GD) and stochastic gradient descent (SGD) for training $L$-hidden-layer linear residual networks (ResNets). We prove that for training deep residual networks with certain linear transformations at input and output layers, which are fixed throughout training, both GD and SGD with zero initialization on all hidden weights can converge to the global minimum of the training loss. Moreover, when specializing to appropriate Gaussian random linear transformations, GD and SGD provably optimize wide enough deep linear ResNets. Compared with the global convergence result of GD for training standard deep linear networks (Du & Hu 2019), our condition on the neural network width is sharper by a factor of $O(\kappa L)$, where $\kappa$ denotes the condition number of the covariance matrix of the training data. We further propose a modified identity input and output transformations, and show that a $(d+k)$-wide neural network is sufficient to guarantee the global convergence of GD/SGD, where $d,k$ are the input and output dimensions respectively. |
Tasks | |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.01094v1 |
https://arxiv.org/pdf/2003.01094v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-global-convergence-of-training-deep-1 |
Repo | |
Framework | |