Paper Group ANR 215
Quantum Statistics-Inspired Neural Attention. Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo. On Numerical Estimation of Joint Probability Distribution from Lebesgue Integral Quadratures. Guaranteed Deterministic Bounds on the Total Variation Distance between Univariate Mixtures. Causal Interventions for Fairness. Segment-based …
Quantum Statistics-Inspired Neural Attention
Title | Quantum Statistics-Inspired Neural Attention |
Authors | Aristotelis Charalampous, Sotirios Chatzis |
Abstract | Sequence-to-sequence (encoder-decoder) models with attention constitute a cornerstone of deep learning research, as they have enabled unprecedented sequential data modeling capabilities. This effectiveness largely stems from the capacity of these models to infer salient temporal dynamics over long horizons; these are encoded into the obtained neural attention (NA) distributions. However, existing NA formulations essentially constitute point-wise selection mechanisms over the observed source sequences; that is, attention weights computation relies on the assumption that each source sequence element is independent of the rest. Unfortunately, although convenient, this assumption fails to account for higher-order dependencies which might be prevalent in real-world data. This paper addresses these limitations by leveraging Quantum-Statistical modeling arguments. Specifically, our work broadens the notion of NA, by attempting to account for the case that the NA model becomes inherently incapable of discerning between individual source elements; this is assumed to be the case due to higher-order temporal dynamics. On the contrary, we postulate that in some cases selection may be feasible only at the level of pairs of source sequence elements. To this end, we cast NA into inference of an attention density matrix (ADM) approximation. We derive effective training and inference algorithms, and evaluate our approach in the context of a machine translation (MT) application. We perform experiments with challenging benchmark datasets. As we show, our approach yields favorable outcomes in terms of several evaluation metrics. |
Tasks | Machine Translation |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06205v2 |
http://arxiv.org/pdf/1809.06205v2.pdf | |
PWC | https://paperswithcode.com/paper/quantum-statistics-inspired-neural-attention |
Repo | |
Framework | |
Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo
Title | Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo |
Authors | Oren Mangoubi, Nisheeth K. Vishnoi |
Abstract | Hamiltonian Monte Carlo (HMC) is a widely deployed method to sample from high-dimensional distributions in Statistics and Machine learning. HMC is known to run very efficiently in practice and its popular second-order “leapfrog” implementation has long been conjectured to run in $d^{1/4}$ gradient evaluations. Here we show that this conjecture is true when sampling from strongly log-concave target distributions that satisfy a weak third-order regularity property associated with the input data. Our regularity condition is weaker than the Lipschitz Hessian property and allows us to show faster convergence bounds for a much larger class of distributions than would be possible with the usual Lipschitz Hessian constant alone. Important distributions that satisfy our regularity condition include posterior distributions used in Bayesian logistic regression for which the data satisfies an “incoherence” property. Our result compares favorably with the best available bounds for the class of strongly log-concave distributions, which grow like $d^{{1}/{2}}$ gradient evaluations with the dimension. Moreover, our simulations on synthetic data suggest that, when our regularity condition is satisfied, leapfrog HMC performs better than its competitors – both in terms of accuracy and in terms of the number of gradient evaluations it requires. |
Tasks | |
Published | 2018-02-24 |
URL | http://arxiv.org/abs/1802.08898v5 |
http://arxiv.org/pdf/1802.08898v5.pdf | |
PWC | https://paperswithcode.com/paper/dimensionally-tight-bounds-for-second-order |
Repo | |
Framework | |
On Numerical Estimation of Joint Probability Distribution from Lebesgue Integral Quadratures
Title | On Numerical Estimation of Joint Probability Distribution from Lebesgue Integral Quadratures |
Authors | Vladislav Gennadievich Malyshkin |
Abstract | An important application of Lebesgue integral quadrature[1] is developed. Given two random processes, $f(x)$ and $g(x)$, two generalized eigenvalue problems can be formulated and solved. In addition to obtaining two Lebesgue quadratures (for $f$ and $g$) from two eigenproblems, the projections of $f$– and $g$– eigenvectors on each other allow to build a joint distribution estimator, the most general form of which is a density–matrix correlation. The examples of the density–matrix correlation can be the value–correlation $V_{f_i;g_j}$, similar to the regular correlation concept, and a new one, the probability–correlation $P_{f_i;g_j}$. The theory is implemented numerically; the software is available under the GPLv3 license. |
Tasks | |
Published | 2018-07-21 |
URL | http://arxiv.org/abs/1807.08197v3 |
http://arxiv.org/pdf/1807.08197v3.pdf | |
PWC | https://paperswithcode.com/paper/on-numerical-estimation-of-joint-probability |
Repo | |
Framework | |
Guaranteed Deterministic Bounds on the Total Variation Distance between Univariate Mixtures
Title | Guaranteed Deterministic Bounds on the Total Variation Distance between Univariate Mixtures |
Authors | Frank Nielsen, Ke Sun |
Abstract | The total variation distance is a core statistical distance between probability measures that satisfies the metric axioms, with value always falling in $[0,1]$. This distance plays a fundamental role in machine learning and signal processing: It is a member of the broader class of $f$-divergences, and it is related to the probability of error in Bayesian hypothesis testing. Since the total variation distance does not admit closed-form expressions for statistical mixtures (like Gaussian mixture models), one often has to rely in practice on costly numerical integrations or on fast Monte Carlo approximations that however do not guarantee deterministic lower and upper bounds. In this work, we consider two methods for bounding the total variation of univariate mixture models: The first method is based on the information monotonicity property of the total variation to design guaranteed nested deterministic lower bounds. The second method relies on computing the geometric lower and upper envelopes of weighted mixture components to derive deterministic bounds based on density ratio. We demonstrate the tightness of our bounds in a series of experiments on Gaussian, Gamma and Rayleigh mixture models. |
Tasks | |
Published | 2018-06-29 |
URL | http://arxiv.org/abs/1806.11311v1 |
http://arxiv.org/pdf/1806.11311v1.pdf | |
PWC | https://paperswithcode.com/paper/guaranteed-deterministic-bounds-on-the-total |
Repo | |
Framework | |
Causal Interventions for Fairness
Title | Causal Interventions for Fairness |
Authors | Matt J. Kusner, Chris Russell, Joshua R. Loftus, Ricardo Silva |
Abstract | Most approaches in algorithmic fairness constrain machine learning methods so the resulting predictions satisfy one of several intuitive notions of fairness. While this may help private companies comply with non-discrimination laws or avoid negative publicity, we believe it is often too little, too late. By the time the training data is collected, individuals in disadvantaged groups have already suffered from discrimination and lost opportunities due to factors out of their control. In the present work we focus instead on interventions such as a new public policy, and in particular, how to maximize their positive effects while improving the fairness of the overall system. We use causal methods to model the effects of interventions, allowing for potential interference–each individual’s outcome may depend on who else receives the intervention. We demonstrate this with an example of allocating a budget of teaching resources using a dataset of schools in New York City. |
Tasks | |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02380v1 |
http://arxiv.org/pdf/1806.02380v1.pdf | |
PWC | https://paperswithcode.com/paper/causal-interventions-for-fairness |
Repo | |
Framework | |
Segment-based Methods for Facial Attribute Detection from Partial Faces
Title | Segment-based Methods for Facial Attribute Detection from Partial Faces |
Authors | Upal Mahbub, Sayantan Sarkar, Rama Chellappa |
Abstract | State-of-the-art methods of attribute detection from faces almost always assume the presence of a full, unoccluded face. Hence, their performance degrades for partially visible and occluded faces. In this paper, we introduce SPLITFACE, a deep convolutional neural network-based method that is explicitly designed to perform attribute detection in partially occluded faces. Taking several facial segments and the full face as input, the proposed method takes a data driven approach to determine which attributes are localized in which facial segments. The unique architecture of the network allows each attribute to be predicted by multiple segments, which permits the implementation of committee machine techniques for combining local and global decisions to boost performance. With access to segment-based predictions, SPLITFACE can predict well those attributes which are localized in the visible parts of the face, without having to rely on the presence of the whole face. We use the CelebA and LFWA facial attribute datasets for standard evaluations. We also modify both datasets, to occlude the faces, so that we can evaluate the performance of attribute detection algorithms on partial faces. Our evaluation shows that SPLITFACE significantly outperforms other recent methods especially for partial faces. |
Tasks | |
Published | 2018-01-10 |
URL | http://arxiv.org/abs/1801.03546v1 |
http://arxiv.org/pdf/1801.03546v1.pdf | |
PWC | https://paperswithcode.com/paper/segment-based-methods-for-facial-attribute |
Repo | |
Framework | |
Review of Deep Learning
Title | Review of Deep Learning |
Authors | Rong Zhang, Weiping Li, Tong Mo |
Abstract | In recent years, China, the United States and other countries, Google and other high-tech companies have increased investment in artificial intelligence. Deep learning is one of the current artificial intelligence research’s key areas. This paper analyzes and summarizes the latest progress and future research directions of deep learning. Firstly, three basic models of deep learning are outlined, including multilayer perceptrons, convolutional neural networks, and recurrent neural networks. On this basis, we further analyze the emerging new models of convolution neural networks and recurrent neural networks. This paper then summarizes deep learning’s applications in many areas of artificial intelligence, including speech processing, computer vision, natural language processing and so on. Finally, this paper discusses the existing problems of deep learning and gives the corresponding possible solutions. |
Tasks | |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1804.01653v2 |
http://arxiv.org/pdf/1804.01653v2.pdf | |
PWC | https://paperswithcode.com/paper/review-of-deep-learning |
Repo | |
Framework | |
Online Improper Learning with an Approximation Oracle
Title | Online Improper Learning with an Approximation Oracle |
Authors | Elad Hazan, Wei Hu, Yuanzhi Li, Zhiyuan Li |
Abstract | We revisit the question of reducing online learning to approximate optimization of the offline problem. In this setting, we give two algorithms with near-optimal performance in the full information setting: they guarantee optimal regret and require only poly-logarithmically many calls to the approximation oracle per iteration. Furthermore, these algorithms apply to the more general improper learning problems. In the bandit setting, our algorithm also significantly improves the best previously known oracle complexity while maintaining the same regret. |
Tasks | |
Published | 2018-04-20 |
URL | http://arxiv.org/abs/1804.07837v1 |
http://arxiv.org/pdf/1804.07837v1.pdf | |
PWC | https://paperswithcode.com/paper/online-improper-learning-with-an |
Repo | |
Framework | |
A Local Information Criterion for Dynamical Systems
Title | A Local Information Criterion for Dynamical Systems |
Authors | Arash Mehrjou, Friedrich Solowjow, Sebastian Trimpe, Bernhard Schölkopf |
Abstract | Encoding a sequence of observations is an essential task with many applications. The encoding can become highly efficient when the observations are generated by a dynamical system. A dynamical system imposes regularities on the observations that can be leveraged to achieve a more efficient code. We propose a method to encode a given or learned dynamical system. Apart from its application for encoding a sequence of observations, we propose to use the compression achieved by this encoding as a criterion for model selection. Given a dataset, different learning algorithms result in different models. But not all learned models are equally good. We show that the proposed encoding approach can be used to choose the learned model which is closer to the true underlying dynamics. We provide experiments for both encoding and model selection, and theoretical results that shed light on why the approach works. |
Tasks | Model Selection |
Published | 2018-05-27 |
URL | http://arxiv.org/abs/1805.10615v1 |
http://arxiv.org/pdf/1805.10615v1.pdf | |
PWC | https://paperswithcode.com/paper/a-local-information-criterion-for-dynamical |
Repo | |
Framework | |
Data Dropout: Optimizing Training Data for Convolutional Neural Networks
Title | Data Dropout: Optimizing Training Data for Convolutional Neural Networks |
Authors | Tianyang Wang, Jun Huan, Bo Li |
Abstract | Deep learning models learn to fit training data while they are highly expected to generalize well to testing data. Most works aim at finding such models by creatively designing architectures and fine-tuning parameters. To adapt to particular tasks, hand-crafted information such as image prior has also been incorporated into end-to-end learning. However, very little progress has been made on investigating how an individual training sample will influence the generalization ability of a model. In other words, to achieve high generalization accuracy, do we really need all the samples in a training dataset? In this paper, we demonstrate that deep learning models such as convolutional neural networks may not favor all training samples, and generalization accuracy can be further improved by dropping those unfavorable samples. Specifically, the influence of removing a training sample is quantifiable, and we propose a Two-Round Training approach, aiming to achieve higher generalization accuracy. We locate unfavorable samples after the first round of training, and then retrain the model from scratch with the reduced training dataset in the second round. Since our approach is essentially different from fine-tuning or further training, the computational cost should not be a concern. Our extensive experimental results indicate that, with identical settings, the proposed approach can boost performance of the well-known networks on both high-level computer vision problems such as image classification, and low-level vision problems such as image denoising. |
Tasks | Denoising, Image Classification, Image Denoising |
Published | 2018-09-01 |
URL | http://arxiv.org/abs/1809.00193v2 |
http://arxiv.org/pdf/1809.00193v2.pdf | |
PWC | https://paperswithcode.com/paper/data-dropout-optimizing-training-data-for |
Repo | |
Framework | |
Non-Negative Networks Against Adversarial Attacks
Title | Non-Negative Networks Against Adversarial Attacks |
Authors | William Fleshman, Edward Raff, Jared Sylvester, Steven Forsyth, Mark McLean |
Abstract | Adversarial attacks against neural networks are a problem of considerable importance, for which effective defenses are not yet readily available. We make progress toward this problem by showing that non-negative weight constraints can be used to improve resistance in specific scenarios. In particular, we show that they can provide an effective defense for binary classification problems with asymmetric cost, such as malware or spam detection. We also show the potential for non-negativity to be helpful to non-binary problems by applying it to image classification. |
Tasks | Image Classification |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.06108v2 |
http://arxiv.org/pdf/1806.06108v2.pdf | |
PWC | https://paperswithcode.com/paper/non-negative-networks-against-adversarial |
Repo | |
Framework | |
CNNs-based Acoustic Scene Classification using Multi-Spectrogram Fusion and Label Expansions
Title | CNNs-based Acoustic Scene Classification using Multi-Spectrogram Fusion and Label Expansions |
Authors | Weiping Zheng, Zhenyao Mo, Xiaotao Xing, Gansen Zhao |
Abstract | Spectrograms have been widely used in Convolutional Neural Networks based schemes for acoustic scene classification, such as the STFT spectrogram and the MFCC spectrogram, etc. They have different time-frequency characteristics, contributing to their own advantages and disadvantages in recognizing acoustic scenes. In this letter, a novel multi-spectrogram fusion framework is proposed, making the spectrograms complement each other. In the framework, a single CNN architecture is applied onto multiple spectrograms for feature extraction. The deep features extracted from multiple spectrograms are then fused to discriminate the acoustic scenes. Moreover, motivated by the inter-class similarities in acoustic scene datasets, a label expansion method is further proposed in which super-class labels are constructed upon the original classes. On the help of the expanded labels, the CNN models are transformed into the multitask learning form to improve the acoustic scene classification by appending the auxiliary task of super-class classification. To verify the effectiveness of the proposed methods, intensive experiments have been performed on the DCASE2017 and the LITIS Rouen datasets. Experimental results show that the proposed method can achieve promising accuracies on both datasets. Specifically, accuracies of 0.9744, 0.8865 and 0.7778 are obtained for the LITIS Rouen dataset, the DCASE Development set and Evaluation set respectively. |
Tasks | Acoustic Scene Classification, Scene Classification |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01543v1 |
http://arxiv.org/pdf/1809.01543v1.pdf | |
PWC | https://paperswithcode.com/paper/cnns-based-acoustic-scene-classification |
Repo | |
Framework | |
Do RNNs learn human-like abstract word order preferences?
Title | Do RNNs learn human-like abstract word order preferences? |
Authors | Richard Futrell, Roger P. Levy |
Abstract | RNN language models have achieved state-of-the-art results on various tasks, but what exactly they are representing about syntax is as yet unclear. Here we investigate whether RNN language models learn humanlike word order preferences in syntactic alternations. We collect language model surprisal scores for controlled sentence stimuli exhibiting major syntactic alternations in English: heavy NP shift, particle shift, the dative alternation, and the genitive alternation. We show that RNN language models reproduce human preferences in these alternations based on NP length, animacy, and definiteness. We collect human acceptability ratings for our stimuli, in the first acceptability judgment experiment directly manipulating the predictors of syntactic alternations. We show that the RNNs’ performance is similar to the human acceptability ratings and is not matched by an n-gram baseline model. Our results show that RNNs learn the abstract features of weight, animacy, and definiteness which underlie soft constraints on syntactic alternations. |
Tasks | Language Modelling |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.01866v1 |
http://arxiv.org/pdf/1811.01866v1.pdf | |
PWC | https://paperswithcode.com/paper/do-rnns-learn-human-like-abstract-word-order |
Repo | |
Framework | |
Noisy Computations during Inference: Harmful or Helpful?
Title | Noisy Computations during Inference: Harmful or Helpful? |
Authors | Minghai Qin, Dejan Vucinic |
Abstract | We study two aspects of noisy computations during inference. The first aspect is how to mitigate their side effects for naturally trained deep learning systems. One of the motivations for looking into this problem is to reduce the high power cost of conventional computing of neural networks through the use of analog neuromorphic circuits. Traditional GPU/CPU-centered deep learning architectures exhibit bottlenecks in power-restricted applications (e.g., embedded systems). The use of specialized neuromorphic circuits, where analog signals passed through memory-cell arrays are sensed to accomplish matrix-vector multiplications, promises large power savings and speed gains but brings with it the problems of limited precision of computations and unavoidable analog noise. We manage to improve inference accuracy from 21.1% to 99.5% for MNIST images, from 29.9% to 89.1% for CIFAR10, and from 15.5% to 89.6% for MNIST stroke sequences with the presence of strong noise (with signal-to-noise power ratio being 0 dB) by noise-injected training and a voting method. This observation promises neural networks that are insensitive to inference noise, which reduces the quality requirements on neuromorphic circuits and is crucial for their practical usage. The second aspect is how to utilize the noisy inference as a defensive architecture against black-box adversarial attacks. During inference, by injecting proper noise to signals in the neural networks, the robustness of adversarially-trained neural networks against black-box attacks has been further enhanced by 0.5% and 1.13% for two adversarially trained models for MNIST and CIFAR10, respectively. |
Tasks | |
Published | 2018-11-26 |
URL | http://arxiv.org/abs/1811.10649v1 |
http://arxiv.org/pdf/1811.10649v1.pdf | |
PWC | https://paperswithcode.com/paper/noisy-computations-during-inference-harmful |
Repo | |
Framework | |
Decoding of Non-Binary LDPC Codes Using the Information Bottleneck Method
Title | Decoding of Non-Binary LDPC Codes Using the Information Bottleneck Method |
Authors | Maximilian Stark, Jan Lewandowsky, Souradip Saha, Gerhard Bauch |
Abstract | Recently, a novel lookup table based decoding method for binary low-density parity-check codes has attracted considerable attention. In this approach, mutual-information maximizing lookup tables replace the conventional operations of the variable nodes and the check nodes in message passing decoding. Moreover, the exchanged messages are represented by integers with very small bit width. A machine learning framework termed the information bottleneck method is used to design the corresponding lookup tables. In this paper, we extend this decoding principle from binary to non-binary codes. This is not a straightforward extension, but requires a more sophisticated lookup table design to cope with the arithmetic in higher order Galois fields. Provided bit error rate simulations show that our proposed scheme outperforms the log-max decoding algorithm and operates close to sum-product decoding. |
Tasks | |
Published | 2018-10-21 |
URL | https://arxiv.org/abs/1810.08921v3 |
https://arxiv.org/pdf/1810.08921v3.pdf | |
PWC | https://paperswithcode.com/paper/decoding-of-non-binary-ldpc-codes-using-the |
Repo | |
Framework | |