Paper Group ANR 41
Uncertainty relations and fluctuation theorems for Bayes nets. Paradox in Deep Neural Networks: Similar yet Different while Different yet Similar. Incremental Sense Weight Training for the Interpretation of Contextualized Word Embeddings. A Survey on Face Data Augmentation. NGO-GM: Natural Gradient Optimization for Graphical Models. More Data Can H …
Uncertainty relations and fluctuation theorems for Bayes nets
Title | Uncertainty relations and fluctuation theorems for Bayes nets |
Authors | David H. Wolpert |
Abstract | Many physical scenarios are naturally modeled as a set of multiple co-evolving systems. Recently, research in stochastic thermodynamics has considered such scenarios, e.g., by modeling the co-evolution of the systems as a Bayes net. In particular, we now have a fluctuation theorem relating the entropy production of one of the systems in a Bayes net to the overall structure of the Bayes net. Here I extend this recent research in four ways. First, I derive fluctuation theorems concerning arbitrary subsets of the systems in the Bayes net. Second, I derive “conditional” fluctuation theorems , governing the probability distribution of entropy production in an arbitrary subset of the systems, conditioned on the entropy production in a different subset of the systems. I then derive thermodynamic uncertainty relations relating the total entropy production of all the systems in the Bayes net to the set of all the precisions of probability currents within the individual systems. I end with an example. |
Tasks | |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02700v4 |
https://arxiv.org/pdf/1911.02700v4.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-relations-and-fluctuation |
Repo | |
Framework | |
Paradox in Deep Neural Networks: Similar yet Different while Different yet Similar
Title | Paradox in Deep Neural Networks: Similar yet Different while Different yet Similar |
Authors | Arash Akbarinia, Karl R. Gegenfurtner |
Abstract | Machine learning is advancing towards a data-science approach, implying a necessity to a line of investigation to divulge the knowledge learnt by deep neuronal networks. Limiting the comparison among networks merely to a predefined intelligent ability, according to ground truth, does not suffice, it should be associated with innate similarity of these artificial entities. Here, we analysed multiple instances of an identical architecture trained to classify objects in static images (CIFAR and ImageNet data sets). We evaluated the performance of the networks under various distortions and compared it to the intrinsic similarity between their constituent kernels. While we expected a close correspondence between these two measures, we observed a puzzling phenomenon. Pairs of networks whose kernels’ weights are over 99.9% correlated can exhibit significantly different performances, yet other pairs with no correlation can reach quite compatible levels of performance. We show implications of this for transfer learning, and argue its importance in our general understanding of what intelligence is, whether natural or artificial. |
Tasks | Transfer Learning |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.04772v1 |
http://arxiv.org/pdf/1903.04772v1.pdf | |
PWC | https://paperswithcode.com/paper/paradox-in-deep-neural-networks-similar-yet |
Repo | |
Framework | |
Incremental Sense Weight Training for the Interpretation of Contextualized Word Embeddings
Title | Incremental Sense Weight Training for the Interpretation of Contextualized Word Embeddings |
Authors | Xinyi Jiang, Zhengzhe Yang, Jinho D. Choi |
Abstract | We present a novel online algorithm that learns the essence of each dimension in word embeddings by minimizing the within-group distance of contextualized embedding groups. Three state-of-the-art neural-based language models are used, Flair, ELMo, and BERT, to generate contextualized word embeddings such that different embeddings are generated for the same word type, which are grouped by their senses manually annotated in the SemCor dataset. We hypothesize that not all dimensions are equally important for downstream tasks so that our algorithm can detect unessential dimensions and discard them without hurting the performance. To verify this hypothesis, we first mask dimensions determined unessential by our algorithm, apply the masked word embeddings to a word sense disambiguation task (WSD), and compare its performance against the one achieved by the original embeddings. Several KNN approaches are experimented to establish strong baselines for WSD. Our results show that the masked word embeddings do not hurt the performance and can improve it by 3%. Our work can be used to conduct future research on the interpretability of contextualized embeddings. |
Tasks | Word Embeddings, Word Sense Disambiguation |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01623v2 |
https://arxiv.org/pdf/1911.01623v2.pdf | |
PWC | https://paperswithcode.com/paper/incremental-sense-weight-training-for-the |
Repo | |
Framework | |
A Survey on Face Data Augmentation
Title | A Survey on Face Data Augmentation |
Authors | Xiang Wang, Kai Wang, Shiguo Lian |
Abstract | The quality and size of training set have great impact on the results of deep learning-based face related tasks. However, collecting and labeling adequate samples with high quality and balanced distributions still remains a laborious and expensive work, and various data augmentation techniques have thus been widely used to enrich the training dataset. In this paper, we systematically review the existing works of face data augmentation from the perspectives of the transformation types and methods, with the state-of-the-art approaches involved. Among all these approaches, we put the emphasis on the deep learning-based works, especially the generative adversarial networks which have been recognized as more powerful and effective tools in recent years. We present their principles, discuss the results and show their applications as well as limitations. Different evaluation metrics for evaluating these approaches are also introduced. We point out the challenges and opportunities in the field of face data augmentation, and provide brief yet insightful discussions. |
Tasks | Data Augmentation |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.11685v1 |
http://arxiv.org/pdf/1904.11685v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-face-data-augmentation |
Repo | |
Framework | |
NGO-GM: Natural Gradient Optimization for Graphical Models
Title | NGO-GM: Natural Gradient Optimization for Graphical Models |
Authors | Eric Benhamou, Jamal Atif, Rida Laraki, David Saltiel |
Abstract | This paper deals with estimating model parameters in graphical models. We reformulate it as an information geometric optimization problem and introduce a natural gradient descent strategy that incorporates additional meta parameters. We show that our approach is a strong alternative to the celebrated EM approach for learning in graphical models. Actually, our natural gradient based strategy leads to learning optimal parameters for the final objective function without artificially trying to fit a distribution that may not correspond to the real one. We support our theoretical findings with the question of trend detection in financial markets and show that the learned model performs better than traditional practitioner methods and is less prone to overfitting. |
Tasks | |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05444v1 |
https://arxiv.org/pdf/1905.05444v1.pdf | |
PWC | https://paperswithcode.com/paper/ngo-gm-natural-gradient-optimization-for |
Repo | |
Framework | |
More Data Can Hurt for Linear Regression: Sample-wise Double Descent
Title | More Data Can Hurt for Linear Regression: Sample-wise Double Descent |
Authors | Preetum Nakkiran |
Abstract | In this expository note we describe a surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples: there is a regime where the test risk of the estimator found by gradient descent increases with additional samples. In other words, more data actually hurts the estimator. This behavior is implicit in a recent line of theoretical works analyzing “double-descent” phenomenon in linear models. In this note, we isolate and understand this behavior in an extremely simple setting: linear regression with isotropic Gaussian covariates. In particular, this occurs due to an unconventional type of bias-variance tradeoff in the overparameterized regime: the bias decreases with more samples, but variance increases. |
Tasks | |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07242v1 |
https://arxiv.org/pdf/1912.07242v1.pdf | |
PWC | https://paperswithcode.com/paper/more-data-can-hurt-for-linear-regression |
Repo | |
Framework | |
Scaling Limit of Neural Networks with the Xavier Initialization and Convergence to a Global Minimum
Title | Scaling Limit of Neural Networks with the Xavier Initialization and Convergence to a Global Minimum |
Authors | Justin Sirignano, Konstantinos Spiliopoulos |
Abstract | We analyze single-layer neural networks with the Xavier initialization in the asymptotic regime of large numbers of hidden units and large numbers of stochastic gradient descent training steps. The evolution of the neural network during training can be viewed as a stochastic system and, using techniques from stochastic analysis, we prove the neural network converges in distribution to a random ODE with a Gaussian distribution. The limit is completely different than in the typical mean-field results for neural networks due to the $\frac{1}{\sqrt{N}}$ normalization factor in the Xavier initialization (versus the $\frac{1}{N}$ factor in the typical mean-field framework). Although the pre-limit problem of optimizing a neural network is non-convex (and therefore the neural network may converge to a local minimum), the limit equation minimizes a (quadratic) convex objective function and therefore converges to a global minimum. Furthermore, under reasonable assumptions, the matrix in the limiting quadratic objective function is positive definite and thus the neural network (in the limit) will converge to a global minimum with zero loss on the training set. |
Tasks | |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04108v2 |
https://arxiv.org/pdf/1907.04108v2.pdf | |
PWC | https://paperswithcode.com/paper/scaling-limit-of-neural-networks-with-the |
Repo | |
Framework | |
On Learning Nominal Automata with Binders
Title | On Learning Nominal Automata with Binders |
Authors | Yi Xiao, Emilio Tuosto |
Abstract | We investigate a learning algorithm in the context of nominal automata, an extension of classical automata to alphabets featuring names. This class of automata captures nominal regular languages; analogously to the classical language theory, nominal automata have been shown to characterise nominal regular expressions with binders. These formalisms are amenable to abstract modelling resource-aware computations. We propose a learning algorithm on nominal regular languages with binders. Our algorithm generalises Angluin’s L* algorithm with respect to nominal regular languages with binders. We show the correctness and study the theoretical complexity of our algorithm. |
Tasks | |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05974v1 |
https://arxiv.org/pdf/1909.05974v1.pdf | |
PWC | https://paperswithcode.com/paper/on-learning-nominal-automata-with-binders |
Repo | |
Framework | |
OCT Fingerprints: Resilience to Presentation Attacks
Title | OCT Fingerprints: Resilience to Presentation Attacks |
Authors | Tarang Chugh, Anil K. Jain |
Abstract | Optical coherent tomography (OCT) fingerprint technology provides rich depth information, including internal fingerprint (papillary junction) and sweat (eccrine) glands, in addition to imaging any fake layers (presentation attacks) placed over finger skin. Unlike 2D surface fingerprint scans, additional depth information provided by the cross-sectional OCT depth profile scans are purported to thwart fingerprint presentation attacks. We develop and evaluate a presentation attack detector (PAD) based on deep convolutional neural network (CNN). Input data to CNN are local patches extracted from the cross-sectional OCT depth profile scans captured using THORLabs Telesto series spectral-domain fingerprint reader. The proposed approach achieves a TDR of 99.73% @ FDR of 0.2% on a database of 3,413 bonafide and 357 PA OCT scans, fabricated using 8 different PA materials. By employing a visualization technique, known as CNN-Fixations, we are able to identify the regions in the OCT scan patches that are crucial for fingerprint PAD detection. |
Tasks | |
Published | 2019-07-31 |
URL | https://arxiv.org/abs/1908.00102v1 |
https://arxiv.org/pdf/1908.00102v1.pdf | |
PWC | https://paperswithcode.com/paper/oct-fingerprints-resilience-to-presentation |
Repo | |
Framework | |
Multi-lingual Dialogue Act Recognition with Deep Learning Methods
Title | Multi-lingual Dialogue Act Recognition with Deep Learning Methods |
Authors | Jiří Martínek, Pavel Král, Ladislav Lenc, Christophe Cerisara |
Abstract | This paper deals with multi-lingual dialogue act (DA) recognition. The proposed approaches are based on deep neural networks and use word2vec embeddings for word representation. Two multi-lingual models are proposed for this task. The first approach uses one general model trained on the embeddings from all available languages. The second method trains the model on a single pivot language and a linear transformation method is used to project other languages onto the pivot language. The popular convolutional neural network and LSTM architectures with different set-ups are used as classifiers. To the best of our knowledge this is the first attempt at multi-lingual DA recognition using neural networks. The multi-lingual models are validated experimentally on two languages from the Verbmobil corpus. |
Tasks | |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05606v1 |
http://arxiv.org/pdf/1904.05606v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-lingual-dialogue-act-recognition-with |
Repo | |
Framework | |
Getting Gender Right in Neural Machine Translation
Title | Getting Gender Right in Neural Machine Translation |
Authors | Eva Vanmassenhove, Christian Hardmeier, Andy Way |
Abstract | Speakers of different languages must attend to and encode strikingly different aspects of the world in order to use their language correctly (Sapir, 1921; Slobin, 1996). One such difference is related to the way gender is expressed in a language. Saying “I am happy” in English, does not encode any additional knowledge of the speaker that uttered the sentence. However, many other languages do have grammatical gender systems and so such knowledge would be encoded. In order to correctly translate such a sentence into, say, French, the inherent gender information needs to be retained/recovered. The same sentence would become either “Je suis heureux”, for a male speaker or “Je suis heureuse” for a female one. Apart from morphological agreement, demographic factors (gender, age, etc.) also influence our use of language in terms of word choices or even on the level of syntactic constructions (Tannen, 1991; Pennebaker et al., 2003). We integrate gender information into NMT systems. Our contribution is two-fold: (1) the compilation of large datasets with speaker information for 20 language pairs, and (2) a simple set of experiments that incorporate gender information into NMT for multiple language pairs. Our experiments show that adding a gender feature to an NMT system significantly improves the translation quality for some language pairs. |
Tasks | Machine Translation |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.05088v1 |
https://arxiv.org/pdf/1909.05088v1.pdf | |
PWC | https://paperswithcode.com/paper/getting-gender-right-in-neural-machine-1 |
Repo | |
Framework | |
Multi-Granularity Self-Attention for Neural Machine Translation
Title | Multi-Granularity Self-Attention for Neural Machine Translation |
Authors | Jie Hao, Xing Wang, Shuming Shi, Jinfeng Zhang, Zhaopeng Tu |
Abstract | Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalism. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling - a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveals that Mg-Sa indeed captures useful phrase information at various levels of granularities. |
Tasks | Machine Translation |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02222v1 |
https://arxiv.org/pdf/1909.02222v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-granularity-self-attention-for-neural |
Repo | |
Framework | |
Interest-Related Item Similarity Model Based on Multimodal Data for Top-N Recommendation
Title | Interest-Related Item Similarity Model Based on Multimodal Data for Top-N Recommendation |
Authors | Junmei Lv, Bin Song, Jie Guo, Xiaojiang Du, Mohsen Guizani |
Abstract | Nowadays, the recommendation systems are applied in the fields of e-commerce, video websites, social networking sites, etc., which bring great convenience to people’s daily lives. The types of the information are diversified and abundant in recommendation systems, therefore the proportion of unstructured multimodal data like text, image and video is increasing. However, due to the representation gap between different modalities, it is intractable to effectively use unstructured multimodal data to improve the efficiency of recommendation systems. In this paper, we propose an end-to-end Multimodal Interest-Related Item Similarity model (Multimodal IRIS) to provide recommendations based on multimodal data source. Specifically, the Multimodal IRIS model consists of three modules, i.e., multimodal feature learning module, the Interest-Related Network (IRN) module and item similarity recommendation module. The multimodal feature learning module adds knowledge sharing unit among different modalities. Then IRN learn the interest relevance between target item and different historical items respectively. At last, the multimodal data feature learning, IRN and item similarity recommendation modules are unified into an integrated system to achieve performance enhancements and to accommodate the addition or absence of different modal data. Extensive experiments on real-world datasets show that, by dealing with the multimodal data which people may pay more attention to when selecting items, the proposed Multimodal IRIS significantly improves accuracy and interpretability on top-N recommendation task over the state-of-the-art methods. |
Tasks | Recommendation Systems |
Published | 2019-02-13 |
URL | http://arxiv.org/abs/1902.05566v1 |
http://arxiv.org/pdf/1902.05566v1.pdf | |
PWC | https://paperswithcode.com/paper/interest-related-item-similarity-model-based |
Repo | |
Framework | |
Online Multiclass Classification Based on Prediction Margin for Partial Feedback
Title | Online Multiclass Classification Based on Prediction Margin for Partial Feedback |
Authors | Takuo Kaneko, Issei Sato, Masashi Sugiyama |
Abstract | We consider the problem of online multiclass classification with partial feedback, where an algorithm predicts a class for a new instance in each round and only receives its correctness. Although several methods have been developed for this problem, recent challenging real-world applications require further performance improvement. In this paper, we propose a novel online learning algorithm inspired by recent work on learning from complementary labels, where a complementary label indicates a class to which an instance does not belong. This allows us to handle partial feedback deterministically in a margin-based way, where the prediction margin has been recognized as a key to superior empirical performance. We provide a theoretical guarantee based on a cumulative loss bound and experimentally demonstrate that our method outperforms existing methods which are non-margin-based and stochastic. |
Tasks | |
Published | 2019-02-04 |
URL | http://arxiv.org/abs/1902.01056v1 |
http://arxiv.org/pdf/1902.01056v1.pdf | |
PWC | https://paperswithcode.com/paper/online-multiclass-classification-based-on |
Repo | |
Framework | |
A Regression Approach to Certain Information Transmission Problems
Title | A Regression Approach to Certain Information Transmission Problems |
Authors | Wenyi Zhang, Yizhu Wang, Cong Shen, Ning Liang |
Abstract | A general information transmission model, under independent and identically distributed Gaussian codebook and nearest neighbor decoding rule with processed channel output, is investigated using the performance metric of generalized mutual information. When the encoder and the decoder know the statistical channel model, it is found that the optimal channel output processing function is the conditional expectation operator, thus hinting a potential role of regression, a classical topic in machine learning, for this model. Without utilizing the statistical channel model, a problem formulation inspired by machine learning principles is established, with suitable performance metrics introduced. A data-driven inference algorithm is proposed to solve the problem, and the effectiveness of the algorithm is validated via numerical experiments. Extensions to more general information transmission models are also discussed. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03777v2 |
https://arxiv.org/pdf/1906.03777v2.pdf | |
PWC | https://paperswithcode.com/paper/a-regression-approach-to-certain-information |
Repo | |
Framework | |