April 2, 2020

3207 words 16 mins read

Paper Group ANR 346

Paper Group ANR 346

Missing-Class-Robust Domain Adaptation by Unilateral Alignment for Fault Diagnosis. Tropical Support Vector Machine and its Applications to Phylogenomics. Leverage the Average: an Analysis of Regularization in RL. Improve SGD Training via Aligning Mini-batches. Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks …

Missing-Class-Robust Domain Adaptation by Unilateral Alignment for Fault Diagnosis

Title Missing-Class-Robust Domain Adaptation by Unilateral Alignment for Fault Diagnosis
Authors Qin Wang, Gabriel Michau, Olga Fink
Abstract Domain adaptation aims at improving model performance by leveraging the learned knowledge in the source domain and transferring it to the target domain. Recently, domain adversarial methods have been particularly successful in alleviating the distribution shift between the source and the target domains. However, these methods assume an identical label space between the two domains. This assumption imposes a significant limitation for real applications since the target training set may not contain the complete set of classes. We demonstrate in this paper that the performance of domain adversarial methods can be vulnerable to an incomplete target label space during training. To overcome this issue, we propose a two-stage unilateral alignment approach. The proposed methodology makes use of the inter-class relationships of the source domain and aligns unilaterally the target to the source domain. The benefits of the proposed methodology are first evaluated on the MNIST$\rightarrow$MNIST-M adaptation task. The proposed methodology is also evaluated on a fault diagnosis task, where the problem of missing fault types in the target training dataset is common in practice. Both experiments demonstrate the effectiveness of the proposed methodology.
Tasks Domain Adaptation
Published 2020-01-07
URL https://arxiv.org/abs/2001.02015v1
PDF https://arxiv.org/pdf/2001.02015v1.pdf
PWC https://paperswithcode.com/paper/missing-class-robust-domain-adaptation-by

Tropical Support Vector Machine and its Applications to Phylogenomics

Title Tropical Support Vector Machine and its Applications to Phylogenomics
Authors Xiaoxian Tang, Houjie Wang, Ruriko Yoshida
Abstract Most data in genome-wide phylogenetic analysis (phylogenomics) is essentially multidimensional, posing a major challenge to human comprehension and computational analysis. Also, we can not directly apply statistical learning models in data science to a set of phylogenetic trees since the space of phylogenetic trees is not Euclidean. In fact, the space of phylogenetic trees is a tropical Grassmannian in terms of max-plus algebra. Therefore, to classify multi-locus data sets for phylogenetic analysis, we propose tropical support vector machines (SVMs). Like classical SVMs, a tropical SVM is a discriminative classifier defined by the tropical hyperplane which maximizes the minimum tropical distance from data points to itself in order to separate these data points into sectors (half-spaces) in the tropical projective torus. Both hard margin tropical SVMs and soft margin tropical SVMs can be formulated as linear programming problems. We focus on classifying two categories of data, and we study a simpler case by assuming the data points from the same category ideally stay in the same sector of a tropical separating hyperplane. For hard margin tropical SVMs, we prove the necessary and sufficient conditions for two categories of data points to be separated, and we show an explicit formula for the optimal value of the feasible linear programming problem. For soft margin tropical SVMs, we develop novel methods to compute an optimal tropical separating hyperplane. Computational experiments show our methods work well. We end this paper with open problems.
Published 2020-03-02
URL https://arxiv.org/abs/2003.00677v2
PDF https://arxiv.org/pdf/2003.00677v2.pdf
PWC https://paperswithcode.com/paper/tropical-support-vector-machine-and-its

Leverage the Average: an Analysis of Regularization in RL

Title Leverage the Average: an Analysis of Regularization in RL
Authors Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist
Abstract Building upon the formalism of regularized Markov decision processes, we study the effect of Kullback-Leibler (KL) and entropy regularization in reinforcement learning. Through an equivalent formulation of the related approximate dynamic programming (ADP) scheme, we show that a KL penalty amounts to averaging q-values. This equivalence allows drawing connections between a priori disconnected methods from the literature, and proving that a KL regularization indeed leads to averaging errors made at each iteration of value function update. With the proposed theoretical analysis, we also study the interplay between KL and entropy regularization. When the considered ADP scheme is combined with neural-network-based stochastic approximations, the equivalence is lost, which suggests a number of different ways to do regularization. Because this goes beyond what we can analyse theoretically, we extensively study this aspect empirically.
Published 2020-03-31
URL https://arxiv.org/abs/2003.14089v1
PDF https://arxiv.org/pdf/2003.14089v1.pdf
PWC https://paperswithcode.com/paper/leverage-the-average-an-analysis-of

Improve SGD Training via Aligning Mini-batches

Title Improve SGD Training via Aligning Mini-batches
Authors Xiangrui Li, Deng Pan, Xin Li, Dongxiao Zhu
Abstract Deep neural networks (DNNs) for supervised learning can be viewed as a pipeline of a feature extractor (i.e. last hidden layer) and a linear classifier (i.e. output layer) that is trained jointly with stochastic gradient descent (SGD). In each iteration of SGD, a mini-batch from the training data is sampled and the true gradient of the loss function is estimated as the noisy gradient calculated on this mini-batch. From the feature learning perspective, the feature extractor should be updated to learn meaningful features with respect to the entire data, and reduce the accommodation to noise in the mini-batch. With this motivation, we propose In-Training Distribution Matching (ITDM) to improve DNN training and reduce overfitting. Specifically, along with the loss function, ITDM regularizes the feature extractor by matching the moments of distributions of different mini-batches in each iteration of SGD, which is fulfilled by minimizing the maximum mean discrepancy. As such, ITDM does not assume any explicit parametric form of data distribution in the latent feature space. Extensive experiments are conducted to demonstrate the effectiveness of our proposed strategy.
Published 2020-02-23
URL https://arxiv.org/abs/2002.09917v2
PDF https://arxiv.org/pdf/2002.09917v2.pdf
PWC https://paperswithcode.com/paper/improve-sgd-training-via-aligning-min-batches

Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks

Title Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks
Authors Prajit Dhar, Arianna Bisazza
Abstract It is now established that modern neural language models can be successfully trained on multiple languages simultaneously without changes to the underlying architecture, providing an easy way to adapt a variety of NLP models to low-resource languages. But what kind of knowledge is really shared among languages within these models? Does multilingual training mostly lead to an alignment of the lexical representation spaces or does it also enable the sharing of purely grammatical knowledge? In this paper we dissect different forms of cross-lingual transfer and look for its most determining factors, using a variety of models and probing tasks. We find that exposing our language models to a related language does not always increase grammatical knowledge in the target language, and that optimal conditions for lexical-semantic transfer may not be optimal for syntactic transfer.
Tasks Cross-Lingual Transfer
Published 2020-03-31
URL https://arxiv.org/abs/2003.14056v1
PDF https://arxiv.org/pdf/2003.14056v1.pdf
PWC https://paperswithcode.com/paper/understanding-cross-lingual-syntactic

Turing analogues of Gödel statements and computability of intelligence

Title Turing analogues of Gödel statements and computability of intelligence
Authors Yasha Savelyev
Abstract We show that there is a mathematical obstruction to complete Turing computability of intelligence. This obstruction can be circumvented only if human reasoning is fundamentally unsound. The most compelling original argument for existence of such an obstruction was proposed by Penrose, however G"odel, Turing and Lucas have also proposed such arguments. We first partially reformulate the argument of Penrose. In this formulation we argue that his argument works up to possibility of construction of a certain G"odel statement. We then completely re-frame the argument in the language of Turing machines, and by partially defining our subject just enough, we show that a certain analogue of a G"odel statement, or a G"odel string as we call it in the language of Turing machines, can be readily constructed directly, without appeal to the G"odel incompleteness theorem, and thus removing the final objection.
Published 2020-01-21
URL https://arxiv.org/abs/2001.07592v2
PDF https://arxiv.org/pdf/2001.07592v2.pdf
PWC https://paperswithcode.com/paper/turing-analogues-of-godel-statements-and

Artificial chemistry experiments with chemlambda, lambda calculus, interaction combinators

Title Artificial chemistry experiments with chemlambda, lambda calculus, interaction combinators
Authors Marius Buliga
Abstract Given a graph rewrite system, a graph G is a quine graph if it has a non-void maximal collection of non-conflicting matches of left patterns of graphs rewrites, such that after the parallel application of the rewrites we obtain a graph isomorphic with G. Such graphs exhibit a metabolism, they can multiply or they can die, when reduced by a random rewriting algorithm. These are introductory notes to the pages of artificial chemistry experiments with chemlambda, lambda calculus or interaction combinators, available from the entry page https://chemlambda.github.io/index.html . The experiments are bundled into pages, all of them based on a library of programs, on a database which contains hundreds of graphs and on a database of about 150 pages of text comments and a collection of more than 200 animations, most of them which can be re-done live, via the programs. There are links to public repositories of other contributors to these experiments, with versions of these programs in python, haskell, awk or javascript.
Published 2020-03-31
URL https://arxiv.org/abs/2003.14332v1
PDF https://arxiv.org/pdf/2003.14332v1.pdf
PWC https://paperswithcode.com/paper/artificial-chemistry-experiments-with

Multi-Issue Bargaining With Deep Reinforcement Learning

Title Multi-Issue Bargaining With Deep Reinforcement Learning
Authors Ho-Chun Herbert Chang
Abstract Negotiation is a process where agents aim to work through disputes and maximize their surplus. As the use of deep reinforcement learning in bargaining games is unexplored, this paper evaluates its ability to exploit, adapt, and cooperate to produce fair outcomes. Two actor-critic networks were trained for the bidding and acceptance strategy, against time-based agents, behavior-based agents, and through self-play. Gameplay against these agents reveals three key findings. 1) Neural agents learn to exploit time-based agents, achieving clear transitions in decision preference values. The Cauchy distribution emerges as suitable for sampling offers, due to its peaky center and heavy tails. The kurtosis and variance sensitivity of the probability distributions used for continuous control produce trade-offs in exploration and exploitation. 2) Neural agents demonstrate adaptive behavior against different combinations of concession, discount factors, and behavior-based strategies. 3) Most importantly, neural agents learn to cooperate with other behavior-based agents, in certain cases utilizing non-credible threats to force fairer results. This bears similarities with reputation-based strategies in the evolutionary dynamics, and departs from equilibria in classical game theory.
Tasks Continuous Control
Published 2020-02-18
URL https://arxiv.org/abs/2002.07788v1
PDF https://arxiv.org/pdf/2002.07788v1.pdf
PWC https://paperswithcode.com/paper/multi-issue-bargaining-with-deep

Basic concepts, definitions, and methods in D number theory

Title Basic concepts, definitions, and methods in D number theory
Authors Xinyang Deng
Abstract As a generalization of Dempster-Shafer theory, D number theory (DNT) aims to provide a framework to deal with uncertain information with non-exclusiveness and incompleteness. Although there are some advances on DNT in previous studies, however, they lack of systematicness, and many important issues have not yet been solved. In this paper, several crucial aspects in constructing a perfect and systematic framework of DNT are considered. At first the non-exclusiveness in DNT is formally defined and discussed. Secondly, a method to combine multiple D numbers is proposed by extending previous exclusive conflict redistribution (ECR) rule. Thirdly, a new pair of belief and plausibility measures for D numbers are defined and many desirable properties are satisfied by the proposed measures. Fourthly, the combination of information-incomplete D numbers is studied specially to show how to deal with the incompleteness of information in DNT. In this paper, we mainly give relative math definitions, properties, and theorems, concrete examples and applications will be considered in the future study.
Published 2020-03-21
URL https://arxiv.org/abs/2003.09661v1
PDF https://arxiv.org/pdf/2003.09661v1.pdf
PWC https://paperswithcode.com/paper/basic-concepts-definitions-and-methods-in-d

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

Title TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
Authors Zhuoqian Yang, Wentao Zhu, Wayne Wu, Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy
Abstract We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person. Without using any paired data for supervision, the proposed method can be trained in an unsupervised manner by exploiting invariance properties of three orthogonal factors of variation including motion, structure, and view-angle. Specifically, with loss functions carefully derived based on invariance, we train an auto-encoder to disentangle the latent representations of such factors given the source and target video clips. This allows us to selectively transfer motion extracted from the source video seamlessly to the target video in spite of structural and view-angle disparities between the source and the target. The relaxed assumption of paired data allows our method to be trained on a vast amount of videos needless of manual annotation of source-target pairing, leading to improved robustness against large structural variations and extreme motion in videos. We demonstrate the effectiveness of our method over the state-of-the-art methods. Code, model and data are publicly available on our project page (https://yzhq97.github.io/transmomo).
Published 2020-03-31
URL https://arxiv.org/abs/2003.14401v2
PDF https://arxiv.org/pdf/2003.14401v2.pdf
PWC https://paperswithcode.com/paper/transmomo-invariance-driven-unsupervised

State-of-Art-Reviewing: A Radical Proposal to Improve Scientific Publication

Title State-of-Art-Reviewing: A Radical Proposal to Improve Scientific Publication
Authors Samuel Albanie, Jaime Thewmore, Robert McCraith, Joao F. Henriques
Abstract Peer review forms the backbone of modern scientific manuscript evaluation. But after two hundred and eighty-nine years of egalitarian service to the scientific community, does this protocol remain fit for purpose in 2020? In this work, we answer this question in the negative (strong reject, high confidence) and propose instead State-Of-the-Art Review (SOAR), a neoteric reviewing pipeline that serves as a ‘plug-and-play’ replacement for peer review. At the heart of our approach is an interpretation of the review process as a multi-objective, massively distributed and extremely-high-latency optimisation, which we scalarise and solve efficiently for PAC and CMT-optimal solutions. We make the following contributions: (1) We propose a highly scalable, fully automatic methodology for review, drawing inspiration from best-practices from premier computer vision and machine learning conferences; (2) We explore several instantiations of our approach and demonstrate that SOAR can be used to both review prints and pre-review pre-prints; (3) We wander listlessly in vain search of catharsis from our latest rounds of savage CVPR rejections.
Published 2020-03-31
URL https://arxiv.org/abs/2003.14415v1
PDF https://arxiv.org/pdf/2003.14415v1.pdf
PWC https://paperswithcode.com/paper/state-of-art-reviewing-a-radical-proposal-to

Brainstorming Generative Adversarial Networks (BGANs): Towards Multi-Agent Generative Models with Distributed Private Datasets

Title Brainstorming Generative Adversarial Networks (BGANs): Towards Multi-Agent Generative Models with Distributed Private Datasets
Authors Aidin Ferdowsi, Walid Saad
Abstract To achieve a high learning accuracy, generative adversarial networks (GANs) must be fed by large datasets that adequately represent the data space. However, in many scenarios, the available datasets may be limited and distributed across multiple agents, each of which is seeking to learn the distribution of the data on its own. In such scenarios, the local datasets are inherently private and agents often do not wish to share them. In this paper, to address this multi-agent GAN problem, a novel brainstorming GAN (BGAN) architecture is proposed using which multiple agents can generate real-like data samples while operating in a fully distributed manner and preserving their data privacy. BGAN allows the agents to gain information from other agents without sharing their real datasets but by “brainstorming” via the sharing of their generated data samples. Therefore, the proposed BGAN yields a higher accuracy compared with a standalone GAN model and its architecture is fully distributed and does not need any centralized controller. Moreover, BGANs are shown to be scalable and not dependent on the hyperparameters of the agents’ deep neural networks (DNNs) thus enabling the agents to have different DNN architectures. Theoretically, the interactions between BGAN agents are analyzed as a game whose unique Nash equilibrium is derived. Experimental results show that BGAN can generate real-like data samples with higher quality compared to other distributed GAN architectures.
Published 2020-02-02
URL https://arxiv.org/abs/2002.00306v1
PDF https://arxiv.org/pdf/2002.00306v1.pdf
PWC https://paperswithcode.com/paper/brainstorming-generative-adversarial-networks

Parallel Intent and Slot Prediction using MLB Fusion

Title Parallel Intent and Slot Prediction using MLB Fusion
Authors Anmol Bhasin, Bharatram Natarajan, Gaurav Mathur, Himanshu Mangla
Abstract Intent and Slot Identification are two important tasks in Spoken Language Understanding (SLU). For a natural language utterance, there is a high correlation between these two tasks. A lot of work has been done on each of these using Recurrent-Neural-Networks (RNN), Convolution Neural Networks (CNN) and Attention based models. Most of the past work used two separate models for intent and slot prediction. Some of them also used sequence-to-sequence type models where slots are predicted after evaluating the utterance-level intent. In this work, we propose a parallel Intent and Slot Prediction technique where separate Bidirectional Gated Recurrent Units (GRU) are used for each task. We posit the usage of MLB (Multimodal Low-rank Bilinear Attention Network) fusion for improvement in performance of intent and slot learning. To the best of our knowledge, this is the first attempt of using such a technique on text based problems. Also, our proposed methods outperform the existing state-of-the-art results for both intent and slot prediction on two benchmark datasets
Tasks Spoken Language Understanding
Published 2020-03-20
URL https://arxiv.org/abs/2003.09211v1
PDF https://arxiv.org/pdf/2003.09211v1.pdf
PWC https://paperswithcode.com/paper/parallel-intent-and-slot-prediction-using-mlb

Transfer Learning for Context-Aware Spoken Language Understanding

Title Transfer Learning for Context-Aware Spoken Language Understanding
Authors Qian Chen, Zhu Zhuo, Wen Wang, Qiuyun Xu
Abstract Spoken language understanding (SLU) is a key component of task-oriented dialogue systems. SLU parses natural language user utterances into semantic frames. Previous work has shown that incorporating context information significantly improves SLU performance for multi-turn dialogues. However, collecting a large-scale human-labeled multi-turn dialogue corpus for the target domains is complex and costly. To reduce dependency on the collection and annotation effort, we propose a Context Encoding Language Transformer (CELT) model facilitating exploiting various context information for SLU. We explore different transfer learning approaches to reduce dependency on data collection and annotation. In addition to unsupervised pre-training using large-scale general purpose unlabeled corpora, such as Wikipedia, we explore unsupervised and supervised adaptive training approaches for transfer learning to benefit from other in-domain and out-of-domain dialogue corpora. Experimental results demonstrate that the proposed model with the proposed transfer learning approaches achieves significant improvement on the SLU performance over state-of-the-art models on two large-scale single-turn dialogue benchmarks and one large-scale multi-turn dialogue benchmark.
Tasks Spoken Language Understanding, Task-Oriented Dialogue Systems, Transfer Learning
Published 2020-03-03
URL https://arxiv.org/abs/2003.01305v1
PDF https://arxiv.org/pdf/2003.01305v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-for-context-aware-spoken

Pre-Training for Query Rewriting in A Spoken Language Understanding System

Title Pre-Training for Query Rewriting in A Spoken Language Understanding System
Authors Zheng Chen, Xing Fan, Yuan Ling, Lambert Mathias, Chenlei Guo
Abstract Query rewriting (QR) is an increasingly important technique to reduce customer friction caused by errors in a spoken language understanding pipeline, where the errors originate from various sources such as speech recognition errors, language understanding errors or entity resolution errors. In this work, we first propose a neural-retrieval based approach for query rewriting. Then, inspired by the wide success of pre-trained contextual language embeddings, and also as a way to compensate for insufficient QR training data, we propose a language-modeling (LM) based approach to pre-train query embeddings on historical user conversation data with a voice assistant. In addition, we propose to use the NLU hypotheses generated by the language understanding system to augment the pre-training. Our experiments show pre-training provides rich prior information and help the QR task achieve strong performance. We also show joint pre-training with NLU hypotheses has further benefit. Finally, after pre-training, we find a small set of rewrite pairs is enough to fine-tune the QR model to outperform a strong baseline by full training on all QR training data.
Tasks Entity Resolution, Language Modelling, Speech Recognition, Spoken Language Understanding
Published 2020-02-13
URL https://arxiv.org/abs/2002.05607v1
PDF https://arxiv.org/pdf/2002.05607v1.pdf
PWC https://paperswithcode.com/paper/pre-training-for-query-rewriting-in-a-spoken
comments powered by Disqus