October 18, 2019

3213 words 16 mins read

Paper Group ANR 521

Regularizing Neural Machine Translation by Target-bidirectional Agreement. Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search. American Sign Language fingerspelling recognition in the wild. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes. Unsupervised Lear …

Regularizing Neural Machine Translation by Target-bidirectional Agreement


Title	Regularizing Neural Machine Translation by Target-bidirectional Agreement
Authors	Zhirui Zhang, Shuangzhi Wu, Shujie Liu, Mu Li, Ming Zhou, Tong Xu
Abstract	Although Neural Machine Translation (NMT) has achieved remarkable progress in the past several years, most NMT systems still suffer from a fundamental shortcoming as in other sequence generation tasks: errors made early in generation process are fed as inputs to the model and can be quickly amplified, harming subsequent sequence generation. To address this issue, we propose a novel model regularization method for NMT training, which aims to improve the agreement between translations generated by left-to-right (L2R) and right-to-left (R2L) NMT decoders. This goal is achieved by introducing two Kullback-Leibler divergence regularization terms into the NMT training objective to reduce the mismatch between output probabilities of L2R and R2L models. In addition, we also employ a joint training strategy to allow L2R and R2L models to improve each other in an interactive update process. Experimental results show that our proposed method significantly outperforms state-of-the-art baselines on Chinese-English and English-German translation tasks.
Tasks	Machine Translation
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04064v2
PDF	http://arxiv.org/pdf/1808.04064v2.pdf
PWC	https://paperswithcode.com/paper/regularizing-neural-machine-translation-by
Repo
Framework

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search


Title	Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search
Authors	Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li
Abstract	We propose to learn acoustic word embeddings with temporal context for query-by-example (QbE) speech search. The temporal context includes the leading and trailing word sequences of a word. We assume that there exist spoken word pairs in the training database. We pad the word pairs with their original temporal context to form fixed-length speech segment pairs. We obtain the acoustic word embeddings through a deep convolutional neural network (CNN) which is trained on the speech segment pairs with a triplet loss. Shifting a fixed-length analysis window through the search content, we obtain a running sequence of embeddings. In this way, searching for the spoken query is equivalent to the matching of acoustic word embeddings. The experiments show that our proposed acoustic word embeddings learned with temporal context are effective in QbE speech search. They outperform the state-of-the-art frame-level feature representations and reduce run-time computation since no dynamic time warping is required in QbE speech search. We also find that it is important to have sufficient speech segment pairs to train the deep CNN for effective acoustic word embeddings.
Tasks	Word Embeddings
Published	2018-06-10
URL	http://arxiv.org/abs/1806.03621v2
PDF	http://arxiv.org/pdf/1806.03621v2.pdf
PWC	https://paperswithcode.com/paper/learning-acoustic-word-embeddings-with-1
Repo
Framework

American Sign Language fingerspelling recognition in the wild


Title	American Sign Language fingerspelling recognition in the wild
Authors	Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, Karen Livescu
Abstract	We address the problem of American Sign Language fingerspelling recognition in the wild, using videos collected from websites. We introduce the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data. Using this data set, we present the first attempt to recognize fingerspelling sequences in this challenging setting. Unlike prior work, our video data is extremely challenging due to low frame rates and visual variability. To tackle the visual challenges, we train a special-purpose signing hand detector using a small subset of our data. Given the hand detector output, a sequence model decodes the hypothesized fingerspelled letter sequence. For the sequence model, we explore attention-based recurrent encoder-decoders and CTC-based approaches. As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions. We find that, as expected, letter error rates are much higher than in previous work on more controlled data, and we analyze the sources of error and effects of model variants.
Tasks	Sign Language Recognition
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11438v3
PDF	http://arxiv.org/pdf/1810.11438v3.pdf
PWC	https://paperswithcode.com/paper/american-sign-language-fingerspelling
Repo
Framework

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes


Title	Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
Authors	Xianyan Jia, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, Tiegang Chen, Guangxiao Hu, Shaohuai Shi, Xiaowen Chu
Abstract	Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the communication-to-computation ratio, it may hurt the generalization ability of the models. To this end, we build a highly scalable deep learning training system for dense GPU clusters with three main contributions: (1) We propose a mixed-precision training method that significantly improves the training throughput of a single GPU without losing accuracy. (2) We propose an optimization approach for extremely large mini-batch size (up to 64k) that can train CNN models on the ImageNet dataset without losing accuracy. (3) We propose highly optimized all-reduce algorithms that achieve up to 3x and 11x speedup on AlexNet and ResNet-50 respectively than NCCL-based training on a cluster with 1024 Tesla P40 GPUs. On training ResNet-50 with 90 epochs, the state-of-the-art GPU-based system with 1024 Tesla P100 GPUs spent 15 minutes and achieved 74.9% top-1 test accuracy, and another KNL-based system with 2048 Intel KNLs spent 20 minutes and achieved 75.4% accuracy. Our training system can achieve 75.8% top-1 test accuracy in only 6.6 minutes using 2048 Tesla P40 GPUs. When training AlexNet with 95 epochs, our system can achieve 58.7% top-1 test accuracy within 4 minutes, which also outperforms all other existing systems.
Tasks
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11205v1
PDF	http://arxiv.org/pdf/1807.11205v1.pdf
PWC	https://paperswithcode.com/paper/highly-scalable-deep-learning-training-system
Repo
Framework

Unsupervised Learning with Self-Organizing Spiking Neural Networks


Title	Unsupervised Learning with Self-Organizing Spiking Neural Networks
Authors	Hananel Hazan, Daniel J. Saunders, Darpan T. Sanghavi, Hava T. Siegelmann, Robert Kozma
Abstract	We present a system comprising a hybridization of self-organized map (SOM) properties with spiking neural networks (SNNs) that retain many of the features of SOMs. Networks are trained in an unsupervised manner to learn a self-organized lattice of filters via excitatory-inhibitory interactions among populations of neurons. We develop and test various inhibition strategies, such as growing with inter-neuron distance and two distinct levels of inhibition. The quality of the unsupervised learning algorithm is evaluated using examples with known labels. Several biologically-inspired classification tools are proposed and compared, including population-level confidence rating, and n-grams using spike motif algorithm. Using the optimal choice of parameters, our approach produces improvements over state-of-art spiking neural networks.
Tasks
Published	2018-07-24
URL	http://arxiv.org/abs/1807.09374v1
PDF	http://arxiv.org/pdf/1807.09374v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-with-self-organizing
Repo
Framework

Syntax and Semantics of Italian Poetry in the First Half of the 20th Century


Title	Syntax and Semantics of Italian Poetry in the First Half of the 20th Century
Authors	Rodolfo Delmonte
Abstract	In this paper we study, analyse and comment rhetorical figures present in some of most interesting poetry of the first half of the twentieth century. These figures are at first traced back to some famous poet of the past and then compared to classical Latin prose. Linguistic theory is then called in to show how they can be represented in syntactic structures and classified as noncanonical structures, by positioning discontinuous or displaced linguistic elements in Spec XP projections at various levels of constituency. Then we introduce LFG (Lexical Functional Grammar) as the theory that allows us to connect syntactic noncanonical structures with informational structure and psycholinguistic theories for complexity evaluation. We end up with two computational linguistics experiments and then evaluate the results. The first one uses best online parsers of Italian to parse poetic structures; the second one uses Getarun, the system created at Ca Foscari Computational Linguistics Laboratory. As will be shown, the first approach is unable to cope with these structures due to the use of only statistical probabilistic information. On the contrary, the second one, being a symbolic rule based system, is by far superior and allows also to complete both semantic an pragmatic analysis.
Tasks
Published	2018-02-11
URL	http://arxiv.org/abs/1802.03712v2
PDF	http://arxiv.org/pdf/1802.03712v2.pdf
PWC	https://paperswithcode.com/paper/syntax-and-semantics-of-italian-poetry-in-the
Repo
Framework

Causal programming: inference with structural causal models as finding instances of a relation


Title	Causal programming: inference with structural causal models as finding instances of a relation
Authors	Joshua Brulé
Abstract	This paper proposes a causal inference relation and causal programming as general frameworks for causal inference with structural causal models. A tuple, $\langle M, I, Q, F \rangle$, is an instance of the relation if a formula, $F$, computes a causal query, $Q$, as a function of known population probabilities, $I$, in every model entailed by a set of model assumptions, $M$. Many problems in causal inference can be viewed as the problem of enumerating instances of the relation that satisfy given criteria. This unifies a number of previously studied problems, including causal effect identification, causal discovery and recovery from selection bias. In addition, the relation supports formalizing new problems in causal inference with structural causal models, such as the problem of research design. Causal programming is proposed as a further generalization of causal inference as the problem of finding optimal instances of the relation, with respect to a cost function.
Tasks	Causal Discovery, Causal Inference
Published	2018-05-04
URL	http://arxiv.org/abs/1805.01960v1
PDF	http://arxiv.org/pdf/1805.01960v1.pdf
PWC	https://paperswithcode.com/paper/causal-programming-inference-with-structural
Repo
Framework

Efficient Sampling and Structure Learning of Bayesian Networks


Title	Efficient Sampling and Structure Learning of Bayesian Networks
Authors	Jack Kuipers, Polina Suter, Giusi Moffa
Abstract	Bayesian networks are probabilistic graphical models widely employed to understand dependencies in high dimensional data, and even to facilitate causal discovery. Learning the underlying network structure, which is encoded as a directed acyclic graph (DAG) is highly challenging mainly due to the vast number of possible networks. Efforts have focussed on two fronts: constraint-based methods that perform conditional independence tests to exclude edges and score and search approaches which explore the DAG space with greedy or MCMC schemes. Here we synthesise these two fields in a novel hybrid method which reduces the complexity of MCMC approaches to that of a constraint-based method. Individual steps in the MCMC scheme only require simple table lookups so that very long chains can be efficiently obtained. Furthermore, the scheme includes an iterative procedure to correct for errors from the conditional independence tests. The algorithm offers markedly superior performance to alternatives, particularly because DAGs can also be sampled from the posterior distribution, enabling full Bayesian model averaging for much larger Bayesian networks.
Tasks	Causal Discovery
Published	2018-03-21
URL	https://arxiv.org/abs/1803.07859v3
PDF	https://arxiv.org/pdf/1803.07859v3.pdf
PWC	https://paperswithcode.com/paper/efficient-structure-learning-and-sampling-of
Repo
Framework

Uncertainty Estimation via Stochastic Batch Normalization


Title	Uncertainty Estimation via Stochastic Batch Normalization
Authors	Andrei Atanov, Arsenii Ashukha, Dmitry Molchanov, Kirill Neklyudov, Dmitry Vetrov
Abstract	In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally inefficient. To reduce memory and computational cost, we propose Stochastic Batch Normalization – an efficient approximation of proper inference procedure. This method provides us with a scalable uncertainty estimation technique. We demonstrate the performance of Stochastic Batch Normalization on popular architectures (including deep convolutional architectures: VGG-like and ResNets) for MNIST and CIFAR-10 datasets.
Tasks
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04893v2
PDF	http://arxiv.org/pdf/1802.04893v2.pdf
PWC	https://paperswithcode.com/paper/uncertainty-estimation-via-stochastic-batch
Repo
Framework

Zero-shot Domain Adaptation without Domain Semantic Descriptors


Title	Zero-shot Domain Adaptation without Domain Semantic Descriptors
Authors	Atsutoshi Kumagai, Tomoharu Iwata
Abstract	We propose a method to infer domain-specific models such as classifiers for unseen domains, from which no data are given in the training phase, without domain semantic descriptors. When training and test distributions are different, standard supervised learning methods perform poorly. Zero-shot domain adaptation attempts to alleviate this problem by inferring models that generalize well to unseen domains by using training data in multiple source domains. Existing methods use observed semantic descriptors characterizing domains such as time information to infer the domain-specific models for the unseen domains. However, it cannot always be assumed that such metadata can be used in real-world applications. The proposed method can infer appropriate domain-specific models without any semantic descriptors by introducing the concept of latent domain vectors, which are latent representations for the domains and are used for inferring the models. The latent domain vector for the unseen domain is inferred from the set of the feature vectors in the corresponding domain, which is given in the testing phase. The domain-specific models consist of two components: the first is for extracting a representation of a feature vector to be predicted, and the second is for inferring model parameters given the latent domain vector. The posterior distributions of the latent domain vectors and the domain-specific models are parametrized by neural networks, and are optimized by maximizing the variational lower bound using stochastic gradient descent. The effectiveness of the proposed method was demonstrated through experiments using one regression and two classification tasks.
Tasks	Domain Adaptation
Published	2018-07-09
URL	http://arxiv.org/abs/1807.02927v1
PDF	http://arxiv.org/pdf/1807.02927v1.pdf
PWC	https://paperswithcode.com/paper/zero-shot-domain-adaptation-without-domain
Repo
Framework

MCA-based Rule Mining Enables Interpretable Inference in Clinical Psychiatry


Title	MCA-based Rule Mining Enables Interpretable Inference in Clinical Psychiatry
Authors	Qingzhu Gao, Humberto Gonzalez, Parvez Ahammad
Abstract	Development of interpretable machine learning models for clinical healthcare applications has the potential of changing the way we understand, treat, and ultimately cure, diseases and disorders in many areas of medicine. These models can serve not only as sources of predictions and estimates, but also as discovery tools for clinicians and researchers to reveal new knowledge from the data. High dimensionality of patient information (e.g., phenotype, genotype, and medical history), lack of objective measurements, and the heterogeneity in patient populations often create significant challenges in developing interpretable machine learning models for clinical psychiatry in practice. In this paper we take a step towards the development of such interpretable models. First, by developing a novel categorical rule mining method based on Multivariate Correspondence Analysis (MCA) capable of handling datasets with large numbers of features, and second, by applying this method to build transdiagnostic Bayesian Rule List models to screen for psychiatric disorders using the Consortium for Neuropsychiatric Phenomics dataset. We show that our method is not only at least 100 times faster than state-of-the-art rule mining techniques for datasets with 50 features, but also provides interpretability and comparable prediction accuracy across several benchmark datasets.
Tasks	Interpretable Machine Learning
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11558v2
PDF	http://arxiv.org/pdf/1810.11558v2.pdf
PWC	https://paperswithcode.com/paper/mca-based-rule-mining-enables-interpretable
Repo
Framework

A Way to Facilitate Decision Making in a Mixed Group of Manned and Unmanned Aerial Vehicles


Title	A Way to Facilitate Decision Making in a Mixed Group of Manned and Unmanned Aerial Vehicles
Authors	Dmitry Maximov, Yury Legovich, Vladimir Goncharenko
Abstract	A mixed group of manned and unmanned aerial vehicles is considered as a distributed system. A lattice of tasks which may be fulfilled by the system matches to it. An external multiplication operation is defined at the lattice, which defines correspondingly linear logic operations. Linear implication and tensor product are used to choose a system reconfiguration variant, i.e., to determine a new task executor choice. The task lattice structure (i.e., the system purpose) and the operation definitions largely define the choice. Thus, the choice is mainly the system purpose consequence. Such a method of the behavior variant choice facilitates the decision making by the pilot controlling the group. The suggested approach is illustrated using an example of a mixed group control at forest fire compression.
Tasks	Decision Making
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10441v2
PDF	http://arxiv.org/pdf/1809.10441v2.pdf
PWC	https://paperswithcode.com/paper/a-way-to-facilitate-decision-making-in-a
Repo
Framework

Fixed-length Bit-string Representation of Fingerprint by Normalized Local Structures


Title	Fixed-length Bit-string Representation of Fingerprint by Normalized Local Structures
Authors	Jun Beom Kho, Andrew B. J. Teoh, Wonjune Lee, Jaihie Kim
Abstract	In this paper, we propose a method to represent a fingerprint image by an ordered, fixed-length bit-string providing improved accuracy performance, faster matching time and compressibility. First, we devise a novel minutia-based local structure modeled by a mixture of 2D elliptical Gaussian functions in the pixel space. Each local structure is mapped to the Euclidean space by normalizing the local structure with the number of minutiae that associates to it. This simple yet crucial crux enables fast dissimilarity computation of two local structures with Euclidean distance without distortion. A complementary texture-based local structure to the minutia-based local structure is also introduced whereby both can be compressed via principal component analysis and fused easily in the Euclidean space. The fused local structure is then converted to a K-bit ordered string via a K-means clustering algorithm. This chain of computation with sole use of Euclidean distance is vital for speedy and discriminative bit-string conversion. The accuracy can be further improved by a finger-specific bit-training algorithm in which two criteria are leveraged to select useful bit positions for matching. Experiments are performed on Fingerprint Verification Competition (FVC) databases for comparison with existing techniques to show the superiority of the proposed method.
Tasks
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11489v1
PDF	http://arxiv.org/pdf/1811.11489v1.pdf
PWC	https://paperswithcode.com/paper/fixed-length-bit-string-representation-of
Repo
Framework

Regularized Singular Value Decomposition and Application to Recommender System


Title	Regularized Singular Value Decomposition and Application to Recommender System
Authors	Shuai Zheng, Chris Ding, Feiping Nie
Abstract	Singular value decomposition (SVD) is the mathematical basis of principal component analysis (PCA). Together, SVD and PCA are one of the most widely used mathematical formalism/decomposition in machine learning, data mining, pattern recognition, artificial intelligence, computer vision, signal processing, etc. In recent applications, regularization becomes an increasing trend. In this paper, we present a regularized SVD (RSVD), present an efficient computational algorithm, and provide several theoretical analysis. We show that although RSVD is non-convex, it has a closed-form global optimal solution. Finally, we apply RSVD to the application of recommender system and experimental result show that RSVD outperforms SVD significantly.
Tasks	Recommendation Systems
Published	2018-04-13
URL	http://arxiv.org/abs/1804.05090v1
PDF	http://arxiv.org/pdf/1804.05090v1.pdf
PWC	https://paperswithcode.com/paper/regularized-singular-value-decomposition-and
Repo
Framework

VIENA2: A Driving Anticipation Dataset


Title	VIENA2: A Driving Anticipation Dataset
Authors	Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, Lars Andersson
Abstract	Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios.
Tasks
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09044v2
PDF	http://arxiv.org/pdf/1810.09044v2.pdf
PWC	https://paperswithcode.com/paper/viena2-a-driving-anticipation-dataset
Repo
Framework