May 7, 2019

3049 words 15 mins read

Paper Group AWR 60

Learning Deep Embeddings with Histogram Loss. Formal Definitions of Unbounded Evolution and Innovation Reveal Universal Mechanisms for Open-Ended Evolution in Dynamical Systems. Pose-Selective Max Pooling for Measuring Similarity. Learning to learn by gradient descent by gradient descent. Recurrent Memory Array Structures. Minimum Regret Search for …

Learning Deep Embeddings with Histogram Loss


Title	Learning Deep Embeddings with Histogram Loss
Authors	Evgeniya Ustinova, Victor Lempitsky
Abstract	We suggest a loss for learning deep embeddings. The new loss does not introduce parameters that need to be tuned and results in very good embeddings across a range of datasets and problems. The loss is computed by estimating two distribution of similarities for positive (matching) and negative (non-matching) sample pairs, and then computing the probability of a positive pair to have a lower similarity score than a negative pair based on the estimated similarity distributions. We show that such operations can be performed in a simple and piecewise-differentiable manner using 1D histograms with soft assignment operations. This makes the proposed loss suitable for learning deep embeddings using stochastic optimization. In the experiments, the new loss performs favourably compared to recently proposed alternatives.
Tasks	Stochastic Optimization
Published	2016-11-02
URL	http://arxiv.org/abs/1611.00822v1
PDF	http://arxiv.org/pdf/1611.00822v1.pdf
PWC	https://paperswithcode.com/paper/learning-deep-embeddings-with-histogram-loss
Repo	https://github.com/madkn/HistogramLoss
Framework	pytorch

Formal Definitions of Unbounded Evolution and Innovation Reveal Universal Mechanisms for Open-Ended Evolution in Dynamical Systems


Title	Formal Definitions of Unbounded Evolution and Innovation Reveal Universal Mechanisms for Open-Ended Evolution in Dynamical Systems
Authors	Alyssa M Adams, Hector Zenil, Paul CW Davies, Sara I Walker
Abstract	Open-ended evolution (OEE) is relevant to a variety of biological, artificial and technological systems, but has been challenging to reproduce in silico. Most theoretical efforts focus on key aspects of open-ended evolution as it appears in biology. We recast the problem as a more general one in dynamical systems theory, providing simple criteria for open-ended evolution based on two hallmark features: unbounded evolution and innovation. We define unbounded evolution as patterns that are non-repeating within the expected Poincare recurrence time of an equivalent isolated system, and innovation as trajectories not observed in isolated systems. As a case study, we implement novel variants of cellular automata (CA) in which the update rules are allowed to vary with time in three alternative ways. Each is capable of generating conditions for open-ended evolution, but vary in their ability to do so. We find that state-dependent dynamics, widely regarded as a hallmark of life, statistically out-performs other candidate mechanisms, and is the only mechanism to produce open-ended evolution in a scalable manner, essential to the notion of ongoing evolution. This analysis suggests a new framework for unifying mechanisms for generating OEE with features distinctive to life and its artifacts, with broad applicability to biological and artificial systems.
Tasks
Published	2016-07-06
URL	http://arxiv.org/abs/1607.01750v2
PDF	http://arxiv.org/pdf/1607.01750v2.pdf
PWC	https://paperswithcode.com/paper/formal-definitions-of-unbounded-evolution-and
Repo	https://github.com/alyssa-adams/OEE_Project
Framework	none

Pose-Selective Max Pooling for Measuring Similarity


Title	Pose-Selective Max Pooling for Measuring Similarity
Authors	Xiang Xiang, Trac D. Tran
Abstract	In this paper, we deal with two challenges for measuring the similarity of the subject identities in practical video-based face recognition - the variation of the head pose in uncontrolled environments and the computational expense of processing videos. Since the frame-wise feature mean is unable to characterize the pose diversity among frames, we define and preserve the overall pose diversity and closeness in a video. Then, identity will be the only source of variation across videos since the pose varies even within a single video. Instead of simply using all the frames, we select those faces whose pose point is closest to the centroid of the K-means cluster containing that pose point. Then, we represent a video as a bag of frame-wise deep face features while the number of features has been reduced from hundreds to K. Since the video representation can well represent the identity, now we measure the subject similarity between two videos as the max correlation among all possible pairs in the two bags of features. On the official 5,000 video-pairs of the YouTube Face dataset for face verification, our algorithm achieves a comparable performance with VGG-face that averages over deep features of all frames. Other vision tasks can also benefit from the generic idea of employing geometric cues to improve the descriptiveness of deep features.
Tasks	Face Recognition, Face Verification, Video Similarity
Published	2016-09-22
URL	http://arxiv.org/abs/1609.07042v4
PDF	http://arxiv.org/pdf/1609.07042v4.pdf
PWC	https://paperswithcode.com/paper/pose-selective-max-pooling-for-measuring
Repo	https://github.com/eglxiang/vgg_face
Framework	none

Learning to learn by gradient descent by gradient descent


Title	Learning to learn by gradient descent by gradient descent
Authors	Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas
Abstract	The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.
Tasks	Meta-Learning
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04474v2
PDF	http://arxiv.org/pdf/1606.04474v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-learn-by-gradient-descent-by
Repo	https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition
Framework	tf

Recurrent Memory Array Structures


Title	Recurrent Memory Array Structures
Authors	Kamil Rocki
Abstract	The following report introduces ideas augmenting standard Long Short Term Memory (LSTM) architecture with multiple memory cells per hidden unit in order to improve its generalization capabilities. It considers both deterministic and stochastic variants of memory operation. It is shown that the nondeterministic Array-LSTM approach improves state-of-the-art performance on character level text prediction achieving 1.402 BPC on enwik8 dataset. Furthermore, this report estabilishes baseline neural-based results of 1.12 BPC and 1.19 BPC for enwik9 and enwik10 datasets respectively.
Tasks
Published	2016-07-11
URL	http://arxiv.org/abs/1607.03085v3
PDF	http://arxiv.org/pdf/1607.03085v3.pdf
PWC	https://paperswithcode.com/paper/recurrent-memory-array-structures
Repo	https://github.com/krocki/ArrayLSTM
Framework	none

Minimum Regret Search for Single- and Multi-Task Optimization


Title	Minimum Regret Search for Single- and Multi-Task Optimization
Authors	Jan Hendrik Metzen
Abstract	We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as entropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer outliers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem.
Tasks
Published	2016-02-02
URL	http://arxiv.org/abs/1602.01064v3
PDF	http://arxiv.org/pdf/1602.01064v3.pdf
PWC	https://paperswithcode.com/paper/minimum-regret-search-for-single-and-multi
Repo	https://github.com/jmetzen/bayesian_optimization
Framework	none

Understanding Convolutional Neural Networks with A Mathematical Model


Title	Understanding Convolutional Neural Networks with A Mathematical Model
Authors	C. -C. Jay Kuo
Abstract	This work attempts to address two fundamental questions about the structure of the convolutional neural networks (CNN): 1) why a non-linear activation function is essential at the filter output of every convolutional layer? 2) what is the advantage of the two-layer cascade system over the one-layer system? A mathematical model called the “REctified-COrrelations on a Sphere” (RECOS) is proposed to answer these two questions. After the CNN training process, the converged filter weights define a set of anchor vectors in the RECOS model. Anchor vectors represent the frequently occurring patterns (or the spectral components). The necessity of rectification is explained using the RECOS model. Then, the behavior of a two-layer RECOS system is analyzed and compared with its one-layer counterpart. The LeNet-5 and the MNIST dataset are used to illustrate discussion points. Finally, the RECOS model is generalized to a multi-layer system with the AlexNet as an example. Keywords: Convolutional Neural Network (CNN), Nonlinear Activation, RECOS Model, Rectified Linear Unit (ReLU), MNIST Dataset.
Tasks
Published	2016-09-14
URL	http://arxiv.org/abs/1609.04112v2
PDF	http://arxiv.org/pdf/1609.04112v2.pdf
PWC	https://paperswithcode.com/paper/understanding-convolutional-neural-networks-1
Repo	https://github.com/caffeine110/Cnn_Classifier
Framework	none

Output Constraint Transfer for Kernelized Correlation Filter in Tracking


Title	Output Constraint Transfer for Kernelized Correlation Filter in Tracking
Authors	Baochang Zhang, Zhigang Li, Xianbin Cao, Qixiang Ye, Chen Chen, Linlin Shen, Alessandro Perina, Rongrong Ji
Abstract	Kernelized Correlation Filter (KCF) is one of the state-of-the-art object trackers. However, it does not reasonably model the distribution of correlation response during tracking process, which might cause the drifting problem, especially when targets undergo significant appearance changes due to occlusion, camera shaking, and/or deformation. In this paper, we propose an Output Constraint Transfer (OCT) method that by modeling the distribution of correlation response in a Bayesian optimization framework is able to mitigate the drifting problem. OCT builds upon the reasonable assumption that the correlation response to the target image follows a Gaussian distribution, which we exploit to select training samples and reduce model uncertainty. OCT is rooted in a new theory which transfers data distribution to a constraint of the optimized variable, leading to an efficient framework to calculate correlation filters. Extensive experiments on a commonly used tracking benchmark show that the proposed method significantly improves KCF, and achieves better performance than other state-of-the-art trackers. To encourage further developments, the source code is made available https://github.com/bczhangbczhang/OCT-KCF.
Tasks
Published	2016-12-16
URL	http://arxiv.org/abs/1612.05365v1
PDF	http://arxiv.org/pdf/1612.05365v1.pdf
PWC	https://paperswithcode.com/paper/output-constraint-transfer-for-kernelized
Repo	https://github.com/bczhangbczhang/OCT-KCF
Framework	none

A Distance for HMMs based on Aggregated Wasserstein Metric and State Registration


Title	A Distance for HMMs based on Aggregated Wasserstein Metric and State Registration
Authors	Yukun Chen, Jianbo Ye, Jia Li
Abstract	We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time spot follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture model-HMM (GMM-HMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semi-metric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of the states. This distance quantifies the dissimilarity of GMM-HMMs by measuring both the difference between the two marginal GMMs and the difference between the two transition matrices. Our new distance is tested on the tasks of retrieval and classification of time series. Experiments on both synthetic data and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the Kullback-Leibler divergence.
Tasks	Time Series
Published	2016-08-05
URL	http://arxiv.org/abs/1608.01747v1
PDF	http://arxiv.org/pdf/1608.01747v1.pdf
PWC	https://paperswithcode.com/paper/a-distance-for-hmms-based-on-aggregated
Repo	https://github.com/cykustcc/aggregated_wasserstein_hmm
Framework	none

Latent Tree Models for Hierarchical Topic Detection


Title	Latent Tree Models for Hierarchical Topic Detection
Authors	Peixian Chen, Nevin L. Zhang, Tengfei Liu, Leonard K. M. Poon, Zhourong Chen, Farhan Khawar
Abstract	We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables, with those at the lowest latent level representing word co-occurrence patterns and those at higher levels representing co-occurrence of patterns at the level below. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. Unlike LDA-based topic models, HLTMs do not refer to a document generation process and use word variables instead of token variables. They use a tree structure to model the relationships between topics and words, which is conducive to the discovery of meaningful topics and topic hierarchies.
Tasks	Topic Models
Published	2016-05-21
URL	http://arxiv.org/abs/1605.06650v2
PDF	http://arxiv.org/pdf/1605.06650v2.pdf
PWC	https://paperswithcode.com/paper/latent-tree-models-for-hierarchical-topic
Repo	https://github.com/kmpoon/hlta
Framework	none

Recurrent Batch Normalization


Title	Recurrent Batch Normalization
Authors	Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, Aaron Courville
Abstract	We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.
Tasks	Language Modelling, Question Answering, Reading Comprehension, Sequential Image Classification
Published	2016-03-30
URL	http://arxiv.org/abs/1603.09025v5
PDF	http://arxiv.org/pdf/1603.09025v5.pdf
PWC	https://paperswithcode.com/paper/recurrent-batch-normalization
Repo	https://github.com/cooijmanstim/recurrent-batch-normalization
Framework	torch

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset


Title	MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Authors	Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang
Abstract	We introduce a large scale MAchine Reading COmprehension dataset, which we name MS MARCO. The dataset comprises of 1,010,916 anonymized questions—sampled from Bing’s search query logs—each with a human generated answer and 182,669 completely human rewritten generated answers. In addition, the dataset contains 8,841,823 passages—extracted from 3,563,535 web documents retrieved by Bing—that provide the information necessary for curating the natural language answers. A question in the MS MARCO dataset may have multiple answers or no answers at all. Using this dataset, we propose three different tasks with varying levels of difficulty: (i) predict if a question is answerable given a set of context passages, and extract and synthesize the answer as a human would (ii) generate a well-formed answer (if possible) based on the context passages that can be understood with the question and passage context, and finally (iii) rank a set of retrieved passages given a question. The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering. We believe that the scale and the real-world nature of this dataset makes it attractive for benchmarking machine reading comprehension and question-answering models.
Tasks	Machine Reading Comprehension, Question Answering, Reading Comprehension
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09268v3
PDF	http://arxiv.org/pdf/1611.09268v3.pdf
PWC	https://paperswithcode.com/paper/ms-marco-a-human-generated-machine-reading
Repo	https://github.com/microsoft/MSMARCO-Question-Answering
Framework	none

Full-Capacity Unitary Recurrent Neural Networks


Title	Full-Capacity Unitary Recurrent Neural Networks
Authors	Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, Les Atlas
Abstract	Recurrent neural networks are powerful models for processing sequential data, but they are generally plagued by vanishing and exploding gradient problems. Unitary recurrent neural networks (uRNNs), which use unitary recurrence matrices, have recently been proposed as a means to avoid these issues. However, in previous experiments, the recurrence matrices were restricted to be a product of parameterized unitary matrices, and an open question remains: when does such a parameterization fail to represent all unitary matrices, and how does this restricted representational capacity limit what can be learned? To address this question, we propose full-capacity uRNNs that optimize their recurrence matrix over all unitary matrices, leading to significantly improved performance over uRNNs that use a restricted-capacity recurrence matrix. Our contribution consists of two main components. First, we provide a theoretical argument to determine if a unitary parameterization has restricted capacity. Using this argument, we show that a recently proposed unitary parameterization has restricted capacity for hidden state dimension greater than 7. Second, we show how a complete, full-capacity unitary recurrence matrix can be optimized over the differentiable manifold of unitary matrices. The resulting multiplicative gradient step is very simple and does not require gradient clipping or learning rate adaptation. We confirm the utility of our claims by empirically evaluating our new full-capacity uRNNs on both synthetic and natural data, achieving superior performance compared to both LSTMs and the original restricted-capacity uRNNs.
Tasks	Sequential Image Classification
Published	2016-10-31
URL	http://arxiv.org/abs/1611.00035v1
PDF	http://arxiv.org/pdf/1611.00035v1.pdf
PWC	https://paperswithcode.com/paper/full-capacity-unitary-recurrent-neural
Repo	https://github.com/stwisdom/urnn
Framework	none

Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution


Title	Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution
Authors	Sebastian Ruder, Parsa Ghaffari, John G. Breslin
Abstract	Convolutional neural networks (CNNs) have demonstrated superior capability for extracting information from raw signals in computer vision. Recently, character-level and multi-channel CNNs have exhibited excellent performance for sentence classification tasks. We apply CNNs to large-scale authorship attribution, which aims to determine an unknown text’s author among many candidate authors, motivated by their ability to process character-level signals and to differentiate between a large number of classes, while making fast predictions in comparison to state-of-the-art approaches. We extensively evaluate CNN-based approaches that leverage word and character channels and compare them against state-of-the-art methods for a large range of author numbers, shedding new light on traditional approaches. We show that character-level CNNs outperform the state-of-the-art on four out of five datasets in different domains. Additionally, we present the first application of authorship attribution to reddit.
Tasks	Sentence Classification
Published	2016-09-21
URL	http://arxiv.org/abs/1609.06686v1
PDF	http://arxiv.org/pdf/1609.06686v1.pdf
PWC	https://paperswithcode.com/paper/character-level-and-multi-channel
Repo	https://github.com/asad1996172/Authorship-attribution-using-CNN
Framework	none

BASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification


Title	BASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification
Authors	Anirban Santara, Kaustubh Mani, Pranoot Hatwar, Ankit Singh, Ankur Garg, Kirti Padia, Pabitra Mitra
Abstract	Deep learning based landcover classification algorithms have recently been proposed in literature. In hyperspectral images (HSI) they face the challenges of large dimensionality, spatial variability of spectral signatures and scarcity of labeled data. In this article we propose an end-to-end deep learning architecture that extracts band specific spectral-spatial features and performs landcover classification. The architecture has fewer independent connection weights and thus requires lesser number of training data. The method is found to outperform the highest reported accuracies on popular hyperspectral image data sets.
Tasks	Hyperspectral Image Classification, Image Classification
Published	2016-12-01
URL	http://arxiv.org/abs/1612.00144v2
PDF	http://arxiv.org/pdf/1612.00144v2.pdf
PWC	https://paperswithcode.com/paper/bass-net-band-adaptive-spectral-spatial
Repo	https://github.com/kaustubh0mani/BASS-Net
Framework	torch