May 6, 2019

2755 words 13 mins read

Paper Group ANR 415

Multimodal Semantic Simulations of Linguistically Underspecified Motion Events. On Optimality Conditions for Auto-Encoder Signal Recovery. Bayesian Hyperparameter Optimization for Ensemble Learning. Competitive analysis of the top-K ranking problem. End-to-End Neural Sentence Ordering Using Pointer Network. A Novel Representation of Neural Networks …

Multimodal Semantic Simulations of Linguistically Underspecified Motion Events


Title	Multimodal Semantic Simulations of Linguistically Underspecified Motion Events
Authors	Nikhil Krishnaswamy, James Pustejovsky
Abstract	In this paper, we describe a system for generating three-dimensional visual simulations of natural language motion expressions. We use a rich formal model of events and their participants to generate simulations that satisfy the minimal constraints entailed by the associated utterance, relying on semantic knowledge of physical objects and motion events. This paper outlines technical considerations and discusses implementing the aforementioned semantic models into such a system.
Tasks
Published	2016-10-03
URL	http://arxiv.org/abs/1610.00602v1
PDF	http://arxiv.org/pdf/1610.00602v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-semantic-simulations-of
Repo
Framework

On Optimality Conditions for Auto-Encoder Signal Recovery


Title	On Optimality Conditions for Auto-Encoder Signal Recovery
Authors	Devansh Arpit, Yingbo Zhou, Hung Q. Ngo, Nils Napp, Venu Govindaraju
Abstract	Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost. The useful representations learned are often found to be sparse and distributed. On the other hand, compressed sensing and sparse coding assume a data generating process, where the observed data is generated from some true latent signal source, and try to recover the corresponding signal from measurements. Looking at auto-encoders from this \textit{signal recovery perspective} enables us to have a more coherent view of these techniques. In this paper, in particular, we show that the \textit{true} hidden representation can be approximately recovered if the weight matrices are highly incoherent with unit $ \ell^{2} $ row length and the bias vectors takes the value (approximately) equal to the negative of the data mean. The recovery also becomes more and more accurate as the sparsity in hidden signals increases. Additionally, we empirically demonstrate that auto-encoders are capable of recovering the data generating dictionary when only data samples are given.
Tasks
Published	2016-05-23
URL	http://arxiv.org/abs/1605.07145v2
PDF	http://arxiv.org/pdf/1605.07145v2.pdf
PWC	https://paperswithcode.com/paper/on-optimality-conditions-for-auto-encoder
Repo
Framework

Bayesian Hyperparameter Optimization for Ensemble Learning


Title	Bayesian Hyperparameter Optimization for Ensemble Learning
Authors	Julien-Charles Lévesque, Christian Gagné, Robert Sabourin
Abstract	In this paper, we bridge the gap between hyperparameter optimization and ensemble learning by performing Bayesian optimization of an ensemble with regards to its hyperparameters. Our method consists in building a fixed-size ensemble, optimizing the configuration of one classifier of the ensemble at each iteration of the hyperparameter optimization algorithm, taking into consideration the interaction with the other models when evaluating potential performances. We also consider the case where the ensemble is to be reconstructed at the end of the hyperparameter optimization phase, through a greedy selection over the pool of models generated during the optimization. We study the performance of our proposed method on three different hyperparameter spaces, showing that our approach is better than both the best single model and a greedy ensemble construction over the models produced by a standard Bayesian optimization.
Tasks	Hyperparameter Optimization
Published	2016-05-20
URL	http://arxiv.org/abs/1605.06394v1
PDF	http://arxiv.org/pdf/1605.06394v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-hyperparameter-optimization-for
Repo
Framework

Competitive analysis of the top-K ranking problem


Title	Competitive analysis of the top-K ranking problem
Authors	Xi Chen, Sivakanth Gopi, Jieming Mao, Jon Schneider
Abstract	Motivated by applications in recommender systems, web search, social choice and crowdsourcing, we consider the problem of identifying the set of top $K$ items from noisy pairwise comparisons. In our setting, we are non-actively given $r$ pairwise comparisons between each pair of $n$ items, where each comparison has noise constrained by a very general noise model called the strong stochastic transitivity (SST) model. We analyze the competitive ratio of algorithms for the top-$K$ problem. In particular, we present a linear time algorithm for the top-$K$ problem which has a competitive ratio of $\tilde{O}(\sqrt{n})$; i.e. to solve any instance of top-$K$, our algorithm needs at most $\tilde{O}(\sqrt{n})$ times as many samples needed as the best possible algorithm for that instance (in contrast, all previous known algorithms for the top-$K$ problem have competitive ratios of $\tilde{\Omega}(n)$ or worse). We further show that this is tight: any algorithm for the top-$K$ problem has competitive ratio at least $\tilde{\Omega}(\sqrt{n})$.
Tasks	Recommendation Systems
Published	2016-05-12
URL	http://arxiv.org/abs/1605.03933v1
PDF	http://arxiv.org/pdf/1605.03933v1.pdf
PWC	https://paperswithcode.com/paper/competitive-analysis-of-the-top-k-ranking
Repo
Framework

End-to-End Neural Sentence Ordering Using Pointer Network


Title	End-to-End Neural Sentence Ordering Using Pointer Network
Authors	Jingjing Gong, Xinchi Chen, Xipeng Qiu, Xuanjing Huang
Abstract	Sentence ordering is one of important tasks in NLP. Previous works mainly focused on improving its performance by using pair-wise strategy. However, it is nontrivial for pair-wise models to incorporate the contextual sentence information. In addition, error prorogation could be introduced by using the pipeline strategy in pair-wise models. In this paper, we propose an end-to-end neural approach to address the sentence ordering problem, which uses the pointer network (Ptr-Net) to alleviate the error propagation problem and utilize the whole contextual information. Experimental results show the effectiveness of the proposed model. Source codes and dataset of this paper are available.
Tasks	Sentence Ordering
Published	2016-11-15
URL	http://arxiv.org/abs/1611.04953v2
PDF	http://arxiv.org/pdf/1611.04953v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-neural-sentence-ordering-using
Repo
Framework

A Novel Representation of Neural Networks


Title	A Novel Representation of Neural Networks
Authors	Anthony Caterini, Dong Eui Chang
Abstract	Deep Neural Networks (DNNs) have become very popular for prediction in many areas. Their strength is in representation with a high number of parameters that are commonly learned via gradient descent or similar optimization methods. However, the representation is non-standardized, and the gradient calculation methods are often performed using component-based approaches that break parameters down into scalar units, instead of considering the parameters as whole entities. In this work, these problems are addressed. Standard notation is used to represent DNNs in a compact framework. Gradients of DNN loss functions are calculated directly over the inner product space on which the parameters are defined. This framework is general and is applied to two common network types: the Multilayer Perceptron and the Deep Autoencoder.
Tasks
Published	2016-10-05
URL	http://arxiv.org/abs/1610.01549v2
PDF	http://arxiv.org/pdf/1610.01549v2.pdf
PWC	https://paperswithcode.com/paper/a-novel-representation-of-neural-networks
Repo
Framework

Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes


Title	Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes
Authors	Mengye Ren, Renjie Liao, Raquel Urtasun, Fabian H. Sinz, Richard S. Zemel
Abstract	Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. Our second contribution is the finding that a small modification to these normalization schemes, in conjunction with a sparse regularizer on the activations, leads to significant benefits over standard normalization techniques. We demonstrate the effectiveness of our unified divisive normalization framework in the context of convolutional neural nets and recurrent neural networks, showing improvements over baselines in image classification, language modeling as well as super-resolution.
Tasks	Image Classification, Language Modelling, Super-Resolution
Published	2016-11-14
URL	http://arxiv.org/abs/1611.04520v2
PDF	http://arxiv.org/pdf/1611.04520v2.pdf
PWC	https://paperswithcode.com/paper/normalizing-the-normalizers-comparing-and
Repo
Framework

Herding as a Learning System with Edge-of-Chaos Dynamics


Title	Herding as a Learning System with Edge-of-Chaos Dynamics
Authors	Yutian Chen, Max Welling
Abstract	Herding defines a deterministic dynamical system at the edge of chaos. It generates a sequence of model states and parameters by alternating parameter perturbations with state maximizations, where the sequence of states can be interpreted as “samples” from an associated MRF model. Herding differs from maximum likelihood estimation in that the sequence of parameters does not converge to a fixed point and differs from an MCMC posterior sampling approach in that the sequence of states is generated deterministically. Herding may be interpreted as a"perturb and map” method where the parameter perturbations are generated using a deterministic nonlinear dynamical system rather than randomly from a Gumbel distribution. This chapter studies the distinct statistical characteristics of the herding algorithm and shows that the fast convergence rate of the controlled moments may be attributed to edge of chaos dynamics. The herding algorithm can also be generalized to models with latent variables and to a discriminative learning setting. The perceptron cycling theorem ensures that the fast moment matching property is preserved in the more general framework.
Tasks
Published	2016-02-09
URL	http://arxiv.org/abs/1602.03014v2
PDF	http://arxiv.org/pdf/1602.03014v2.pdf
PWC	https://paperswithcode.com/paper/herding-as-a-learning-system-with-edge-of
Repo
Framework

Optimistic and Pessimistic Neural Networks for Scene and Object Recognition


Title	Optimistic and Pessimistic Neural Networks for Scene and Object Recognition
Authors	Rene Grzeszick, Sebastian Sudholt, Gernot A. Fink
Abstract	In this paper the application of uncertainty modeling to convolutional neural networks is evaluated. A novel method for adjusting the network’s predictions based on uncertainty information is introduced. This allows the network to be either optimistic or pessimistic in its prediction scores. The proposed method builds on the idea of applying dropout at test time and sampling a predictive mean and variance from the network’s output. Besides the methodological aspects, implementation details allowing for a fast evaluation are presented. Furthermore, a multilabel network architecture is introduced that strongly benefits from the presented approach. In the evaluation it will be shown that modeling uncertainty allows for improving the performance of a given model purely at test time without any further training steps. The evaluation considers several applications in the field of computer vision, including object classification and detection as well as scene attribute recognition.
Tasks	Object Classification, Object Recognition
Published	2016-09-26
URL	http://arxiv.org/abs/1609.07982v2
PDF	http://arxiv.org/pdf/1609.07982v2.pdf
PWC	https://paperswithcode.com/paper/optimistic-and-pessimistic-neural-networks
Repo
Framework

Identity-sensitive Word Embedding through Heterogeneous Networks


Title	Identity-sensitive Word Embedding through Heterogeneous Networks
Authors	Jian Tang, Meng Qu, Qiaozhu Mei
Abstract	Most existing word embedding approaches do not distinguish the same words in different contexts, therefore ignoring their contextual meanings. As a result, the learned embeddings of these words are usually a mixture of multiple meanings. In this paper, we acknowledge multiple identities of the same word in different contexts and learn the \textbf{identity-sensitive} word embeddings. Based on an identity-labeled text corpora, a heterogeneous network of words and word identities is constructed to model different-levels of word co-occurrences. The heterogeneous network is further embedded into a low-dimensional space through a principled network embedding approach, through which we are able to obtain the embeddings of words and the embeddings of word identities. We study three different types of word identities including topics, sentiments and categories. Experimental results on real-world data sets show that the identity-sensitive word embeddings learned by our approach indeed capture different meanings of words and outperforms competitive methods on tasks including text classification and word similarity computation.
Tasks	Network Embedding, Text Classification, Word Embeddings
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09878v1
PDF	http://arxiv.org/pdf/1611.09878v1.pdf
PWC	https://paperswithcode.com/paper/identity-sensitive-word-embedding-through
Repo
Framework

A Dynamic Window Neural Network for CCG Supertagging


Title	A Dynamic Window Neural Network for CCG Supertagging
Authors	Huijia Wu, Jiajun Zhang, Chengqing Zong
Abstract	Combinatory Category Grammar (CCG) supertagging is a task to assign lexical categories to each word in a sentence. Almost all previous methods use fixed context window sizes as input features. However, it is obvious that different tags usually rely on different context window sizes. These motivate us to build a supertagger with a dynamic window approach, which can be treated as an attention mechanism on the local contexts. Applying dropout on the dynamic filters can be seen as drop on words directly, which is superior to the regular dropout on word embeddings. We use this approach to demonstrate the state-of-the-art CCG supertagging performance on the standard test set.
Tasks	CCG Supertagging, Word Embeddings
Published	2016-10-10
URL	http://arxiv.org/abs/1610.02749v1
PDF	http://arxiv.org/pdf/1610.02749v1.pdf
PWC	https://paperswithcode.com/paper/a-dynamic-window-neural-network-for-ccg
Repo
Framework

Manifolds of Projective Shapes


Title	Manifolds of Projective Shapes
Authors	Thomas Hotz, Florian Kelma, John T. Kent
Abstract	The projective shape of a configuration of k points or “landmarks” in RP(d) consists of the information that is invariant under projective transformations and hence is reconstructable from uncalibrated camera views. Mathematically, the space of projective shapes for these k landmarks can be described as the quotient space of k copies of RP(d) modulo the action of the projective linear group PGL(d). Using homogeneous coordinates, such configurations can be described as real k-times-(d+1)-dimensional matrices given up to left-multiplication of non-singular diagonal matrices, while the group PGL(d) acts as GL(d+1) from the right. The main purpose of this paper is to give a detailed examination of the topology of projective shape space, and, using matrix notation, it is shown how to derive subsets that are in a certain sense maximal, differentiable Hausdorff manifolds which can be provided with a Riemannian metric. A special subclass of the projective shapes consists of the Tyler regular shapes, for which geometrically motivated pre-shapes can be defined, thus allowing for the construction of a natural Riemannian metric.
Tasks
Published	2016-02-13
URL	http://arxiv.org/abs/1602.04330v4
PDF	http://arxiv.org/pdf/1602.04330v4.pdf
PWC	https://paperswithcode.com/paper/manifolds-of-projective-shapes
Repo
Framework

Convergence rate of stochastic k-means


Title	Convergence rate of stochastic k-means
Authors	Cheng Tang, Claire Monteleoni
Abstract	We analyze online \cite{BottouBengio} and mini-batch \cite{Sculley} $k$-means variants. Both scale up the widely used $k$-means algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that starting with any initial solution, they converge to a “local optimum” at rate $O(\frac{1}{t})$ (in terms of the $k$-means objective) under general conditions. In addition, we show if the dataset is clusterable, when initialized with a simple and scalable seeding algorithm, mini-batch $k$-means converges to an optimal $k$-means solution at rate $O(\frac{1}{t})$ with high probability. The $k$-means objective is non-convex and non-differentiable: we exploit ideas from recent work on stochastic gradient descent for non-convex problems \cite{ge:sgd_tensor, balsubramani13} by providing a novel characterization of the trajectory of $k$-means algorithm on its solution space, and circumvent the non-differentiability problem via geometric insights about $k$-means update.
Tasks
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05132v1
PDF	http://arxiv.org/pdf/1611.05132v1.pdf
PWC	https://paperswithcode.com/paper/convergence-rate-of-stochastic-k-means
Repo
Framework

Tensor Sparse and Low-Rank based Submodule Clustering Method for Multi-way Data


Title	Tensor Sparse and Low-Rank based Submodule Clustering Method for Multi-way Data
Authors	Xinglin Piao, Yongli Hu, Junbin Gao, Yanfeng Sun, Zhouchen Lin, Baocai Yin
Abstract	A new submodule clustering method via sparse and low-rank representation for multi-way data is proposed in this paper. Instead of reshaping multi-way data into vectors, this method maintains their natural orders to preserve data intrinsic structures, e.g., image data kept as matrices. To implement clustering, the multi-way data, viewed as tensors, are represented by the proposed tensor sparse and low-rank model to obtain its submodule representation, called a free module, which is finally used for spectral clustering. The proposed method extends the conventional subspace clustering method based on sparse and low-rank representation to multi-way data submodule clustering by combining t-product operator. The new method is tested on several public datasets, including synthetical data, video sequences and toy images. The experiments show that the new method outperforms the state-of-the-art methods, such as Sparse Subspace Clustering (SSC), Low-Rank Representation (LRR), Ordered Subspace Clustering (OSC), Robust Latent Low Rank Representation (RobustLatLRR) and Sparse Submodule Clustering method (SSmC).
Tasks
Published	2016-01-02
URL	http://arxiv.org/abs/1601.00149v7
PDF	http://arxiv.org/pdf/1601.00149v7.pdf
PWC	https://paperswithcode.com/paper/tensor-sparse-and-low-rank-based-submodule
Repo
Framework

Learning an Optimization Algorithm through Human Design Iterations


Title	Learning an Optimization Algorithm through Human Design Iterations
Authors	Thurston Sexton, Max Yi Ren
Abstract	Solving optimal design problems through crowdsourcing faces a dilemma: On one hand, human beings have been shown to be more effective than algorithms at searching for good solutions of certain real-world problems with high-dimensional or discrete solution spaces; on the other hand, the cost of setting up crowdsourcing environments, the uncertainty in the crowd’s domain-specific competence, and the lack of commitment of the crowd, all contribute to the lack of real-world application of design crowdsourcing. We are thus motivated to investigate a solution-searching mechanism where an optimization algorithm is tuned based on human demonstrations on solution searching, so that the search can be continued after human participants abandon the problem. To do so, we model the iterative search process as a Bayesian Optimization (BO) algorithm, and propose an inverse BO (IBO) algorithm to find the maximum likelihood estimators of the BO parameters based on human solutions. We show through a vehicle design and control problem that the search performance of BO can be improved by recovering its parameters based on an effective human search. Thus, IBO has the potential to improve the success rate of design crowdsourcing activities, by requiring only good search strategies instead of good solutions from the crowd.
Tasks
Published	2016-08-24
URL	http://arxiv.org/abs/1608.06984v4
PDF	http://arxiv.org/pdf/1608.06984v4.pdf
PWC	https://paperswithcode.com/paper/learning-an-optimization-algorithm-through
Repo
Framework