Paper Group ANR 415
Multimodal Semantic Simulations of Linguistically Underspecified Motion Events. On Optimality Conditions for Auto-Encoder Signal Recovery. Bayesian Hyperparameter Optimization for Ensemble Learning. Competitive analysis of the top-K ranking problem. End-to-End Neural Sentence Ordering Using Pointer Network. A Novel Representation of Neural Networks …
Multimodal Semantic Simulations of Linguistically Underspecified Motion Events
Title | Multimodal Semantic Simulations of Linguistically Underspecified Motion Events |
Authors | Nikhil Krishnaswamy, James Pustejovsky |
Abstract | In this paper, we describe a system for generating three-dimensional visual simulations of natural language motion expressions. We use a rich formal model of events and their participants to generate simulations that satisfy the minimal constraints entailed by the associated utterance, relying on semantic knowledge of physical objects and motion events. This paper outlines technical considerations and discusses implementing the aforementioned semantic models into such a system. |
Tasks | |
Published | 2016-10-03 |
URL | http://arxiv.org/abs/1610.00602v1 |
http://arxiv.org/pdf/1610.00602v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-semantic-simulations-of |
Repo | |
Framework | |
On Optimality Conditions for Auto-Encoder Signal Recovery
Title | On Optimality Conditions for Auto-Encoder Signal Recovery |
Authors | Devansh Arpit, Yingbo Zhou, Hung Q. Ngo, Nils Napp, Venu Govindaraju |
Abstract | Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost. The useful representations learned are often found to be sparse and distributed. On the other hand, compressed sensing and sparse coding assume a data generating process, where the observed data is generated from some true latent signal source, and try to recover the corresponding signal from measurements. Looking at auto-encoders from this \textit{signal recovery perspective} enables us to have a more coherent view of these techniques. In this paper, in particular, we show that the \textit{true} hidden representation can be approximately recovered if the weight matrices are highly incoherent with unit $ \ell^{2} $ row length and the bias vectors takes the value (approximately) equal to the negative of the data mean. The recovery also becomes more and more accurate as the sparsity in hidden signals increases. Additionally, we empirically demonstrate that auto-encoders are capable of recovering the data generating dictionary when only data samples are given. |
Tasks | |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.07145v2 |
http://arxiv.org/pdf/1605.07145v2.pdf | |
PWC | https://paperswithcode.com/paper/on-optimality-conditions-for-auto-encoder |
Repo | |
Framework | |
Bayesian Hyperparameter Optimization for Ensemble Learning
Title | Bayesian Hyperparameter Optimization for Ensemble Learning |
Authors | Julien-Charles Lévesque, Christian Gagné, Robert Sabourin |
Abstract | In this paper, we bridge the gap between hyperparameter optimization and ensemble learning by performing Bayesian optimization of an ensemble with regards to its hyperparameters. Our method consists in building a fixed-size ensemble, optimizing the configuration of one classifier of the ensemble at each iteration of the hyperparameter optimization algorithm, taking into consideration the interaction with the other models when evaluating potential performances. We also consider the case where the ensemble is to be reconstructed at the end of the hyperparameter optimization phase, through a greedy selection over the pool of models generated during the optimization. We study the performance of our proposed method on three different hyperparameter spaces, showing that our approach is better than both the best single model and a greedy ensemble construction over the models produced by a standard Bayesian optimization. |
Tasks | Hyperparameter Optimization |
Published | 2016-05-20 |
URL | http://arxiv.org/abs/1605.06394v1 |
http://arxiv.org/pdf/1605.06394v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-hyperparameter-optimization-for |
Repo | |
Framework | |
Competitive analysis of the top-K ranking problem
Title | Competitive analysis of the top-K ranking problem |
Authors | Xi Chen, Sivakanth Gopi, Jieming Mao, Jon Schneider |
Abstract | Motivated by applications in recommender systems, web search, social choice and crowdsourcing, we consider the problem of identifying the set of top $K$ items from noisy pairwise comparisons. In our setting, we are non-actively given $r$ pairwise comparisons between each pair of $n$ items, where each comparison has noise constrained by a very general noise model called the strong stochastic transitivity (SST) model. We analyze the competitive ratio of algorithms for the top-$K$ problem. In particular, we present a linear time algorithm for the top-$K$ problem which has a competitive ratio of $\tilde{O}(\sqrt{n})$; i.e. to solve any instance of top-$K$, our algorithm needs at most $\tilde{O}(\sqrt{n})$ times as many samples needed as the best possible algorithm for that instance (in contrast, all previous known algorithms for the top-$K$ problem have competitive ratios of $\tilde{\Omega}(n)$ or worse). We further show that this is tight: any algorithm for the top-$K$ problem has competitive ratio at least $\tilde{\Omega}(\sqrt{n})$. |
Tasks | Recommendation Systems |
Published | 2016-05-12 |
URL | http://arxiv.org/abs/1605.03933v1 |
http://arxiv.org/pdf/1605.03933v1.pdf | |
PWC | https://paperswithcode.com/paper/competitive-analysis-of-the-top-k-ranking |
Repo | |
Framework | |
End-to-End Neural Sentence Ordering Using Pointer Network
Title | End-to-End Neural Sentence Ordering Using Pointer Network |
Authors | Jingjing Gong, Xinchi Chen, Xipeng Qiu, Xuanjing Huang |
Abstract | Sentence ordering is one of important tasks in NLP. Previous works mainly focused on improving its performance by using pair-wise strategy. However, it is nontrivial for pair-wise models to incorporate the contextual sentence information. In addition, error prorogation could be introduced by using the pipeline strategy in pair-wise models. In this paper, we propose an end-to-end neural approach to address the sentence ordering problem, which uses the pointer network (Ptr-Net) to alleviate the error propagation problem and utilize the whole contextual information. Experimental results show the effectiveness of the proposed model. Source codes and dataset of this paper are available. |
Tasks | Sentence Ordering |
Published | 2016-11-15 |
URL | http://arxiv.org/abs/1611.04953v2 |
http://arxiv.org/pdf/1611.04953v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-neural-sentence-ordering-using |
Repo | |
Framework | |
A Novel Representation of Neural Networks
Title | A Novel Representation of Neural Networks |
Authors | Anthony Caterini, Dong Eui Chang |
Abstract | Deep Neural Networks (DNNs) have become very popular for prediction in many areas. Their strength is in representation with a high number of parameters that are commonly learned via gradient descent or similar optimization methods. However, the representation is non-standardized, and the gradient calculation methods are often performed using component-based approaches that break parameters down into scalar units, instead of considering the parameters as whole entities. In this work, these problems are addressed. Standard notation is used to represent DNNs in a compact framework. Gradients of DNN loss functions are calculated directly over the inner product space on which the parameters are defined. This framework is general and is applied to two common network types: the Multilayer Perceptron and the Deep Autoencoder. |
Tasks | |
Published | 2016-10-05 |
URL | http://arxiv.org/abs/1610.01549v2 |
http://arxiv.org/pdf/1610.01549v2.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-representation-of-neural-networks |
Repo | |
Framework | |
Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes
Title | Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes |
Authors | Mengye Ren, Renjie Liao, Raquel Urtasun, Fabian H. Sinz, Richard S. Zemel |
Abstract | Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. Our second contribution is the finding that a small modification to these normalization schemes, in conjunction with a sparse regularizer on the activations, leads to significant benefits over standard normalization techniques. We demonstrate the effectiveness of our unified divisive normalization framework in the context of convolutional neural nets and recurrent neural networks, showing improvements over baselines in image classification, language modeling as well as super-resolution. |
Tasks | Image Classification, Language Modelling, Super-Resolution |
Published | 2016-11-14 |
URL | http://arxiv.org/abs/1611.04520v2 |
http://arxiv.org/pdf/1611.04520v2.pdf | |
PWC | https://paperswithcode.com/paper/normalizing-the-normalizers-comparing-and |
Repo | |
Framework | |
Herding as a Learning System with Edge-of-Chaos Dynamics
Title | Herding as a Learning System with Edge-of-Chaos Dynamics |
Authors | Yutian Chen, Max Welling |
Abstract | Herding defines a deterministic dynamical system at the edge of chaos. It generates a sequence of model states and parameters by alternating parameter perturbations with state maximizations, where the sequence of states can be interpreted as “samples” from an associated MRF model. Herding differs from maximum likelihood estimation in that the sequence of parameters does not converge to a fixed point and differs from an MCMC posterior sampling approach in that the sequence of states is generated deterministically. Herding may be interpreted as a"perturb and map” method where the parameter perturbations are generated using a deterministic nonlinear dynamical system rather than randomly from a Gumbel distribution. This chapter studies the distinct statistical characteristics of the herding algorithm and shows that the fast convergence rate of the controlled moments may be attributed to edge of chaos dynamics. The herding algorithm can also be generalized to models with latent variables and to a discriminative learning setting. The perceptron cycling theorem ensures that the fast moment matching property is preserved in the more general framework. |
Tasks | |
Published | 2016-02-09 |
URL | http://arxiv.org/abs/1602.03014v2 |
http://arxiv.org/pdf/1602.03014v2.pdf | |
PWC | https://paperswithcode.com/paper/herding-as-a-learning-system-with-edge-of |
Repo | |
Framework | |
Optimistic and Pessimistic Neural Networks for Scene and Object Recognition
Title | Optimistic and Pessimistic Neural Networks for Scene and Object Recognition |
Authors | Rene Grzeszick, Sebastian Sudholt, Gernot A. Fink |
Abstract | In this paper the application of uncertainty modeling to convolutional neural networks is evaluated. A novel method for adjusting the network’s predictions based on uncertainty information is introduced. This allows the network to be either optimistic or pessimistic in its prediction scores. The proposed method builds on the idea of applying dropout at test time and sampling a predictive mean and variance from the network’s output. Besides the methodological aspects, implementation details allowing for a fast evaluation are presented. Furthermore, a multilabel network architecture is introduced that strongly benefits from the presented approach. In the evaluation it will be shown that modeling uncertainty allows for improving the performance of a given model purely at test time without any further training steps. The evaluation considers several applications in the field of computer vision, including object classification and detection as well as scene attribute recognition. |
Tasks | Object Classification, Object Recognition |
Published | 2016-09-26 |
URL | http://arxiv.org/abs/1609.07982v2 |
http://arxiv.org/pdf/1609.07982v2.pdf | |
PWC | https://paperswithcode.com/paper/optimistic-and-pessimistic-neural-networks |
Repo | |
Framework | |
Identity-sensitive Word Embedding through Heterogeneous Networks
Title | Identity-sensitive Word Embedding through Heterogeneous Networks |
Authors | Jian Tang, Meng Qu, Qiaozhu Mei |
Abstract | Most existing word embedding approaches do not distinguish the same words in different contexts, therefore ignoring their contextual meanings. As a result, the learned embeddings of these words are usually a mixture of multiple meanings. In this paper, we acknowledge multiple identities of the same word in different contexts and learn the \textbf{identity-sensitive} word embeddings. Based on an identity-labeled text corpora, a heterogeneous network of words and word identities is constructed to model different-levels of word co-occurrences. The heterogeneous network is further embedded into a low-dimensional space through a principled network embedding approach, through which we are able to obtain the embeddings of words and the embeddings of word identities. We study three different types of word identities including topics, sentiments and categories. Experimental results on real-world data sets show that the identity-sensitive word embeddings learned by our approach indeed capture different meanings of words and outperforms competitive methods on tasks including text classification and word similarity computation. |
Tasks | Network Embedding, Text Classification, Word Embeddings |
Published | 2016-11-29 |
URL | http://arxiv.org/abs/1611.09878v1 |
http://arxiv.org/pdf/1611.09878v1.pdf | |
PWC | https://paperswithcode.com/paper/identity-sensitive-word-embedding-through |
Repo | |
Framework | |
A Dynamic Window Neural Network for CCG Supertagging
Title | A Dynamic Window Neural Network for CCG Supertagging |
Authors | Huijia Wu, Jiajun Zhang, Chengqing Zong |
Abstract | Combinatory Category Grammar (CCG) supertagging is a task to assign lexical categories to each word in a sentence. Almost all previous methods use fixed context window sizes as input features. However, it is obvious that different tags usually rely on different context window sizes. These motivate us to build a supertagger with a dynamic window approach, which can be treated as an attention mechanism on the local contexts. Applying dropout on the dynamic filters can be seen as drop on words directly, which is superior to the regular dropout on word embeddings. We use this approach to demonstrate the state-of-the-art CCG supertagging performance on the standard test set. |
Tasks | CCG Supertagging, Word Embeddings |
Published | 2016-10-10 |
URL | http://arxiv.org/abs/1610.02749v1 |
http://arxiv.org/pdf/1610.02749v1.pdf | |
PWC | https://paperswithcode.com/paper/a-dynamic-window-neural-network-for-ccg |
Repo | |
Framework | |
Manifolds of Projective Shapes
Title | Manifolds of Projective Shapes |
Authors | Thomas Hotz, Florian Kelma, John T. Kent |
Abstract | The projective shape of a configuration of k points or “landmarks” in RP(d) consists of the information that is invariant under projective transformations and hence is reconstructable from uncalibrated camera views. Mathematically, the space of projective shapes for these k landmarks can be described as the quotient space of k copies of RP(d) modulo the action of the projective linear group PGL(d). Using homogeneous coordinates, such configurations can be described as real k-times-(d+1)-dimensional matrices given up to left-multiplication of non-singular diagonal matrices, while the group PGL(d) acts as GL(d+1) from the right. The main purpose of this paper is to give a detailed examination of the topology of projective shape space, and, using matrix notation, it is shown how to derive subsets that are in a certain sense maximal, differentiable Hausdorff manifolds which can be provided with a Riemannian metric. A special subclass of the projective shapes consists of the Tyler regular shapes, for which geometrically motivated pre-shapes can be defined, thus allowing for the construction of a natural Riemannian metric. |
Tasks | |
Published | 2016-02-13 |
URL | http://arxiv.org/abs/1602.04330v4 |
http://arxiv.org/pdf/1602.04330v4.pdf | |
PWC | https://paperswithcode.com/paper/manifolds-of-projective-shapes |
Repo | |
Framework | |
Convergence rate of stochastic k-means
Title | Convergence rate of stochastic k-means |
Authors | Cheng Tang, Claire Monteleoni |
Abstract | We analyze online \cite{BottouBengio} and mini-batch \cite{Sculley} $k$-means variants. Both scale up the widely used $k$-means algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that starting with any initial solution, they converge to a “local optimum” at rate $O(\frac{1}{t})$ (in terms of the $k$-means objective) under general conditions. In addition, we show if the dataset is clusterable, when initialized with a simple and scalable seeding algorithm, mini-batch $k$-means converges to an optimal $k$-means solution at rate $O(\frac{1}{t})$ with high probability. The $k$-means objective is non-convex and non-differentiable: we exploit ideas from recent work on stochastic gradient descent for non-convex problems \cite{ge:sgd_tensor, balsubramani13} by providing a novel characterization of the trajectory of $k$-means algorithm on its solution space, and circumvent the non-differentiability problem via geometric insights about $k$-means update. |
Tasks | |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05132v1 |
http://arxiv.org/pdf/1611.05132v1.pdf | |
PWC | https://paperswithcode.com/paper/convergence-rate-of-stochastic-k-means |
Repo | |
Framework | |
Tensor Sparse and Low-Rank based Submodule Clustering Method for Multi-way Data
Title | Tensor Sparse and Low-Rank based Submodule Clustering Method for Multi-way Data |
Authors | Xinglin Piao, Yongli Hu, Junbin Gao, Yanfeng Sun, Zhouchen Lin, Baocai Yin |
Abstract | A new submodule clustering method via sparse and low-rank representation for multi-way data is proposed in this paper. Instead of reshaping multi-way data into vectors, this method maintains their natural orders to preserve data intrinsic structures, e.g., image data kept as matrices. To implement clustering, the multi-way data, viewed as tensors, are represented by the proposed tensor sparse and low-rank model to obtain its submodule representation, called a free module, which is finally used for spectral clustering. The proposed method extends the conventional subspace clustering method based on sparse and low-rank representation to multi-way data submodule clustering by combining t-product operator. The new method is tested on several public datasets, including synthetical data, video sequences and toy images. The experiments show that the new method outperforms the state-of-the-art methods, such as Sparse Subspace Clustering (SSC), Low-Rank Representation (LRR), Ordered Subspace Clustering (OSC), Robust Latent Low Rank Representation (RobustLatLRR) and Sparse Submodule Clustering method (SSmC). |
Tasks | |
Published | 2016-01-02 |
URL | http://arxiv.org/abs/1601.00149v7 |
http://arxiv.org/pdf/1601.00149v7.pdf | |
PWC | https://paperswithcode.com/paper/tensor-sparse-and-low-rank-based-submodule |
Repo | |
Framework | |
Learning an Optimization Algorithm through Human Design Iterations
Title | Learning an Optimization Algorithm through Human Design Iterations |
Authors | Thurston Sexton, Max Yi Ren |
Abstract | Solving optimal design problems through crowdsourcing faces a dilemma: On one hand, human beings have been shown to be more effective than algorithms at searching for good solutions of certain real-world problems with high-dimensional or discrete solution spaces; on the other hand, the cost of setting up crowdsourcing environments, the uncertainty in the crowd’s domain-specific competence, and the lack of commitment of the crowd, all contribute to the lack of real-world application of design crowdsourcing. We are thus motivated to investigate a solution-searching mechanism where an optimization algorithm is tuned based on human demonstrations on solution searching, so that the search can be continued after human participants abandon the problem. To do so, we model the iterative search process as a Bayesian Optimization (BO) algorithm, and propose an inverse BO (IBO) algorithm to find the maximum likelihood estimators of the BO parameters based on human solutions. We show through a vehicle design and control problem that the search performance of BO can be improved by recovering its parameters based on an effective human search. Thus, IBO has the potential to improve the success rate of design crowdsourcing activities, by requiring only good search strategies instead of good solutions from the crowd. |
Tasks | |
Published | 2016-08-24 |
URL | http://arxiv.org/abs/1608.06984v4 |
http://arxiv.org/pdf/1608.06984v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-an-optimization-algorithm-through |
Repo | |
Framework | |