Paper Group ANR 304
Exploiting the Short-term to Long-term Plasticity Transition in Memristive Nanodevice Learning Architectures. Globally Variance-Constrained Sparse Representation and Its Application in Image Set Coding. Attentive Explanations: Justifying Decisions and Pointing to the Evidence. Training a Hidden Markov Model with a Bayesian Spiking Neural Network. S …
Exploiting the Short-term to Long-term Plasticity Transition in Memristive Nanodevice Learning Architectures
Title | Exploiting the Short-term to Long-term Plasticity Transition in Memristive Nanodevice Learning Architectures |
Authors | Christopher H. Bennett, Selina La Barbera, Adrien F. Vincent, Fabien Alibart, Damien Querlioz |
Abstract | Memristive nanodevices offer new frontiers for computing systems that unite arithmetic and memory operations on-chip. Here, we explore the integration of electrochemical metallization cell (ECM) nanodevices with tunable filamentary switching in nanoscale learning systems. Such devices offer a natural transition between short-term plasticity (STP) and long-term plasticity (LTP). In this work, we show that this property can be exploited to efficiently solve noisy classification tasks. A single crossbar learning scheme is first introduced and evaluated. Perfect classification is possible only for simple input patterns, within critical timing parameters, and when device variability is weak. To overcome these limitations, a dual-crossbar learning system partly inspired by the extreme learning machine (ELM) approach is then introduced. This approach outperforms a conventional ELM-inspired system when the first layer is imprinted before training and testing, and especially so when variability in device timing evolution is considered: variability is therefore transformed from an issue to a feature. In attempting to classify the MNIST database under the same conditions, conventional ELM obtains 84% classification, the imprinted, uniform device system obtains 88% classification, and the imprinted, variable device system reaches 92% classification. We discuss benefits and drawbacks of both systems in terms of energy, complexity, area imprint, and speed. All these results highlight that tuning and exploiting intrinsic device timing parameters may be of central interest to future bio-inspired approximate computing systems. |
Tasks | |
Published | 2016-06-27 |
URL | http://arxiv.org/abs/1606.08366v1 |
http://arxiv.org/pdf/1606.08366v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-the-short-term-to-long-term |
Repo | |
Framework | |
Globally Variance-Constrained Sparse Representation and Its Application in Image Set Coding
Title | Globally Variance-Constrained Sparse Representation and Its Application in Image Set Coding |
Authors | Xiang Zhang, Jiarui Sun, Siwei Ma, Zhouchen Lin, Jian Zhang, Shiqi Wang, Wen Gao |
Abstract | Sparse representation leads to an efficient way to approximately recover a signal by the linear composition of a few bases from a learnt dictionary, based on which various successful applications have been achieved. However, in the scenario of data compression, its efficiency and popularity are hindered. It is because of the fact that encoding sparsely distributed coefficients may consume more bits for representing the index of nonzero coefficients. Therefore, introducing an accurate rate-constraint in sparse coding and dictionary learning becomes meaningful, which has not been fully exploited in the context of sparse representation. According to the Shannon entropy inequality, the variance of a Gaussian distributed data bounds its entropy, indicating the actual bitrate can be well estimated by its variance. Hence, a Globally Variance-Constrained Sparse Representation (GVCSR) model is proposed in this work, where a variance-constrained rate term is introduced to the optimization process. Specifically, we employ the Alternating Direction Method of Multipliers (ADMM) to solve the non-convex optimization problem for sparse coding and dictionary learning, both of them have shown the state-of-the-art rate-distortion performance for image representation. Furthermore, we investigate the potential of applying the GVCSR algorithm in the practical image set compression, where the optimized dictionary is trained to efficiently represent the images captured in similar scenarios by implicitly utilizing inter-image correlations. Experimental results have demonstrated superior rate-distortion performance against the state-of-the-art methods. |
Tasks | Dictionary Learning |
Published | 2016-08-17 |
URL | http://arxiv.org/abs/1608.04902v2 |
http://arxiv.org/pdf/1608.04902v2.pdf | |
PWC | https://paperswithcode.com/paper/globally-variance-constrained-sparse |
Repo | |
Framework | |
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
Title | Attentive Explanations: Justifying Decisions and Pointing to the Evidence |
Authors | Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach |
Abstract | Deep models are the defacto standard in visual decision models due to their impressive performance on a wide array of visual tasks. However, they are frequently seen as opaque and are unable to explain their decisions. In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions. We postulate that deep models can do this as well and propose our Pointing and Justification (PJ-X) model which can justify its decision with a sentence and point to the evidence by introspecting its decision and explanation process using an attention mechanism. Unfortunately there is no dataset available with reference explanations for visual decision making. We thus collect two datasets in two domains where it is interesting and challenging to explain decisions. First, we extend the visual question answering task to not only provide an answer but also a natural language explanation for the answer. Second, we focus on explaining human activities which is traditionally more challenging than object classification. We extensively evaluate our PJ-X model, both on the justification and pointing tasks, by comparing it to prior models and ablations using both automatic and human evaluations. |
Tasks | Decision Making, Object Classification, Question Answering, Visual Question Answering |
Published | 2016-12-14 |
URL | http://arxiv.org/abs/1612.04757v2 |
http://arxiv.org/pdf/1612.04757v2.pdf | |
PWC | https://paperswithcode.com/paper/attentive-explanations-justifying-decisions |
Repo | |
Framework | |
Training a Hidden Markov Model with a Bayesian Spiking Neural Network
Title | Training a Hidden Markov Model with a Bayesian Spiking Neural Network |
Authors | Amirhossein Tavanaei, Anthony S Maida |
Abstract | It is of some interest to understand how statistically based mechanisms for signal processing might be integrated with biologically motivated mechanisms such as neural networks. This paper explores a novel hybrid approach for classifying segments of sequential data, such as individual spoken works. The approach combines a hidden Markov model (HMM) with a spiking neural network (SNN). The HMM, consisting of states and transitions, forms a fixed backbone with nonadaptive transition probabilities. The SNN, however, implements a biologically based Bayesian computation that derives from the spike timing-dependent plasticity (STDP) learning rule. The emission (observation) probabilities of the HMM are represented in the SNN and trained with the STDP rule. A separate SNN, each with the same architecture, is associated with each of the states of the HMM. Because of the STDP training, each SNN implements an expectation maximization algorithm to learn the emission probabilities for one HMM state. The model was studied on synthesized spike-train data and also on spoken word data. Preliminary results suggest its performance compares favorably with other biologically motivated approaches. Because of the model’s uniqueness and initial promise, it warrants further study. It provides some new ideas on how the brain might implement the equivalent of an HMM in a neural circuit. |
Tasks | |
Published | 2016-06-02 |
URL | http://arxiv.org/abs/1606.00825v2 |
http://arxiv.org/pdf/1606.00825v2.pdf | |
PWC | https://paperswithcode.com/paper/training-a-hidden-markov-model-with-a |
Repo | |
Framework | |
Scalable Solution for Approximate Nearest Subspace Search
Title | Scalable Solution for Approximate Nearest Subspace Search |
Authors | Masakazu Iwamura, Masataka Konishi, Koichi Kise |
Abstract | Finding the nearest subspace is a fundamental problem and influential to many applications. In particular, a scalable solution that is fast and accurate for a large problem has a great impact. The existing methods for the problem are, however, useless in a large-scale problem with a large number of subspaces and high dimensionality of the feature space. A cause is that they are designed based on the traditional idea to represent a subspace by a single point. In this paper, we propose a scalable solution for the approximate nearest subspace search (ANSS) problem. Intuitively, the proposed method represents a subspace by multiple points unlike the existing methods. This makes a large-scale ANSS problem tractable. In the experiment with 3036 subspaces in the 1024-dimensional space, we confirmed that the proposed method was 7.3 times faster than the previous state-of-the-art without loss of accuracy. |
Tasks | |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08810v1 |
http://arxiv.org/pdf/1603.08810v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-solution-for-approximate-nearest |
Repo | |
Framework | |
Gov2Vec: Learning Distributed Representations of Institutions and Their Legal Text
Title | Gov2Vec: Learning Distributed Representations of Institutions and Their Legal Text |
Authors | John J. Nay |
Abstract | We compare policy differences across institutions by embedding representations of the entire legal corpus of each institution and the vocabulary shared across all corpora into a continuous vector space. We apply our method, Gov2Vec, to Supreme Court opinions, Presidential actions, and official summaries of Congressional bills. The model discerns meaningful differences between government branches. We also learn representations for more fine-grained word sources: individual Presidents and (2-year) Congresses. The similarities between learned representations of Congresses over time and sitting Presidents are negatively correlated with the bill veto rate, and the temporal ordering of Presidents and Congresses was implicitly learned from only text. With the resulting vectors we answer questions such as: how does Obama and the 113th House differ in addressing climate change and how does this vary from environmental or economic perspectives? Our work illustrates vector-arithmetic-based investigations of complex relationships between word sources based on their texts. We are extending this to create a more comprehensive legal semantic map. |
Tasks | |
Published | 2016-09-21 |
URL | http://arxiv.org/abs/1609.06616v2 |
http://arxiv.org/pdf/1609.06616v2.pdf | |
PWC | https://paperswithcode.com/paper/gov2vec-learning-distributed-representations |
Repo | |
Framework | |
Tuning parameter selection in high dimensional penalized likelihood
Title | Tuning parameter selection in high dimensional penalized likelihood |
Authors | Yingying Fan, Cheng Yong Tang |
Abstract | Determining how to appropriately select the tuning parameter is essential in penalized likelihood methods for high-dimensional data analysis. We examine this problem in the setting of penalized likelihood methods for generalized linear models, where the dimensionality of covariates p is allowed to increase exponentially with the sample size n. We propose to select the tuning parameter by optimizing the generalized information criterion (GIC) with an appropriate model complexity penalty. To ensure that we consistently identify the true model, a range for the model complexity penalty is identified in GIC. We find that this model complexity penalty should diverge at the rate of some power of $\log p$ depending on the tail probability behavior of the response variables. This reveals that using the AIC or BIC to select the tuning parameter may not be adequate for consistently identifying the true model. Based on our theoretical study, we propose a uniform choice of the model complexity penalty and show that the proposed approach consistently identifies the true model among candidate models with asymptotic probability one. We justify the performance of the proposed procedure by numerical simulations and a gene expression data analysis. |
Tasks | |
Published | 2016-05-11 |
URL | http://arxiv.org/abs/1605.03321v1 |
http://arxiv.org/pdf/1605.03321v1.pdf | |
PWC | https://paperswithcode.com/paper/tuning-parameter-selection-in-high |
Repo | |
Framework | |
Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input
Title | Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input |
Authors | Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, Christian Theobalt |
Abstract | Real-time simultaneous tracking of hands manipulating and interacting with external objects has many potential applications in augmented reality, tangible computing, and wearable computing. However, due to difficult occlusions, fast motions, and uniform hand appearance, jointly tracking hand and object pose is more challenging than tracking either of the two separately. Many previous approaches resort to complex multi-camera setups to remedy the occlusion problem and often employ expensive segmentation and optimization steps which makes real-time tracking impossible. In this paper, we propose a real-time solution that uses a single commodity RGB-D camera. The core of our approach is a 3D articulated Gaussian mixture alignment strategy tailored to hand-object tracking that allows fast pose optimization. The alignment energy uses novel regularizers to address occlusions and hand-object contacts. For added robustness, we guide the optimization with discriminative part classification of the hand and segmentation of the object. We conducted extensive experiments on several existing datasets and introduce a new annotated hand-object dataset. Quantitative and qualitative results show the key advantages of our method: speed, accuracy, and robustness. |
Tasks | Object Tracking |
Published | 2016-10-16 |
URL | http://arxiv.org/abs/1610.04889v1 |
http://arxiv.org/pdf/1610.04889v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-joint-tracking-of-a-hand |
Repo | |
Framework | |
An Empirical Exploration of Skip Connections for Sequential Tagging
Title | An Empirical Exploration of Skip Connections for Sequential Tagging |
Authors | Huijia Wu, Jiajun Zhang, Chengqing Zong |
Abstract | In this paper, we empirically explore the effects of various kinds of skip connections in stacked bidirectional LSTMs for sequential tagging. We investigate three kinds of skip connections connecting to LSTM cells: (a) skip connections to the gates, (b) skip connections to the internal states and (c) skip connections to the cell outputs. We present comprehensive experiments showing that skip connections to cell outputs outperform the remaining two. Furthermore, we observe that using gated identity functions as skip mappings works pretty well. Based on this novel skip connections, we successfully train deep stacked bidirectional LSTM models and obtain state-of-the-art results on CCG supertagging and comparable results on POS tagging. |
Tasks | CCG Supertagging |
Published | 2016-10-11 |
URL | http://arxiv.org/abs/1610.03167v1 |
http://arxiv.org/pdf/1610.03167v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-exploration-of-skip-connections |
Repo | |
Framework | |
A Sufficient Statistics Construction of Bayesian Nonparametric Exponential Family Conjugate Models
Title | A Sufficient Statistics Construction of Bayesian Nonparametric Exponential Family Conjugate Models |
Authors | Robert Finn, Brian Kulis |
Abstract | Conjugate pairs of distributions over infinite dimensional spaces are prominent in statistical learning theory, particularly due to the widespread adoption of Bayesian nonparametric methodologies for a host of models and applications. Much of the existing literature in the learning community focuses on processes possessing some form of computationally tractable conjugacy as is the case for the beta and gamma processes (and, via normalization, the Dirichlet process). For these processes, proofs of conjugacy and requisite derivation of explicit computational formulae for posterior density parameters are idiosyncratic to the stochastic process in question. As such, Bayesian Nonparametric models are currently available for a limited number of conjugate pairs, e.g. the Dirichlet-multinomial and beta-Bernoulli process pairs. In each of these above cases the likelihood process belongs to the class of discrete exponential family distributions. The exclusion of continuous likelihood distributions from the known cases of Bayesian Nonparametric Conjugate models stands as a disparity in the researcher’s toolbox. In this paper we first address the problem of obtaining a general construction of prior distributions over infinite dimensional spaces possessing distributional properties amenable to conjugacy. Second, we bridge the divide between the discrete and continuous likelihoods by illustrating a canonical construction for stochastic processes whose Levy measure densities are from positive exponential families, and then demonstrate that these processes in fact form the prior, likelihood, and posterior in a conjugate family. Our canonical construction subsumes known computational formulae for posterior density parameters in the cases where the likelihood is from a discrete distribution belonging to an exponential family. |
Tasks | |
Published | 2016-01-10 |
URL | http://arxiv.org/abs/1601.02257v1 |
http://arxiv.org/pdf/1601.02257v1.pdf | |
PWC | https://paperswithcode.com/paper/a-sufficient-statistics-construction-of |
Repo | |
Framework | |
Stochastic Optimization for Large-scale Optimal Transport
Title | Stochastic Optimization for Large-scale Optimal Transport |
Authors | Genevay Aude, Marco Cuturi, Gabriel Peyré, Francis Bach |
Abstract | Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation ; (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn’s algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, apart from computing OT on finite samples. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems. |
Tasks | Stochastic Optimization |
Published | 2016-05-27 |
URL | http://arxiv.org/abs/1605.08527v1 |
http://arxiv.org/pdf/1605.08527v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-optimization-for-large-scale |
Repo | |
Framework | |
Gaussian Process Morphable Models
Title | Gaussian Process Morphable Models |
Authors | Marcel Lüthi, Christoph Jud, Thomas Gerig, Thomas Vetter |
Abstract | Statistical shape models (SSMs) represent a class of shapes as a normal distribution of point variations, whose parameters are estimated from example shapes. Principal component analysis (PCA) is applied to obtain a low-dimensional representation of the shape variation in terms of the leading principal components. In this paper, we propose a generalization of SSMs, called Gaussian Process Morphable Models (GPMMs). We model the shape variations with a Gaussian process, which we represent using the leading components of its Karhunen-Loeve expansion. To compute the expansion, we make use of an approximation scheme based on the Nystrom method. The resulting model can be seen as a continuous analogon of an SSM. However, while for SSMs the shape variation is restricted to the span of the example data, with GPMMs we can define the shape variation using any Gaussian process. For example, we can build shape models that correspond to classical spline models, and thus do not require any example data. Furthermore, Gaussian processes make it possible to combine different models. For example, an SSM can be extended with a spline model, to obtain a model that incorporates learned shape characteristics, but is flexible enough to explain shapes that cannot be represented by the SSM. We introduce a simple algorithm for fitting a GPMM to a surface or image. This results in a non-rigid registration approach, whose regularization properties are defined by a GPMM. We show how we can obtain different registration schemes,including methods for multi-scale, spatially-varying or hybrid registration, by constructing an appropriate GPMM. As our approach strictly separates modelling from the fitting process, this is all achieved without changes to the fitting algorithm. We show the applicability and versatility of GPMMs on a clinical use case, where the goal is the model-based segmentation of 3D forearm images. |
Tasks | Gaussian Processes |
Published | 2016-03-23 |
URL | http://arxiv.org/abs/1603.07254v1 |
http://arxiv.org/pdf/1603.07254v1.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-process-morphable-models |
Repo | |
Framework | |
Some Simulation Results for Emphatic Temporal-Difference Learning Algorithms
Title | Some Simulation Results for Emphatic Temporal-Difference Learning Algorithms |
Authors | Huizhen Yu |
Abstract | This is a companion note to our recent study of the weak convergence properties of constrained emphatic temporal-difference learning (ETD) algorithms from a theoretic perspective. It supplements the latter analysis with simulation results and illustrates the behavior of some of the ETD algorithms using three example problems. |
Tasks | |
Published | 2016-05-06 |
URL | http://arxiv.org/abs/1605.02099v1 |
http://arxiv.org/pdf/1605.02099v1.pdf | |
PWC | https://paperswithcode.com/paper/some-simulation-results-for-emphatic-temporal |
Repo | |
Framework | |
Machine learning meets network science: dimensionality reduction for fast and efficient embedding of networks in the hyperbolic space
Title | Machine learning meets network science: dimensionality reduction for fast and efficient embedding of networks in the hyperbolic space |
Authors | Josephine Maria Thomas, Alessandro Muscoloni, Sara Ciucci, Ginestra Bianconi, Carlo Vittorio Cannistraci |
Abstract | Complex network topologies and hyperbolic geometry seem specularly connected, and one of the most fascinating and challenging problems of recent complex network theory is to map a given network to its hyperbolic space. The Popularity Similarity Optimization (PSO) model represents - at the moment - the climax of this theory. It suggests that the trade-off between node popularity and similarity is a mechanism to explain how complex network topologies emerge - as discrete samples - from the continuous world of hyperbolic geometry. The hyperbolic space seems appropriate to represent real complex networks. In fact, it preserves many of their fundamental topological properties, and can be exploited for real applications such as, among others, link prediction and community detection. Here, we observe for the first time that a topological-based machine learning class of algorithms - for nonlinear unsupervised dimensionality reduction - can directly approximate the network’s node angular coordinates of the hyperbolic model into a two-dimensional space, according to a similar topological organization that we named angular coalescence. On the basis of this phenomenon, we propose a new class of algorithms that offers fast and accurate coalescent embedding of networks in the hyperbolic space even for graphs with thousands of nodes. |
Tasks | Community Detection, Dimensionality Reduction, Link Prediction |
Published | 2016-02-21 |
URL | http://arxiv.org/abs/1602.06522v1 |
http://arxiv.org/pdf/1602.06522v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-meets-network-science |
Repo | |
Framework | |
Complexity of Shift Bribery in Committee Elections
Title | Complexity of Shift Bribery in Committee Elections |
Authors | Robert Bredereck, Piotr Faliszewski, Rolf Niedermeier, Nimrod Talmon |
Abstract | Given an election, a preferred candidate p, and a budget, the SHIFT BRIBERY problem asks whether p can win the election after shifting p higher in some voters’ preference orders. Of course, shifting comes at a price (depending on the voter and on the extent of the shift) and one must not exceed the given budget. We study the (parameterized) computational complexity of S HIFT BRIBERY for multiwinner voting rules where winning the election means to be part of some winning committee. We focus on the well-established SNTV, Bloc, k-Borda, and Chamberlin-Courant rules, as well as on approximate variants of the Chamberlin-Courant rule, since the original rule is NP-hard to compute. We show that SHIFT BRIBERY tends to be harder in the multiwinner setting than in the single-winner one by showing settings where SHIFT BRIBERY is easy in the single-winner cases, but is hard (and hard to approximate) in the multiwinner ones. Moreover, we show that the non-monotonicity of those rules which are based on approximation algorithms for the Chamberlin-Courant rule sometimes affects the complexity of SHIFT BRIBERY. |
Tasks | |
Published | 2016-01-07 |
URL | http://arxiv.org/abs/1601.01492v2 |
http://arxiv.org/pdf/1601.01492v2.pdf | |
PWC | https://paperswithcode.com/paper/complexity-of-shift-bribery-in-committee |
Repo | |
Framework | |