July 27, 2019

2935 words 14 mins read

Paper Group ANR 573

Paper Group ANR 573

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning. Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net. The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages. Reconstruction of Word Embeddings from Sub-Word Parameters. Disentangled Representations via Syn …

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

Title From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning
Authors Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, Heng Tao Shen
Abstract Video captioning in essential is a complex natural process, which is affected by various uncertainties stemming from video content, subjective judgment, etc. In this paper we build on the recent progress in using encoder-decoder framework for video captioning and address what we find to be a critical deficiency of the existing methods, that most of the decoders propagate deterministic hidden states. Such complex uncertainty cannot be modeled efficiently by the deterministic models. In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables. Therefore, MS-RNN can improve the performance of video captioning, and generate multiple sentences to describe a video considering different random factors. Specifically, a multi-modal LSTM (M-LSTM) is first proposed to interact with both visual and textual features to capture a high-level representation. Then, a backward stochastic LSTM (S-LSTM) is proposed to support uncertainty propagation by introducing latent variables. Experimental results on the challenging datasets MSVD and MSR-VTT show that our proposed MS-RNN approach outperforms the state-of-the-art video captioning benchmarks.
Tasks Video Captioning
Published 2017-08-08
URL http://arxiv.org/abs/1708.02478v2
PDF http://arxiv.org/pdf/1708.02478v2.pdf
PWC https://paperswithcode.com/paper/from-deterministic-to-generative-multi-modal
Repo
Framework

Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

Title Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net
Authors Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio
Abstract We propose a novel method to directly learn a stochastic transition operator whose repeated application provides generated samples. Traditional undirected graphical models approach this problem indirectly by learning a Markov chain model whose stationary distribution obeys detailed balance with respect to a parameterized energy function. The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge. Moreover, the detailed balance condition is highly restrictive: energy based models corresponding to neural networks must have symmetric weights, unlike biological neural circuits. In contrast, we develop a method for directly learning arbitrarily parameterized transition operators capable of expressing non-equilibrium stationary distributions that violate detailed balance, thereby enabling us to learn more biologically plausible asymmetric neural networks and more general non-energy based dynamical systems. The proposed training objective, which we derive via principled variational methods, encourages the transition operator to “walk back” in multi-step trajectories that start at data-points, as quickly as possible back to the original data points. We present a series of experimental results illustrating the soundness of the proposed approach, Variational Walkback (VW), on the MNIST, CIFAR-10, SVHN and CelebA datasets, demonstrating superior samples compared to earlier attempts to learn a transition operator. We also show that although each rapid training trajectory is limited to a finite but variable number of steps, our transition operator continues to generate good samples well past the length of such trajectories, thereby demonstrating the match of its non-equilibrium stationary distribution to the data distribution. Source Code: http://github.com/anirudh9119/walkback_nips17
Tasks
Published 2017-11-07
URL http://arxiv.org/abs/1711.02282v1
PDF http://arxiv.org/pdf/1711.02282v1.pdf
PWC https://paperswithcode.com/paper/variational-walkback-learning-a-transition
Repo
Framework

The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages

Title The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages
Authors Dingquan Wang, Jason Eisner
Abstract We release Galactic Dependencies 1.0—a large set of synthetic languages not found on Earth, but annotated in Universal Dependencies format. This new resource aims to provide training and development data for NLP methods that aim to adapt to unfamiliar languages. Each synthetic treebank is produced from a real treebank by stochastically permuting the dependents of nouns and/or verbs to match the word order of other real languages. We discuss the usefulness, realism, parsability, perplexity, and diversity of the synthetic languages. As a simple demonstration of the use of Galactic Dependencies, we consider single-source transfer, which attempts to parse a real target language using a parser trained on a “nearby” source language. We find that including synthetic source languages somewhat increases the diversity of the source pool, which significantly improves results for most target languages.
Tasks
Published 2017-10-10
URL http://arxiv.org/abs/1710.03838v1
PDF http://arxiv.org/pdf/1710.03838v1.pdf
PWC https://paperswithcode.com/paper/the-galactic-dependencies-treebanks-getting
Repo
Framework

Reconstruction of Word Embeddings from Sub-Word Parameters

Title Reconstruction of Word Embeddings from Sub-Word Parameters
Authors Karl Stratos
Abstract Pre-trained word embeddings improve the performance of a neural model at the cost of increasing the model size. We propose to benefit from this resource without paying the cost by operating strictly at the sub-lexical level. Our approach is quite simple: before task-specific training, we first optimize sub-word parameters to reconstruct pre-trained word embeddings using various distance measures. We report interesting results on a variety of tasks: word similarity, word analogy, and part-of-speech tagging.
Tasks Part-Of-Speech Tagging, Word Embeddings
Published 2017-07-21
URL http://arxiv.org/abs/1707.06957v1
PDF http://arxiv.org/pdf/1707.06957v1.pdf
PWC https://paperswithcode.com/paper/reconstruction-of-word-embeddings-from-sub
Repo
Framework

Disentangled Representations via Synergy Minimization

Title Disentangled Representations via Synergy Minimization
Authors Greg Ver Steeg, Rob Brekelmans, Hrayr Harutyunyan, Aram Galstyan
Abstract Scientists often seek simplified representations of complex systems to facilitate prediction and understanding. If the factors comprising a representation allow us to make accurate predictions about our system, but obscuring any subset of the factors destroys our ability to make predictions, we say that the representation exhibits informational synergy. We argue that synergy is an undesirable feature in learned representations and that explicitly minimizing synergy can help disentangle the true factors of variation underlying data. We explore different ways of quantifying synergy, deriving new closed-form expressions in some cases, and then show how to modify learning to produce representations that are minimally synergistic. We introduce a benchmark task to disentangle separate characters from images of words. We demonstrate that Minimally Synergistic (MinSyn) representations correctly disentangle characters while methods relying on statistical independence fail.
Tasks
Published 2017-10-10
URL http://arxiv.org/abs/1710.03839v1
PDF http://arxiv.org/pdf/1710.03839v1.pdf
PWC https://paperswithcode.com/paper/disentangled-representations-via-synergy
Repo
Framework

Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture

Title Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture
Authors S. Bazrafkan, H. Javidnia, J. Lemley, P. Corcoran
Abstract Deep neural networks are applied to a wide range of problems in recent years. In this work, Convolutional Neural Network (CNN) is applied to the problem of determining the depth from a single camera image (monocular depth). Eight different networks are designed to perform depth estimation, each of them suitable for a feature level. Networks with different pooling sizes determine different feature levels. After designing a set of networks, these models may be combined into a single network topology using graph optimization techniques. This “Semi Parallel Deep Neural Network (SPDNN)” eliminates duplicated common network layers, and can be further optimized by retraining to achieve an improved model compared to the individual topologies. In this study, four SPDNN models are trained and have been evaluated at 2 stages on the KITTI dataset. The ground truth images in the first part of the experiment are provided by the benchmark, and for the second part, the ground truth images are the depth map results from applying a state-of-the-art stereo matching method. The results of this evaluation demonstrate that using post-processing techniques to refine the target of the network increases the accuracy of depth estimation on individual mono images. The second evaluation shows that using segmentation data alongside the original data as the input can improve the depth estimation results to a point where performance is comparable with stereo depth estimation. The computational time is also discussed in this study.
Tasks Depth Estimation, Stereo Depth Estimation, Stereo Matching, Stereo Matching Hand
Published 2017-03-10
URL http://arxiv.org/abs/1703.03867v3
PDF http://arxiv.org/pdf/1703.03867v3.pdf
PWC https://paperswithcode.com/paper/depth-from-monocular-images-using-a-semi
Repo
Framework

Learning with Latent Language

Title Learning with Latent Language
Authors Jacob Andreas, Dan Klein, Sergey Levine
Abstract The named concepts and compositional operators present in natural language provide a rich source of information about the kinds of abstractions humans use to navigate the world. Can this linguistic background knowledge improve the generality and efficiency of learned classifiers and control policies? This paper aims to show that using the space of natural language strings as a parameter space is an effective way to capture natural task structure. In a pretraining phase, we learn a language interpretation model that transforms inputs (e.g. images) into outputs (e.g. labels) given natural language descriptions. To learn a new concept (e.g. a classifier), we search directly in the space of descriptions to minimize the interpreter’s loss on training examples. Crucially, our models do not require language data to learn these concepts: language is used only in pretraining to impose structure on subsequent learning. Results on image classification, text editing, and reinforcement learning show that, in all settings, models with a linguistic parameterization outperform those without.
Tasks Image Classification
Published 2017-11-01
URL http://arxiv.org/abs/1711.00482v1
PDF http://arxiv.org/pdf/1711.00482v1.pdf
PWC https://paperswithcode.com/paper/learning-with-latent-language
Repo
Framework

Going Further with Point Pair Features

Title Going Further with Point Pair Features
Authors Stefan Hinterstoisser, Vincent Lepetit, Naresh Rajkumar, Kurt Konolige
Abstract Point Pair Features is a widely used method to detect 3D objects in point clouds, however they are prone to fail in presence of sensor noise and background clutter. We introduce novel sampling and voting schemes that significantly reduces the influence of clutter and sensor noise. Our experiments show that with our improvements, PPFs become competitive against state-of-the-art methods as it outperforms them on several objects from challenging benchmarks, at a low computational cost.
Tasks 6D Pose Estimation using RGB
Published 2017-11-11
URL http://arxiv.org/abs/1711.04061v1
PDF http://arxiv.org/pdf/1711.04061v1.pdf
PWC https://paperswithcode.com/paper/going-further-with-point-pair-features
Repo
Framework

Beyond Evolutionary Algorithms for Search-based Software Engineering

Title Beyond Evolutionary Algorithms for Search-based Software Engineering
Authors Jianfeng Chen, Vivek Nair, Tim Menzies
Abstract Context: Evolutionary algorithms typically require a large number of evaluations (of solutions) to converge - which can be very slow and expensive to evaluate.Objective: To solve search-based software engineering (SE) problems, using fewer evaluations than evolutionary methods.Method: Instead of mutating a small population, we build a very large initial population which is then culled using a recursive bi-clustering chop approach. We evaluate this approach on multiple SE models, unconstrained as well as constrained, and compare its performance with standard evolutionary algorithms. Results: Using just a few evaluations (under 100), we can obtain comparable results to state-of-the-art evolutionary algorithms.Conclusion: Just because something works, and is widespread use, does not necessarily mean that there is no value in seeking methods to improve that method. Before undertaking search-based SE optimization tasks using traditional EAs, it is recommended to try other techniques, like those explored here, to obtain the same results with fewer evaluations.
Tasks
Published 2017-01-27
URL http://arxiv.org/abs/1701.07950v3
PDF http://arxiv.org/pdf/1701.07950v3.pdf
PWC https://paperswithcode.com/paper/beyond-evolutionary-algorithms-for-search
Repo
Framework

Improving Opinion-Target Extraction with Character-Level Word Embeddings

Title Improving Opinion-Target Extraction with Character-Level Word Embeddings
Authors Soufian Jebbara, Philipp Cimiano
Abstract Fine-grained sentiment analysis is receiving increasing attention in recent years. Extracting opinion target expressions (OTE) in reviews is often an important step in fine-grained, aspect-based sentiment analysis. Retrieving this information from user-generated text, however, can be difficult. Customer reviews, for instance, are prone to contain misspelled words and are difficult to process due to their domain-specific language. In this work, we investigate whether character-level models can improve the performance for the identification of opinion target expressions. We integrate information about the character structure of a word into a sequence labeling system using character-level word embeddings and show their positive impact on the system’s performance. Specifically, we obtain an increase by 3.3 points F1-score with respect to our baseline model. In further experiments, we reveal encoded character patterns of the learned embeddings and give a nuanced view of the performance differences of both models.
Tasks Aspect-Based Sentiment Analysis, Sentiment Analysis, Word Embeddings
Published 2017-09-19
URL http://arxiv.org/abs/1709.06317v1
PDF http://arxiv.org/pdf/1709.06317v1.pdf
PWC https://paperswithcode.com/paper/improving-opinion-target-extraction-with
Repo
Framework

On the benefits of output sparsity for multi-label classification

Title On the benefits of output sparsity for multi-label classification
Authors Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Joseph Salmon
Abstract The multi-label classification framework, where each observation can be associated with a set of labels, has generated a tremendous amount of attention over recent years. The modern multi-label problems are typically large-scale in terms of number of observations, features and labels, and the amount of labels can even be comparable with the amount of observations. In this context, different remedies have been proposed to overcome the curse of dimensionality. In this work, we aim at exploiting the output sparsity by introducing a new loss, called the sparse weighted Hamming loss. This proposed loss can be seen as a weighted version of classical ones, where active and inactive labels are weighted separately. Leveraging the influence of sparsity in the loss function, we provide improved generalization bounds for the empirical risk minimizer, a suitable property for large-scale problems. For this new loss, we derive rates of convergence linear in the underlying output-sparsity rather than linear in the number of labels. In practice, minimizing the associated risk can be performed efficiently by using convex surrogates and modern convex optimization algorithms. We provide experiments on various real-world datasets demonstrating the pertinence of our approach when compared to non-weighted techniques.
Tasks Multi-Label Classification
Published 2017-03-14
URL http://arxiv.org/abs/1703.04697v1
PDF http://arxiv.org/pdf/1703.04697v1.pdf
PWC https://paperswithcode.com/paper/on-the-benefits-of-output-sparsity-for-multi
Repo
Framework

Voltage-Driven Domain-Wall Motion based Neuro-Synaptic Devices for Dynamic On-line Learning

Title Voltage-Driven Domain-Wall Motion based Neuro-Synaptic Devices for Dynamic On-line Learning
Authors Akhilesh Jaiswal, Amogh Agrawal, Priyadarshini Panda, Kaushik Roy
Abstract Conventional von-Neumann computing models have achieved remarkable feats for the past few decades. However, they fail to deliver the required efficiency for certain basic tasks like image and speech recognition when compared to biological systems. As such, taking cues from biological systems, novel computing paradigms are being explored for efficient hardware implementations of recognition/classification tasks. The basic building blocks of such neuromorphic systems are neurons and synapses. Towards that end, we propose a leaky-integrate-fire (LIF) neuron and a programmable non-volatile synapse using domain wall motion induced by magneto-electric effect. Due to a strong elastic pinning between the ferro-magnetic domain wall (FM-DW) and the underlying ferro-electric domain wall (FE-DW), the FM-DW gets dragged by the FE-DW on application of a voltage pulse. The fact that FE materials are insulators allows for pure voltage-driven FM-DW motion, which in turn can be used to mimic the behaviors of biological spiking neurons and synapses. The voltage driven nature of the proposed devices allows energy-efficient operation. A detailed device to system level simulation framework based on micromagnetic simulations has been developed to analyze the feasibility of the proposed neuro-synaptic devices. We also demonstrate that the energy-efficient voltage-controlled behavior of the proposed devices make them suitable for dynamic on-line and lifelong learning in spiking neural networks (SNNs).
Tasks Speech Recognition
Published 2017-05-19
URL http://arxiv.org/abs/1705.06942v2
PDF http://arxiv.org/pdf/1705.06942v2.pdf
PWC https://paperswithcode.com/paper/voltage-driven-domain-wall-motion-based-neuro
Repo
Framework

High Dynamic Range Imaging Technology

Title High Dynamic Range Imaging Technology
Authors Alessandro Artusi, Thomas Richter, Touradj Ebrahimi, Rafal K. Mantiuk
Abstract In this lecture note, we describe high dynamic range (HDR) imaging systems; such systems are able to represent luminances of much larger brightness and, typically, also a larger range of colors than conventional standard dynamic range (SDR) imaging systems. The larger luminance range greatly improve the overall quality of visual content, making it appears much more realistic and appealing to observers. HDR is one of the key technologies of the future imaging pipeline, which will change the way the digital visual content is represented and manipulated today.
Tasks
Published 2017-11-30
URL http://arxiv.org/abs/1711.11326v1
PDF http://arxiv.org/pdf/1711.11326v1.pdf
PWC https://paperswithcode.com/paper/high-dynamic-range-imaging-technology
Repo
Framework

Learning Robust Hash Codes for Multiple Instance Image Retrieval

Title Learning Robust Hash Codes for Multiple Instance Image Retrieval
Authors Sailesh Conjeti, Magdalini Paschali, Amin Katouzian, Nassir Navab
Abstract In this paper, for the first time, we introduce a multiple instance (MI) deep hashing technique for learning discriminative hash codes with weak bag-level supervision suited for large-scale retrieval. We learn such hash codes by aggregating deeply learnt hierarchical representations across bag members through a dedicated MI pool layer. For better trainability and retrieval quality, we propose a two-pronged approach that includes robust optimization and training with an auxiliary single instance hashing arm which is down-regulated gradually. We pose retrieval for tumor assessment as an MI problem because tumors often coexist with benign masses and could exhibit complementary signatures when scanned from different anatomical views. Experimental validations on benchmark mammography and histology datasets demonstrate improved retrieval performance over the state-of-the-art methods.
Tasks Image Retrieval
Published 2017-03-16
URL http://arxiv.org/abs/1703.05724v1
PDF http://arxiv.org/pdf/1703.05724v1.pdf
PWC https://paperswithcode.com/paper/learning-robust-hash-codes-for-multiple
Repo
Framework

Can Boltzmann Machines Discover Cluster Updates ?

Title Can Boltzmann Machines Discover Cluster Updates ?
Authors Lei Wang
Abstract Boltzmann machines are physics informed generative models with wide applications in machine learning. They can learn the probability distribution from an input dataset and generate new samples accordingly. Applying them back to physics, the Boltzmann machines are ideal recommender systems to accelerate Monte Carlo simulation of physical systems due to their flexibility and effectiveness. More intriguingly, we show that the generative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from the latent representation of the Boltzmann machines, which learn to mediate complex interactions and identify clusters of the physical system. We demonstrate these findings with concrete examples of the classical Ising model with and without four spin plaquette interactions. Our results endorse a fresh research paradigm where intelligent machines are designed to create or inspire human discovery of innovative algorithms.
Tasks Recommendation Systems
Published 2017-02-28
URL http://arxiv.org/abs/1702.08586v1
PDF http://arxiv.org/pdf/1702.08586v1.pdf
PWC https://paperswithcode.com/paper/can-boltzmann-machines-discover-cluster
Repo
Framework
comments powered by Disqus