May 7, 2019

3025 words 15 mins read

Paper Group AWR 59

Learning Residual Images for Face Attribute Manipulation. Parallelizing Word2Vec in Multi-Core and Many-Core Architectures. Style Imitation and Chord Invention in Polyphonic Music with Exponential Families. Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks. Compressed Learning: A Deep Neural Network Approach. En …

Learning Residual Images for Face Attribute Manipulation


Title	Learning Residual Images for Face Attribute Manipulation
Authors	Wei Shen, Rujie Liu
Abstract	Face attributes are interesting due to their detailed description of human faces. Unlike prior researches working on attribute prediction, we address an inverse and more challenging problem called face attribute manipulation which aims at modifying a face image according to a given attribute value. Instead of manipulating the whole image, we propose to learn the corresponding residual image defined as the difference between images before and after the manipulation. In this way, the manipulation can be operated efficiently with modest pixel modification. The framework of our approach is based on the Generative Adversarial Network. It consists of two image transformation networks and a discriminative network. The transformation networks are responsible for the attribute manipulation and its dual operation and the discriminative network is used to distinguish the generated images from real images. We also apply dual learning to allow transformation networks to learn from each other. Experiments show that residual images can be effectively learned and used for attribute manipulations. The generated images remain most of the details in attribute-irrelevant areas.
Tasks
Published	2016-12-16
URL	http://arxiv.org/abs/1612.05363v2
PDF	http://arxiv.org/pdf/1612.05363v2.pdf
PWC	https://paperswithcode.com/paper/learning-residual-images-for-face-attribute
Repo	https://github.com/Juzov/FaceAttributeManipulation
Framework	tf

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures


Title	Parallelizing Word2Vec in Multi-Core and Many-Core Architectures
Authors	Shihao Ji, Nadathur Satish, Sheng Li, Pradeep Dubey
Abstract	Word2vec is a widely used algorithm for extracting low-dimensional vector representations of words. State-of-the-art algorithms including those by Mikolov et al. have been parallelized for multi-core CPU architectures, but are based on vector-vector operations with “Hogwild” updates that are memory-bandwidth intensive and do not efficiently use computational resources. In this paper, we propose “HogBatch” by improving reuse of various data structures in the algorithm through the use of minibatching and negative sample sharing, hence allowing us to express the problem using matrix multiply operations. We also explore different techniques to distribute word2vec computation across nodes in a compute cluster, and demonstrate good strong scalability up to 32 nodes. The new algorithm is particularly suitable for modern multi-core/many-core architectures, especially Intel’s latest Knights Landing processors, and allows us to scale up the computation near linearly across cores and nodes, and process hundreds of millions of words per second, which is the fastest word2vec implementation to the best of our knowledge.
Tasks
Published	2016-11-18
URL	http://arxiv.org/abs/1611.06172v2
PDF	http://arxiv.org/pdf/1611.06172v2.pdf
PWC	https://paperswithcode.com/paper/parallelizing-word2vec-in-multi-core-and-many
Repo	https://github.com/IntelLabs/pWord2Vec
Framework	none

Style Imitation and Chord Invention in Polyphonic Music with Exponential Families


Title	Style Imitation and Chord Invention in Polyphonic Music with Exponential Families
Authors	Gaëtan Hadjeres, Jason Sakellariou, François Pachet
Abstract	Modeling polyphonic music is a particularly challenging task because of the intricate interplay between melody and harmony. A good model should satisfy three requirements: statistical accuracy (capturing faithfully the statistics of correlations at various ranges, horizontally and vertically), flexibility (coping with arbitrary user constraints), and generalization capacity (inventing new material, while staying in the style of the training corpus). Models proposed so far fail on at least one of these requirements. We propose a statistical model of polyphonic music, based on the maximum entropy principle. This model is able to learn and reproduce pairwise statistics between neighboring note events in a given corpus. The model is also able to invent new chords and to harmonize unknown melodies. We evaluate the invention capacity of the model by assessing the amount of cited, re-discovered, and invented chords on a corpus of Bach chorales. We discuss how the model enables the user to specify and enforce user-defined constraints, which makes it useful for style-based, interactive music generation.
Tasks	Music Generation
Published	2016-09-16
URL	http://arxiv.org/abs/1609.05152v1
PDF	http://arxiv.org/pdf/1609.05152v1.pdf
PWC	https://paperswithcode.com/paper/style-imitation-and-chord-invention-in
Repo	https://github.com/kastnerkyle/pachet_experiments
Framework	none

Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks


Title	Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks
Authors	Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum
Abstract	Our goal is to combine the rich multistep inference of symbolic logical reasoning with the generalization capabilities of neural networks. We are particularly interested in complex reasoning about entities and relations in text and large-scale knowledge bases (KBs). Neelakantan et al. (2015) use RNNs to compose the distributed semantics of multi-hop paths in KBs; however for multiple reasons, the approach lacks accuracy and practicality. This paper proposes three significant modeling advances: (1) we learn to jointly reason about relations, entities, and entity-types; (2) we use neural attention modeling to incorporate multiple paths; (3) we learn to share strength in a single RNN that represents logical composition across all relations. On a largescale Freebase+ClueWeb prediction task, we achieve 25% error reduction, and a 53% error reduction on sparse relations due to shared strength. On chains of reasoning in WordNet we reduce error in mean quantile by 84% versus previous state-of-the-art. The code and data are available at https://rajarshd.github.io/ChainsofReasoning
Tasks
Published	2016-07-05
URL	http://arxiv.org/abs/1607.01426v3
PDF	http://arxiv.org/pdf/1607.01426v3.pdf
PWC	https://paperswithcode.com/paper/chains-of-reasoning-over-entities-relations
Repo	https://github.com/rajarshd/ChainsofReasoning
Framework	torch

Compressed Learning: A Deep Neural Network Approach


Title	Compressed Learning: A Deep Neural Network Approach
Authors	Amir Adler, Michael Elad, Michael Zibulevsky
Abstract	Compressed Learning (CL) is a joint signal processing and machine learning framework for inference from a signal, using a small number of measurements obtained by linear projections of the signal. In this paper we present an end-to-end deep learning approach for CL, in which a network composed of fully-connected layers followed by convolutional layers perform the linear sensing and non-linear inference stages. During the training phase, the sensing matrix and the non-linear inference operator are jointly optimized, and the proposed approach outperforms state-of-the-art for the task of image classification. For example, at a sensing rate of 1% (only 8 measurements of 28 X 28 pixels images), the classification error for the MNIST handwritten digits dataset is 6.46% compared to 41.06% with state-of-the-art.
Tasks	Image Classification
Published	2016-10-30
URL	http://arxiv.org/abs/1610.09615v1
PDF	http://arxiv.org/pdf/1610.09615v1.pdf
PWC	https://paperswithcode.com/paper/compressed-learning-a-deep-neural-network
Repo	https://github.com/viebboy/MultilinearCompressiveLearningFramework
Framework	tf

Enhanced LSTM for Natural Language Inference


Title	Enhanced LSTM for Natural Language Inference
Authors	Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, Diana Inkpen
Abstract	Reasoning and inference are central to human and artificial intelligence. Modeling inference in human language is very challenging. With the availability of large annotated data (Bowman et al., 2015), it has recently become feasible to train neural network based inference models, which have shown to be very effective. In this paper, we present a new state-of-the-art result, achieving the accuracy of 88.6% on the Stanford Natural Language Inference Dataset. Unlike the previous top models that use very complicated network architectures, we first demonstrate that carefully designing sequential inference models based on chain LSTMs can outperform all previous models. Based on this, we further show that by explicitly considering recursive architectures in both local inference modeling and inference composition, we achieve additional improvement. Particularly, incorporating syntactic parsing information contributes to our best result—it further improves the performance even when added to the already very strong model.
Tasks	Natural Language Inference
Published	2016-09-20
URL	http://arxiv.org/abs/1609.06038v3
PDF	http://arxiv.org/pdf/1609.06038v3.pdf
PWC	https://paperswithcode.com/paper/enhanced-lstm-for-natural-language-inference
Repo	https://github.com/blcunlp/CNLI
Framework	tf

DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model


Title	DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model
Authors	Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, Bernt Schiele
Abstract	The goal of this paper is to advance the state-of-the-art of articulated pose estimation in scenes with multiple people. To that end we contribute on three fronts. We propose (1) improved body part detectors that generate effective bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms that allow to assemble the proposals into a variable number of consistent body part configurations; and (3) an incremental optimization strategy that explores the search space more efficiently thus leading both to better performance and significant speed-up factors. Evaluation is done on two single-person and two multi-person pose estimation benchmarks. The proposed approach significantly outperforms best known multi-person pose estimation results while demonstrating competitive performance on the task of single person pose estimation. Models and code available at http://pose.mpi-inf.mpg.de
Tasks	Multi-Person Pose Estimation, Pose Estimation
Published	2016-05-10
URL	http://arxiv.org/abs/1605.03170v3
PDF	http://arxiv.org/pdf/1605.03170v3.pdf
PWC	https://paperswithcode.com/paper/deepercut-a-deeper-stronger-and-faster-multi
Repo	https://github.com/toomanymat/pose_estimation
Framework	tf

Recurrent Neural Networks With Limited Numerical Precision


Title	Recurrent Neural Networks With Limited Numerical Precision
Authors	Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio
Abstract	Recurrent Neural Networks (RNNs) produce state-of-art performance on many machine learning tasks but their demand on resources in terms of memory and computational power are often high. Therefore, there is a great interest in optimizing the computations performed with these models especially when considering development of specialized low-power hardware for deep networks. One way of reducing the computational needs is to limit the numerical precision of the network weights and biases. This has led to different proposed rounding methods which have been applied so far to only Convolutional Neural Networks and Fully-Connected Networks. This paper addresses the question of how to best reduce weight precision during training in the case of RNNs. We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets. The results show that the weight binarization methods do not work with the RNNs. However, the stochastic and deterministic ternarization, and pow2-ternarization methods gave rise to low-precision RNNs that produce similar and even higher accuracy on certain datasets therefore providing a path towards training more efficient implementations of RNNs in specialized hardware.
Tasks
Published	2016-08-24
URL	http://arxiv.org/abs/1608.06902v2
PDF	http://arxiv.org/pdf/1608.06902v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-networks-with-limited-1
Repo	https://github.com/ottj/QuantizedRNN
Framework	none

Unsupervised Learning for Physical Interaction through Video Prediction


Title	Unsupervised Learning for Physical Interaction through Video Prediction
Authors	Chelsea Finn, Ian Goodfellow, Sergey Levine
Abstract	A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for real-world interactive agents, we also introduce a dataset of 59,000 robot interactions involving pushing motions, including a test set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot’s future actions amounts to learning a “visual imagination” of different futures based on different courses of action. Our experiments show that our proposed method produces more accurate video predictions both quantitatively and qualitatively, when compared to prior methods.
Tasks	Video Prediction
Published	2016-05-23
URL	http://arxiv.org/abs/1605.07157v4
PDF	http://arxiv.org/pdf/1605.07157v4.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-for-physical
Repo	https://github.com/tensorflow/models/tree/master/research/video_prediction
Framework	tf

TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild


Title	TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild
Authors	Lluis Gomez-Bigorda, Dimosthenis Karatzas
Abstract	Motivated by the success of powerful while expensive techniques to recognize words in a holistic way, object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way. Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10 percent f-score the best-performing method in the last ICDAR Robust Reading Competition. Source code of the complete end-to-end system is available at https://github.com/lluisgomez/TextProposals
Tasks
Published	2016-04-10
URL	http://arxiv.org/abs/1604.02619v3
PDF	http://arxiv.org/pdf/1604.02619v3.pdf
PWC	https://paperswithcode.com/paper/textproposals-a-text-specific-selective
Repo	https://github.com/lluisgomez/TextProposals
Framework	none

Deep Watershed Transform for Instance Segmentation


Title	Deep Watershed Transform for Instance Segmentation
Authors	Min Bai, Raquel Urtasun
Abstract	Most contemporary approaches to instance segmentation use complex pipelines involving conditional random fields, recurrent neural networks, object proposals, or template matching schemes. In our paper, we present a simple yet powerful end-to-end convolutional neural network to tackle this task. Our approach combines intuitions from the classical watershed transform and modern deep learning to produce an energy map of the image where object instances are unambiguously represented as basins in the energy map. We then perform a cut at a single energy level to directly yield connected components corresponding to object instances. Our model more than doubles the performance of the state-of-the-art on the challenging Cityscapes Instance Level Segmentation task.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2016-11-24
URL	http://arxiv.org/abs/1611.08303v2
PDF	http://arxiv.org/pdf/1611.08303v2.pdf
PWC	https://paperswithcode.com/paper/deep-watershed-transform-for-instance
Repo	https://github.com/timothyn617/watershed-transform
Framework	pytorch

A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks


Title	A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks
Authors	Felix Grün, Christian Rupprecht, Nassir Navab, Federico Tombari
Abstract	Over the last decade, Convolutional Neural Networks (CNN) saw a tremendous surge in performance. However, understanding what a network has learned still proves to be a challenging task. To remedy this unsatisfactory situation, a number of groups have recently proposed different methods to visualize the learned models. In this work we suggest a general taxonomy to classify and compare these methods, subdividing the literature into three main categories and providing researchers with a terminology to base their works on. Furthermore, we introduce the FeatureVis library for MatConvNet: an extendable, easy to use open source library for visualizing CNNs. It contains implementations from each of the three main classes of visualization methods and serves as a useful tool for an enhanced understanding of the features learned by intermediate layers, as well as for the analysis of why a network might fail for certain examples.
Tasks
Published	2016-06-24
URL	http://arxiv.org/abs/1606.07757v1
PDF	http://arxiv.org/pdf/1606.07757v1.pdf
PWC	https://paperswithcode.com/paper/a-taxonomy-and-library-for-visualizing
Repo	https://github.com/FelixGruen/featurevis
Framework	none

A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation


Title	A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation
Authors	Thang D. Bui, Josiah Yan, Richard E. Turner
Abstract	Gaussian processes (GPs) are flexible distributions over functions that enable high-level assumptions about unknown functions to be encoded in a parsimonious, flexible and general way. Although elegant, the application of GPs is limited by computational and analytical intractabilities that arise when data are sufficiently numerous or when employing non-Gaussian models. Consequently, a wealth of GP approximation schemes have been developed over the last 15 years to address these key limitations. Many of these schemes employ a small set of pseudo data points to summarise the actual data. In this paper, we develop a new pseudo-point approximation framework using Power Expectation Propagation (Power EP) that unifies a large number of these pseudo-point approximations. Unlike much of the previous venerable work in this area, the new framework is built on standard methods for approximate inference (variational free-energy, EP and Power EP methods) rather than employing approximations to the probabilistic generative model itself. In this way, all of approximation is performed at `inference time' rather than at` modelling time’ resolving awkward philosophical and empirical questions that trouble previous approaches. Crucially, we demonstrate that the new framework includes new pseudo-point approximation methods that outperform current approaches on regression and classification tasks.
Tasks	Gaussian Processes
Published	2016-05-23
URL	http://arxiv.org/abs/1605.07066v3
PDF	http://arxiv.org/pdf/1605.07066v3.pdf
PWC	https://paperswithcode.com/paper/a-unifying-framework-for-gaussian-process
Repo	https://github.com/thangbui/sparseGP_powerEP
Framework	none

How Document Pre-processing affects Keyphrase Extraction Performance


Title	How Document Pre-processing affects Keyphrase Extraction Performance
Authors	Florian Boudin, Hugo Mougard, Damien Cram
Abstract	The SemEval-2010 benchmark dataset has brought renewed attention to the task of automatic keyphrase extraction. This dataset is made up of scientific articles that were automatically converted from PDF format to plain text and thus require careful preprocessing so that irrevelant spans of text do not negatively affect keyphrase extraction performance. In previous work, a wide range of document preprocessing techniques were described but their impact on the overall performance of keyphrase extraction models is still unexplored. Here, we re-assess the performance of several keyphrase extraction models and measure their robustness against increasingly sophisticated levels of document preprocessing.
Tasks
Published	2016-10-25
URL	http://arxiv.org/abs/1610.07809v1
PDF	http://arxiv.org/pdf/1610.07809v1.pdf
PWC	https://paperswithcode.com/paper/how-document-pre-processing-affects-keyphrase
Repo	https://github.com/boudinfl/semeval-2010-pre
Framework	none

Times series averaging and denoising from a probabilistic perspective on time-elastic kernels


Title	Times series averaging and denoising from a probabilistic perspective on time-elastic kernels
Authors	Pierre-François Marteau
Abstract	In the light of regularized dynamic time warping kernels, this paper re-considers the concept of time elastic centroid for a setof time series. We derive a new algorithm based on a probabilistic interpretation of kernel alignment matrices. This algorithm expressesthe averaging process in terms of a stochastic alignment automata. It uses an iterative agglomerative heuristic method for averagingthe aligned samples, while also averaging the times of occurrence of the aligned samples. By comparing classification accuracies for45 heterogeneous time series datasets obtained by first nearest centroid/medoid classifiers we show that: i) centroid-basedapproaches significantly outperform medoid-based approaches, ii) for the considered datasets, our algorithm that combines averagingin the sample space and along the time axes, emerges as the most significantly robust model for time-elastic averaging with apromising noise reduction capability. We also demonstrate its benefit in an isolated gesture recognition experiment and its ability tosignificantly reduce the size of training instance sets. Finally we highlight its denoising capability using demonstrative synthetic data:we show that it is possible to retrieve, from few noisy instances, a signal whose components are scattered in a wide spectral band.
Tasks	Denoising, Gesture Recognition, Time Series, Time Series Denoising
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09194v4
PDF	http://arxiv.org/pdf/1611.09194v4.pdf
PWC	https://paperswithcode.com/paper/times-series-averaging-and-denoising-from-a
Repo	https://github.com/pfmarteau/eKATS
Framework	none