Paper Group AWR 59
Learning Residual Images for Face Attribute Manipulation. Parallelizing Word2Vec in Multi-Core and Many-Core Architectures. Style Imitation and Chord Invention in Polyphonic Music with Exponential Families. Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks. Compressed Learning: A Deep Neural Network Approach. En …
Learning Residual Images for Face Attribute Manipulation
Title | Learning Residual Images for Face Attribute Manipulation |
Authors | Wei Shen, Rujie Liu |
Abstract | Face attributes are interesting due to their detailed description of human faces. Unlike prior researches working on attribute prediction, we address an inverse and more challenging problem called face attribute manipulation which aims at modifying a face image according to a given attribute value. Instead of manipulating the whole image, we propose to learn the corresponding residual image defined as the difference between images before and after the manipulation. In this way, the manipulation can be operated efficiently with modest pixel modification. The framework of our approach is based on the Generative Adversarial Network. It consists of two image transformation networks and a discriminative network. The transformation networks are responsible for the attribute manipulation and its dual operation and the discriminative network is used to distinguish the generated images from real images. We also apply dual learning to allow transformation networks to learn from each other. Experiments show that residual images can be effectively learned and used for attribute manipulations. The generated images remain most of the details in attribute-irrelevant areas. |
Tasks | |
Published | 2016-12-16 |
URL | http://arxiv.org/abs/1612.05363v2 |
http://arxiv.org/pdf/1612.05363v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-residual-images-for-face-attribute |
Repo | https://github.com/Juzov/FaceAttributeManipulation |
Framework | tf |
Parallelizing Word2Vec in Multi-Core and Many-Core Architectures
Title | Parallelizing Word2Vec in Multi-Core and Many-Core Architectures |
Authors | Shihao Ji, Nadathur Satish, Sheng Li, Pradeep Dubey |
Abstract | Word2vec is a widely used algorithm for extracting low-dimensional vector representations of words. State-of-the-art algorithms including those by Mikolov et al. have been parallelized for multi-core CPU architectures, but are based on vector-vector operations with “Hogwild” updates that are memory-bandwidth intensive and do not efficiently use computational resources. In this paper, we propose “HogBatch” by improving reuse of various data structures in the algorithm through the use of minibatching and negative sample sharing, hence allowing us to express the problem using matrix multiply operations. We also explore different techniques to distribute word2vec computation across nodes in a compute cluster, and demonstrate good strong scalability up to 32 nodes. The new algorithm is particularly suitable for modern multi-core/many-core architectures, especially Intel’s latest Knights Landing processors, and allows us to scale up the computation near linearly across cores and nodes, and process hundreds of millions of words per second, which is the fastest word2vec implementation to the best of our knowledge. |
Tasks | |
Published | 2016-11-18 |
URL | http://arxiv.org/abs/1611.06172v2 |
http://arxiv.org/pdf/1611.06172v2.pdf | |
PWC | https://paperswithcode.com/paper/parallelizing-word2vec-in-multi-core-and-many |
Repo | https://github.com/IntelLabs/pWord2Vec |
Framework | none |
Style Imitation and Chord Invention in Polyphonic Music with Exponential Families
Title | Style Imitation and Chord Invention in Polyphonic Music with Exponential Families |
Authors | Gaëtan Hadjeres, Jason Sakellariou, François Pachet |
Abstract | Modeling polyphonic music is a particularly challenging task because of the intricate interplay between melody and harmony. A good model should satisfy three requirements: statistical accuracy (capturing faithfully the statistics of correlations at various ranges, horizontally and vertically), flexibility (coping with arbitrary user constraints), and generalization capacity (inventing new material, while staying in the style of the training corpus). Models proposed so far fail on at least one of these requirements. We propose a statistical model of polyphonic music, based on the maximum entropy principle. This model is able to learn and reproduce pairwise statistics between neighboring note events in a given corpus. The model is also able to invent new chords and to harmonize unknown melodies. We evaluate the invention capacity of the model by assessing the amount of cited, re-discovered, and invented chords on a corpus of Bach chorales. We discuss how the model enables the user to specify and enforce user-defined constraints, which makes it useful for style-based, interactive music generation. |
Tasks | Music Generation |
Published | 2016-09-16 |
URL | http://arxiv.org/abs/1609.05152v1 |
http://arxiv.org/pdf/1609.05152v1.pdf | |
PWC | https://paperswithcode.com/paper/style-imitation-and-chord-invention-in |
Repo | https://github.com/kastnerkyle/pachet_experiments |
Framework | none |
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks
Title | Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks |
Authors | Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum |
Abstract | Our goal is to combine the rich multistep inference of symbolic logical reasoning with the generalization capabilities of neural networks. We are particularly interested in complex reasoning about entities and relations in text and large-scale knowledge bases (KBs). Neelakantan et al. (2015) use RNNs to compose the distributed semantics of multi-hop paths in KBs; however for multiple reasons, the approach lacks accuracy and practicality. This paper proposes three significant modeling advances: (1) we learn to jointly reason about relations, entities, and entity-types; (2) we use neural attention modeling to incorporate multiple paths; (3) we learn to share strength in a single RNN that represents logical composition across all relations. On a largescale Freebase+ClueWeb prediction task, we achieve 25% error reduction, and a 53% error reduction on sparse relations due to shared strength. On chains of reasoning in WordNet we reduce error in mean quantile by 84% versus previous state-of-the-art. The code and data are available at https://rajarshd.github.io/ChainsofReasoning |
Tasks | |
Published | 2016-07-05 |
URL | http://arxiv.org/abs/1607.01426v3 |
http://arxiv.org/pdf/1607.01426v3.pdf | |
PWC | https://paperswithcode.com/paper/chains-of-reasoning-over-entities-relations |
Repo | https://github.com/rajarshd/ChainsofReasoning |
Framework | torch |
Compressed Learning: A Deep Neural Network Approach
Title | Compressed Learning: A Deep Neural Network Approach |
Authors | Amir Adler, Michael Elad, Michael Zibulevsky |
Abstract | Compressed Learning (CL) is a joint signal processing and machine learning framework for inference from a signal, using a small number of measurements obtained by linear projections of the signal. In this paper we present an end-to-end deep learning approach for CL, in which a network composed of fully-connected layers followed by convolutional layers perform the linear sensing and non-linear inference stages. During the training phase, the sensing matrix and the non-linear inference operator are jointly optimized, and the proposed approach outperforms state-of-the-art for the task of image classification. For example, at a sensing rate of 1% (only 8 measurements of 28 X 28 pixels images), the classification error for the MNIST handwritten digits dataset is 6.46% compared to 41.06% with state-of-the-art. |
Tasks | Image Classification |
Published | 2016-10-30 |
URL | http://arxiv.org/abs/1610.09615v1 |
http://arxiv.org/pdf/1610.09615v1.pdf | |
PWC | https://paperswithcode.com/paper/compressed-learning-a-deep-neural-network |
Repo | https://github.com/viebboy/MultilinearCompressiveLearningFramework |
Framework | tf |
Enhanced LSTM for Natural Language Inference
Title | Enhanced LSTM for Natural Language Inference |
Authors | Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, Diana Inkpen |
Abstract | Reasoning and inference are central to human and artificial intelligence. Modeling inference in human language is very challenging. With the availability of large annotated data (Bowman et al., 2015), it has recently become feasible to train neural network based inference models, which have shown to be very effective. In this paper, we present a new state-of-the-art result, achieving the accuracy of 88.6% on the Stanford Natural Language Inference Dataset. Unlike the previous top models that use very complicated network architectures, we first demonstrate that carefully designing sequential inference models based on chain LSTMs can outperform all previous models. Based on this, we further show that by explicitly considering recursive architectures in both local inference modeling and inference composition, we achieve additional improvement. Particularly, incorporating syntactic parsing information contributes to our best result—it further improves the performance even when added to the already very strong model. |
Tasks | Natural Language Inference |
Published | 2016-09-20 |
URL | http://arxiv.org/abs/1609.06038v3 |
http://arxiv.org/pdf/1609.06038v3.pdf | |
PWC | https://paperswithcode.com/paper/enhanced-lstm-for-natural-language-inference |
Repo | https://github.com/blcunlp/CNLI |
Framework | tf |
DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model
Title | DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model |
Authors | Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, Bernt Schiele |
Abstract | The goal of this paper is to advance the state-of-the-art of articulated pose estimation in scenes with multiple people. To that end we contribute on three fronts. We propose (1) improved body part detectors that generate effective bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms that allow to assemble the proposals into a variable number of consistent body part configurations; and (3) an incremental optimization strategy that explores the search space more efficiently thus leading both to better performance and significant speed-up factors. Evaluation is done on two single-person and two multi-person pose estimation benchmarks. The proposed approach significantly outperforms best known multi-person pose estimation results while demonstrating competitive performance on the task of single person pose estimation. Models and code available at http://pose.mpi-inf.mpg.de |
Tasks | Multi-Person Pose Estimation, Pose Estimation |
Published | 2016-05-10 |
URL | http://arxiv.org/abs/1605.03170v3 |
http://arxiv.org/pdf/1605.03170v3.pdf | |
PWC | https://paperswithcode.com/paper/deepercut-a-deeper-stronger-and-faster-multi |
Repo | https://github.com/toomanymat/pose_estimation |
Framework | tf |
Recurrent Neural Networks With Limited Numerical Precision
Title | Recurrent Neural Networks With Limited Numerical Precision |
Authors | Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio |
Abstract | Recurrent Neural Networks (RNNs) produce state-of-art performance on many machine learning tasks but their demand on resources in terms of memory and computational power are often high. Therefore, there is a great interest in optimizing the computations performed with these models especially when considering development of specialized low-power hardware for deep networks. One way of reducing the computational needs is to limit the numerical precision of the network weights and biases. This has led to different proposed rounding methods which have been applied so far to only Convolutional Neural Networks and Fully-Connected Networks. This paper addresses the question of how to best reduce weight precision during training in the case of RNNs. We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets. The results show that the weight binarization methods do not work with the RNNs. However, the stochastic and deterministic ternarization, and pow2-ternarization methods gave rise to low-precision RNNs that produce similar and even higher accuracy on certain datasets therefore providing a path towards training more efficient implementations of RNNs in specialized hardware. |
Tasks | |
Published | 2016-08-24 |
URL | http://arxiv.org/abs/1608.06902v2 |
http://arxiv.org/pdf/1608.06902v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-networks-with-limited-1 |
Repo | https://github.com/ottj/QuantizedRNN |
Framework | none |
Unsupervised Learning for Physical Interaction through Video Prediction
Title | Unsupervised Learning for Physical Interaction through Video Prediction |
Authors | Chelsea Finn, Ian Goodfellow, Sergey Levine |
Abstract | A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for real-world interactive agents, we also introduce a dataset of 59,000 robot interactions involving pushing motions, including a test set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot’s future actions amounts to learning a “visual imagination” of different futures based on different courses of action. Our experiments show that our proposed method produces more accurate video predictions both quantitatively and qualitatively, when compared to prior methods. |
Tasks | Video Prediction |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.07157v4 |
http://arxiv.org/pdf/1605.07157v4.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-for-physical |
Repo | https://github.com/tensorflow/models/tree/master/research/video_prediction |
Framework | tf |
TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild
Title | TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild |
Authors | Lluis Gomez-Bigorda, Dimosthenis Karatzas |
Abstract | Motivated by the success of powerful while expensive techniques to recognize words in a holistic way, object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way. Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10 percent f-score the best-performing method in the last ICDAR Robust Reading Competition. Source code of the complete end-to-end system is available at https://github.com/lluisgomez/TextProposals |
Tasks | |
Published | 2016-04-10 |
URL | http://arxiv.org/abs/1604.02619v3 |
http://arxiv.org/pdf/1604.02619v3.pdf | |
PWC | https://paperswithcode.com/paper/textproposals-a-text-specific-selective |
Repo | https://github.com/lluisgomez/TextProposals |
Framework | none |
Deep Watershed Transform for Instance Segmentation
Title | Deep Watershed Transform for Instance Segmentation |
Authors | Min Bai, Raquel Urtasun |
Abstract | Most contemporary approaches to instance segmentation use complex pipelines involving conditional random fields, recurrent neural networks, object proposals, or template matching schemes. In our paper, we present a simple yet powerful end-to-end convolutional neural network to tackle this task. Our approach combines intuitions from the classical watershed transform and modern deep learning to produce an energy map of the image where object instances are unambiguously represented as basins in the energy map. We then perform a cut at a single energy level to directly yield connected components corresponding to object instances. Our model more than doubles the performance of the state-of-the-art on the challenging Cityscapes Instance Level Segmentation task. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2016-11-24 |
URL | http://arxiv.org/abs/1611.08303v2 |
http://arxiv.org/pdf/1611.08303v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-watershed-transform-for-instance |
Repo | https://github.com/timothyn617/watershed-transform |
Framework | pytorch |
A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks
Title | A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks |
Authors | Felix Grün, Christian Rupprecht, Nassir Navab, Federico Tombari |
Abstract | Over the last decade, Convolutional Neural Networks (CNN) saw a tremendous surge in performance. However, understanding what a network has learned still proves to be a challenging task. To remedy this unsatisfactory situation, a number of groups have recently proposed different methods to visualize the learned models. In this work we suggest a general taxonomy to classify and compare these methods, subdividing the literature into three main categories and providing researchers with a terminology to base their works on. Furthermore, we introduce the FeatureVis library for MatConvNet: an extendable, easy to use open source library for visualizing CNNs. It contains implementations from each of the three main classes of visualization methods and serves as a useful tool for an enhanced understanding of the features learned by intermediate layers, as well as for the analysis of why a network might fail for certain examples. |
Tasks | |
Published | 2016-06-24 |
URL | http://arxiv.org/abs/1606.07757v1 |
http://arxiv.org/pdf/1606.07757v1.pdf | |
PWC | https://paperswithcode.com/paper/a-taxonomy-and-library-for-visualizing |
Repo | https://github.com/FelixGruen/featurevis |
Framework | none |
A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation
Title | A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation |
Authors | Thang D. Bui, Josiah Yan, Richard E. Turner |
Abstract | Gaussian processes (GPs) are flexible distributions over functions that enable high-level assumptions about unknown functions to be encoded in a parsimonious, flexible and general way. Although elegant, the application of GPs is limited by computational and analytical intractabilities that arise when data are sufficiently numerous or when employing non-Gaussian models. Consequently, a wealth of GP approximation schemes have been developed over the last 15 years to address these key limitations. Many of these schemes employ a small set of pseudo data points to summarise the actual data. In this paper, we develop a new pseudo-point approximation framework using Power Expectation Propagation (Power EP) that unifies a large number of these pseudo-point approximations. Unlike much of the previous venerable work in this area, the new framework is built on standard methods for approximate inference (variational free-energy, EP and Power EP methods) rather than employing approximations to the probabilistic generative model itself. In this way, all of approximation is performed at inference time' rather than at modelling time’ resolving awkward philosophical and empirical questions that trouble previous approaches. Crucially, we demonstrate that the new framework includes new pseudo-point approximation methods that outperform current approaches on regression and classification tasks. |
Tasks | Gaussian Processes |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.07066v3 |
http://arxiv.org/pdf/1605.07066v3.pdf | |
PWC | https://paperswithcode.com/paper/a-unifying-framework-for-gaussian-process |
Repo | https://github.com/thangbui/sparseGP_powerEP |
Framework | none |
How Document Pre-processing affects Keyphrase Extraction Performance
Title | How Document Pre-processing affects Keyphrase Extraction Performance |
Authors | Florian Boudin, Hugo Mougard, Damien Cram |
Abstract | The SemEval-2010 benchmark dataset has brought renewed attention to the task of automatic keyphrase extraction. This dataset is made up of scientific articles that were automatically converted from PDF format to plain text and thus require careful preprocessing so that irrevelant spans of text do not negatively affect keyphrase extraction performance. In previous work, a wide range of document preprocessing techniques were described but their impact on the overall performance of keyphrase extraction models is still unexplored. Here, we re-assess the performance of several keyphrase extraction models and measure their robustness against increasingly sophisticated levels of document preprocessing. |
Tasks | |
Published | 2016-10-25 |
URL | http://arxiv.org/abs/1610.07809v1 |
http://arxiv.org/pdf/1610.07809v1.pdf | |
PWC | https://paperswithcode.com/paper/how-document-pre-processing-affects-keyphrase |
Repo | https://github.com/boudinfl/semeval-2010-pre |
Framework | none |
Times series averaging and denoising from a probabilistic perspective on time-elastic kernels
Title | Times series averaging and denoising from a probabilistic perspective on time-elastic kernels |
Authors | Pierre-François Marteau |
Abstract | In the light of regularized dynamic time warping kernels, this paper re-considers the concept of time elastic centroid for a setof time series. We derive a new algorithm based on a probabilistic interpretation of kernel alignment matrices. This algorithm expressesthe averaging process in terms of a stochastic alignment automata. It uses an iterative agglomerative heuristic method for averagingthe aligned samples, while also averaging the times of occurrence of the aligned samples. By comparing classification accuracies for45 heterogeneous time series datasets obtained by first nearest centroid/medoid classifiers we show that: i) centroid-basedapproaches significantly outperform medoid-based approaches, ii) for the considered datasets, our algorithm that combines averagingin the sample space and along the time axes, emerges as the most significantly robust model for time-elastic averaging with apromising noise reduction capability. We also demonstrate its benefit in an isolated gesture recognition experiment and its ability tosignificantly reduce the size of training instance sets. Finally we highlight its denoising capability using demonstrative synthetic data:we show that it is possible to retrieve, from few noisy instances, a signal whose components are scattered in a wide spectral band. |
Tasks | Denoising, Gesture Recognition, Time Series, Time Series Denoising |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09194v4 |
http://arxiv.org/pdf/1611.09194v4.pdf | |
PWC | https://paperswithcode.com/paper/times-series-averaging-and-denoising-from-a |
Repo | https://github.com/pfmarteau/eKATS |
Framework | none |