May 7, 2019

2784 words 14 mins read

Paper Group AWR 25

Paper Group AWR 25

Tensor Switching Networks. Learning Representations for Automatic Colorization. Robust and Low-Rank Representation for Fast Face Identification with Occlusions. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. Boosting Joint Models for Longitudinal and Time-to-Event Data. Interpreting Neural Networks to Improve Politeness Comprehe …

Tensor Switching Networks

Title Tensor Switching Networks
Authors Chuan-Yung Tsai, Andrew Saxe, David Cox
Abstract We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.
Tasks Representation Learning
Published 2016-10-31
URL http://arxiv.org/abs/1610.10087v1
PDF http://arxiv.org/pdf/1610.10087v1.pdf
PWC https://paperswithcode.com/paper/tensor-switching-networks
Repo https://github.com/coxlab/tsnet
Framework none

Learning Representations for Automatic Colorization

Title Learning Representations for Automatic Colorization
Authors Gustav Larsson, Michael Maire, Gregory Shakhnarovich
Abstract We develop a fully automatic image colorization system. Our approach leverages recent advances in deep networks, exploiting both low-level and semantic representations. As many scene elements naturally appear according to multimodal color distributions, we train our model to predict per-pixel color histograms. This intermediate output can be used to automatically generate a color image, or further manipulated prior to image formation. On both fully and partially automatic colorization tasks, we outperform existing methods. We also explore colorization as a vehicle for self-supervised visual representation learning.
Tasks Colorization, Representation Learning
Published 2016-03-22
URL http://arxiv.org/abs/1603.06668v3
PDF http://arxiv.org/pdf/1603.06668v3.pdf
PWC https://paperswithcode.com/paper/learning-representations-for-automatic
Repo https://github.com/gustavla/autocolorize
Framework tf

Robust and Low-Rank Representation for Fast Face Identification with Occlusions

Title Robust and Low-Rank Representation for Fast Face Identification with Occlusions
Authors Michael Iliadis, Haohong Wang, Rafael Molina, Aggelos K. Katsaggelos
Abstract In this paper we propose an iterative method to address the face identification problem with block occlusions. Our approach utilizes a robust representation based on two characteristics in order to model contiguous errors (e.g., block occlusion) effectively. The first fits to the errors a distribution described by a tailored loss function. The second describes the error image as having a specific structure (resulting in low-rank in comparison to image size). We will show that this joint characterization is effective for describing errors with spatial continuity. Our approach is computationally efficient due to the utilization of the Alternating Direction Method of Multipliers (ADMM). A special case of our fast iterative algorithm leads to the robust representation method which is normally used to handle non-contiguous errors (e.g., pixel corruption). Extensive results on representative face databases (in constrained and unconstrained environments) document the effectiveness of our method over existing robust representation methods with respect to both identification rates and computational time. Code is available at Github, where you can find implementations of the F-LR-IRNNLS and F-IRNNLS (fast version of the RRC) : https://github.com/miliadis/FIRC
Tasks Face Identification
Published 2016-05-08
URL http://arxiv.org/abs/1605.02266v2
PDF http://arxiv.org/pdf/1605.02266v2.pdf
PWC https://paperswithcode.com/paper/robust-and-low-rank-representation-for-fast
Repo https://github.com/miliadis/FIRC
Framework none

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

Title SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
Authors Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu
Abstract As a new way of training generative models, Generative Adversarial Nets (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.
Tasks Text Generation
Published 2016-09-18
URL http://arxiv.org/abs/1609.05473v6
PDF http://arxiv.org/pdf/1609.05473v6.pdf
PWC https://paperswithcode.com/paper/seqgan-sequence-generative-adversarial-nets
Repo https://github.com/L0SG/seqgan-music
Framework tf

Boosting Joint Models for Longitudinal and Time-to-Event Data

Title Boosting Joint Models for Longitudinal and Time-to-Event Data
Authors Elisabeth Waldmann, David Taylor-Robinson, Nadja Klein, Thomas Kneib, Tania Pressler, Matthias Schmid, Andreas Mayr
Abstract Joint Models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique to approach common a data structure in clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by independent modelling. Commonly, joint models are estimated in likelihood based expectation maximization or Bayesian approaches using frameworks where variable selection is problematic and which do not immediately work for high-dimensional data. In this paper, we propose a boosting algorithm tackling these challenges by being able to simultaneously estimate predictors for joint models and automatically select the most influential variables even in high-dimensional data situations. We analyse the performance of the new algorithm in a simulation study and apply it to the Danish cystic fibrosis registry which collects longitudinal lung function data on patients with cystic fibrosis together with data regarding the onset of pulmonary infections. This is the first approach to combine state-of-the art algorithms from the field of machine-learning with the model class of joint models, providing a fully data-driven mechanism to select variables and predictor effects in a unified framework of boosting joint models.
Tasks
Published 2016-09-09
URL http://arxiv.org/abs/1609.02686v2
PDF http://arxiv.org/pdf/1609.02686v2.pdf
PWC https://paperswithcode.com/paper/boosting-joint-models-for-longitudinal-and
Repo https://github.com/mayrandy/JMboost
Framework none

Interpreting Neural Networks to Improve Politeness Comprehension

Title Interpreting Neural Networks to Improve Politeness Comprehension
Authors Malika Aubakirova, Mohit Bansal
Abstract We present an interpretable neural network approach to predicting and understanding politeness in natural language requests. Our models are based on simple convolutional neural networks directly on raw text, avoiding any manual identification of complex sentiment or syntactic features, while performing better than such feature-based models from previous work. More importantly, we use the challenging task of politeness prediction as a testbed to next present a much-needed understanding of what these successful networks are actually learning. For this, we present several network visualizations based on activation clusters, first derivative saliency, and embedding space transformations, helping us automatically identify several subtle linguistics markers of politeness theories. Further, this analysis reveals multiple novel, high-scoring politeness strategies which, when added back as new features, reduce the accuracy gap between the original featurized system and the neural model, thus providing a clear quantitative interpretation of the success of these neural networks.
Tasks
Published 2016-10-09
URL http://arxiv.org/abs/1610.02683v1
PDF http://arxiv.org/pdf/1610.02683v1.pdf
PWC https://paperswithcode.com/paper/interpreting-neural-networks-to-improve
Repo https://github.com/swkarlekar/summaries
Framework tf

Doubly Stochastic Neighbor Embedding on Spheres

Title Doubly Stochastic Neighbor Embedding on Spheres
Authors Yao Lu, Jukka Corander, Zhirong Yang
Abstract Stochastic Neighbor Embedding (SNE) methods minimize the divergence between the similarity matrix of a high-dimensional data set and its counterpart from a low-dimensional embedding, leading to widely applied tools for data visualization. Despite their popularity, the current SNE methods experience a crowding problem when the data include highly imbalanced similarities. This implies that the data points with higher total similarity tend to get crowded around the display center. To solve this problem, we introduce a fast normalization method and normalize the similarity matrix to be doubly stochastic such that all the data points have equal total similarities. Furthermore, we show empirically and theoretically that the doubly stochasticity constraint often leads to embeddings which are approximately spherical. This suggests replacing a flat space with spheres as the embedding space. The spherical embedding eliminates the discrepancy between the center and the periphery in visualization, which efficiently resolves the crowding problem. We compared the proposed method (DOSNES) with the state-of-the-art SNE method on three real-world datasets and the results clearly indicate that our method is more favorable in terms of visualization quality.
Tasks
Published 2016-09-07
URL http://arxiv.org/abs/1609.01977v2
PDF http://arxiv.org/pdf/1609.01977v2.pdf
PWC https://paperswithcode.com/paper/doubly-stochastic-neighbor-embedding-on
Repo https://github.com/yaolubrain/DOSNES
Framework none

Cross-lingual Models of Word Embeddings: An Empirical Comparison

Title Cross-lingual Models of Word Embeddings: An Empirical Comparison
Authors Shyam Upadhyay, Manaal Faruqui, Chris Dyer, Dan Roth
Abstract Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typographically different language pairs. Our evaluation setup spans four different tasks, including intrinsic evaluation on mono-lingual and cross-lingual similarity, and extrinsic evaluation on downstream semantic and syntactic applications. We show that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks.
Tasks Word Embeddings
Published 2016-04-01
URL http://arxiv.org/abs/1604.00425v2
PDF http://arxiv.org/pdf/1604.00425v2.pdf
PWC https://paperswithcode.com/paper/cross-lingual-models-of-word-embeddings-an
Repo https://github.com/shyamupa/biling-survey
Framework none

Deep Reinforcement Learning for Dialogue Generation

Title Deep Reinforcement Learning for Dialogue Generation
Authors Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, Dan Jurafsky
Abstract Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes. Modeling the future direction of a dialogue is crucial to generating coherent, interesting dialogues, a need which led traditional NLP models of dialogue to draw on reinforcement learning. In this paper, we show how to integrate these goals, applying deep reinforcement learning to model future reward in chatbot dialogue. The model simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity (non-repetitive turns), coherence, and ease of answering (related to forward-looking function). We evaluate our model on diversity, length as well as with human judges, showing that the proposed algorithm generates more interactive responses and manages to foster a more sustained conversation in dialogue simulation. This work marks a first step towards learning a neural conversational model based on the long-term success of dialogues.
Tasks Chatbot, Dialogue Generation, Policy Gradient Methods
Published 2016-06-05
URL http://arxiv.org/abs/1606.01541v4
PDF http://arxiv.org/pdf/1606.01541v4.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-for-dialogue
Repo https://github.com/tfolkman/deep-learning-experiments
Framework none

Reweighting with Boosted Decision Trees

Title Reweighting with Boosted Decision Trees
Authors A. Rogozhnikov
Abstract Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers. In most cases, these are classification models used to select the “signal” events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting - assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.
Tasks
Published 2016-08-20
URL http://arxiv.org/abs/1608.05806v1
PDF http://arxiv.org/pdf/1608.05806v1.pdf
PWC https://paperswithcode.com/paper/reweighting-with-boosted-decision-trees
Repo https://github.com/philippgadow/reweight_samples
Framework none

Fuzzy Bayesian Learning

Title Fuzzy Bayesian Learning
Authors Indranil Pan, Dirk Bester
Abstract In this paper we propose a novel approach for learning from data using rule based fuzzy inference systems where the model parameters are estimated using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the applicability of the method for regression and classification tasks using synthetic data-sets and also a real world example in the financial services industry. Then we demonstrate how the method can be extended for knowledge extraction to select the individual rules in a Bayesian way which best explains the given data. Finally we discuss the advantages and pitfalls of using this method over state-of-the-art techniques and highlight the specific class of problems where this would be useful.
Tasks Bayesian Inference
Published 2016-10-28
URL http://arxiv.org/abs/1610.09156v2
PDF http://arxiv.org/pdf/1610.09156v2.pdf
PWC https://paperswithcode.com/paper/fuzzy-bayesian-learning
Repo https://github.com/SciemusGithub/FBL
Framework none

Object Contour Detection with a Fully Convolutional Encoder-Decoder Network

Title Object Contour Detection with a Fully Convolutional Encoder-Decoder Network
Authors Jimei Yang, Brian Price, Scott Cohen, Honglak Lee, Ming-Hsuan Yang
Abstract We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network. Different from previous low-level edge detection, our algorithm focuses on detecting higher-level object contours. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection than previous methods. We find that the learned model generalizes well to unseen object classes from the same super-categories on MS COCO and can match state-of-the-art edge detection on BSDS500 with fine-tuning. By combining with the multiscale combinatorial grouping algorithm, our method can generate high-quality segmented object proposals, which significantly advance the state-of-the-art on PASCAL VOC (improving average recall from 0.62 to 0.67) with a relatively small amount of candidates ($\sim$1660 per image).
Tasks Contour Detection, Edge Detection
Published 2016-03-15
URL http://arxiv.org/abs/1603.04530v1
PDF http://arxiv.org/pdf/1603.04530v1.pdf
PWC https://paperswithcode.com/paper/object-contour-detection-with-a-fully
Repo https://github.com/Raj-08/tensorflow-object-contour-detection
Framework tf

Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis

Title Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis
Authors Alessio Benavoli, Giorgio Corani, Janez Demsar, Marco Zaffalon
Abstract The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for it.
Tasks
Published 2016-06-14
URL http://arxiv.org/abs/1606.04316v3
PDF http://arxiv.org/pdf/1606.04316v3.pdf
PWC https://paperswithcode.com/paper/time-for-a-change-a-tutorial-for-comparing
Repo https://github.com/BayesianTestsML/tutorial
Framework none

Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation

Title Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation
Authors Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, Pascal Fua
Abstract Most recent approaches to monocular 3D human pose estimation rely on Deep Learning. They typically involve regressing from an image to either 3D joint coordinates directly or 2D joint locations from which 3D coordinates are inferred. Both approaches have their strengths and weaknesses and we therefore propose a novel architecture designed to deliver the best of both worlds by performing both simultaneously and fusing the information along the way. At the heart of our framework is a trainable fusion scheme that learns how to fuse the information optimally instead of being hand-designed. This yields significant improvements upon the state-of-the-art on standard 3D human pose estimation benchmarks.
Tasks 3D Human Pose Estimation, Pose Estimation
Published 2016-11-17
URL http://arxiv.org/abs/1611.05708v3
PDF http://arxiv.org/pdf/1611.05708v3.pdf
PWC https://paperswithcode.com/paper/learning-to-fuse-2d-and-3d-image-cues-for
Repo https://github.com/romanus/code_tekinetal_iccv17
Framework none

Cryptocurrency Portfolio Management with Deep Reinforcement Learning

Title Cryptocurrency Portfolio Management with Deep Reinforcement Learning
Authors Zhengyao Jiang, Jinjun Liang
Abstract Portfolio management is the decision-making process of allocating an amount of fund into different financial investment products. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. This paper presents a model-less convolutional neural network with historic prices of a set of financial assets as its input, outputting portfolio weights of the set. The network is trained with 0.7 years’ price data from a cryptocurrency exchange. The training is done in a reinforcement manner, maximizing the accumulative return, which is regarded as the reward function of the network. Backtest trading experiments with trading period of 30 minutes is conducted in the same market, achieving 10-fold returns in 1.8 months’ periods. Some recently published portfolio selection strategies are also used to perform the same back-tests, whose results are compared with the neural network. The network is not limited to cryptocurrency, but can be applied to any other financial markets.
Tasks Decision Making
Published 2016-12-05
URL http://arxiv.org/abs/1612.01277v5
PDF http://arxiv.org/pdf/1612.01277v5.pdf
PWC https://paperswithcode.com/paper/cryptocurrency-portfolio-management-with-deep
Repo https://github.com/edwardwardward/crypto_ml
Framework tf
comments powered by Disqus