May 7, 2019

2784 words 14 mins read

Paper Group AWR 25

Tensor Switching Networks. Learning Representations for Automatic Colorization. Robust and Low-Rank Representation for Fast Face Identification with Occlusions. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. Boosting Joint Models for Longitudinal and Time-to-Event Data. Interpreting Neural Networks to Improve Politeness Comprehe …

Tensor Switching Networks


Title	Tensor Switching Networks
Authors	Chuan-Yung Tsai, Andrew Saxe, David Cox
Abstract	We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.
Tasks	Representation Learning
Published	2016-10-31
URL	http://arxiv.org/abs/1610.10087v1
PDF	http://arxiv.org/pdf/1610.10087v1.pdf
PWC	https://paperswithcode.com/paper/tensor-switching-networks
Repo	https://github.com/coxlab/tsnet
Framework	none

Learning Representations for Automatic Colorization


Title	Learning Representations for Automatic Colorization
Authors	Gustav Larsson, Michael Maire, Gregory Shakhnarovich
Abstract	We develop a fully automatic image colorization system. Our approach leverages recent advances in deep networks, exploiting both low-level and semantic representations. As many scene elements naturally appear according to multimodal color distributions, we train our model to predict per-pixel color histograms. This intermediate output can be used to automatically generate a color image, or further manipulated prior to image formation. On both fully and partially automatic colorization tasks, we outperform existing methods. We also explore colorization as a vehicle for self-supervised visual representation learning.
Tasks	Colorization, Representation Learning
Published	2016-03-22
URL	http://arxiv.org/abs/1603.06668v3
PDF	http://arxiv.org/pdf/1603.06668v3.pdf
PWC	https://paperswithcode.com/paper/learning-representations-for-automatic
Repo	https://github.com/gustavla/autocolorize
Framework	tf

Robust and Low-Rank Representation for Fast Face Identification with Occlusions


Title	Robust and Low-Rank Representation for Fast Face Identification with Occlusions
Authors	Michael Iliadis, Haohong Wang, Rafael Molina, Aggelos K. Katsaggelos
Abstract	In this paper we propose an iterative method to address the face identification problem with block occlusions. Our approach utilizes a robust representation based on two characteristics in order to model contiguous errors (e.g., block occlusion) effectively. The first fits to the errors a distribution described by a tailored loss function. The second describes the error image as having a specific structure (resulting in low-rank in comparison to image size). We will show that this joint characterization is effective for describing errors with spatial continuity. Our approach is computationally efficient due to the utilization of the Alternating Direction Method of Multipliers (ADMM). A special case of our fast iterative algorithm leads to the robust representation method which is normally used to handle non-contiguous errors (e.g., pixel corruption). Extensive results on representative face databases (in constrained and unconstrained environments) document the effectiveness of our method over existing robust representation methods with respect to both identification rates and computational time. Code is available at Github, where you can find implementations of the F-LR-IRNNLS and F-IRNNLS (fast version of the RRC) : https://github.com/miliadis/FIRC
Tasks	Face Identification
Published	2016-05-08
URL	http://arxiv.org/abs/1605.02266v2
PDF	http://arxiv.org/pdf/1605.02266v2.pdf
PWC	https://paperswithcode.com/paper/robust-and-low-rank-representation-for-fast
Repo	https://github.com/miliadis/FIRC
Framework	none

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient


Title	SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
Authors	Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu
Abstract	As a new way of training generative models, Generative Adversarial Nets (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.
Tasks	Text Generation
Published	2016-09-18
URL	http://arxiv.org/abs/1609.05473v6
PDF	http://arxiv.org/pdf/1609.05473v6.pdf
PWC	https://paperswithcode.com/paper/seqgan-sequence-generative-adversarial-nets
Repo	https://github.com/L0SG/seqgan-music
Framework	tf

Boosting Joint Models for Longitudinal and Time-to-Event Data


Title	Boosting Joint Models for Longitudinal and Time-to-Event Data
Authors	Elisabeth Waldmann, David Taylor-Robinson, Nadja Klein, Thomas Kneib, Tania Pressler, Matthias Schmid, Andreas Mayr
Abstract	Joint Models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique to approach common a data structure in clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by independent modelling. Commonly, joint models are estimated in likelihood based expectation maximization or Bayesian approaches using frameworks where variable selection is problematic and which do not immediately work for high-dimensional data. In this paper, we propose a boosting algorithm tackling these challenges by being able to simultaneously estimate predictors for joint models and automatically select the most influential variables even in high-dimensional data situations. We analyse the performance of the new algorithm in a simulation study and apply it to the Danish cystic fibrosis registry which collects longitudinal lung function data on patients with cystic fibrosis together with data regarding the onset of pulmonary infections. This is the first approach to combine state-of-the art algorithms from the field of machine-learning with the model class of joint models, providing a fully data-driven mechanism to select variables and predictor effects in a unified framework of boosting joint models.
Tasks
Published	2016-09-09
URL	http://arxiv.org/abs/1609.02686v2
PDF	http://arxiv.org/pdf/1609.02686v2.pdf
PWC	https://paperswithcode.com/paper/boosting-joint-models-for-longitudinal-and
Repo	https://github.com/mayrandy/JMboost
Framework	none

Interpreting Neural Networks to Improve Politeness Comprehension


Title	Interpreting Neural Networks to Improve Politeness Comprehension
Authors	Malika Aubakirova, Mohit Bansal
Abstract	We present an interpretable neural network approach to predicting and understanding politeness in natural language requests. Our models are based on simple convolutional neural networks directly on raw text, avoiding any manual identification of complex sentiment or syntactic features, while performing better than such feature-based models from previous work. More importantly, we use the challenging task of politeness prediction as a testbed to next present a much-needed understanding of what these successful networks are actually learning. For this, we present several network visualizations based on activation clusters, first derivative saliency, and embedding space transformations, helping us automatically identify several subtle linguistics markers of politeness theories. Further, this analysis reveals multiple novel, high-scoring politeness strategies which, when added back as new features, reduce the accuracy gap between the original featurized system and the neural model, thus providing a clear quantitative interpretation of the success of these neural networks.
Tasks
Published	2016-10-09
URL	http://arxiv.org/abs/1610.02683v1
PDF	http://arxiv.org/pdf/1610.02683v1.pdf
PWC	https://paperswithcode.com/paper/interpreting-neural-networks-to-improve
Repo	https://github.com/swkarlekar/summaries
Framework	tf

Doubly Stochastic Neighbor Embedding on Spheres


Title	Doubly Stochastic Neighbor Embedding on Spheres
Authors	Yao Lu, Jukka Corander, Zhirong Yang
Abstract	Stochastic Neighbor Embedding (SNE) methods minimize the divergence between the similarity matrix of a high-dimensional data set and its counterpart from a low-dimensional embedding, leading to widely applied tools for data visualization. Despite their popularity, the current SNE methods experience a crowding problem when the data include highly imbalanced similarities. This implies that the data points with higher total similarity tend to get crowded around the display center. To solve this problem, we introduce a fast normalization method and normalize the similarity matrix to be doubly stochastic such that all the data points have equal total similarities. Furthermore, we show empirically and theoretically that the doubly stochasticity constraint often leads to embeddings which are approximately spherical. This suggests replacing a flat space with spheres as the embedding space. The spherical embedding eliminates the discrepancy between the center and the periphery in visualization, which efficiently resolves the crowding problem. We compared the proposed method (DOSNES) with the state-of-the-art SNE method on three real-world datasets and the results clearly indicate that our method is more favorable in terms of visualization quality.
Tasks
Published	2016-09-07
URL	http://arxiv.org/abs/1609.01977v2
PDF	http://arxiv.org/pdf/1609.01977v2.pdf
PWC	https://paperswithcode.com/paper/doubly-stochastic-neighbor-embedding-on
Repo	https://github.com/yaolubrain/DOSNES
Framework	none

Cross-lingual Models of Word Embeddings: An Empirical Comparison


Title	Cross-lingual Models of Word Embeddings: An Empirical Comparison
Authors	Shyam Upadhyay, Manaal Faruqui, Chris Dyer, Dan Roth
Abstract	Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typographically different language pairs. Our evaluation setup spans four different tasks, including intrinsic evaluation on mono-lingual and cross-lingual similarity, and extrinsic evaluation on downstream semantic and syntactic applications. We show that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks.
Tasks	Word Embeddings
Published	2016-04-01
URL	http://arxiv.org/abs/1604.00425v2
PDF	http://arxiv.org/pdf/1604.00425v2.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-models-of-word-embeddings-an
Repo	https://github.com/shyamupa/biling-survey
Framework	none

Deep Reinforcement Learning for Dialogue Generation


Title	Deep Reinforcement Learning for Dialogue Generation
Authors	Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, Dan Jurafsky
Abstract	Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes. Modeling the future direction of a dialogue is crucial to generating coherent, interesting dialogues, a need which led traditional NLP models of dialogue to draw on reinforcement learning. In this paper, we show how to integrate these goals, applying deep reinforcement learning to model future reward in chatbot dialogue. The model simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity (non-repetitive turns), coherence, and ease of answering (related to forward-looking function). We evaluate our model on diversity, length as well as with human judges, showing that the proposed algorithm generates more interactive responses and manages to foster a more sustained conversation in dialogue simulation. This work marks a first step towards learning a neural conversational model based on the long-term success of dialogues.
Tasks	Chatbot, Dialogue Generation, Policy Gradient Methods
Published	2016-06-05
URL	http://arxiv.org/abs/1606.01541v4
PDF	http://arxiv.org/pdf/1606.01541v4.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-dialogue
Repo	https://github.com/tfolkman/deep-learning-experiments
Framework	none

Reweighting with Boosted Decision Trees


Title	Reweighting with Boosted Decision Trees
Authors	A. Rogozhnikov
Abstract	Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers. In most cases, these are classification models used to select the “signal” events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting - assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.
Tasks
Published	2016-08-20
URL	http://arxiv.org/abs/1608.05806v1
PDF	http://arxiv.org/pdf/1608.05806v1.pdf
PWC	https://paperswithcode.com/paper/reweighting-with-boosted-decision-trees
Repo	https://github.com/philippgadow/reweight_samples
Framework	none

Fuzzy Bayesian Learning


Title	Fuzzy Bayesian Learning
Authors	Indranil Pan, Dirk Bester
Abstract	In this paper we propose a novel approach for learning from data using rule based fuzzy inference systems where the model parameters are estimated using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the applicability of the method for regression and classification tasks using synthetic data-sets and also a real world example in the financial services industry. Then we demonstrate how the method can be extended for knowledge extraction to select the individual rules in a Bayesian way which best explains the given data. Finally we discuss the advantages and pitfalls of using this method over state-of-the-art techniques and highlight the specific class of problems where this would be useful.
Tasks	Bayesian Inference
Published	2016-10-28
URL	http://arxiv.org/abs/1610.09156v2
PDF	http://arxiv.org/pdf/1610.09156v2.pdf
PWC	https://paperswithcode.com/paper/fuzzy-bayesian-learning
Repo	https://github.com/SciemusGithub/FBL
Framework	none

Object Contour Detection with a Fully Convolutional Encoder-Decoder Network


Title	Object Contour Detection with a Fully Convolutional Encoder-Decoder Network
Authors	Jimei Yang, Brian Price, Scott Cohen, Honglak Lee, Ming-Hsuan Yang
Abstract	We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network. Different from previous low-level edge detection, our algorithm focuses on detecting higher-level object contours. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection than previous methods. We find that the learned model generalizes well to unseen object classes from the same super-categories on MS COCO and can match state-of-the-art edge detection on BSDS500 with fine-tuning. By combining with the multiscale combinatorial grouping algorithm, our method can generate high-quality segmented object proposals, which significantly advance the state-of-the-art on PASCAL VOC (improving average recall from 0.62 to 0.67) with a relatively small amount of candidates ($\sim$1660 per image).
Tasks	Contour Detection, Edge Detection
Published	2016-03-15
URL	http://arxiv.org/abs/1603.04530v1
PDF	http://arxiv.org/pdf/1603.04530v1.pdf
PWC	https://paperswithcode.com/paper/object-contour-detection-with-a-fully
Repo	https://github.com/Raj-08/tensorflow-object-contour-detection
Framework	tf

Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis


Title	Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis
Authors	Alessio Benavoli, Giorgio Corani, Janez Demsar, Marco Zaffalon
Abstract	The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for it.
Tasks
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04316v3
PDF	http://arxiv.org/pdf/1606.04316v3.pdf
PWC	https://paperswithcode.com/paper/time-for-a-change-a-tutorial-for-comparing
Repo	https://github.com/BayesianTestsML/tutorial
Framework	none

Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation


Title	Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation
Authors	Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, Pascal Fua
Abstract	Most recent approaches to monocular 3D human pose estimation rely on Deep Learning. They typically involve regressing from an image to either 3D joint coordinates directly or 2D joint locations from which 3D coordinates are inferred. Both approaches have their strengths and weaknesses and we therefore propose a novel architecture designed to deliver the best of both worlds by performing both simultaneously and fusing the information along the way. At the heart of our framework is a trainable fusion scheme that learns how to fuse the information optimally instead of being hand-designed. This yields significant improvements upon the state-of-the-art on standard 3D human pose estimation benchmarks.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2016-11-17
URL	http://arxiv.org/abs/1611.05708v3
PDF	http://arxiv.org/pdf/1611.05708v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-fuse-2d-and-3d-image-cues-for
Repo	https://github.com/romanus/code_tekinetal_iccv17
Framework	none

Cryptocurrency Portfolio Management with Deep Reinforcement Learning


Title	Cryptocurrency Portfolio Management with Deep Reinforcement Learning
Authors	Zhengyao Jiang, Jinjun Liang
Abstract	Portfolio management is the decision-making process of allocating an amount of fund into different financial investment products. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. This paper presents a model-less convolutional neural network with historic prices of a set of financial assets as its input, outputting portfolio weights of the set. The network is trained with 0.7 years’ price data from a cryptocurrency exchange. The training is done in a reinforcement manner, maximizing the accumulative return, which is regarded as the reward function of the network. Backtest trading experiments with trading period of 30 minutes is conducted in the same market, achieving 10-fold returns in 1.8 months’ periods. Some recently published portfolio selection strategies are also used to perform the same back-tests, whose results are compared with the neural network. The network is not limited to cryptocurrency, but can be applied to any other financial markets.
Tasks	Decision Making
Published	2016-12-05
URL	http://arxiv.org/abs/1612.01277v5
PDF	http://arxiv.org/pdf/1612.01277v5.pdf
PWC	https://paperswithcode.com/paper/cryptocurrency-portfolio-management-with-deep
Repo	https://github.com/edwardwardward/crypto_ml
Framework	tf