Paper Group AWR 25
Tensor Switching Networks. Learning Representations for Automatic Colorization. Robust and Low-Rank Representation for Fast Face Identification with Occlusions. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. Boosting Joint Models for Longitudinal and Time-to-Event Data. Interpreting Neural Networks to Improve Politeness Comprehe …
Tensor Switching Networks
Title | Tensor Switching Networks |
Authors | Chuan-Yung Tsai, Andrew Saxe, David Cox |
Abstract | We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks. |
Tasks | Representation Learning |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1610.10087v1 |
http://arxiv.org/pdf/1610.10087v1.pdf | |
PWC | https://paperswithcode.com/paper/tensor-switching-networks |
Repo | https://github.com/coxlab/tsnet |
Framework | none |
Learning Representations for Automatic Colorization
Title | Learning Representations for Automatic Colorization |
Authors | Gustav Larsson, Michael Maire, Gregory Shakhnarovich |
Abstract | We develop a fully automatic image colorization system. Our approach leverages recent advances in deep networks, exploiting both low-level and semantic representations. As many scene elements naturally appear according to multimodal color distributions, we train our model to predict per-pixel color histograms. This intermediate output can be used to automatically generate a color image, or further manipulated prior to image formation. On both fully and partially automatic colorization tasks, we outperform existing methods. We also explore colorization as a vehicle for self-supervised visual representation learning. |
Tasks | Colorization, Representation Learning |
Published | 2016-03-22 |
URL | http://arxiv.org/abs/1603.06668v3 |
http://arxiv.org/pdf/1603.06668v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-representations-for-automatic |
Repo | https://github.com/gustavla/autocolorize |
Framework | tf |
Robust and Low-Rank Representation for Fast Face Identification with Occlusions
Title | Robust and Low-Rank Representation for Fast Face Identification with Occlusions |
Authors | Michael Iliadis, Haohong Wang, Rafael Molina, Aggelos K. Katsaggelos |
Abstract | In this paper we propose an iterative method to address the face identification problem with block occlusions. Our approach utilizes a robust representation based on two characteristics in order to model contiguous errors (e.g., block occlusion) effectively. The first fits to the errors a distribution described by a tailored loss function. The second describes the error image as having a specific structure (resulting in low-rank in comparison to image size). We will show that this joint characterization is effective for describing errors with spatial continuity. Our approach is computationally efficient due to the utilization of the Alternating Direction Method of Multipliers (ADMM). A special case of our fast iterative algorithm leads to the robust representation method which is normally used to handle non-contiguous errors (e.g., pixel corruption). Extensive results on representative face databases (in constrained and unconstrained environments) document the effectiveness of our method over existing robust representation methods with respect to both identification rates and computational time. Code is available at Github, where you can find implementations of the F-LR-IRNNLS and F-IRNNLS (fast version of the RRC) : https://github.com/miliadis/FIRC |
Tasks | Face Identification |
Published | 2016-05-08 |
URL | http://arxiv.org/abs/1605.02266v2 |
http://arxiv.org/pdf/1605.02266v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-and-low-rank-representation-for-fast |
Repo | https://github.com/miliadis/FIRC |
Framework | none |
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
Title | SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient |
Authors | Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu |
Abstract | As a new way of training generative models, Generative Adversarial Nets (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines. |
Tasks | Text Generation |
Published | 2016-09-18 |
URL | http://arxiv.org/abs/1609.05473v6 |
http://arxiv.org/pdf/1609.05473v6.pdf | |
PWC | https://paperswithcode.com/paper/seqgan-sequence-generative-adversarial-nets |
Repo | https://github.com/L0SG/seqgan-music |
Framework | tf |
Boosting Joint Models for Longitudinal and Time-to-Event Data
Title | Boosting Joint Models for Longitudinal and Time-to-Event Data |
Authors | Elisabeth Waldmann, David Taylor-Robinson, Nadja Klein, Thomas Kneib, Tania Pressler, Matthias Schmid, Andreas Mayr |
Abstract | Joint Models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique to approach common a data structure in clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by independent modelling. Commonly, joint models are estimated in likelihood based expectation maximization or Bayesian approaches using frameworks where variable selection is problematic and which do not immediately work for high-dimensional data. In this paper, we propose a boosting algorithm tackling these challenges by being able to simultaneously estimate predictors for joint models and automatically select the most influential variables even in high-dimensional data situations. We analyse the performance of the new algorithm in a simulation study and apply it to the Danish cystic fibrosis registry which collects longitudinal lung function data on patients with cystic fibrosis together with data regarding the onset of pulmonary infections. This is the first approach to combine state-of-the art algorithms from the field of machine-learning with the model class of joint models, providing a fully data-driven mechanism to select variables and predictor effects in a unified framework of boosting joint models. |
Tasks | |
Published | 2016-09-09 |
URL | http://arxiv.org/abs/1609.02686v2 |
http://arxiv.org/pdf/1609.02686v2.pdf | |
PWC | https://paperswithcode.com/paper/boosting-joint-models-for-longitudinal-and |
Repo | https://github.com/mayrandy/JMboost |
Framework | none |
Interpreting Neural Networks to Improve Politeness Comprehension
Title | Interpreting Neural Networks to Improve Politeness Comprehension |
Authors | Malika Aubakirova, Mohit Bansal |
Abstract | We present an interpretable neural network approach to predicting and understanding politeness in natural language requests. Our models are based on simple convolutional neural networks directly on raw text, avoiding any manual identification of complex sentiment or syntactic features, while performing better than such feature-based models from previous work. More importantly, we use the challenging task of politeness prediction as a testbed to next present a much-needed understanding of what these successful networks are actually learning. For this, we present several network visualizations based on activation clusters, first derivative saliency, and embedding space transformations, helping us automatically identify several subtle linguistics markers of politeness theories. Further, this analysis reveals multiple novel, high-scoring politeness strategies which, when added back as new features, reduce the accuracy gap between the original featurized system and the neural model, thus providing a clear quantitative interpretation of the success of these neural networks. |
Tasks | |
Published | 2016-10-09 |
URL | http://arxiv.org/abs/1610.02683v1 |
http://arxiv.org/pdf/1610.02683v1.pdf | |
PWC | https://paperswithcode.com/paper/interpreting-neural-networks-to-improve |
Repo | https://github.com/swkarlekar/summaries |
Framework | tf |
Doubly Stochastic Neighbor Embedding on Spheres
Title | Doubly Stochastic Neighbor Embedding on Spheres |
Authors | Yao Lu, Jukka Corander, Zhirong Yang |
Abstract | Stochastic Neighbor Embedding (SNE) methods minimize the divergence between the similarity matrix of a high-dimensional data set and its counterpart from a low-dimensional embedding, leading to widely applied tools for data visualization. Despite their popularity, the current SNE methods experience a crowding problem when the data include highly imbalanced similarities. This implies that the data points with higher total similarity tend to get crowded around the display center. To solve this problem, we introduce a fast normalization method and normalize the similarity matrix to be doubly stochastic such that all the data points have equal total similarities. Furthermore, we show empirically and theoretically that the doubly stochasticity constraint often leads to embeddings which are approximately spherical. This suggests replacing a flat space with spheres as the embedding space. The spherical embedding eliminates the discrepancy between the center and the periphery in visualization, which efficiently resolves the crowding problem. We compared the proposed method (DOSNES) with the state-of-the-art SNE method on three real-world datasets and the results clearly indicate that our method is more favorable in terms of visualization quality. |
Tasks | |
Published | 2016-09-07 |
URL | http://arxiv.org/abs/1609.01977v2 |
http://arxiv.org/pdf/1609.01977v2.pdf | |
PWC | https://paperswithcode.com/paper/doubly-stochastic-neighbor-embedding-on |
Repo | https://github.com/yaolubrain/DOSNES |
Framework | none |
Cross-lingual Models of Word Embeddings: An Empirical Comparison
Title | Cross-lingual Models of Word Embeddings: An Empirical Comparison |
Authors | Shyam Upadhyay, Manaal Faruqui, Chris Dyer, Dan Roth |
Abstract | Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typographically different language pairs. Our evaluation setup spans four different tasks, including intrinsic evaluation on mono-lingual and cross-lingual similarity, and extrinsic evaluation on downstream semantic and syntactic applications. We show that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks. |
Tasks | Word Embeddings |
Published | 2016-04-01 |
URL | http://arxiv.org/abs/1604.00425v2 |
http://arxiv.org/pdf/1604.00425v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-models-of-word-embeddings-an |
Repo | https://github.com/shyamupa/biling-survey |
Framework | none |
Deep Reinforcement Learning for Dialogue Generation
Title | Deep Reinforcement Learning for Dialogue Generation |
Authors | Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, Dan Jurafsky |
Abstract | Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes. Modeling the future direction of a dialogue is crucial to generating coherent, interesting dialogues, a need which led traditional NLP models of dialogue to draw on reinforcement learning. In this paper, we show how to integrate these goals, applying deep reinforcement learning to model future reward in chatbot dialogue. The model simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity (non-repetitive turns), coherence, and ease of answering (related to forward-looking function). We evaluate our model on diversity, length as well as with human judges, showing that the proposed algorithm generates more interactive responses and manages to foster a more sustained conversation in dialogue simulation. This work marks a first step towards learning a neural conversational model based on the long-term success of dialogues. |
Tasks | Chatbot, Dialogue Generation, Policy Gradient Methods |
Published | 2016-06-05 |
URL | http://arxiv.org/abs/1606.01541v4 |
http://arxiv.org/pdf/1606.01541v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-dialogue |
Repo | https://github.com/tfolkman/deep-learning-experiments |
Framework | none |
Reweighting with Boosted Decision Trees
Title | Reweighting with Boosted Decision Trees |
Authors | A. Rogozhnikov |
Abstract | Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers. In most cases, these are classification models used to select the “signal” events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting - assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed. |
Tasks | |
Published | 2016-08-20 |
URL | http://arxiv.org/abs/1608.05806v1 |
http://arxiv.org/pdf/1608.05806v1.pdf | |
PWC | https://paperswithcode.com/paper/reweighting-with-boosted-decision-trees |
Repo | https://github.com/philippgadow/reweight_samples |
Framework | none |
Fuzzy Bayesian Learning
Title | Fuzzy Bayesian Learning |
Authors | Indranil Pan, Dirk Bester |
Abstract | In this paper we propose a novel approach for learning from data using rule based fuzzy inference systems where the model parameters are estimated using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the applicability of the method for regression and classification tasks using synthetic data-sets and also a real world example in the financial services industry. Then we demonstrate how the method can be extended for knowledge extraction to select the individual rules in a Bayesian way which best explains the given data. Finally we discuss the advantages and pitfalls of using this method over state-of-the-art techniques and highlight the specific class of problems where this would be useful. |
Tasks | Bayesian Inference |
Published | 2016-10-28 |
URL | http://arxiv.org/abs/1610.09156v2 |
http://arxiv.org/pdf/1610.09156v2.pdf | |
PWC | https://paperswithcode.com/paper/fuzzy-bayesian-learning |
Repo | https://github.com/SciemusGithub/FBL |
Framework | none |
Object Contour Detection with a Fully Convolutional Encoder-Decoder Network
Title | Object Contour Detection with a Fully Convolutional Encoder-Decoder Network |
Authors | Jimei Yang, Brian Price, Scott Cohen, Honglak Lee, Ming-Hsuan Yang |
Abstract | We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network. Different from previous low-level edge detection, our algorithm focuses on detecting higher-level object contours. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection than previous methods. We find that the learned model generalizes well to unseen object classes from the same super-categories on MS COCO and can match state-of-the-art edge detection on BSDS500 with fine-tuning. By combining with the multiscale combinatorial grouping algorithm, our method can generate high-quality segmented object proposals, which significantly advance the state-of-the-art on PASCAL VOC (improving average recall from 0.62 to 0.67) with a relatively small amount of candidates ($\sim$1660 per image). |
Tasks | Contour Detection, Edge Detection |
Published | 2016-03-15 |
URL | http://arxiv.org/abs/1603.04530v1 |
http://arxiv.org/pdf/1603.04530v1.pdf | |
PWC | https://paperswithcode.com/paper/object-contour-detection-with-a-fully |
Repo | https://github.com/Raj-08/tensorflow-object-contour-detection |
Framework | tf |
Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis
Title | Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis |
Authors | Alessio Benavoli, Giorgio Corani, Janez Demsar, Marco Zaffalon |
Abstract | The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for it. |
Tasks | |
Published | 2016-06-14 |
URL | http://arxiv.org/abs/1606.04316v3 |
http://arxiv.org/pdf/1606.04316v3.pdf | |
PWC | https://paperswithcode.com/paper/time-for-a-change-a-tutorial-for-comparing |
Repo | https://github.com/BayesianTestsML/tutorial |
Framework | none |
Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation
Title | Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation |
Authors | Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, Pascal Fua |
Abstract | Most recent approaches to monocular 3D human pose estimation rely on Deep Learning. They typically involve regressing from an image to either 3D joint coordinates directly or 2D joint locations from which 3D coordinates are inferred. Both approaches have their strengths and weaknesses and we therefore propose a novel architecture designed to deliver the best of both worlds by performing both simultaneously and fusing the information along the way. At the heart of our framework is a trainable fusion scheme that learns how to fuse the information optimally instead of being hand-designed. This yields significant improvements upon the state-of-the-art on standard 3D human pose estimation benchmarks. |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2016-11-17 |
URL | http://arxiv.org/abs/1611.05708v3 |
http://arxiv.org/pdf/1611.05708v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-fuse-2d-and-3d-image-cues-for |
Repo | https://github.com/romanus/code_tekinetal_iccv17 |
Framework | none |
Cryptocurrency Portfolio Management with Deep Reinforcement Learning
Title | Cryptocurrency Portfolio Management with Deep Reinforcement Learning |
Authors | Zhengyao Jiang, Jinjun Liang |
Abstract | Portfolio management is the decision-making process of allocating an amount of fund into different financial investment products. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. This paper presents a model-less convolutional neural network with historic prices of a set of financial assets as its input, outputting portfolio weights of the set. The network is trained with 0.7 years’ price data from a cryptocurrency exchange. The training is done in a reinforcement manner, maximizing the accumulative return, which is regarded as the reward function of the network. Backtest trading experiments with trading period of 30 minutes is conducted in the same market, achieving 10-fold returns in 1.8 months’ periods. Some recently published portfolio selection strategies are also used to perform the same back-tests, whose results are compared with the neural network. The network is not limited to cryptocurrency, but can be applied to any other financial markets. |
Tasks | Decision Making |
Published | 2016-12-05 |
URL | http://arxiv.org/abs/1612.01277v5 |
http://arxiv.org/pdf/1612.01277v5.pdf | |
PWC | https://paperswithcode.com/paper/cryptocurrency-portfolio-management-with-deep |
Repo | https://github.com/edwardwardward/crypto_ml |
Framework | tf |