Paper Group AWR 52
On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers. FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras. Indirect Supervision for Relation Extraction using Question-Answer Pairs. Faster independent component analysis by preconditioning with Hessian approximations. DeepAR: Probabilisti …
On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers
Title | On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers |
Authors | Nayyar A. Zaidi, Yang Du, Geoffrey I. Webb |
Abstract | Learning algorithms that learn linear models often have high representation bias on real-world problems. In this paper, we show that this representation bias can be greatly reduced by discretization. Discretization is a common procedure in machine learning that is used to convert a quantitative attribute into a qualitative one. It is often motivated by the limitation of some learners to qualitative data. Discretization loses information, as fewer distinctions between instances are possible using discretized data relative to undiscretized data. In consequence, where discretization is not essential, it might appear desirable to avoid it. However, it has been shown that discretization often substantially reduces the error of the linear generative Bayesian classifier naive Bayes. This motivates a systematic study of the effectiveness of discretizing quantitative attributes for other linear classifiers. In this work, we study the effect of discretization on the performance of linear classifiers optimizing three distinct discriminative objective functions — logistic regression (optimizing negative log-likelihood), support vector classifiers (optimizing hinge loss) and a zero-hidden layer artificial neural network (optimizing mean-square-error). We show that discretization can greatly increase the accuracy of these linear discriminative learners by reducing their representation bias, especially on big datasets. We substantiate our claims with an empirical study on $42$ benchmark datasets. |
Tasks | |
Published | 2017-01-24 |
URL | http://arxiv.org/abs/1701.07114v1 |
http://arxiv.org/pdf/1701.07114v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-effectiveness-of-discretizing |
Repo | https://github.com/vedic-partap/Discretization |
Framework | none |
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
Title | FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras |
Authors | Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura |
Abstract | In this paper, we develop deep spatio-temporal neural networks to sequentially count vehicles from low quality videos captured by city cameras (citycams). Citycam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To overcome limitations of existing methods and incorporate the temporal information of traffic video, we design a novel FCN-rLSTM network to jointly estimate vehicle density and vehicle count by connecting fully convolutional neural networks (FCN) with long short term memory networks (LSTM) in a residual learning fashion. Such design leverages the strengths of FCN for pixel-level prediction and the strengths of LSTM for learning complex temporal dynamics. The residual learning connection reformulates the vehicle count regression as learning residual functions with reference to the sum of densities in each frame, which significantly accelerates the training of networks. To preserve feature map resolution, we propose a Hyper-Atrous combination to integrate atrous convolution in FCN and combine feature maps of different convolution layers. FCN-rLSTM enables refined feature representation and a novel end-to-end trainable mapping from pixels to vehicle count. We extensively evaluated the proposed method on different counting tasks with three datasets, with experimental results demonstrating their effectiveness and robustness. In particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21 on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process is accelerated by 5 times on average. |
Tasks | |
Published | 2017-07-29 |
URL | http://arxiv.org/abs/1707.09476v2 |
http://arxiv.org/pdf/1707.09476v2.pdf | |
PWC | https://paperswithcode.com/paper/fcn-rlstm-deep-spatio-temporal-neural |
Repo | https://github.com/dpernes/FCN-rLSTM |
Framework | pytorch |
Indirect Supervision for Relation Extraction using Question-Answer Pairs
Title | Indirect Supervision for Relation Extraction using Question-Answer Pairs |
Authors | Zeqiu Wu, Xiang Ren, Frank F. Xu, Ji Li, Jiawei Han |
Abstract | Automatic relation extraction (RE) for types of interest is of great importance for interpreting massive text corpora in an efficient manner. Traditional RE models have heavily relied on human-annotated corpus for training, which can be costly in generating labeled data and become obstacles when dealing with more relation types. Thus, more RE extraction systems have shifted to be built upon training data automatically acquired by linking to knowledge bases (distant supervision). However, due to the incompleteness of knowledge bases and the context-agnostic labeling, the training data collected via distant supervision (DS) can be very noisy. In recent years, as increasing attention has been brought to tackling question-answering (QA) tasks, user feedback or datasets of such tasks become more accessible. In this paper, we propose a novel framework, ReQuest, to leverage question-answer pairs as an indirect source of supervision for relation extraction, and study how to use such supervision to reduce noise induced from DS. Our model jointly embeds relation mentions, types, QA entity mention pairs and text features in two low-dimensional spaces (RE and QA), where objects with same relation types or semantically similar question-answer pairs have similar representations. Shared features connect these two spaces, carrying clearer semantic knowledge from both sources. ReQuest, then use these learned embeddings to estimate the types of test relation mentions. We formulate a global objective function and adopt a novel margin-based QA loss to reduce noise in DS by exploiting semantic evidence from the QA dataset. Our experimental results achieve an average of 11% improvement in F1 score on two public RE datasets combined with TREC QA dataset. |
Tasks | Question Answering, Relation Extraction |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.11169v2 |
http://arxiv.org/pdf/1710.11169v2.pdf | |
PWC | https://paperswithcode.com/paper/indirect-supervision-for-relation-extraction |
Repo | https://github.com/ellenmellon/ReQuest |
Framework | none |
Faster independent component analysis by preconditioning with Hessian approximations
Title | Faster independent component analysis by preconditioning with Hessian approximations |
Authors | Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort |
Abstract | Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data that is widely used in observational sciences. In its classic form, ICA relies on modeling the data as linear mixtures of non-Gaussian independent sources. The maximization of the corresponding likelihood is a challenging problem if it has to be completed quickly and accurately on large sets of real data. We introduce the Preconditioned ICA for Real Data (Picard) algorithm, which is a relative L-BFGS algorithm preconditioned with sparse Hessian approximations. Extensive numerical comparisons to several algorithms of the same class demonstrate the superior performance of the proposed technique, especially on real data, for which the ICA model does not necessarily hold. |
Tasks | |
Published | 2017-06-25 |
URL | http://arxiv.org/abs/1706.08171v3 |
http://arxiv.org/pdf/1706.08171v3.pdf | |
PWC | https://paperswithcode.com/paper/faster-independent-component-analysis-by |
Repo | https://github.com/pierreablin/picard |
Framework | none |
DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks
Title | DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks |
Authors | David Salinas, Valentin Flunkert, Jan Gasthaus |
Abstract | Probabilistic forecasting, i.e. estimating the probability distribution of a time series’ future given its past, is a key enabler for optimizing business processes. In retail businesses, for example, forecasting demand is crucial for having the right inventory available at the right time at the right place. In this paper we propose DeepAR, a methodology for producing accurate probabilistic forecasts, based on training an auto regressive recurrent network model on a large number of related time series. We demonstrate how by applying deep learning techniques to forecasting, one can overcome many of the challenges faced by widely-used classical approaches to the problem. We show through extensive empirical evaluation on several real-world forecasting data sets accuracy improvements of around 15% compared to state-of-the-art methods. |
Tasks | Time Series |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.04110v3 |
http://arxiv.org/pdf/1704.04110v3.pdf | |
PWC | https://paperswithcode.com/paper/deepar-probabilistic-forecasting-with |
Repo | https://github.com/husnejahan/DeepAR-pytorch |
Framework | pytorch |
Deep Asymmetric Multi-task Feature Learning
Title | Deep Asymmetric Multi-task Feature Learning |
Authors | Hae Beom Lee, Eunho Yang, Sung Ju Hwang |
Abstract | We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows reliable predictors for the easy tasks to have high contribution to the feature learning while suppressing the influences of unreliable predictors for more difficult tasks. This allows the learning of less noisy representations, and enables unreliable predictors to exploit knowledge from the reliable predictors via the shared latent features. Such asymmetric knowledge transfer through shared features is also more scalable and efficient than inter-task asymmetric transfer. We validate our Deep-AMTFL model on multiple benchmark datasets for multitask learning and image classification, on which it significantly outperforms existing symmetric and asymmetric multitask learning models, by effectively preventing negative transfer in deep feature learning. |
Tasks | Image Classification, Transfer Learning |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00260v3 |
http://arxiv.org/pdf/1708.00260v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-asymmetric-multi-task-feature-learning |
Repo | https://github.com/haebeom-lee/amtfl |
Framework | tf |
Variational Inference based on Robust Divergences
Title | Variational Inference based on Robust Divergences |
Authors | Futoshi Futami, Issei Sato, Masashi Sugiyama |
Abstract | Robustness to outliers is a central issue in real-world machine learning applications. While replacing a model to a heavy-tailed one (e.g., from Gaussian to Student-t) is a standard approach for robustification, it can only be applied to simple models. In this paper, based on Zellner’s optimization and variational formulation of Bayesian inference, we propose an outlier-robust pseudo-Bayesian variational method by replacing the Kullback-Leibler divergence used for data fitting to a robust divergence such as the beta- and gamma-divergences. An advantage of our approach is that superior but complex models such as deep networks can also be handled. We theoretically prove that, for deep networks with ReLU activation functions, the \emph{influence function} in our proposed method is bounded, while it is unbounded in the ordinary variational inference. This implies that our proposed method is robust to both of input and output outliers, while the ordinary variational method is not. We experimentally demonstrate that our robust variational method outperforms ordinary variational inference in regression and classification with deep networks. |
Tasks | Bayesian Inference |
Published | 2017-10-18 |
URL | http://arxiv.org/abs/1710.06595v2 |
http://arxiv.org/pdf/1710.06595v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-inference-based-on-robust |
Repo | https://github.com/futoshi-futami/Robust_VI |
Framework | tf |
Learning Structural Weight Uncertainty for Sequential Decision-Making
Title | Learning Structural Weight Uncertainty for Sequential Decision-Making |
Authors | Ruiyi Zhang, Chunyuan Li, Changyou Chen, Lawrence Carin |
Abstract | Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications. Bayesian methods, such as Stein variational gradient descent (SVGD), offer an elegant framework to reason about NN model uncertainty. However, by assuming independent Gaussian priors for the individual NN weights (as often applied), SVGD does not impose prior knowledge that there is often structural information (dependence) among weights. We propose efficient posterior learning of structural weight uncertainty, within an SVGD framework, by employing matrix variate Gaussian priors on NN parameters. We further investigate the learned structural uncertainty in sequential decision-making problems, including contextual bandits and reinforcement learning. Experiments on several synthetic and real datasets indicate the superiority of our model, compared with state-of-the-art methods. |
Tasks | Decision Making, Multi-Armed Bandits |
Published | 2017-12-30 |
URL | http://arxiv.org/abs/1801.00085v2 |
http://arxiv.org/pdf/1801.00085v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-structural-weight-uncertainty-for |
Repo | https://github.com/zhangry868/S2VGD |
Framework | none |
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Title | Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments |
Authors | Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch |
Abstract | We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies. |
Tasks | Multi-agent Reinforcement Learning, Q-Learning |
Published | 2017-06-07 |
URL | https://arxiv.org/abs/1706.02275v4 |
https://arxiv.org/pdf/1706.02275v4.pdf | |
PWC | https://paperswithcode.com/paper/multi-agent-actor-critic-for-mixed |
Repo | https://github.com/google/maddpg-replication |
Framework | tf |
Online Deep Learning: Learning Deep Neural Networks on the Fly
Title | Online Deep Learning: Learning Deep Neural Networks on the Fly |
Authors | Doyen Sahoo, Quang Pham, Jing Lu, Steven C. H. Hoi |
Abstract | Deep Neural Networks (DNNs) are typically trained by backpropagation in a batch learning setting, which requires the entire training data to be made available prior to the learning task. This is not scalable for many real-world scenarios where new data arrives sequentially in a stream form. We aim to address an open challenge of “Online Deep Learning” (ODL) for learning DNNs on the fly in an online setting. Unlike traditional online learning that often optimizes some convex objective function with respect to a shallow model (e.g., a linear/kernel-based hypothesis), ODL is significantly more challenging since the optimization of the DNN objective function is non-convex, and regular backpropagation does not work well in practice, especially for online learning settings. In this paper, we present a new online deep learning framework that attempts to tackle the challenges by learning DNN models of adaptive depth from a sequence of training data in an online learning setting. In particular, we propose a novel Hedge Backpropagation (HBP) method for online updating the parameters of DNN effectively, and validate the efficacy of our method on large-scale data sets, including both stationary and concept drifting scenarios. |
Tasks | |
Published | 2017-11-10 |
URL | http://arxiv.org/abs/1711.03705v1 |
http://arxiv.org/pdf/1711.03705v1.pdf | |
PWC | https://paperswithcode.com/paper/online-deep-learning-learning-deep-neural |
Repo | https://github.com/LIBOL/ODL |
Framework | pytorch |
Machine Learning on Sequential Data Using a Recurrent Weighted Average
Title | Machine Learning on Sequential Data Using a Recurrent Weighted Average |
Authors | Jared Ostmeyer, Lindsay Cowell |
Abstract | Recurrent Neural Networks (RNN) are a type of statistical model designed to handle sequential data. The model reads a sequence one symbol at a time. Each symbol is processed based on information collected from the previous symbols. With existing RNN architectures, each symbol is processed using only information from the previous processing step. To overcome this limitation, we propose a new kind of RNN model that computes a recurrent weighted average (RWA) over every past processing step. Because the RWA can be computed as a running average, the computational overhead scales like that of any other RNN architecture. The approach essentially reformulates the attention mechanism into a stand-alone model. The performance of the RWA model is assessed on the variable copy problem, the adding problem, classification of artificial grammar, classification of sequences by length, and classification of the MNIST images (where the pixels are read sequentially one at a time). On almost every task, the RWA model is found to outperform a standard LSTM model. |
Tasks | |
Published | 2017-03-03 |
URL | http://arxiv.org/abs/1703.01253v5 |
http://arxiv.org/pdf/1703.01253v5.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-on-sequential-data-using-a |
Repo | https://github.com/jostmey/rwa |
Framework | tf |
Exploring the Landscape of Spatial Robustness
Title | Exploring the Landscape of Spatial Robustness |
Authors | Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, Aleksander Madry |
Abstract | The study of adversarial robustness has so far largely focused on perturbations bound in p-norms. However, state-of-the-art models turn out to be also vulnerable to other, more natural classes of perturbations such as translations and rotations. In this work, we thoroughly investigate the vulnerability of neural network–based classifiers to rotations and translations. While data augmentation offers relatively small robustness, we use ideas from robust optimization and test-time input aggregation to significantly improve robustness. Finally we find that, in contrast to the p-norm case, first-order methods cannot reliably find worst-case perturbations. This highlights spatial robustness as a fundamentally different setting requiring additional study. Code available at https://github.com/MadryLab/adversarial_spatial and https://github.com/MadryLab/spatial-pytorch. |
Tasks | Data Augmentation |
Published | 2017-12-07 |
URL | https://arxiv.org/abs/1712.02779v4 |
https://arxiv.org/pdf/1712.02779v4.pdf | |
PWC | https://paperswithcode.com/paper/a-rotation-and-a-translation-suffice-fooling |
Repo | https://github.com/MadryLab/spatial-pytorch |
Framework | pytorch |
Dynamic Word Embeddings for Evolving Semantic Discovery
Title | Dynamic Word Embeddings for Evolving Semantic Discovery |
Authors | Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, Hui Xiong |
Abstract | Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human history. However, traditional techniques such as word representation learning do not adequately capture the evolving language structure and vocabulary. In this paper, we develop a dynamic statistical model to learn time-aware word vector representation. We propose a model that simultaneously learns time-aware embeddings and solves the resulting “alignment problem”. This model is trained on a crawled NYTimes dataset. Additionally, we develop multiple intuitive evaluation strategies of temporal word embeddings. Our qualitative and quantitative tests indicate that our method not only reliably captures this evolution over time, but also consistently outperforms state-of-the-art temporal embedding approaches on both semantic accuracy and alignment quality. |
Tasks | Representation Learning, Word Embeddings |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00607v2 |
http://arxiv.org/pdf/1703.00607v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-word-embeddings-for-evolving-semantic |
Repo | https://github.com/yifan0sun/DynamicWord2Vec |
Framework | none |
Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks
Title | Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks |
Authors | Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang |
Abstract | Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios. In most situations, the source and the target speakers do not repeat the same texts or they may even speak different languages. In this case, one possible, although indirect, solution is to build a generative model for speech. Generative models focus on explaining the observations with latent variables instead of learning a pairwise transformation function, thereby bypassing the requirement of speech frame alignment. In this paper, we propose a non-parallel VC framework with a variational autoencoding Wasserstein generative adversarial network (VAW-GAN) that explicitly considers a VC objective when building the speech model. Experimental results corroborate the capability of our framework for building a VC system from unaligned data, and demonstrate improved conversion quality. |
Tasks | Voice Conversion |
Published | 2017-04-04 |
URL | http://arxiv.org/abs/1704.00849v3 |
http://arxiv.org/pdf/1704.00849v3.pdf | |
PWC | https://paperswithcode.com/paper/voice-conversion-from-unaligned-corpora-using |
Repo | https://github.com/JeremyCCHsu/vae-npvc |
Framework | tf |
Near Optimal Behavior via Approximate State Abstraction
Title | Near Optimal Behavior via Approximate State Abstraction |
Authors | David Abel, D. Ellis Hershkowitz, Michael L. Littman |
Abstract | The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opportunities for abstraction in environments where no two situations are exactly alike. In this work, we investigate approximate state abstractions, which treat nearly-identical situations as equivalent. We present theoretical guarantees of the quality of behaviors derived from four types of approximate abstractions. Additionally, we empirically demonstrate that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments. |
Tasks | |
Published | 2017-01-15 |
URL | http://arxiv.org/abs/1701.04113v1 |
http://arxiv.org/pdf/1701.04113v1.pdf | |
PWC | https://paperswithcode.com/paper/near-optimal-behavior-via-approximate-state |
Repo | https://github.com/david-abel/state_abstraction |
Framework | none |