July 30, 2019

3022 words 15 mins read

Paper Group AWR 52

On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers. FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras. Indirect Supervision for Relation Extraction using Question-Answer Pairs. Faster independent component analysis by preconditioning with Hessian approximations. DeepAR: Probabilisti …

On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers


Title	On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers
Authors	Nayyar A. Zaidi, Yang Du, Geoffrey I. Webb
Abstract	Learning algorithms that learn linear models often have high representation bias on real-world problems. In this paper, we show that this representation bias can be greatly reduced by discretization. Discretization is a common procedure in machine learning that is used to convert a quantitative attribute into a qualitative one. It is often motivated by the limitation of some learners to qualitative data. Discretization loses information, as fewer distinctions between instances are possible using discretized data relative to undiscretized data. In consequence, where discretization is not essential, it might appear desirable to avoid it. However, it has been shown that discretization often substantially reduces the error of the linear generative Bayesian classifier naive Bayes. This motivates a systematic study of the effectiveness of discretizing quantitative attributes for other linear classifiers. In this work, we study the effect of discretization on the performance of linear classifiers optimizing three distinct discriminative objective functions — logistic regression (optimizing negative log-likelihood), support vector classifiers (optimizing hinge loss) and a zero-hidden layer artificial neural network (optimizing mean-square-error). We show that discretization can greatly increase the accuracy of these linear discriminative learners by reducing their representation bias, especially on big datasets. We substantiate our claims with an empirical study on $42$ benchmark datasets.
Tasks
Published	2017-01-24
URL	http://arxiv.org/abs/1701.07114v1
PDF	http://arxiv.org/pdf/1701.07114v1.pdf
PWC	https://paperswithcode.com/paper/on-the-effectiveness-of-discretizing
Repo	https://github.com/vedic-partap/Discretization
Framework	none

FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras


Title	FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
Authors	Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura
Abstract	In this paper, we develop deep spatio-temporal neural networks to sequentially count vehicles from low quality videos captured by city cameras (citycams). Citycam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To overcome limitations of existing methods and incorporate the temporal information of traffic video, we design a novel FCN-rLSTM network to jointly estimate vehicle density and vehicle count by connecting fully convolutional neural networks (FCN) with long short term memory networks (LSTM) in a residual learning fashion. Such design leverages the strengths of FCN for pixel-level prediction and the strengths of LSTM for learning complex temporal dynamics. The residual learning connection reformulates the vehicle count regression as learning residual functions with reference to the sum of densities in each frame, which significantly accelerates the training of networks. To preserve feature map resolution, we propose a Hyper-Atrous combination to integrate atrous convolution in FCN and combine feature maps of different convolution layers. FCN-rLSTM enables refined feature representation and a novel end-to-end trainable mapping from pixels to vehicle count. We extensively evaluated the proposed method on different counting tasks with three datasets, with experimental results demonstrating their effectiveness and robustness. In particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21 on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process is accelerated by 5 times on average.
Tasks
Published	2017-07-29
URL	http://arxiv.org/abs/1707.09476v2
PDF	http://arxiv.org/pdf/1707.09476v2.pdf
PWC	https://paperswithcode.com/paper/fcn-rlstm-deep-spatio-temporal-neural
Repo	https://github.com/dpernes/FCN-rLSTM
Framework	pytorch

Indirect Supervision for Relation Extraction using Question-Answer Pairs


Title	Indirect Supervision for Relation Extraction using Question-Answer Pairs
Authors	Zeqiu Wu, Xiang Ren, Frank F. Xu, Ji Li, Jiawei Han
Abstract	Automatic relation extraction (RE) for types of interest is of great importance for interpreting massive text corpora in an efficient manner. Traditional RE models have heavily relied on human-annotated corpus for training, which can be costly in generating labeled data and become obstacles when dealing with more relation types. Thus, more RE extraction systems have shifted to be built upon training data automatically acquired by linking to knowledge bases (distant supervision). However, due to the incompleteness of knowledge bases and the context-agnostic labeling, the training data collected via distant supervision (DS) can be very noisy. In recent years, as increasing attention has been brought to tackling question-answering (QA) tasks, user feedback or datasets of such tasks become more accessible. In this paper, we propose a novel framework, ReQuest, to leverage question-answer pairs as an indirect source of supervision for relation extraction, and study how to use such supervision to reduce noise induced from DS. Our model jointly embeds relation mentions, types, QA entity mention pairs and text features in two low-dimensional spaces (RE and QA), where objects with same relation types or semantically similar question-answer pairs have similar representations. Shared features connect these two spaces, carrying clearer semantic knowledge from both sources. ReQuest, then use these learned embeddings to estimate the types of test relation mentions. We formulate a global objective function and adopt a novel margin-based QA loss to reduce noise in DS by exploiting semantic evidence from the QA dataset. Our experimental results achieve an average of 11% improvement in F1 score on two public RE datasets combined with TREC QA dataset.
Tasks	Question Answering, Relation Extraction
Published	2017-10-30
URL	http://arxiv.org/abs/1710.11169v2
PDF	http://arxiv.org/pdf/1710.11169v2.pdf
PWC	https://paperswithcode.com/paper/indirect-supervision-for-relation-extraction
Repo	https://github.com/ellenmellon/ReQuest
Framework	none

Faster independent component analysis by preconditioning with Hessian approximations


Title	Faster independent component analysis by preconditioning with Hessian approximations
Authors	Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
Abstract	Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data that is widely used in observational sciences. In its classic form, ICA relies on modeling the data as linear mixtures of non-Gaussian independent sources. The maximization of the corresponding likelihood is a challenging problem if it has to be completed quickly and accurately on large sets of real data. We introduce the Preconditioned ICA for Real Data (Picard) algorithm, which is a relative L-BFGS algorithm preconditioned with sparse Hessian approximations. Extensive numerical comparisons to several algorithms of the same class demonstrate the superior performance of the proposed technique, especially on real data, for which the ICA model does not necessarily hold.
Tasks
Published	2017-06-25
URL	http://arxiv.org/abs/1706.08171v3
PDF	http://arxiv.org/pdf/1706.08171v3.pdf
PWC	https://paperswithcode.com/paper/faster-independent-component-analysis-by
Repo	https://github.com/pierreablin/picard
Framework	none

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks


Title	DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks
Authors	David Salinas, Valentin Flunkert, Jan Gasthaus
Abstract	Probabilistic forecasting, i.e. estimating the probability distribution of a time series’ future given its past, is a key enabler for optimizing business processes. In retail businesses, for example, forecasting demand is crucial for having the right inventory available at the right time at the right place. In this paper we propose DeepAR, a methodology for producing accurate probabilistic forecasts, based on training an auto regressive recurrent network model on a large number of related time series. We demonstrate how by applying deep learning techniques to forecasting, one can overcome many of the challenges faced by widely-used classical approaches to the problem. We show through extensive empirical evaluation on several real-world forecasting data sets accuracy improvements of around 15% compared to state-of-the-art methods.
Tasks	Time Series
Published	2017-04-13
URL	http://arxiv.org/abs/1704.04110v3
PDF	http://arxiv.org/pdf/1704.04110v3.pdf
PWC	https://paperswithcode.com/paper/deepar-probabilistic-forecasting-with
Repo	https://github.com/husnejahan/DeepAR-pytorch
Framework	pytorch

Deep Asymmetric Multi-task Feature Learning


Title	Deep Asymmetric Multi-task Feature Learning
Authors	Hae Beom Lee, Eunho Yang, Sung Ju Hwang
Abstract	We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows reliable predictors for the easy tasks to have high contribution to the feature learning while suppressing the influences of unreliable predictors for more difficult tasks. This allows the learning of less noisy representations, and enables unreliable predictors to exploit knowledge from the reliable predictors via the shared latent features. Such asymmetric knowledge transfer through shared features is also more scalable and efficient than inter-task asymmetric transfer. We validate our Deep-AMTFL model on multiple benchmark datasets for multitask learning and image classification, on which it significantly outperforms existing symmetric and asymmetric multitask learning models, by effectively preventing negative transfer in deep feature learning.
Tasks	Image Classification, Transfer Learning
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00260v3
PDF	http://arxiv.org/pdf/1708.00260v3.pdf
PWC	https://paperswithcode.com/paper/deep-asymmetric-multi-task-feature-learning
Repo	https://github.com/haebeom-lee/amtfl
Framework	tf

Variational Inference based on Robust Divergences


Title	Variational Inference based on Robust Divergences
Authors	Futoshi Futami, Issei Sato, Masashi Sugiyama
Abstract	Robustness to outliers is a central issue in real-world machine learning applications. While replacing a model to a heavy-tailed one (e.g., from Gaussian to Student-t) is a standard approach for robustification, it can only be applied to simple models. In this paper, based on Zellner’s optimization and variational formulation of Bayesian inference, we propose an outlier-robust pseudo-Bayesian variational method by replacing the Kullback-Leibler divergence used for data fitting to a robust divergence such as the beta- and gamma-divergences. An advantage of our approach is that superior but complex models such as deep networks can also be handled. We theoretically prove that, for deep networks with ReLU activation functions, the \emph{influence function} in our proposed method is bounded, while it is unbounded in the ordinary variational inference. This implies that our proposed method is robust to both of input and output outliers, while the ordinary variational method is not. We experimentally demonstrate that our robust variational method outperforms ordinary variational inference in regression and classification with deep networks.
Tasks	Bayesian Inference
Published	2017-10-18
URL	http://arxiv.org/abs/1710.06595v2
PDF	http://arxiv.org/pdf/1710.06595v2.pdf
PWC	https://paperswithcode.com/paper/variational-inference-based-on-robust
Repo	https://github.com/futoshi-futami/Robust_VI
Framework	tf

Learning Structural Weight Uncertainty for Sequential Decision-Making


Title	Learning Structural Weight Uncertainty for Sequential Decision-Making
Authors	Ruiyi Zhang, Chunyuan Li, Changyou Chen, Lawrence Carin
Abstract	Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications. Bayesian methods, such as Stein variational gradient descent (SVGD), offer an elegant framework to reason about NN model uncertainty. However, by assuming independent Gaussian priors for the individual NN weights (as often applied), SVGD does not impose prior knowledge that there is often structural information (dependence) among weights. We propose efficient posterior learning of structural weight uncertainty, within an SVGD framework, by employing matrix variate Gaussian priors on NN parameters. We further investigate the learned structural uncertainty in sequential decision-making problems, including contextual bandits and reinforcement learning. Experiments on several synthetic and real datasets indicate the superiority of our model, compared with state-of-the-art methods.
Tasks	Decision Making, Multi-Armed Bandits
Published	2017-12-30
URL	http://arxiv.org/abs/1801.00085v2
PDF	http://arxiv.org/pdf/1801.00085v2.pdf
PWC	https://paperswithcode.com/paper/learning-structural-weight-uncertainty-for
Repo	https://github.com/zhangry868/S2VGD
Framework	none

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments


Title	Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Authors	Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch
Abstract	We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.
Tasks	Multi-agent Reinforcement Learning, Q-Learning
Published	2017-06-07
URL	https://arxiv.org/abs/1706.02275v4
PDF	https://arxiv.org/pdf/1706.02275v4.pdf
PWC	https://paperswithcode.com/paper/multi-agent-actor-critic-for-mixed
Repo	https://github.com/google/maddpg-replication
Framework	tf

Online Deep Learning: Learning Deep Neural Networks on the Fly


Title	Online Deep Learning: Learning Deep Neural Networks on the Fly
Authors	Doyen Sahoo, Quang Pham, Jing Lu, Steven C. H. Hoi
Abstract	Deep Neural Networks (DNNs) are typically trained by backpropagation in a batch learning setting, which requires the entire training data to be made available prior to the learning task. This is not scalable for many real-world scenarios where new data arrives sequentially in a stream form. We aim to address an open challenge of “Online Deep Learning” (ODL) for learning DNNs on the fly in an online setting. Unlike traditional online learning that often optimizes some convex objective function with respect to a shallow model (e.g., a linear/kernel-based hypothesis), ODL is significantly more challenging since the optimization of the DNN objective function is non-convex, and regular backpropagation does not work well in practice, especially for online learning settings. In this paper, we present a new online deep learning framework that attempts to tackle the challenges by learning DNN models of adaptive depth from a sequence of training data in an online learning setting. In particular, we propose a novel Hedge Backpropagation (HBP) method for online updating the parameters of DNN effectively, and validate the efficacy of our method on large-scale data sets, including both stationary and concept drifting scenarios.
Tasks
Published	2017-11-10
URL	http://arxiv.org/abs/1711.03705v1
PDF	http://arxiv.org/pdf/1711.03705v1.pdf
PWC	https://paperswithcode.com/paper/online-deep-learning-learning-deep-neural
Repo	https://github.com/LIBOL/ODL
Framework	pytorch

Machine Learning on Sequential Data Using a Recurrent Weighted Average


Title	Machine Learning on Sequential Data Using a Recurrent Weighted Average
Authors	Jared Ostmeyer, Lindsay Cowell
Abstract	Recurrent Neural Networks (RNN) are a type of statistical model designed to handle sequential data. The model reads a sequence one symbol at a time. Each symbol is processed based on information collected from the previous symbols. With existing RNN architectures, each symbol is processed using only information from the previous processing step. To overcome this limitation, we propose a new kind of RNN model that computes a recurrent weighted average (RWA) over every past processing step. Because the RWA can be computed as a running average, the computational overhead scales like that of any other RNN architecture. The approach essentially reformulates the attention mechanism into a stand-alone model. The performance of the RWA model is assessed on the variable copy problem, the adding problem, classification of artificial grammar, classification of sequences by length, and classification of the MNIST images (where the pixels are read sequentially one at a time). On almost every task, the RWA model is found to outperform a standard LSTM model.
Tasks
Published	2017-03-03
URL	http://arxiv.org/abs/1703.01253v5
PDF	http://arxiv.org/pdf/1703.01253v5.pdf
PWC	https://paperswithcode.com/paper/machine-learning-on-sequential-data-using-a
Repo	https://github.com/jostmey/rwa
Framework	tf

Exploring the Landscape of Spatial Robustness


Title	Exploring the Landscape of Spatial Robustness
Authors	Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, Aleksander Madry
Abstract	The study of adversarial robustness has so far largely focused on perturbations bound in p-norms. However, state-of-the-art models turn out to be also vulnerable to other, more natural classes of perturbations such as translations and rotations. In this work, we thoroughly investigate the vulnerability of neural network–based classifiers to rotations and translations. While data augmentation offers relatively small robustness, we use ideas from robust optimization and test-time input aggregation to significantly improve robustness. Finally we find that, in contrast to the p-norm case, first-order methods cannot reliably find worst-case perturbations. This highlights spatial robustness as a fundamentally different setting requiring additional study. Code available at https://github.com/MadryLab/adversarial_spatial and https://github.com/MadryLab/spatial-pytorch.
Tasks	Data Augmentation
Published	2017-12-07
URL	https://arxiv.org/abs/1712.02779v4
PDF	https://arxiv.org/pdf/1712.02779v4.pdf
PWC	https://paperswithcode.com/paper/a-rotation-and-a-translation-suffice-fooling
Repo	https://github.com/MadryLab/spatial-pytorch
Framework	pytorch

Dynamic Word Embeddings for Evolving Semantic Discovery


Title	Dynamic Word Embeddings for Evolving Semantic Discovery
Authors	Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, Hui Xiong
Abstract	Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human history. However, traditional techniques such as word representation learning do not adequately capture the evolving language structure and vocabulary. In this paper, we develop a dynamic statistical model to learn time-aware word vector representation. We propose a model that simultaneously learns time-aware embeddings and solves the resulting “alignment problem”. This model is trained on a crawled NYTimes dataset. Additionally, we develop multiple intuitive evaluation strategies of temporal word embeddings. Our qualitative and quantitative tests indicate that our method not only reliably captures this evolution over time, but also consistently outperforms state-of-the-art temporal embedding approaches on both semantic accuracy and alignment quality.
Tasks	Representation Learning, Word Embeddings
Published	2017-03-02
URL	http://arxiv.org/abs/1703.00607v2
PDF	http://arxiv.org/pdf/1703.00607v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-word-embeddings-for-evolving-semantic
Repo	https://github.com/yifan0sun/DynamicWord2Vec
Framework	none

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks


Title	Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks
Authors	Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
Abstract	Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios. In most situations, the source and the target speakers do not repeat the same texts or they may even speak different languages. In this case, one possible, although indirect, solution is to build a generative model for speech. Generative models focus on explaining the observations with latent variables instead of learning a pairwise transformation function, thereby bypassing the requirement of speech frame alignment. In this paper, we propose a non-parallel VC framework with a variational autoencoding Wasserstein generative adversarial network (VAW-GAN) that explicitly considers a VC objective when building the speech model. Experimental results corroborate the capability of our framework for building a VC system from unaligned data, and demonstrate improved conversion quality.
Tasks	Voice Conversion
Published	2017-04-04
URL	http://arxiv.org/abs/1704.00849v3
PDF	http://arxiv.org/pdf/1704.00849v3.pdf
PWC	https://paperswithcode.com/paper/voice-conversion-from-unaligned-corpora-using
Repo	https://github.com/JeremyCCHsu/vae-npvc
Framework	tf

Near Optimal Behavior via Approximate State Abstraction


Title	Near Optimal Behavior via Approximate State Abstraction
Authors	David Abel, D. Ellis Hershkowitz, Michael L. Littman
Abstract	The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opportunities for abstraction in environments where no two situations are exactly alike. In this work, we investigate approximate state abstractions, which treat nearly-identical situations as equivalent. We present theoretical guarantees of the quality of behaviors derived from four types of approximate abstractions. Additionally, we empirically demonstrate that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments.
Tasks
Published	2017-01-15
URL	http://arxiv.org/abs/1701.04113v1
PDF	http://arxiv.org/pdf/1701.04113v1.pdf
PWC	https://paperswithcode.com/paper/near-optimal-behavior-via-approximate-state
Repo	https://github.com/david-abel/state_abstraction
Framework	none