July 27, 2019

3262 words 16 mins read

Paper Group ANR 475

Learning Powers of Poisson Binomial Distributions. Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers. Deep Learning for Video Game Playing. Limiting the Reconstruction Capability of Generative Neural Network using Negative Learning. Deep Episodic Value Iteration for Model-based Meta-Reinforcement Lear …

Learning Powers of Poisson Binomial Distributions


Title	Learning Powers of Poisson Binomial Distributions
Authors	Dimitris Fotakis, Vasilis Kontonis, Piotr Krysta, Paul Spirakis
Abstract	We introduce the problem of simultaneously learning all powers of a Poisson Binomial Distribution (PBD). A PBD of order $n$ is the distribution of a sum of $n$ mutually independent Bernoulli random variables $X_i$, where $\mathbb{E}[X_i] = p_i$. The $k$'th power of this distribution, for $k$ in a range $[m]$, is the distribution of $P_k = \sum_{i=1}^n X_i^{(k)}$, where each Bernoulli random variable $X_i^{(k)}$ has $\mathbb{E}[X_i^{(k)}] = (p_i)^k$. The learning algorithm can query any power $P_k$ several times and succeeds in learning all powers in the range, if with probability at least $1- \delta$: given any $k \in [m]$, it returns a probability distribution $Q_k$ with total variation distance from $P_k$ at most $\epsilon$. We provide almost matching lower and upper bounds on query complexity for this problem. We first show a lower bound on the query complexity on PBD powers instances with many distinct parameters $p_i$ which are separated, and we almost match this lower bound by examining the query complexity of simultaneously learning all the powers of a special class of PBD’s resembling the PBD’s of our lower bound. We study the fundamental setting of a Binomial distribution, and provide an optimal algorithm which uses $O(1/\epsilon^2)$ samples. Diakonikolas, Kane and Stewart [COLT’16] showed a lower bound of $\Omega(2^{1/\epsilon})$ samples to learn the $p_i$'s within error $\epsilon$. The question whether sampling from powers of PBDs can reduce this sampling complexity, has a negative answer since we show that the exponential number of samples is inevitable. Having sampling access to the powers of a PBD we then give a nearly optimal algorithm that learns its $p_i$'s. To prove our two last lower bounds we extend the classical minimax risk definition from statistics to estimating functions of sequences of distributions.
Tasks
Published	2017-07-18
URL	http://arxiv.org/abs/1707.05662v1
PDF	http://arxiv.org/pdf/1707.05662v1.pdf
PWC	https://paperswithcode.com/paper/learning-powers-of-poisson-binomial
Repo
Framework

Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers


Title	Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers
Authors	Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, Ian Sturdy
Abstract	In this paper, we present a system that associates faces with voices in a video by fusing information from the audio and visual signals. The thesis underlying our work is that an extremely simple approach to generating (weak) speech clusters can be combined with visual signals to effectively associate faces and voices by aggregating statistics across a video. This approach does not need any training data specific to this task and leverages the natural coherence of information in the audio and visual streams. It is particularly applicable to tracking speakers in videos on the web where a priori information about the environment (e.g., number of speakers, spatial signals for beamforming) is not available. We performed experiments on a real-world dataset using this analysis framework to determine the speaker in a video. Given a ground truth labeling determined by human rater consensus, our approach had ~71% accuracy.
Tasks
Published	2017-05-31
URL	http://arxiv.org/abs/1706.00079v1
PDF	http://arxiv.org/pdf/1706.00079v1.pdf
PWC	https://paperswithcode.com/paper/putting-a-face-to-the-voice-fusing-audio-and
Repo
Framework

Deep Learning for Video Game Playing


Title	Deep Learning for Video Game Playing
Authors	Niels Justesen, Philip Bontrager, Julian Togelius, Sebastian Risi
Abstract	In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards.
Tasks	Real-Time Strategy Games
Published	2017-08-25
URL	http://arxiv.org/abs/1708.07902v3
PDF	http://arxiv.org/pdf/1708.07902v3.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-video-game-playing
Repo
Framework

Limiting the Reconstruction Capability of Generative Neural Network using Negative Learning


Title	Limiting the Reconstruction Capability of Generative Neural Network using Negative Learning
Authors	Asim Munawar, Phongtharin Vinayavekhin, Giovanni De Magistris
Abstract	Generative models are widely used for unsupervised learning with various applications, including data compression and signal restoration. Training methods for such systems focus on the generality of the network given limited amount of training data. A less researched type of techniques concerns generation of only a single type of input. This is useful for applications such as constraint handling, noise reduction and anomaly detection. In this paper we present a technique to limit the generative capability of the network using negative learning. The proposed method searches the solution in the gradient direction for the desired input and in the opposite direction for the undesired input. One of the application can be anomaly detection where the undesired inputs are the anomalous data. In the results section we demonstrate the features of the algorithm using MNIST handwritten digit dataset and latter apply the technique to a real-world obstacle detection problem. The results clearly show that the proposed learning technique can significantly improve the performance for anomaly detection.
Tasks	Anomaly Detection
Published	2017-08-16
URL	http://arxiv.org/abs/1708.08985v1
PDF	http://arxiv.org/pdf/1708.08985v1.pdf
PWC	https://paperswithcode.com/paper/limiting-the-reconstruction-capability-of
Repo
Framework

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning


Title	Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning
Authors	Steven Stenberg Hansen
Abstract	We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI’s model-based internal structure provides `one-shot’ transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces. \|
Tasks	Q-Learning
Published	2017-05-09
URL	http://arxiv.org/abs/1705.03562v1
PDF	http://arxiv.org/pdf/1705.03562v1.pdf
PWC	https://paperswithcode.com/paper/deep-episodic-value-iteration-for-model-based
Repo
Framework

Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging


Title	Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging
Authors	Shusen Wang, Alex Gittens, Michael W. Mahoney
Abstract	We address the statistical and optimization impacts of the classical sketch and Hessian sketch used to approximately solve the Matrix Ridge Regression (MRR) problem. Prior research has quantified the effects of classical sketch on the strictly simpler least squares regression (LSR) problem. We establish that classical sketch has a similar effect upon the optimization properties of MRR as it does on those of LSR: namely, it recovers nearly optimal solutions. By contrast, Hessian sketch does not have this guarantee, instead, the approximation error is governed by a subtle interplay between the “mass” in the responses and the optimal objective value. For both types of approximation, the regularization in the sketched MRR problem results in significantly different statistical properties from those of the sketched LSR problem. In particular, there is a bias-variance trade-off in sketched MRR that is not present in sketched LSR. We provide upper and lower bounds on the bias and variance of sketched MRR, these bounds show that classical sketch significantly increases the variance, while Hessian sketch significantly increases the bias. Empirically, sketched MRR solutions can have risks that are higher by an order-of-magnitude than those of the optimal MRR solutions. We establish theoretically and empirically that model averaging greatly decreases the gap between the risks of the true and sketched solutions to the MRR problem. Thus, in parallel or distributed settings, sketching combined with model averaging is a powerful technique that quickly obtains near-optimal solutions to the MRR problem while greatly mitigating the increased statistical risk incurred by sketching.
Tasks
Published	2017-02-16
URL	http://arxiv.org/abs/1702.04837v4
PDF	http://arxiv.org/pdf/1702.04837v4.pdf
PWC	https://paperswithcode.com/paper/sketched-ridge-regression-optimization
Repo
Framework

Automatic Identification of AltLexes using Monolingual Parallel Corpora


Title	Automatic Identification of AltLexes using Monolingual Parallel Corpora
Authors	Elnaz Davoodi, Leila Kosseim
Abstract	The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as “since” or “but”, are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signaled by markers outside these inventories (i.e. AltLexes) are not detected as effectively. In this paper, we propose a novel method to leverage parallel corpora in text simplification and lexical resources to automatically identify alternative lexicalizations that signal discourse relation. When applied to the Simple Wikipedia and Newsela corpora along with WordNet and the PPDB, the method allowed the automatic discovery of 91 AltLexes.
Tasks	Text Simplification
Published	2017-08-11
URL	http://arxiv.org/abs/1708.03541v1
PDF	http://arxiv.org/pdf/1708.03541v1.pdf
PWC	https://paperswithcode.com/paper/automatic-identification-of-altlexes-using
Repo
Framework

Training L1-Regularized Models with Orthant-Wise Passive Descent Algorithms


Title	Training L1-Regularized Models with Orthant-Wise Passive Descent Algorithms
Authors	Jianqiao Wangni
Abstract	The $L_1$-regularized models are widely used for sparse regression or classification tasks. In this paper, we propose the orthant-wise passive descent algorithm (OPDA) for optimizing $L_1$-regularized models, as an improved substitute of proximal algorithms, which are the standard tools for optimizing the models nowadays. OPDA uses a stochastic variance-reduced gradient (SVRG) to initialize the descent direction, then apply a novel alignment operator to encourage each element keeping the same sign after one iteration of update, so the parameter remains in the same orthant as before. It also explicitly suppresses the magnitude of each element to impose sparsity. The quasi-Newton update can be utilized to incorporate curvature information and accelerate the speed. We prove a linear convergence rate for OPDA on general smooth and strongly-convex loss functions. By conducting experiments on $L_1$-regularized logistic regression and convolutional neural networks, we show that OPDA outperforms state-of-the-art stochastic proximal algorithms, implying a wide range of applications in training sparse models.
Tasks
Published	2017-04-26
URL	http://arxiv.org/abs/1704.07987v3
PDF	http://arxiv.org/pdf/1704.07987v3.pdf
PWC	https://paperswithcode.com/paper/training-l1-regularized-models-with-orthant
Repo
Framework

Transductive Zero-Shot Learning with Adaptive Structural Embedding


Title	Transductive Zero-Shot Learning with Adaptive Structural Embedding
Authors	Yunlong Yu, Zhong Ji, Jichang Guo, Yanwei Pang
Abstract	Zero-shot learning (ZSL) endows the computer vision system with the inferential capability to recognize instances of a new category that has never seen before. Two fundamental challenges in it are visual-semantic embedding and domain adaptation in cross-modality learning and unseen class prediction steps, respectively. To address both challenges, this paper presents two corresponding methods named Adaptive STructural Embedding (ASTE) and Self-PAsed Selective Strategy (SPASS), respectively. Specifically, ASTE formulates the visualsemantic interactions in a latent structural SVM framework to adaptively adjust the slack variables to embody the different reliableness among training instances. In this way, the reliable instances are imposed with small punishments, wheras the less reliable instances are imposed with more severe punishments. Thus, it ensures a more discriminative embedding. On the other hand, SPASS offers a framework to alleviate the domain shift problem in ZSL, which exploits the unseen data in an easy to hard fashion. Particularly, SPASS borrows the idea from selfpaced learning by iteratively selecting the unseen instances from reliable to less reliable to gradually adapt the knowledge from the seen domain to the unseen domain. Subsequently, by combining SPASS and ASTE, we present a self-paced Transductive ASTE (TASTE) method to progressively reinforce the classification capacity. Extensive experiments on three benchmark datasets (i.e., AwA, CUB, and aPY) demonstrate the superiorities of ASTE and TASTE. Furthermore, we also propose a fast training (FT) strategy to improve the efficiency of most of existing ZSL methods. The FT strategy is surprisingly simple and general enough, which can speed up the training time of most existing methods by 4~300 times while holding the previous performance.
Tasks	Domain Adaptation, Zero-Shot Learning
Published	2017-03-27
URL	http://arxiv.org/abs/1703.08897v1
PDF	http://arxiv.org/pdf/1703.08897v1.pdf
PWC	https://paperswithcode.com/paper/transductive-zero-shot-learning-with-adaptive
Repo
Framework

Optimal Schemes for Discrete Distribution Estimation under Locally Differential Privacy


Title	Optimal Schemes for Discrete Distribution Estimation under Locally Differential Privacy
Authors	Min Ye, Alexander Barg
Abstract	We consider the minimax estimation problem of a discrete distribution with support size $k$ under privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $\epsilon$ measures the privacy level of a privatization scheme. For a given $\epsilon,$ we consider the problem of constructing optimal privatization schemes with $\epsilon$-privacy level, i.e., schemes that minimize the expected estimation loss for the worst-case distribution. Two schemes in the literature provide order optimal performance in the high privacy regime where $\epsilon$ is very close to $0,$ and in the low privacy regime where $e^{\epsilon}\approx k,$ respectively. In this paper, we propose a new family of schemes which substantially improve the performance of the existing schemes in the medium privacy regime when $1\ll e^{\epsilon} \ll k.$ More concretely, we prove that when $3.8 < \epsilon <\ln(k/9) ,$ our schemes reduce the expected estimation loss by $50%$ under $\ell_2^2$ metric and by $30%$ under $\ell_1$ metric over the existing schemes. We also prove a lower bound for the region $e^{\epsilon} \ll k,$ which implies that our schemes are order optimal in this regime.
Tasks
Published	2017-02-02
URL	http://arxiv.org/abs/1702.00610v1
PDF	http://arxiv.org/pdf/1702.00610v1.pdf
PWC	https://paperswithcode.com/paper/optimal-schemes-for-discrete-distribution
Repo
Framework

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems


Title	Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems
Authors	Celestine Dünner, Thomas Parnell, Martin Jaggi
Abstract	We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of large-scale machine learning models, when the training data exceeds their memory capacity. Also, it provides adaptivity to any system’s memory hierarchy in terms of size and processing speed. Our technique is built upon novel theoretical insights regarding primal-dual coordinate methods, and uses duality gap information to dynamically decide which part of the data should be made available for fast processing. To illustrate the power of our approach we demonstrate its performance for training of generalized linear models on a large-scale dataset exceeding the memory size of a modern GPU, showing an order-of-magnitude speedup over existing approaches.
Tasks
Published	2017-08-17
URL	http://arxiv.org/abs/1708.05357v2
PDF	http://arxiv.org/pdf/1708.05357v2.pdf
PWC	https://paperswithcode.com/paper/efficient-use-of-limited-memory-accelerators
Repo
Framework


Title	Toward Abstraction from Multi-modal Data: Empirical Studies on Multiple Time-scale Recurrent Models
Authors	Junpei Zhong, Angelo Cangelosi, Tetsuya Ogata
Abstract	The abstraction tasks are challenging for multi- modal sequences as they require a deeper semantic understanding and a novel text generation for the data. Although the recurrent neural networks (RNN) can be used to model the context of the time-sequences, in most cases the long-term dependencies of multi-modal data make the back-propagation through time training of RNN tend to vanish in the time domain. Recently, inspired from Multiple Time-scale Recurrent Neural Network (MTRNN), an extension of Gated Recurrent Unit (GRU), called Multiple Time-scale Gated Recurrent Unit (MTGRU), has been proposed to learn the long-term dependencies in natural language processing. Particularly it is also able to accomplish the abstraction task for paragraphs given that the time constants are well defined. In this paper, we compare the MTRNN and MTGRU in terms of its learning performances as well as their abstraction representation on higher level (with a slower neural activation). This was done by conducting two studies based on a smaller data- set (two-dimension time sequences from non-linear functions) and a relatively large data-set (43-dimension time sequences from iCub manipulation tasks with multi-modal data). We conclude that gated recurrent mechanisms may be necessary for learning long-term dependencies in large dimension multi-modal data-sets (e.g. learning of robot manipulation), even when natural language commands was not involved. But for smaller learning tasks with simple time-sequences, generic version of recurrent models, such as MTRNN, were sufficient to accomplish the abstraction task.
Tasks	Text Generation
Published	2017-02-07
URL	http://arxiv.org/abs/1702.05441v1
PDF	http://arxiv.org/pdf/1702.05441v1.pdf
PWC	https://paperswithcode.com/paper/toward-abstraction-from-multi-modal-data
Repo
Framework

ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information


Title	ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information
Authors	Madalina Fiterau, Suvrat Bhooshan, Jason Fries, Charles Bournhonesque, Jennifer Hicks, Eni Halilaj, Christopher Ré, Scott Delp
Abstract	In healthcare applications, temporal variables that encode movement, health status and longitudinal patient evolution are often accompanied by rich structured information such as demographics, diagnostics and medical exam data. However, current methods do not jointly optimize over structured covariates and time series in the feature extraction process. We present ShortFuse, a method that boosts the accuracy of deep learning models for time series by explicitly modeling temporal interactions and dependencies with structured covariates. ShortFuse introduces hybrid convolutional and LSTM cells that incorporate the covariates via weights that are shared across the temporal domain. ShortFuse outperforms competing models by 3% on two biomedical applications, forecasting osteoarthritis-related cartilage degeneration and predicting surgical outcomes for cerebral palsy patients, matching or exceeding the accuracy of models that use features engineered by domain experts.
Tasks	Time Series
Published	2017-05-13
URL	http://arxiv.org/abs/1705.04790v2
PDF	http://arxiv.org/pdf/1705.04790v2.pdf
PWC	https://paperswithcode.com/paper/shortfuse-biomedical-time-series
Repo
Framework

Learning Predictive Leading Indicators for Forecasting Time Series Systems with Unknown Clusters of Forecast Tasks


Title	Learning Predictive Leading Indicators for Forecasting Time Series Systems with Unknown Clusters of Forecast Tasks
Authors	Magda Gregorova, Alexandros Kalousis, Stephane Marchand-Maillet
Abstract	We present a new method for forecasting systems of multiple interrelated time series. The method learns the forecast models together with discovering leading indicators from within the system that serve as good predictors improving the forecast accuracy and a cluster structure of the predictive tasks around these. The method is based on the classical linear vector autoregressive model (VAR) and links the discovery of the leading indicators to inferring sparse graphs of Granger causality. We formulate a new constrained optimisation problem to promote the desired sparse structures across the models and the sharing of information amongst the learning tasks in a multi-task manner. We propose an algorithm for solving the problem and document on a battery of synthetic and real-data experiments the advantages of our new method over baseline VAR models as well as the state-of-the-art sparse VAR learning methods.
Tasks	Time Series
Published	2017-10-02
URL	http://arxiv.org/abs/1710.00569v1
PDF	http://arxiv.org/pdf/1710.00569v1.pdf
PWC	https://paperswithcode.com/paper/learning-predictive-leading-indicators-for
Repo
Framework

Colour Constancy: Biologically-inspired Contrast Variant Pooling Mechanism


Title	Colour Constancy: Biologically-inspired Contrast Variant Pooling Mechanism
Authors	Arash Akbarinia, Raquel Gil Rodríguez, C. Alejandro Parraga
Abstract	Pooling is a ubiquitous operation in image processing algorithms that allows for higher-level processes to collect relevant low-level features from a region of interest. Currently, max-pooling is one of the most commonly used operators in the computational literature. However, it can lack robustness to outliers due to the fact that it relies merely on the peak of a function. Pooling mechanisms are also present in the primate visual cortex where neurons of higher cortical areas pool signals from lower ones. The receptive fields of these neurons have been shown to vary according to the contrast by aggregating signals over a larger region in the presence of low contrast stimuli. We hypothesise that this contrast-variant-pooling mechanism can address some of the shortcomings of max-pooling. We modelled this contrast variation through a histogram clipping in which the percentage of pooled signal is inversely proportional to the local contrast of an image. We tested our hypothesis by applying it to the phenomenon of colour constancy where a number of popular algorithms utilise a max-pooling step (e.g. White-Patch, Grey-Edge and Double-Opponency). For each of these methods, we investigated the consequences of replacing their original max-pooling by the proposed contrast-variant-pooling. Our experiments on three colour constancy benchmark datasets suggest that previous results can significantly improve by adopting a contrast-variant-pooling mechanism.
Tasks
Published	2017-11-29
URL	http://arxiv.org/abs/1711.10968v1
PDF	http://arxiv.org/pdf/1711.10968v1.pdf
PWC	https://paperswithcode.com/paper/colour-constancy-biologically-inspired
Repo
Framework