October 17, 2019

3235 words 16 mins read

Paper Group ANR 819

Learning a Saliency Evaluation Metric Using Crowdsourced Perceptual Judgments. Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention. The challenge of realistic music generation: modelling raw audio at scale. Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis. Learning non-Gaussian Tim …

Learning a Saliency Evaluation Metric Using Crowdsourced Perceptual Judgments


Title	Learning a Saliency Evaluation Metric Using Crowdsourced Perceptual Judgments
Authors	Changqun Xia, Jia Li, Jinming Su, Ali Borji
Abstract	In the area of human fixation prediction, dozens of computational saliency models are proposed to reveal certain saliency characteristics under different assumptions and definitions. As a result, saliency model benchmarking often requires several evaluation metrics to simultaneously assess saliency models from multiple perspectives. However, most computational metrics are not designed to directly measure the perceptual similarity of saliency maps so that the evaluation results may be sometimes inconsistent with the subjective impression. To address this problem, this paper first conducts extensive subjective tests to find out how the visual similarities between saliency maps are perceived by humans. Based on the crowdsourced data collected in these tests, we conclude several key factors in assessing saliency maps and quantize the performance of existing metrics. Inspired by these factors, we propose to learn a saliency evaluation metric based on a two-stream convolutional neural network using crowdsourced perceptual judgements. Specifically, the relative saliency score of each pair from the crowdsourced data is utilized to regularize the network during the training process. By capturing the key factors shared by various subjects in comparing saliency maps, the learned metric better aligns with human perception of saliency maps, making it a good complement to the existing metrics. Experimental results validate that the learned metric can be generalized to the comparisons of saliency maps from new images, new datasets, new models and synthetic data. Due to the effectiveness of the learned metric, it also can be used to facilitate the development of new models for fixation prediction.
Tasks
Published	2018-06-27
URL	http://arxiv.org/abs/1806.10257v1
PDF	http://arxiv.org/pdf/1806.10257v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-saliency-evaluation-metric-using
Repo
Framework

Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention


Title	Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention
Authors	Xingyu Liao, Lingxiao He, Zhouwang Yang, Chi Zhang
Abstract	Video-based person re-identification (ReID) is a challenging problem, where some video tracks of people across non-overlapping cameras are available for matching. Feature aggregation from a video track is a key step for video-based person ReID. Many existing methods tackle this problem by average/maximum temporal pooling or RNNs with attention. However, these methods cannot deal with temporal dependency and spatial misalignment problems at the same time. We are inspired by video action recognition that involves the identification of different actions from video tracks. Firstly, we use 3D convolutions on video volume, instead of using 2D convolutions across frames, to extract spatial and temporal features simultaneously. Secondly, we use a non-local block to tackle the misalignment problem and capture spatial-temporal long-range dependencies. As a result, the network can learn useful spatial-temporal information as a weighted sum of the features in all space and temporal positions in the input feature map. Experimental results on three datasets show that our framework outperforms state-of-the-art approaches by a large margin on multiple metrics.
Tasks	Person Re-Identification, Temporal Action Localization, Video-Based Person Re-Identification
Published	2018-07-12
URL	http://arxiv.org/abs/1807.05073v3
PDF	http://arxiv.org/pdf/1807.05073v3.pdf
PWC	https://paperswithcode.com/paper/video-based-person-re-identification-via-3d
Repo
Framework

The challenge of realistic music generation: modelling raw audio at scale


Title	The challenge of realistic music generation: modelling raw audio at scale
Authors	Sander Dieleman, Aäron van den Oord, Karen Simonyan
Abstract	Realistic music generation is a challenging task. When building generative models of music that are learnt from data, typically high-level representations such as scores or MIDI are used that abstract away the idiosyncrasies of a particular performance. But these nuances are very important for our perception of musicality and realism, so in this work we embark on modelling music in the raw audio domain. It has been shown that autoregressive models excel at generating raw audio waveforms of speech, but when applied to music, we find them biased towards capturing local signal structure at the expense of modelling long-range correlations. This is problematic because music exhibits structure at many different timescales. In this work, we explore autoregressive discrete autoencoders (ADAs) as a means to enable autoregressive models to capture long-range correlations in waveforms. We find that they allow us to unconditionally generate piano music directly in the raw audio domain, which shows stylistic consistency across tens of seconds.
Tasks	Music Generation
Published	2018-06-26
URL	http://arxiv.org/abs/1806.10474v1
PDF	http://arxiv.org/pdf/1806.10474v1.pdf
PWC	https://paperswithcode.com/paper/the-challenge-of-realistic-music-generation
Repo
Framework

Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis


Title	Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis
Authors	Chuang Wang, Yonina C. Eldar, Yue M. Lu
Abstract	We present a high-dimensional analysis of three popular algorithms, namely, Oja’s method, GROUSE and PETRELS, for subspace estimation from streaming and highly incomplete observations. We show that, with proper time scaling, the time-varying principal angles between the true subspace and its estimates given by the algorithms converge weakly to deterministic processes when the ambient dimension $n$ tends to infinity. Moreover, the limiting processes can be exactly characterized as the unique solutions of certain ordinary differential equations (ODEs). A finite sample bound is also given, showing that the rate of convergence towards such limits is $\mathcal{O}(1/\sqrt{n})$. In addition to providing asymptotically exact predictions of the dynamic performance of the algorithms, our high-dimensional analysis yields several insights, including an asymptotic equivalence between Oja’s method and GROUSE, and a precise scaling relationship linking the amount of missing data to the signal-to-noise ratio. By analyzing the solutions of the limiting ODEs, we also establish phase transition phenomena associated with the steady-state performance of these techniques.
Tasks
Published	2018-05-17
URL	http://arxiv.org/abs/1805.06834v3
PDF	http://arxiv.org/pdf/1805.06834v3.pdf
PWC	https://paperswithcode.com/paper/subspace-estimation-from-incomplete
Repo
Framework

Learning non-Gaussian Time Series using the Box-Cox Gaussian Process


Title	Learning non-Gaussian Time Series using the Box-Cox Gaussian Process
Authors	Gonzalo Rios, Felipe Tobar
Abstract	Gaussian processes (GPs) are Bayesian nonparametric generative models that provide interpretability of hyperparameters, admit closed-form expressions for training and inference, and are able to accurately represent uncertainty. To model general non-Gaussian data with complex correlation structure, GPs can be paired with an expressive covariance kernel and then fed into a nonlinear transformation (or warping). However, overparametrising the kernel and the warping is known to, respectively, hinder gradient-based training and make the predictions computationally expensive. We remedy this issue by (i) training the model using derivative-free global-optimisation techniques so as to find meaningful maxima of the model likelihood, and (ii) proposing a warping function based on the celebrated Box-Cox transformation that requires minimal numerical approximations—unlike existing warped GP models. We validate the proposed approach by first showing that predictions can be computed analytically, and then on a learning, reconstruction and forecasting experiment using real-world datasets.
Tasks	Gaussian Processes, Time Series
Published	2018-03-19
URL	http://arxiv.org/abs/1803.07102v1
PDF	http://arxiv.org/pdf/1803.07102v1.pdf
PWC	https://paperswithcode.com/paper/learning-non-gaussian-time-series-using-the
Repo
Framework

Training Augmentation with Adversarial Examples for Robust Speech Recognition


Title	Training Augmentation with Adversarial Examples for Robust Speech Recognition
Authors	Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie
Abstract	This paper explores the use of adversarial examples in training speech recognition systems to increase robustness of deep neural network acoustic models. During training, the fast gradient sign method is used to generate adversarial examples augmenting the original training data. Different from conventional data augmentation based on data transformations, the examples are dynamically generated based on current acoustic model parameters. We assess the impact of adversarial data augmentation in experiments on the Aurora-4 and CHiME-4 single-channel tasks, showing improved robustness against noise and channel variation. Further improvement is obtained when combining adversarial examples with teacher/student training, leading to a 23% relative word error rate reduction on Aurora-4.
Tasks	Data Augmentation, Robust Speech Recognition, Speech Recognition
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02782v2
PDF	http://arxiv.org/pdf/1806.02782v2.pdf
PWC	https://paperswithcode.com/paper/training-augmentation-with-adversarial
Repo
Framework

Parallel Convolutional Networks for Image Recognition via a Discriminator


Title	Parallel Convolutional Networks for Image Recognition via a Discriminator
Authors	Shiqi Yang, Gang Peng
Abstract	In this paper, we introduce a simple but quite effective recognition framework dubbed D-PCN, aiming at enhancing feature extracting ability of CNN. The framework consists of two parallel CNNs, a discriminator and an extra classifier which takes integrated features from parallel networks and gives final prediction. The discriminator is core which drives parallel networks to focus on different regions and learn different representations. The corresponding training strategy is introduced to ensures utilization of discriminator. We validate D-PCN with several CNN models on benchmark datasets: CIFAR-100, and ImageNet, D-PCN enhances all models. In particular it yields state of the art performance on CIFAR-100 compared with related works. We also conduct visualization experiment on fine-grained Stanford Dogs dataset to verify our motivation. Additionally, we apply D-PCN for segmentation on PASCAL VOC 2012 and also find promotion.
Tasks
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02265v3
PDF	http://arxiv.org/pdf/1807.02265v3.pdf
PWC	https://paperswithcode.com/paper/parallel-convolutional-networks-for-image
Repo
Framework

Character-Aware Decoder for Translation into Morphologically Rich Languages


Title	Character-Aware Decoder for Translation into Morphologically Rich Languages
Authors	Adithya Renduchintala, Pamela Shapiro, Kevin Duh, Philipp Koehn
Abstract	Neural machine translation (NMT) systems operate primarily on words (or sub-words), ignoring lower-level patterns of morphology. We present a character-aware decoder designed to capture such patterns when translating into morphologically rich languages. We achieve character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder model with convolutional neural networks that operate on the spelling of a word. To investigate performance on a wide variety of morphological phenomena, we translate English into 14 typologically diverse target languages using the TED multi-target dataset. In this low-resource setting, the character-aware decoder provides consistent improvements with BLEU score gains of up to $+3.05$. In addition, we analyze the relationship between the gains obtained and properties of the target language and find evidence that our model does indeed exploit morphological patterns.
Tasks	Machine Translation
Published	2018-09-06
URL	https://arxiv.org/abs/1809.02223v5
PDF	https://arxiv.org/pdf/1809.02223v5.pdf
PWC	https://paperswithcode.com/paper/character-aware-decoder-for-neural-machine
Repo
Framework

Classification of Epileptic EEG Signals by Wavelet based CFC


Title	Classification of Epileptic EEG Signals by Wavelet based CFC
Authors	Amirmasoud Ahmadi, Mahsa Behroozi, Vahid Shalchyan, Mohammad Reza Daliri
Abstract	Electroencephalogram, an influential equipment for analyzing humans activities and recognition of seizure attacks can play a crucial role in designing accurate systems which can distinguish ictal seizures from regular brain alertness, since it is the first step towards accomplishing a high accuracy computer aided diagnosis system (CAD). In this article a novel approach for classification of ictal signals with wavelet based cross frequency coupling (CFC) is suggested. After extracting features by wavelet based CFC, optimal features have been selected by t-test and quadratic discriminant analysis (QDA) have completed the Classification.
Tasks	EEG
Published	2018-05-04
URL	http://arxiv.org/abs/1805.01743v1
PDF	http://arxiv.org/pdf/1805.01743v1.pdf
PWC	https://paperswithcode.com/paper/classification-of-epileptic-eeg-signals-by
Repo
Framework

Learning ReLU Networks via Alternating Minimization


Title	Learning ReLU Networks via Alternating Minimization
Authors	Gauri Jagatap, Chinmay Hegde
Abstract	We propose and analyze a new family of algorithms for training neural networks with ReLU activations. Our algorithms are based on the technique of alternating minimization: estimating the activation patterns of each ReLU for all given samples, interleaved with weight updates via a least-squares step. The main focus of our paper are 1-hidden layer networks with $k$ hidden neurons and ReLU activation. We show that under standard distributional assumptions on the $d-$dimensional input data, our algorithm provably recovers the true `ground truth’ parameters in a linearly convergent fashion. This holds as long as the weights are sufficiently well initialized; furthermore, our method requires only $n=\widetilde{O}(dk^2)$ samples. We also analyze the special case of 1-hidden layer networks with skipped connections, commonly used in ResNet-type architectures, and propose a novel initialization strategy for the same. For ReLU based ResNet type networks, we provide the first linear convergence guarantee with an end-to-end algorithm. We also extend this framework to deeper networks and empirically demonstrate its convergence to a global minimum. \|
Tasks
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07863v2
PDF	http://arxiv.org/pdf/1806.07863v2.pdf
PWC	https://paperswithcode.com/paper/learning-relu-networks-via-alternating
Repo
Framework


Title	Sparse Gaussian Process Temporal Difference Learning for Marine Robot Navigation
Authors	John Martin, Jinkun Wang, Brendan Englot
Abstract	We present a method for Temporal Difference (TD) learning that addresses several challenges faced by robots learning to navigate in a marine environment. For improved data efficiency, our method reduces TD updates to Gaussian Process regression. To make predictions amenable to online settings, we introduce a sparse approximation with improved quality over current rejection-based sparse methods. We derive the predictive value function posterior and use the moments to obtain a new algorithm for model-free policy evaluation, SPGP-SARSA. With simple changes, we show SPGP-SARSA can be reduced to a model-based equivalent, SPGP-TD. We perform comprehensive simulation studies and also conduct physical learning trials with an underwater robot. Our results show SPGP-SARSA can outperform the state-of-the-art sparse method, replicate the prediction quality of its exact counterpart, and be applied to solve underwater navigation tasks.
Tasks	Marine Robot Navigation, Robot Navigation
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01217v1
PDF	http://arxiv.org/pdf/1810.01217v1.pdf
PWC	https://paperswithcode.com/paper/sparse-gaussian-process-temporal-difference
Repo
Framework

Stable Prediction across Unknown Environments


Title	Stable Prediction across Unknown Environments
Authors	Kun Kuang, Ruoxuan Xiong, Peng Cui, Susan Athey, Bo Li
Abstract	In many important machine learning applications, the training distribution used to learn a probabilistic classifier differs from the testing distribution on which the classifier will be used to make predictions. Traditional methods correct the distribution shift by reweighting the training data with the ratio of the density between test and training data. In many applications training takes place without prior knowledge of the testing distribution on which the algorithm will be applied in the future. Recently, methods have been proposed to address the shift by learning causal structure, but those methods rely on the diversity of multiple training data to a good performance, and have complexity limitations in high dimensions. In this paper, we propose a novel Deep Global Balancing Regression (DGBR) algorithm to jointly optimize a deep auto-encoder model for feature selection and a global balancing model for stable prediction across unknown environments. The global balancing model constructs balancing weights that facilitate estimating of partial effects of features (holding fixed all other features), a problem that is challenging in high dimensions, and thus helps to identify stable, causal relationships between features and outcomes. The deep auto-encoder model is designed to reduce the dimensionality of the feature space, thus making global balancing easier. We show, both theoretically and with empirical experiments, that our algorithm can make stable predictions across unknown environments. Our experiments on both synthetic and real world datasets demonstrate that our DGBR algorithm outperforms the state-of-the-art methods for stable prediction across unknown environments.
Tasks	Feature Selection
Published	2018-06-16
URL	http://arxiv.org/abs/1806.06270v2
PDF	http://arxiv.org/pdf/1806.06270v2.pdf
PWC	https://paperswithcode.com/paper/stable-prediction-across-unknown-environments
Repo
Framework

Predicting with Proxies: Transfer Learning in High Dimension


Title	Predicting with Proxies: Transfer Learning in High Dimension
Authors	Hamsa Bastani
Abstract	Predictive analytics is increasingly used to guide decision-making in many applications. However, in practice, we often have limited data on the true predictive task of interest, and must instead rely on more abundant data on a closely-related proxy predictive task. For example, e-commerce platforms use abundant customer click data (proxy) to make product recommendations rather than the relatively sparse customer purchase data (true outcome of interest); alternatively, hospitals often rely on medical risk scores trained on a different patient population (proxy) rather than their own patient population (true cohort of interest) to assign interventions. Yet, not accounting for the bias in the proxy can lead to sub-optimal decisions. Using real datasets, we find that this bias can often be captured by a sparse function of the features. Thus, we propose a novel two-step estimator that uses techniques from high-dimensional statistics to efficiently combine a large amount of proxy data and a small amount of true data. We prove upper bounds on the error of our proposed estimator and lower bounds on several heuristics used by data scientists; in particular, our proposed estimator can achieve the same accuracy with exponentially less true data (in the number of features). Our proof relies on a new LASSO tail inequality for approximately sparse vectors. Finally, we demonstrate the effectiveness of our approach on e-commerce and healthcare datasets; in both cases, we achieve significantly better predictive accuracy as well as managerial insights into the nature of the bias in the proxy data.
Tasks	Decision Making, Transfer Learning
Published	2018-12-28
URL	https://arxiv.org/abs/1812.11097v2
PDF	https://arxiv.org/pdf/1812.11097v2.pdf
PWC	https://paperswithcode.com/paper/predicting-with-proxies
Repo
Framework

A Dynamic Neural Network Approach to Generating Robot’s Novel Actions: A Simulation Experiment


Title	A Dynamic Neural Network Approach to Generating Robot’s Novel Actions: A Simulation Experiment
Authors	Jungsik Hwang, Jun Tani
Abstract	In this study, we investigate how a robot can generate novel and creative actions from its own experience of learning basic actions. Inspired by a machine learning approach to computational creativity, we propose a dynamic neural network model that can learn and generate robot’s actions. We conducted a set of simulation experiments with a humanoid robot. The results showed that the proposed model was able to learn the basic actions and also to generate novel actions by modulating and combining those learned actions. The analysis on the neural activities illustrated that the ability to generate creative actions emerged from the model’s nonlinear memory structure self-organized during training. The results also showed that the different way of learning the basic actions induced the self-organization of the memory structure with the different characteristics, resulting in the generation of different levels of creative actions. Our approach can be utilized in human-robot interaction in which a user can interactively explore the robot’s memory to control its behavior and also discover other novel actions.
Tasks
Published	2018-05-15
URL	http://arxiv.org/abs/1805.05537v1
PDF	http://arxiv.org/pdf/1805.05537v1.pdf
PWC	https://paperswithcode.com/paper/a-dynamic-neural-network-approach-to
Repo
Framework

Robust Website Fingerprinting Through the Cache Occupancy Channel


Title	Robust Website Fingerprinting Through the Cache Occupancy Channel
Authors	Anatoly Shusterman, Lachlan Kang, Yarden Haskal, Yosef Meltser, Prateek Mittal, Yossi Oren, Yuval Yarom
Abstract	Website fingerprinting attacks, which use statistical analysis on network traffic to compromise user privacy, have been shown to be effective even if the traffic is sent over anonymity-preserving networks such as Tor. The classical attack model used to evaluate website fingerprinting attacks assumes an on-path adversary, who can observe all traffic traveling between the user’s computer and the Tor network. In this work we investigate these attacks under a different attack model, in which the adversary is capable of running a small amount of unprivileged code on the target user’s computer. Under this model, the attacker can mount cache side-channel attacks, which exploit the effects of contention on the CPU’s cache, to identify the website being browsed. In an important special case of this attack model, a JavaScript attack is launched when the target user visits a website controlled by the attacker. The effectiveness of this attack scenario has never been systematically analyzed, especially in the open-world model which assumes that the user is visiting a mix of both sensitive and non-sensitive sites. In this work we show that cache website fingerprinting attacks in JavaScript are highly feasible, even when they are run from highly restrictive environments, such as the Tor Browser. Specifically, we use machine learning techniques to classify traces of cache activity. Unlike prior works, which try to identify cache conflicts, our work measures the overall occupancy of the last-level cache. We show that our approach achieves high classification accuracy in both the open-world and the closed-world models. We further show that our techniques are resilient both to network-based defenses and to side-channel countermeasures introduced to modern browsers as a response to the Spectre attack.
Tasks
Published	2018-11-17
URL	http://arxiv.org/abs/1811.07153v3
PDF	http://arxiv.org/pdf/1811.07153v3.pdf
PWC	https://paperswithcode.com/paper/robust-website-fingerprinting-through-the
Repo
Framework