January 28, 2020

3138 words 15 mins read

Paper Group ANR 929

Ray Interference: a Source of Plateaus in Deep Reinforcement Learning. Incorporating End-to-End Speech Recognition Models for Sentiment Analysis. Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning. Deep Image Deraining Via Intrinsic Rainy Image Priors and Multi-scale Auxiliary Decoding. Sha …

Ray Interference: a Source of Plateaus in Deep Reinforcement Learning


Title	Ray Interference: a Source of Plateaus in Deep Reinforcement Learning
Authors	Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu
Abstract	Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms. We study the learning dynamics of reinforcement learning (RL), specifically a characteristic coupling between learning and data generation that arises because RL agents control their future data distribution. In the presence of function approximation, this coupling can lead to a problematic type of ‘ray interference’, characterized by learning dynamics that sequentially traverse a number of performance plateaus, effectively constraining the agent to learn one thing at a time even when learning in parallel is better. We establish the conditions under which ray interference occurs, show its relation to saddle points and obtain the exact learning dynamics in a restricted setting. We characterize a number of its properties and discuss possible remedies.
Tasks
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11455v1
PDF	http://arxiv.org/pdf/1904.11455v1.pdf
PWC	https://paperswithcode.com/paper/ray-interference-a-source-of-plateaus-in-deep
Repo
Framework

Incorporating End-to-End Speech Recognition Models for Sentiment Analysis


Title	Incorporating End-to-End Speech Recognition Models for Sentiment Analysis
Authors	Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter
Abstract	Previous work on emotion recognition demonstrated a synergistic effect of combining several modalities such as auditory, visual, and transcribed text to estimate the affective state of a speaker. Among these, the linguistic modality is crucial for the evaluation of an expressed emotion. However, manually transcribed spoken text cannot be given as input to a system practically. We argue that using ground-truth transcriptions during training and evaluation phases leads to a significant discrepancy in performance compared to real-world conditions, as the spoken text has to be recognized on the fly and can contain speech recognition mistakes. In this paper, we propose a method of integrating an automatic speech recognition (ASR) output with a character-level recurrent neural network for sentiment recognition. In addition, we conduct several experiments investigating sentiment recognition for human-robot interaction in a noise-realistic scenario which is challenging for the ASR systems. We quantify the improvement compared to using only the acoustic modality in sentiment recognition. We demonstrate the effectiveness of this approach on the Multimodal Corpus of Sentiment Intensity (MOSI) by achieving 73,6% accuracy in a binary sentiment classification task, exceeding previously reported results that use only acoustic input. In addition, we set a new state-of-the-art performance on the MOSI dataset (80.4% accuracy, 2% absolute improvement).
Tasks	Emotion Recognition, End-To-End Speech Recognition, Sentiment Analysis, Speech Recognition
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11245v1
PDF	http://arxiv.org/pdf/1902.11245v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-end-to-end-speech-recognition
Repo
Framework

Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning


Title	Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning
Authors	Yingru Liu, Xuewen Yang, Dongliang Xie, Xin Wang, Li Shen, Haozhi Huang, Niranjan Balasubramanian
Abstract	Multi-task learning (MTL) is a common paradigm that seeks to improve the generalization performance of task learning by training related tasks simultaneously. However, it is still a challenging problem to search the flexible and accurate architecture that can be shared among multiple tasks. In this paper, we propose a novel deep learning model called Task Adaptive Activation Network (TAAN) that can automatically learn the optimal network architecture for MTL. The main principle of TAAN is to derive flexible activation functions for different tasks from the data with other parameters of the network fully shared. We further propose two functional regularization methods that improve the MTL performance of TAAN. The improved performance of both TAAN and the regularization methods is demonstrated by comprehensive experiments.
Tasks	Multi-Task Learning
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08065v1
PDF	https://arxiv.org/pdf/1911.08065v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-activation-network-and-functional
Repo
Framework

Deep Image Deraining Via Intrinsic Rainy Image Priors and Multi-scale Auxiliary Decoding


Title	Deep Image Deraining Via Intrinsic Rainy Image Priors and Multi-scale Auxiliary Decoding
Authors	Yinglong Wang, Chao Ma, Bing Zeng
Abstract	Different rain models and novel network structures have been proposed to remove rain streaks from single rainy images. In this work, we bring attention to the intrinsic priors and multi-scale features of the rainy images, and develop several intrinsic loss functions to train a CNN deraining network. We first study the sparse priors of rainy images, which have been verified to preserve unbroken edges in image decomposition. However, its mathematical formulation usually leads to an intractable solution, we propose quasi-sparsity priors to decrease complexity, so that our network can be trained under the supervision of sparse properties of rainy images. Quasi-sparsity supervises network training in different gradient domain which is still ill-posed to decompose a rainy image into rain layer and background layer. We develop another $L_1$ loss based on the intrinsic low-value property of rain layer to restore image contents together with the commonly-used $L_1$ similarity loss. Multi-scale features are further explored via a multi-scale auxiliary decoding structure to show which kinds of features contribute the most to the deraining task, and the corresponding multi-scale auxiliary loss improves the deraining performance further. In our network, more efficient group convolution and feature sharing are utilized to obtain an one order of magnitude improvement in network running speed. The proposed deraining method performs favorably against state-of-the-art deraining approaches.
Tasks	Rain Removal
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10810v1
PDF	https://arxiv.org/pdf/1911.10810v1.pdf
PWC	https://paperswithcode.com/paper/deep-image-deraining-via-intrinsic-rainy
Repo
Framework

Shaping Belief States with Generative Environment Models for RL


Title	Shaping Belief States with Generative Environment Models for RL
Authors	Karol Gregor, Danilo Jimenez Rezende, Frederic Besse, Yan Wu, Hamza Merzic, Aaron van den Oord
Abstract	When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show that the learned representation captures the layout of the environment as well as the position and orientation of the agent. Our experiments show that the model substantially improves data-efficiency on a number of reinforcement learning (RL) tasks compared with strong model-free baseline agents. We find that predicting multiple steps into the future (overshooting), in combination with an expressive generative model, is critical for stable representations to emerge. In practice, using expressive generative models in RL is computationally expensive and we propose a scheme to reduce this computational burden, allowing us to build agents that are competitive with model-free baselines.
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1906.09237v2
PDF	https://arxiv.org/pdf/1906.09237v2.pdf
PWC	https://paperswithcode.com/paper/shaping-belief-states-with-generative
Repo
Framework

Nemesyst: A Hybrid Parallelism Deep Learning-Based Framework Applied for Internet of Things Enabled Food Retailing Refrigeration Systems


Title	Nemesyst: A Hybrid Parallelism Deep Learning-Based Framework Applied for Internet of Things Enabled Food Retailing Refrigeration Systems
Authors	George Onoufriou, Ronald Bickerton, Simon Pearson, Georgios Leontidis
Abstract	Deep Learning has attracted considerable attention across multiple application domains, including computer vision, signal processing and natural language processing. Although quite a few single node deep learning frameworks exist, such as tensorflow, pytorch and keras, we still lack a complete processing structure that can accommodate large scale data processing, version control, and deployment, all while staying agnostic of any specific single node framework. To bridge this gap, this paper proposes a new, higher level framework, i.e. Nemesyst, which uses databases along with model sequentialisation to allow processes to be fed unique and transformed data at the point of need. This facilitates near real-time application and makes models available for further training or use at any node that has access to the database simultaneously. Nemesyst is well suited as an application framework for internet of things aggregated control systems, deploying deep learning techniques to optimise individual machines in massive networks. To demonstrate this framework, we adopted a case study in a novel domain; deploying deep learning to optimise the high speed control of electrical power consumed by a massive internet of things network of retail refrigeration systems in proportion to load available on the UK National Grid (a demand side response). The case study demonstrated for the first time in such a setting how deep learning models, such as Recurrent Neural Networks (vanilla and Long-Short-Term Memory) and Generative Adversarial Networks paired with Nemesyst, achieve compelling performance, whilst still being malleable to future adjustments as both the data and requirements inevitably change over time.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01600v2
PDF	https://arxiv.org/pdf/1906.01600v2.pdf
PWC	https://paperswithcode.com/paper/nemesyst-a-hybrid-parallelism-deep-learning
Repo
Framework

Detecting Behavioral Engagement of Students in the Wild Based on Contextual and Visual Data


Title	Detecting Behavioral Engagement of Students in the Wild Based on Contextual and Visual Data
Authors	Eda Okur, Nese Alyuz, Sinem Aslan, Utku Genc, Cagri Tanriover, Asli Arslan Esme
Abstract	To investigate the detection of students’ behavioral engagement (On-Task vs. Off-Task), we propose a two-phase approach in this study. In Phase 1, contextual logs (URLs) are utilized to assess active usage of the content platform. If there is active use, the appearance information is utilized in Phase 2 to infer behavioral engagement. Incorporating the contextual information improved the overall F1-scores from 0.77 to 0.82. Our cross-classroom and cross-platform experiments showed the proposed generic and multi-modal behavioral engagement models’ applicability to a different set of students or different subject areas.
Tasks
Published	2019-01-15
URL	http://arxiv.org/abs/1901.06291v1
PDF	http://arxiv.org/pdf/1901.06291v1.pdf
PWC	https://paperswithcode.com/paper/detecting-behavioral-engagement-of-students
Repo
Framework

$\ell_0$ Regularized Structured Sparsity Convolutional Neural Networks


Title	$\ell_0$ Regularized Structured Sparsity Convolutional Neural Networks
Authors	Kevin Bui, Fredrick Park, Shuai Zhang, Yingyong Qi, Jack Xin
Abstract	Deepening and widening convolutional neural networks (CNNs) significantly increases the number of trainable weight parameters by adding more convolutional layers and feature maps per layer, respectively. By imposing inter- and intra-group sparsity onto the weights of the layers during the training process, a compressed network can be obtained with accuracy comparable to a dense one. In this paper, we propose a new variant of sparse group lasso that blends the $\ell_0$ norm onto the individual weight parameters and the $\ell_{2,1}$ norm onto the output channels of a layer. To address the non-differentiability of the $\ell_0$ norm, we apply variable splitting resulting in an algorithm that consists of executing stochastic gradient descent followed by hard thresholding for each iteration. Numerical experiments are demonstrated on LeNet-5 and wide-residual-networks for MNIST and CIFAR 10/100, respectively. They showcase the effectiveness of our proposed method in attaining superior test accuracy with network sparsification on par with the current state of the art.
Tasks
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07868v1
PDF	https://arxiv.org/pdf/1912.07868v1.pdf
PWC	https://paperswithcode.com/paper/ell_0-regularized-structured-sparsity
Repo
Framework

Leveraging Model Interpretability and Stability to increase Model Robustness


Title	Leveraging Model Interpretability and Stability to increase Model Robustness
Authors	Fei Wu, Thomas Michel, Alexandre Briot
Abstract	State of the art Deep Neural Networks (DNN) can now achieve above human level accuracy on image classification tasks. However their outstanding performances come along with a complex inference mechanism making them arduously interpretable models. In order to understand the underlying prediction rules of DNNs, Dhamdhere et al. propose an interpretability method to break down a DNN prediction score as sum of its hidden unit contributions, in the form of a metric called conductance. Analyzing conductances of DNN hidden units, we find out there is a difference in how wrong and correct predictions are inferred. We identify distinguishable patterns of hidden unit activations for wrong and correct predictions. We then use an error detector in the form of a binary classifier on top of the DNN to automatically discriminate wrong and correct predictions of the DNN based on their hidden unit activations. Detected wrong predictions are discarded, increasing the model robustness. A different approach to distinguish wrong and correct predictions of DNNs is proposed by Wang et al. whose method is based on the premise that input samples leading a DNN into making wrong predictions are less stable to the DNN weight changes than correctly classified input samples. In our study, we compare both methods and find out by combining them that better detection of wrong predictions can be achieved.
Tasks	Image Classification
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00387v2
PDF	https://arxiv.org/pdf/1910.00387v2.pdf
PWC	https://paperswithcode.com/paper/leveraging-model-interpretability-and
Repo
Framework

Dual-Reference Design for Holographic Coherent Diffraction Imaging


Title	Dual-Reference Design for Holographic Coherent Diffraction Imaging
Authors	David A. Barmherzig, Ju Sun, Emmanuel J. Candès, T. J. Lane, Po-Nan Li
Abstract	A new reference design is introduced for holographic coherent diffraction imaging. This consists in two references - “block” and “pinhole” shaped regions - placed adjacent to the imaging specimen. An efficient recovery algorithm is provided for the resulting holographic phase retrieval problem, which is based on solving a structured, overdetermined linear system. Analysis of the expected recovery error on noisy data, which is contaminated by Poisson shot noise, shows that this simple modification synergizes the individual references and hence leads to uniformly superior performance over single-reference schemes. Numerical experiments on simulated data confirm the theoretical prediction, and the proposed dual-reference scheme achieves a smaller recovery error than leading single-reference schemes.
Tasks
Published	2019-02-07
URL	https://arxiv.org/abs/1902.02492v2
PDF	https://arxiv.org/pdf/1902.02492v2.pdf
PWC	https://paperswithcode.com/paper/dual-reference-design-for-holographic
Repo
Framework

Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits


Title	Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits
Authors	Chao Tao, Qin Zhang, Yuan Zhou
Abstract	Best arm identification (or, pure exploration) in multi-armed bandits is a fundamental problem in machine learning. In this paper we study the distributed version of this problem where we have multiple agents, and they want to learn the best arm collaboratively. We want to quantify the power of collaboration under limited interaction (or, communication steps), as interaction is expensive in many settings. We measure the running time of a distributed algorithm as the speedup over the best centralized algorithm where there is only one agent. We give almost tight round-speedup tradeoffs for this problem, along which we develop several new techniques for proving lower bounds on the number of communication steps under time or confidence constraints.
Tasks	Multi-Armed Bandits
Published	2019-04-05
URL	https://arxiv.org/abs/1904.03293v2
PDF	https://arxiv.org/pdf/1904.03293v2.pdf
PWC	https://paperswithcode.com/paper/collaborative-learning-with-limited
Repo
Framework

IPG-Net: Image Pyramid Guidance Network for Object Detection


Title	IPG-Net: Image Pyramid Guidance Network for Object Detection
Authors	Ziming Liu, Guangyu Gao, Lin Sun
Abstract	For Convolutional Neural Network based object detection, there is a typical dilemma: the spatial information is well kept in the shallow layers which unfortunately do not have enough semantic information, while the deep layers have high semantic concept but lost a lot of spatial information, resulting in serious information imbalance. To acquire enough semantic information for shallow layers, Feature Pyramid Networks (FPN) is used to build a top-down propagated path. In this paper, except for top-down combining of information for shallow layers, we propose a novel network called Image Pyramid Guidance Network(IPG-Net) to make sure both the spatial information and semantic information are abundant for each layer. Our IPG-Net has three main parts: the image pyramid guidance sub-network, the ResNet based backbone network and the fusing module. The image pyramid guidance sub-network supplies spatial information to each scale’s feature to solve the information imbalance problem. This sub-network promise even in the deepest stage of the ResNet, there is enough spatial information for bounding box regression and classification. Furthermore, we designed an effective fusing module to fuse the features from the image pyramid and features from the feature pyramid. We have tried to apply this novel network to both one stage and two stage models, state of the art results are obtained on the most popular benchmark data sets, i.e. MS COCO and Pascal VOC.
Tasks	Object Detection
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00632v2
PDF	https://arxiv.org/pdf/1912.00632v2.pdf
PWC	https://paperswithcode.com/paper/ipg-net-image-pyramid-guidance-network-for
Repo
Framework

Online Pandora’s Boxes and Bandits


Title	Online Pandora’s Boxes and Bandits
Authors	Hossein Esfandiari, MohammadTaghi Hajiaghayi, Brendan Lucier, Michael Mitzenmacher
Abstract	We consider online variations of the Pandora’s box problem (Weitzman. 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drew jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside). We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.
Tasks	Decision Making
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10698v1
PDF	http://arxiv.org/pdf/1901.10698v1.pdf
PWC	https://paperswithcode.com/paper/online-pandoras-boxes-and-bandits
Repo
Framework

PAC-Bayes with Backprop


Title	PAC-Bayes with Backprop
Authors	Omar Rivasplata, Vikram M Tankasali, Csaba Szepesvari
Abstract	We explore the family of methods “PAC-Bayes with Backprop” (PBB) to train probabilistic neural networks by minimizing PAC-Bayes bounds. We present two training objectives, one derived from a previously known PAC-Bayes bound, and a second one derived from a novel PAC-Bayes bound. Both training objectives are evaluated on MNIST and on various UCI data sets. Our experiments show two striking observations: we obtain competitive test set error estimates (~1.4% on MNIST) and at the same time we compute non-vacuous bounds with much tighter values (~2.3% on MNIST) than previous results. These observations suggest that neural nets trained by PBB may lead to self-bounding learning, where the available data can be used to simultaneously learn a predictor and certify its risk, with no need to follow a data-splitting protocol.
Tasks
Published	2019-08-19
URL	https://arxiv.org/abs/1908.07380v5
PDF	https://arxiv.org/pdf/1908.07380v5.pdf
PWC	https://paperswithcode.com/paper/pac-bayes-with-backprop
Repo
Framework

Robustness Analysis of Face Obscuration


Title	Robustness Analysis of Face Obscuration
Authors	Hanxiang Hao, David Güera, János Horváth, Amy R. Reibman, Edward J. Delp
Abstract	Face obscuration is needed by law enforcement and mass media outlets to guarantee privacy. Sharing sensitive content where obscuration or redaction techniques have failed to completely remove all identifiable traces can lead to many legal and social issues. Hence, we need to be able to systematically measure the face obscuration performance of a given technique. In this paper we propose to measure the effectiveness of eight obscuration techniques. We do so by attacking the redacted faces in three scenarios: obscured face identification, verification, and reconstruction. Threat modeling is also considered to provide a vulnerability analysis for each studied obscuration technique. Based on our evaluation, we show that the k-same based methods are the most effective.
Tasks	Face Identification
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05243v2
PDF	https://arxiv.org/pdf/1905.05243v2.pdf
PWC	https://paperswithcode.com/paper/robustness-analysis-of-face-obscuration
Repo
Framework