January 25, 2020

3301 words 16 mins read

Paper Group ANR 1756

Modularization of End-to-End Learning: Case Study in Arcade Games. ASU at TextGraphs 2019 Shared Task: Explanation ReGeneration using Language Models and Iterative Re-Ranking. DR Loss: Improving Object Detection by Distributional Ranking. An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation. Understanding Multi-Hea …

Modularization of End-to-End Learning: Case Study in Arcade Games


Title	Modularization of End-to-End Learning: Case Study in Arcade Games
Authors	Andrew Melnik, Sascha Fleer, Malte Schilling, Helge Ritter
Abstract	Complex environments and tasks pose a difficult problem for holistic end-to-end learning approaches. Decomposition of an environment into interacting controllable and non-controllable objects allows supervised learning for non-controllable objects and universal value function approximator learning for controllable objects. Such decomposition should lead to a shorter learning time and better generalisation capability. Here, we consider arcade-game environments as sets of interacting objects (controllable, non-controllable) and propose a set of functional modules that are specialized on mastering different types of interactions in a broad range of environments. The modules utilize regression, supervised learning, and reinforcement learning algorithms. Results of this case study in different Atari games suggest that human-level performance can be achieved by a learning agent within a human amount of game experience (10-15 minutes game time) when a proper decomposition of an environment or a task is provided. However, automatization of such decomposition remains a challenging problem. This case study shows how a model of a causal structure underlying an environment or a task can benefit learning time and generalization capability of the agent, and argues in favor of exploiting modular structure in contrast to using pure end-to-end learning approaches.
Tasks	Atari Games
Published	2019-01-27
URL	http://arxiv.org/abs/1901.09895v1
PDF	http://arxiv.org/pdf/1901.09895v1.pdf
PWC	https://paperswithcode.com/paper/modularization-of-end-to-end-learning-case
Repo
Framework

ASU at TextGraphs 2019 Shared Task: Explanation ReGeneration using Language Models and Iterative Re-Ranking


Title	ASU at TextGraphs 2019 Shared Task: Explanation ReGeneration using Language Models and Iterative Re-Ranking
Authors	Pratyay Banerjee
Abstract	In this work we describe the system from Natural Language Processing group at Arizona State University for the TextGraphs 2019 Shared Task. The task focuses on Explanation Regeneration, an intermediate step towards general multi-hop inference on large graphs. Our approach consists of modeling the explanation regeneration task as a \textit{learning to rank} problem, for which we use state-of-the-art language models and explore dataset preparation techniques. We utilize an iterative re-ranking based approach to further improve the rankings. Our system secured 2nd rank in the task with a mean average precision (MAP) of 41.3% on the test set.
Tasks	Learning-To-Rank
Published	2019-09-19
URL	https://arxiv.org/abs/1909.08863v1
PDF	https://arxiv.org/pdf/1909.08863v1.pdf
PWC	https://paperswithcode.com/paper/asu-at-textgraphs-2019-shared-task
Repo
Framework

DR Loss: Improving Object Detection by Distributional Ranking


Title	DR Loss: Improving Object Detection by Distributional Ranking
Authors	Qi Qian, Lei Chen, Hao Li, Rong Jin
Abstract	Most of object detection algorithms can be categorized into two classes: two-stage detectors and one-stage detectors. Recently, many efforts have been devoted to one-stage detectors for the simple yet effective architecture. Different from two-stage detectors, one-stage detectors aim to identify foreground objects from all candidates in a single stage. This architecture is efficient but can suffer from the imbalance issue with respect to two aspects: the inter-class imbalance between the number of candidates from foreground and background classes and the intra-class imbalance in the hardness of background candidates, where only a few candidates are hard to be identified. In this work, we propose a novel distributional ranking (DR) loss to handle the challenge. For each image, we convert the classification problem to a ranking problem, which considers pairs of candidates within the image, to address the inter-class imbalance problem. Then, we push the distributions of confidence scores for foreground and background towards the decision boundary. After that, we optimize the rank of the expectations of derived distributions in lieu of original pairs. Our method not only mitigates the intra-class imbalance issue in background candidates but also improves the efficiency for the ranking algorithm. By merely replacing the focal loss in RetinaNet with the developed DR loss and applying ResNet-101 as the backbone, mAP of the single-scale test on COCO can be improved from 39.1% to 41.7% without bells and whistles, which demonstrates the effectiveness of the proposed loss function. Code will be available.
Tasks	Object Detection
Published	2019-07-23
URL	https://arxiv.org/abs/1907.10156v2
PDF	https://arxiv.org/pdf/1907.10156v2.pdf
PWC	https://paperswithcode.com/paper/dr-loss-improving-object-detection-by
Repo
Framework

An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation


Title	An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation
Authors	Jonathan Scarlett, Volkan Cevher
Abstract	Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano’s inequality. In this chapter, we provide a survey of Fano’s inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization.
Tasks	Density Estimation, Model Selection
Published	2019-01-02
URL	https://arxiv.org/abs/1901.00555v3
PDF	https://arxiv.org/pdf/1901.00555v3.pdf
PWC	https://paperswithcode.com/paper/an-introductory-guide-to-fanos-inequality
Repo
Framework

Understanding Multi-Head Attention in Abstractive Summarization


Title	Understanding Multi-Head Attention in Abstractive Summarization
Authors	Joris Baan, Maartje ter Hoeve, Marlies van der Wees, Anne Schuth, Maarten de Rijke
Abstract	Attention mechanisms in deep learning architectures have often been used as a means of transparency and, as such, to shed light on the inner workings of the architectures. Recently, there has been a growing interest in whether or not this assumption is correct. In this paper we investigate the interpretability of multi-head attention in abstractive summarization, a sequence-to-sequence task for which attention does not have an intuitive alignment role, such as in machine translation. We first introduce three metrics to gain insight in the focus of attention heads and observe that these heads specialize towards relative positions, specific part-of-speech tags, and named entities. However, we also find that ablating and pruning these heads does not lead to a significant drop in performance, indicating redundancy. By replacing the softmax activation functions with sparsemax activation functions, we find that attention heads behave seemingly more transparent: we can ablate fewer heads and heads score higher on our interpretability metrics. However, if we apply pruning to the sparsemax model we find that we can prune even more heads, raising the question whether enforced sparsity actually improves transparency. Finally, we find that relative positions heads seem integral to summarization performance and persistently remain after pruning.
Tasks	Abstractive Text Summarization, Machine Translation
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03898v1
PDF	https://arxiv.org/pdf/1911.03898v1.pdf
PWC	https://paperswithcode.com/paper/understanding-multi-head-attention-in
Repo
Framework

Piracy Resistant Watermarks for Deep Neural Networks


Title	Piracy Resistant Watermarks for Deep Neural Networks
Authors	Huiying Li, Emily Wenger, Ben Y. Zhao, Haitao Zheng
Abstract	As companies continue to invest heavily in larger, more accurate and more robust deep learning models, they are exploring approaches to monetize their models while protecting their intellectual property. Model licensing is promising, but requires a robust tool for owners to claim ownership of models, i.e. a watermark. Unfortunately, current designs have not been able to address piracy attacks, where third parties falsely claim model ownership by embedding their own “pirate watermarks” into an already-watermarked model. We observe that resistance to piracy attacks is fundamentally at odds with the current use of incremental training to embed watermarks into models. In this work, we propose null embedding, a new way to build piracy-resistant watermarks into DNNs that can only take place at a model’s initial training. A null embedding takes a bit string (watermark value) as input, and builds strong dependencies between the model’s normal classification accuracy and the watermark. As a result, attackers cannot remove an embedded watermark via tuning or incremental training, and cannot add new pirate watermarks to already watermarked models. We empirically show that our proposed watermarks achieve piracy resistance and other watermark properties, over a wide range of tasks and models. Finally, we explore a number of adaptive counter-measures, and show our watermark remains robust against a variety of model modifications, including model fine-tuning, compression, and existing methods to detect/remove backdoors. Our watermarked models are also amenable to transfer learning without losing their watermark properties.
Tasks	Transfer Learning
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01226v2
PDF	https://arxiv.org/pdf/1910.01226v2.pdf
PWC	https://paperswithcode.com/paper/persistent-and-unforgeable-watermarks-for
Repo
Framework

An Efficient Schmidt-EKF for 3D Visual-Inertial SLAM


Title	An Efficient Schmidt-EKF for 3D Visual-Inertial SLAM
Authors	Patrick Geneva, James Maley, Guoquan Huang
Abstract	It holds great implications for practical applications to enable centimeter-accuracy positioning for mobile and wearable sensor systems. In this paper, we propose a novel, high-precision, efficient visual-inertial (VI)-SLAM algorithm, termed Schmidt-EKF VI-SLAM (SEVIS), which optimally fuses IMU measurements and monocular images in a tightly-coupled manner to provide 3D motion tracking with bounded error. In particular, we adapt the Schmidt Kalman filter formulation to selectively include informative features in the state vector while treating them as nuisance parameters (or Schmidt states) once they become matured. This change in modeling allows for significant computational savings by no longer needing to constantly update the Schmidt states (or their covariance), while still allowing the EKF to correctly account for their cross-correlations with the active states. As a result, we achieve linear computational complexity in terms of map size, instead of quadratic as in the standard SLAM systems. In order to fully exploit the map information to bound navigation drifts, we advocate efficient keyframe-aided 2D-to-2D feature matching to find reliable correspondences between current 2D visual measurements and 3D map features. The proposed SEVIS is extensively validated in both simulations and experiments.
Tasks
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08636v1
PDF	http://arxiv.org/pdf/1903.08636v1.pdf
PWC	https://paperswithcode.com/paper/an-efficient-schmidt-ekf-for-3d-visual
Repo
Framework

Should I Raise The Red Flag? A comprehensive survey of anomaly scoring methods toward mitigating false alarms


Title	Should I Raise The Red Flag? A comprehensive survey of anomaly scoring methods toward mitigating false alarms
Authors	Zahra Zohrevand, Uwe Glässer
Abstract	A general Intrusion Detection System (IDS) fundamentally acts based on an Anomaly Detection System (ADS) or a combination of anomaly detection and signature-based methods, gathering and analyzing observations and reporting possible suspicious cases to a system administrator or the other users for further investigation. One of the notorious challenges which even the state-of-the-art ADS and IDS have not overcome is the possibility of a very high false alarms rate. Especially in very large and complex system settings, the amount of low-level alarms easily overwhelms administrators and increases their tendency to ignore alerts.We can group the existing false alarm mitigation strategies into two main families: The first group covers the methods directly customized and applied toward higher quality anomaly scoring in ADS. The second group includes approaches utilized in the related contexts as a filtering method toward decreasing the possibility of false alarm rates.Given the lack of a comprehensive study regarding possible ways to mitigate the false alarm rates, in this paper, we review the existing techniques for false alarm mitigation in ADS and present the pros and cons of each technique. We also study a few promising techniques applied in the signature-based IDS and other related contexts like commercial Security Information and Event Management (SIEM) tools, which are applicable and promising in the ADS context.Finally, we conclude with some directions for future research.
Tasks	Anomaly Detection, Intrusion Detection
Published	2019-04-14
URL	http://arxiv.org/abs/1904.06646v1
PDF	http://arxiv.org/pdf/1904.06646v1.pdf
PWC	https://paperswithcode.com/paper/should-i-raise-the-red-flag-a-comprehensive
Repo
Framework

Rethinking deep active learning: Using unlabeled data at model training


Title	Rethinking deep active learning: Using unlabeled data at model training
Authors	Oriane Siméoni, Mateusz Budnik, Yannis Avrithis, Guillaume Gravier
Abstract	Active learning typically focuses on training a model on few labeled examples alone, while unlabeled ones are only used for acquisition. In this work we depart from this setting by using both labeled and unlabeled data during model training across active learning cycles. We do so by using unsupervised feature learning at the beginning of the active learning pipeline and semi-supervised learning at every active learning cycle, on all available data. The former has not been investigated before in active learning, while the study of latter in the context of deep learning is scarce and recent findings are not conclusive with respect to its benefit. Our idea is orthogonal to acquisition strategies by using more data, much like ensemble methods use more models. By systematically evaluating on a number of popular acquisition strategies and datasets, we find that the use of unlabeled data during model training brings a surprising accuracy improvement in image classification, compared to the differences between acquisition strategies. We thus explore smaller label budgets, even one label per class.
Tasks	Active Learning, Image Classification
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08177v1
PDF	https://arxiv.org/pdf/1911.08177v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-deep-active-learning-using-1
Repo
Framework

Reinforcement Learning Applications


Title	Reinforcement Learning Applications
Authors	Yuxi Li
Abstract	We start with a brief introduction to reinforcement learning (RL), about its successful stories, basics, an example, issues, the ICML 2019 Workshop on RL for Real Life, how to use it, study material and an outlook. Then we discuss a selection of RL applications, including recommender systems, computer systems, energy, finance, healthcare, robotics, and transportation.
Tasks	Recommendation Systems
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06973v1
PDF	https://arxiv.org/pdf/1908.06973v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-applications
Repo
Framework


Title	Different Absorption from the Same Sharing: Sifted Multi-task Learning for Fake News Detection
Authors	Lianwei Wu, Yuan Rao, Haolin Jin, Ambreen Nazir, Ling Sun
Abstract	Recently, neural networks based on multi-task learning have achieved promising performance on fake news detection, which focus on learning shared features among tasks as complementary features to serve different tasks. However, in most of the existing approaches, the shared features are completely assigned to different tasks without selection, which may lead to some useless and even adverse features integrated into specific tasks. In this paper, we design a sifted multi-task learning method with a selected sharing layer for fake news detection. The selected sharing layer adopts gate mechanism and attention mechanism to filter and select shared feature flows between tasks. Experiments on two public and widely used competition datasets, i.e. RumourEval and PHEME, demonstrate that our proposed method achieves the state-of-the-art performance and boosts the F1-score by more than 0.87%, 1.31%, respectively.
Tasks	Fake News Detection, Multi-Task Learning
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01720v1
PDF	https://arxiv.org/pdf/1909.01720v1.pdf
PWC	https://paperswithcode.com/paper/different-absorption-from-the-same-sharing
Repo
Framework

Large-Scale Local Causal Inference of Gene Regulatory Relationships


Title	Large-Scale Local Causal Inference of Gene Regulatory Relationships
Authors	Ioan Gabriel Bucur, Tom Claassen, Tom Heskes
Abstract	Gene regulatory networks play a crucial role in controlling an organism’s biological processes, which is why there is significant interest in developing computational methods that are able to extract their structure from high-throughput genetic data. Many of these computational methods are designed to infer individual regulatory relationships among genes from data on gene expression. We propose a novel efficient Bayesian method for discovering local causal relationships among triplets of (normally distributed) variables. In our approach, we score covariance structures for each triplet in one go and incorporate available background knowledge in the form of priors to derive posterior probabilities over local causal structures. Our method is flexible in the sense that it allows for different types of causal structures and assumptions. We apply our approach to the task of learning causal regulatory relationships among genes. We show that the proposed algorithm produces stable and conservative posterior probability estimates over local causal structures that can be used to derive an honest ranking of the most meaningful regulatory relationships. We demonstrate the stability and efficacy of our method both on simulated data and on real-world data from an experiment on yeast.
Tasks	Causal Inference
Published	2019-09-03
URL	https://arxiv.org/abs/1909.03818v2
PDF	https://arxiv.org/pdf/1909.03818v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-gene-network-causal-inference
Repo
Framework

Time Series Analysis of Electricity Price and Demand to Find Cyber-attacks using Stationary Analysis


Title	Time Series Analysis of Electricity Price and Demand to Find Cyber-attacks using Stationary Analysis
Authors	Mohsen Rakhshandehroo, Mohammad Rajabdorri
Abstract	With developing of computation tools in the last years, data analysis methods to find insightful information are becoming more common among industries and researchers. This paper is the first part of the times series analysis of New England electricity price and demand to find anomaly in the data. In this paper time-series stationary criteria to prepare data for further times-series related analysis is investigated. Three main analysis are conducted in this paper, including moving average, moving standard deviation and augmented Dickey-Fuller test. The data used in this paper is New England big data from 9 different operational zones. For each zone, 4 different variables including day-ahead (DA) electricity demand, price and real-time (RT) electricity demand price are considered.
Tasks	Time Series, Time Series Analysis
Published	2019-07-23
URL	https://arxiv.org/abs/1907.11651v3
PDF	https://arxiv.org/pdf/1907.11651v3.pdf
PWC	https://paperswithcode.com/paper/time-series-analysis-of-electricity-price-and
Repo
Framework

A Meta Approach to Defend Noisy Labels by the Manifold Regularizer PSDR


Title	A Meta Approach to Defend Noisy Labels by the Manifold Regularizer PSDR
Authors	Pengfei Chen, Benben Liao, Guangyong Chen, Shengyu Zhang
Abstract	Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) since DNNs can easily overfit to the noisy labels. Most recent efforts have been devoted to defending noisy labels by discarding noisy samples from the training set or assigning weights to training samples, where the weight associated with a noisy sample is expected to be small. Thereby, these previous efforts result in a waste of samples, especially those assigned with small weights. The input $x$ is always useful regardless of whether its observed label $y$ is clean. To make full use of all samples, we introduce a manifold regularizer, named as Paired Softmax Divergence Regularization (PSDR), to penalize the Kullback-Leibler (KL) divergence between softmax outputs of similar inputs. In particular, similar inputs can be effectively generated by data augmentation. PSDR can be easily implemented on any type of DNNs to improve the robustness against noisy labels. As empirically demonstrated on benchmark datasets, our PSDR impressively improve state-of-the-art results by a significant margin.
Tasks	Data Augmentation
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05509v1
PDF	https://arxiv.org/pdf/1906.05509v1.pdf
PWC	https://paperswithcode.com/paper/a-meta-approach-to-defend-noisy-labels-by-the
Repo
Framework

Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition


Title	Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition
Authors	Lin Chen, Qian Yu, Hannah Lawrence, Amin Karbasi
Abstract	We study the problem of switching-constrained online convex optimization (OCO), where the player has a limited number of opportunities to change her action. While the discrete analog of this online learning task has been studied extensively, previous work in the continuous setting has neither established the minimax rate nor algorithmically achieved it. We here show that $ T $-round switching-constrained OCO with fewer than $ K $ switches has a minimax regret of $ \Theta(\frac{T}{\sqrt{K}}) $. In particular, it is at least $ \frac{T}{\sqrt{2K}} $ for one dimension and at least $ \frac{T}{\sqrt{K}} $ for higher dimensions. The lower bound in higher dimensions is attained by an orthogonal subspace argument. The minimax analysis in one dimension is more involved. To establish the one-dimensional result, we introduce the fugal game relaxation, whose minimax regret lower bounds that of switching-constrained OCO. We show that the minimax regret of the fugal game is at least $ \frac{T}{\sqrt{2K}} $ and thereby establish the minimax lower bound in one dimension. We next show that a mini-batching algorithm provides an $ O(\frac{T}{\sqrt{K}}) $ upper bound, and therefore we conclude that the minimax regret of switching-constrained OCO is $ \Theta(\frac{T}{\sqrt{K}}) $ for any $K$. This is in sharp contrast to its discrete counterpart, the switching-constrained prediction-from-experts problem, which exhibits a phase transition in minimax regret between the low-switching and high-switching regimes. In the case of bandit feedback, we first determine a novel linear (in $T$) minimax regret for bandit linear optimization against the strongly adaptive adversary of OCO, implying that a slightly weaker adversary is appropriate. We also establish the minimax regret of switching-constrained bandit convex optimization in dimension $n>2$ to be $\tilde{\Theta}(\frac{T}{\sqrt{K}})$.
Tasks
Published	2019-10-24
URL	https://arxiv.org/abs/1910.10873v2
PDF	https://arxiv.org/pdf/1910.10873v2.pdf
PWC	https://paperswithcode.com/paper/minimax-regret-of-switching-constrained
Repo
Framework