Paper Group ANR 1756
Modularization of End-to-End Learning: Case Study in Arcade Games. ASU at TextGraphs 2019 Shared Task: Explanation ReGeneration using Language Models and Iterative Re-Ranking. DR Loss: Improving Object Detection by Distributional Ranking. An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation. Understanding Multi-Hea …
Modularization of End-to-End Learning: Case Study in Arcade Games
Title | Modularization of End-to-End Learning: Case Study in Arcade Games |
Authors | Andrew Melnik, Sascha Fleer, Malte Schilling, Helge Ritter |
Abstract | Complex environments and tasks pose a difficult problem for holistic end-to-end learning approaches. Decomposition of an environment into interacting controllable and non-controllable objects allows supervised learning for non-controllable objects and universal value function approximator learning for controllable objects. Such decomposition should lead to a shorter learning time and better generalisation capability. Here, we consider arcade-game environments as sets of interacting objects (controllable, non-controllable) and propose a set of functional modules that are specialized on mastering different types of interactions in a broad range of environments. The modules utilize regression, supervised learning, and reinforcement learning algorithms. Results of this case study in different Atari games suggest that human-level performance can be achieved by a learning agent within a human amount of game experience (10-15 minutes game time) when a proper decomposition of an environment or a task is provided. However, automatization of such decomposition remains a challenging problem. This case study shows how a model of a causal structure underlying an environment or a task can benefit learning time and generalization capability of the agent, and argues in favor of exploiting modular structure in contrast to using pure end-to-end learning approaches. |
Tasks | Atari Games |
Published | 2019-01-27 |
URL | http://arxiv.org/abs/1901.09895v1 |
http://arxiv.org/pdf/1901.09895v1.pdf | |
PWC | https://paperswithcode.com/paper/modularization-of-end-to-end-learning-case |
Repo | |
Framework | |
ASU at TextGraphs 2019 Shared Task: Explanation ReGeneration using Language Models and Iterative Re-Ranking
Title | ASU at TextGraphs 2019 Shared Task: Explanation ReGeneration using Language Models and Iterative Re-Ranking |
Authors | Pratyay Banerjee |
Abstract | In this work we describe the system from Natural Language Processing group at Arizona State University for the TextGraphs 2019 Shared Task. The task focuses on Explanation Regeneration, an intermediate step towards general multi-hop inference on large graphs. Our approach consists of modeling the explanation regeneration task as a \textit{learning to rank} problem, for which we use state-of-the-art language models and explore dataset preparation techniques. We utilize an iterative re-ranking based approach to further improve the rankings. Our system secured 2nd rank in the task with a mean average precision (MAP) of 41.3% on the test set. |
Tasks | Learning-To-Rank |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.08863v1 |
https://arxiv.org/pdf/1909.08863v1.pdf | |
PWC | https://paperswithcode.com/paper/asu-at-textgraphs-2019-shared-task |
Repo | |
Framework | |
DR Loss: Improving Object Detection by Distributional Ranking
Title | DR Loss: Improving Object Detection by Distributional Ranking |
Authors | Qi Qian, Lei Chen, Hao Li, Rong Jin |
Abstract | Most of object detection algorithms can be categorized into two classes: two-stage detectors and one-stage detectors. Recently, many efforts have been devoted to one-stage detectors for the simple yet effective architecture. Different from two-stage detectors, one-stage detectors aim to identify foreground objects from all candidates in a single stage. This architecture is efficient but can suffer from the imbalance issue with respect to two aspects: the inter-class imbalance between the number of candidates from foreground and background classes and the intra-class imbalance in the hardness of background candidates, where only a few candidates are hard to be identified. In this work, we propose a novel distributional ranking (DR) loss to handle the challenge. For each image, we convert the classification problem to a ranking problem, which considers pairs of candidates within the image, to address the inter-class imbalance problem. Then, we push the distributions of confidence scores for foreground and background towards the decision boundary. After that, we optimize the rank of the expectations of derived distributions in lieu of original pairs. Our method not only mitigates the intra-class imbalance issue in background candidates but also improves the efficiency for the ranking algorithm. By merely replacing the focal loss in RetinaNet with the developed DR loss and applying ResNet-101 as the backbone, mAP of the single-scale test on COCO can be improved from 39.1% to 41.7% without bells and whistles, which demonstrates the effectiveness of the proposed loss function. Code will be available. |
Tasks | Object Detection |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.10156v2 |
https://arxiv.org/pdf/1907.10156v2.pdf | |
PWC | https://paperswithcode.com/paper/dr-loss-improving-object-detection-by |
Repo | |
Framework | |
An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation
Title | An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation |
Authors | Jonathan Scarlett, Volkan Cevher |
Abstract | Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano’s inequality. In this chapter, we provide a survey of Fano’s inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization. |
Tasks | Density Estimation, Model Selection |
Published | 2019-01-02 |
URL | https://arxiv.org/abs/1901.00555v3 |
https://arxiv.org/pdf/1901.00555v3.pdf | |
PWC | https://paperswithcode.com/paper/an-introductory-guide-to-fanos-inequality |
Repo | |
Framework | |
Understanding Multi-Head Attention in Abstractive Summarization
Title | Understanding Multi-Head Attention in Abstractive Summarization |
Authors | Joris Baan, Maartje ter Hoeve, Marlies van der Wees, Anne Schuth, Maarten de Rijke |
Abstract | Attention mechanisms in deep learning architectures have often been used as a means of transparency and, as such, to shed light on the inner workings of the architectures. Recently, there has been a growing interest in whether or not this assumption is correct. In this paper we investigate the interpretability of multi-head attention in abstractive summarization, a sequence-to-sequence task for which attention does not have an intuitive alignment role, such as in machine translation. We first introduce three metrics to gain insight in the focus of attention heads and observe that these heads specialize towards relative positions, specific part-of-speech tags, and named entities. However, we also find that ablating and pruning these heads does not lead to a significant drop in performance, indicating redundancy. By replacing the softmax activation functions with sparsemax activation functions, we find that attention heads behave seemingly more transparent: we can ablate fewer heads and heads score higher on our interpretability metrics. However, if we apply pruning to the sparsemax model we find that we can prune even more heads, raising the question whether enforced sparsity actually improves transparency. Finally, we find that relative positions heads seem integral to summarization performance and persistently remain after pruning. |
Tasks | Abstractive Text Summarization, Machine Translation |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03898v1 |
https://arxiv.org/pdf/1911.03898v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-multi-head-attention-in |
Repo | |
Framework | |
Piracy Resistant Watermarks for Deep Neural Networks
Title | Piracy Resistant Watermarks for Deep Neural Networks |
Authors | Huiying Li, Emily Wenger, Ben Y. Zhao, Haitao Zheng |
Abstract | As companies continue to invest heavily in larger, more accurate and more robust deep learning models, they are exploring approaches to monetize their models while protecting their intellectual property. Model licensing is promising, but requires a robust tool for owners to claim ownership of models, i.e. a watermark. Unfortunately, current designs have not been able to address piracy attacks, where third parties falsely claim model ownership by embedding their own “pirate watermarks” into an already-watermarked model. We observe that resistance to piracy attacks is fundamentally at odds with the current use of incremental training to embed watermarks into models. In this work, we propose null embedding, a new way to build piracy-resistant watermarks into DNNs that can only take place at a model’s initial training. A null embedding takes a bit string (watermark value) as input, and builds strong dependencies between the model’s normal classification accuracy and the watermark. As a result, attackers cannot remove an embedded watermark via tuning or incremental training, and cannot add new pirate watermarks to already watermarked models. We empirically show that our proposed watermarks achieve piracy resistance and other watermark properties, over a wide range of tasks and models. Finally, we explore a number of adaptive counter-measures, and show our watermark remains robust against a variety of model modifications, including model fine-tuning, compression, and existing methods to detect/remove backdoors. Our watermarked models are also amenable to transfer learning without losing their watermark properties. |
Tasks | Transfer Learning |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.01226v2 |
https://arxiv.org/pdf/1910.01226v2.pdf | |
PWC | https://paperswithcode.com/paper/persistent-and-unforgeable-watermarks-for |
Repo | |
Framework | |
An Efficient Schmidt-EKF for 3D Visual-Inertial SLAM
Title | An Efficient Schmidt-EKF for 3D Visual-Inertial SLAM |
Authors | Patrick Geneva, James Maley, Guoquan Huang |
Abstract | It holds great implications for practical applications to enable centimeter-accuracy positioning for mobile and wearable sensor systems. In this paper, we propose a novel, high-precision, efficient visual-inertial (VI)-SLAM algorithm, termed Schmidt-EKF VI-SLAM (SEVIS), which optimally fuses IMU measurements and monocular images in a tightly-coupled manner to provide 3D motion tracking with bounded error. In particular, we adapt the Schmidt Kalman filter formulation to selectively include informative features in the state vector while treating them as nuisance parameters (or Schmidt states) once they become matured. This change in modeling allows for significant computational savings by no longer needing to constantly update the Schmidt states (or their covariance), while still allowing the EKF to correctly account for their cross-correlations with the active states. As a result, we achieve linear computational complexity in terms of map size, instead of quadratic as in the standard SLAM systems. In order to fully exploit the map information to bound navigation drifts, we advocate efficient keyframe-aided 2D-to-2D feature matching to find reliable correspondences between current 2D visual measurements and 3D map features. The proposed SEVIS is extensively validated in both simulations and experiments. |
Tasks | |
Published | 2019-03-20 |
URL | http://arxiv.org/abs/1903.08636v1 |
http://arxiv.org/pdf/1903.08636v1.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-schmidt-ekf-for-3d-visual |
Repo | |
Framework | |
Should I Raise The Red Flag? A comprehensive survey of anomaly scoring methods toward mitigating false alarms
Title | Should I Raise The Red Flag? A comprehensive survey of anomaly scoring methods toward mitigating false alarms |
Authors | Zahra Zohrevand, Uwe Glässer |
Abstract | A general Intrusion Detection System (IDS) fundamentally acts based on an Anomaly Detection System (ADS) or a combination of anomaly detection and signature-based methods, gathering and analyzing observations and reporting possible suspicious cases to a system administrator or the other users for further investigation. One of the notorious challenges which even the state-of-the-art ADS and IDS have not overcome is the possibility of a very high false alarms rate. Especially in very large and complex system settings, the amount of low-level alarms easily overwhelms administrators and increases their tendency to ignore alerts.We can group the existing false alarm mitigation strategies into two main families: The first group covers the methods directly customized and applied toward higher quality anomaly scoring in ADS. The second group includes approaches utilized in the related contexts as a filtering method toward decreasing the possibility of false alarm rates.Given the lack of a comprehensive study regarding possible ways to mitigate the false alarm rates, in this paper, we review the existing techniques for false alarm mitigation in ADS and present the pros and cons of each technique. We also study a few promising techniques applied in the signature-based IDS and other related contexts like commercial Security Information and Event Management (SIEM) tools, which are applicable and promising in the ADS context.Finally, we conclude with some directions for future research. |
Tasks | Anomaly Detection, Intrusion Detection |
Published | 2019-04-14 |
URL | http://arxiv.org/abs/1904.06646v1 |
http://arxiv.org/pdf/1904.06646v1.pdf | |
PWC | https://paperswithcode.com/paper/should-i-raise-the-red-flag-a-comprehensive |
Repo | |
Framework | |
Rethinking deep active learning: Using unlabeled data at model training
Title | Rethinking deep active learning: Using unlabeled data at model training |
Authors | Oriane Siméoni, Mateusz Budnik, Yannis Avrithis, Guillaume Gravier |
Abstract | Active learning typically focuses on training a model on few labeled examples alone, while unlabeled ones are only used for acquisition. In this work we depart from this setting by using both labeled and unlabeled data during model training across active learning cycles. We do so by using unsupervised feature learning at the beginning of the active learning pipeline and semi-supervised learning at every active learning cycle, on all available data. The former has not been investigated before in active learning, while the study of latter in the context of deep learning is scarce and recent findings are not conclusive with respect to its benefit. Our idea is orthogonal to acquisition strategies by using more data, much like ensemble methods use more models. By systematically evaluating on a number of popular acquisition strategies and datasets, we find that the use of unlabeled data during model training brings a surprising accuracy improvement in image classification, compared to the differences between acquisition strategies. We thus explore smaller label budgets, even one label per class. |
Tasks | Active Learning, Image Classification |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08177v1 |
https://arxiv.org/pdf/1911.08177v1.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-deep-active-learning-using-1 |
Repo | |
Framework | |
Reinforcement Learning Applications
Title | Reinforcement Learning Applications |
Authors | Yuxi Li |
Abstract | We start with a brief introduction to reinforcement learning (RL), about its successful stories, basics, an example, issues, the ICML 2019 Workshop on RL for Real Life, how to use it, study material and an outlook. Then we discuss a selection of RL applications, including recommender systems, computer systems, energy, finance, healthcare, robotics, and transportation. |
Tasks | Recommendation Systems |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06973v1 |
https://arxiv.org/pdf/1908.06973v1.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-applications |
Repo | |
Framework | |
Different Absorption from the Same Sharing: Sifted Multi-task Learning for Fake News Detection
Title | Different Absorption from the Same Sharing: Sifted Multi-task Learning for Fake News Detection |
Authors | Lianwei Wu, Yuan Rao, Haolin Jin, Ambreen Nazir, Ling Sun |
Abstract | Recently, neural networks based on multi-task learning have achieved promising performance on fake news detection, which focus on learning shared features among tasks as complementary features to serve different tasks. However, in most of the existing approaches, the shared features are completely assigned to different tasks without selection, which may lead to some useless and even adverse features integrated into specific tasks. In this paper, we design a sifted multi-task learning method with a selected sharing layer for fake news detection. The selected sharing layer adopts gate mechanism and attention mechanism to filter and select shared feature flows between tasks. Experiments on two public and widely used competition datasets, i.e. RumourEval and PHEME, demonstrate that our proposed method achieves the state-of-the-art performance and boosts the F1-score by more than 0.87%, 1.31%, respectively. |
Tasks | Fake News Detection, Multi-Task Learning |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01720v1 |
https://arxiv.org/pdf/1909.01720v1.pdf | |
PWC | https://paperswithcode.com/paper/different-absorption-from-the-same-sharing |
Repo | |
Framework | |
Large-Scale Local Causal Inference of Gene Regulatory Relationships
Title | Large-Scale Local Causal Inference of Gene Regulatory Relationships |
Authors | Ioan Gabriel Bucur, Tom Claassen, Tom Heskes |
Abstract | Gene regulatory networks play a crucial role in controlling an organism’s biological processes, which is why there is significant interest in developing computational methods that are able to extract their structure from high-throughput genetic data. Many of these computational methods are designed to infer individual regulatory relationships among genes from data on gene expression. We propose a novel efficient Bayesian method for discovering local causal relationships among triplets of (normally distributed) variables. In our approach, we score covariance structures for each triplet in one go and incorporate available background knowledge in the form of priors to derive posterior probabilities over local causal structures. Our method is flexible in the sense that it allows for different types of causal structures and assumptions. We apply our approach to the task of learning causal regulatory relationships among genes. We show that the proposed algorithm produces stable and conservative posterior probability estimates over local causal structures that can be used to derive an honest ranking of the most meaningful regulatory relationships. We demonstrate the stability and efficacy of our method both on simulated data and on real-world data from an experiment on yeast. |
Tasks | Causal Inference |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.03818v2 |
https://arxiv.org/pdf/1909.03818v2.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-gene-network-causal-inference |
Repo | |
Framework | |
Time Series Analysis of Electricity Price and Demand to Find Cyber-attacks using Stationary Analysis
Title | Time Series Analysis of Electricity Price and Demand to Find Cyber-attacks using Stationary Analysis |
Authors | Mohsen Rakhshandehroo, Mohammad Rajabdorri |
Abstract | With developing of computation tools in the last years, data analysis methods to find insightful information are becoming more common among industries and researchers. This paper is the first part of the times series analysis of New England electricity price and demand to find anomaly in the data. In this paper time-series stationary criteria to prepare data for further times-series related analysis is investigated. Three main analysis are conducted in this paper, including moving average, moving standard deviation and augmented Dickey-Fuller test. The data used in this paper is New England big data from 9 different operational zones. For each zone, 4 different variables including day-ahead (DA) electricity demand, price and real-time (RT) electricity demand price are considered. |
Tasks | Time Series, Time Series Analysis |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.11651v3 |
https://arxiv.org/pdf/1907.11651v3.pdf | |
PWC | https://paperswithcode.com/paper/time-series-analysis-of-electricity-price-and |
Repo | |
Framework | |
A Meta Approach to Defend Noisy Labels by the Manifold Regularizer PSDR
Title | A Meta Approach to Defend Noisy Labels by the Manifold Regularizer PSDR |
Authors | Pengfei Chen, Benben Liao, Guangyong Chen, Shengyu Zhang |
Abstract | Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) since DNNs can easily overfit to the noisy labels. Most recent efforts have been devoted to defending noisy labels by discarding noisy samples from the training set or assigning weights to training samples, where the weight associated with a noisy sample is expected to be small. Thereby, these previous efforts result in a waste of samples, especially those assigned with small weights. The input $x$ is always useful regardless of whether its observed label $y$ is clean. To make full use of all samples, we introduce a manifold regularizer, named as Paired Softmax Divergence Regularization (PSDR), to penalize the Kullback-Leibler (KL) divergence between softmax outputs of similar inputs. In particular, similar inputs can be effectively generated by data augmentation. PSDR can be easily implemented on any type of DNNs to improve the robustness against noisy labels. As empirically demonstrated on benchmark datasets, our PSDR impressively improve state-of-the-art results by a significant margin. |
Tasks | Data Augmentation |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05509v1 |
https://arxiv.org/pdf/1906.05509v1.pdf | |
PWC | https://paperswithcode.com/paper/a-meta-approach-to-defend-noisy-labels-by-the |
Repo | |
Framework | |
Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition
Title | Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition |
Authors | Lin Chen, Qian Yu, Hannah Lawrence, Amin Karbasi |
Abstract | We study the problem of switching-constrained online convex optimization (OCO), where the player has a limited number of opportunities to change her action. While the discrete analog of this online learning task has been studied extensively, previous work in the continuous setting has neither established the minimax rate nor algorithmically achieved it. We here show that $ T $-round switching-constrained OCO with fewer than $ K $ switches has a minimax regret of $ \Theta(\frac{T}{\sqrt{K}}) $. In particular, it is at least $ \frac{T}{\sqrt{2K}} $ for one dimension and at least $ \frac{T}{\sqrt{K}} $ for higher dimensions. The lower bound in higher dimensions is attained by an orthogonal subspace argument. The minimax analysis in one dimension is more involved. To establish the one-dimensional result, we introduce the fugal game relaxation, whose minimax regret lower bounds that of switching-constrained OCO. We show that the minimax regret of the fugal game is at least $ \frac{T}{\sqrt{2K}} $ and thereby establish the minimax lower bound in one dimension. We next show that a mini-batching algorithm provides an $ O(\frac{T}{\sqrt{K}}) $ upper bound, and therefore we conclude that the minimax regret of switching-constrained OCO is $ \Theta(\frac{T}{\sqrt{K}}) $ for any $K$. This is in sharp contrast to its discrete counterpart, the switching-constrained prediction-from-experts problem, which exhibits a phase transition in minimax regret between the low-switching and high-switching regimes. In the case of bandit feedback, we first determine a novel linear (in $T$) minimax regret for bandit linear optimization against the strongly adaptive adversary of OCO, implying that a slightly weaker adversary is appropriate. We also establish the minimax regret of switching-constrained bandit convex optimization in dimension $n>2$ to be $\tilde{\Theta}(\frac{T}{\sqrt{K}})$. |
Tasks | |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.10873v2 |
https://arxiv.org/pdf/1910.10873v2.pdf | |
PWC | https://paperswithcode.com/paper/minimax-regret-of-switching-constrained |
Repo | |
Framework | |