Paper Group ANR 505
ReluDiff: Differential Verification of Deep Neural Networks. DeepFake Detection: Current Challenges and Next Steps. Taking a hint: How to leverage loss predictors in contextual bandits?. FDFtNet: Facing Off Fake Images using Fake Detection Fine-tuning Network. Decentralized Multi-player Multi-armed Bandits with No Collision Information. Near-linear …
ReluDiff: Differential Verification of Deep Neural Networks
Title | ReluDiff: Differential Verification of Deep Neural Networks |
Authors | Brandon Paulsen, Jingbo Wang, Chao Wang |
Abstract | As deep neural networks are increasingly being deployed in practice, their efficiency has become an important issue. While there are compression techniques for reducing the network’s size, energy consumption and computational requirement, they only demonstrate empirically that there is no loss of accuracy, but lack formal guarantees of the compressed network, e.g., in the presence of adversarial examples. Existing verification techniques such as Reluplex, ReluVal, and DeepPoly provide formal guarantees, but they are designed for analyzing a single network instead of the relationship between two networks. To fill the gap, we develop a new method for differential verification of two closely related networks. Our method consists of a fast but approximate forward interval analysis pass followed by a backward pass that iteratively refines the approximation until the desired property is verified. We have two main innovations. During the forward pass, we exploit structural and behavioral similarities of the two networks to more accurately bound the difference between the output neurons of the two networks. Then in the backward pass, we leverage the gradient differences to more accurately compute the most beneficial refinement. Our experiments show that, compared to state-of-the-art verification tools, our method can achieve orders-of-magnitude speedup and prove many more properties than existing tools. |
Tasks | |
Published | 2020-01-10 |
URL | https://arxiv.org/abs/2001.03662v2 |
https://arxiv.org/pdf/2001.03662v2.pdf | |
PWC | https://paperswithcode.com/paper/reludiff-differential-verification-of-deep |
Repo | |
Framework | |
DeepFake Detection: Current Challenges and Next Steps
Title | DeepFake Detection: Current Challenges and Next Steps |
Authors | Siwei Lyu |
Abstract | High quality fake videos and audios generated by AI-algorithms (the deep fakes) have started to challenge the status of videos and audios as definitive evidence of events. In this paper, we highlight a few of these challenges and discuss the research opportunities in this direction. |
Tasks | DeepFake Detection, Face Swapping |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.09234v1 |
https://arxiv.org/pdf/2003.09234v1.pdf | |
PWC | https://paperswithcode.com/paper/deepfake-detection-current-challenges-and |
Repo | |
Framework | |
Taking a hint: How to leverage loss predictors in contextual bandits?
Title | Taking a hint: How to leverage loss predictors in contextual bandits? |
Authors | Chen-Yu Wei, Haipeng Luo, Alekh Agarwal |
Abstract | We initiate the study of learning in contextual bandits with the help of loss predictors. The main question we address is whether one can improve over the minimax regret $\mathcal{O}(\sqrt{T})$ for learning over $T$ rounds, when the total error of the predictor $\mathcal{E} \leq T$ is relatively small. We provide a complete answer to this question, including upper and lower bounds for various settings: adversarial versus stochastic environments, known versus unknown $\mathcal{E}$, and single versus multiple predictors. We show several surprising results, such as 1) the optimal regret is $\mathcal{O}(\min{\sqrt{T}, \sqrt{\mathcal{E}}T^\frac{1}{4}})$ when $\mathcal{E}$ is known, a sharp contrast to the standard and better bound $\mathcal{O}(\sqrt{\mathcal{E}})$ for non-contextual problems (such as multi-armed bandits); 2) the same bound cannot be achieved if $\mathcal{E}$ is unknown, but as a remedy, $\mathcal{O}(\sqrt{\mathcal{E}}T^\frac{1}{3})$ is achievable; 3) with $M$ predictors, a linear dependence on $M$ is necessary, even if logarithmic dependence is possible for non-contextual problems. We also develop several novel algorithmic techniques to achieve matching upper bounds, including 1) a key action remapping technique for optimal regret with known $\mathcal{E}$, 2) implementing Catoni’s robust mean estimator efficiently via an ERM oracle leading to an efficient algorithm in the stochastic setting with optimal regret, 3) constructing an underestimator for $\mathcal{E}$ via estimating the histogram with bins of exponentially increasing size for the stochastic setting with unknown $\mathcal{E}$, and 4) a self-referential scheme for learning with multiple predictors, all of which might be of independent interest. |
Tasks | Multi-Armed Bandits |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.01922v1 |
https://arxiv.org/pdf/2003.01922v1.pdf | |
PWC | https://paperswithcode.com/paper/taking-a-hint-how-to-leverage-loss-predictors |
Repo | |
Framework | |
FDFtNet: Facing Off Fake Images using Fake Detection Fine-tuning Network
Title | FDFtNet: Facing Off Fake Images using Fake Detection Fine-tuning Network |
Authors | Hyeonseong Jeon, Youngoh Bang, Simon S. Woo |
Abstract | Creating fake images and videos such as “Deepfake” has become much easier these days due to the advancement in Generative Adversarial Networks (GANs). Moreover, recent research such as the few-shot learning can create highly realistic personalized fake images with only a few images. Therefore, the threat of Deepfake to be used for a variety of malicious intents such as propagating fake images and videos becomes prevalent. And detecting these machine-generated fake images has been quite challenging than ever. In this work, we propose a light-weight robust fine-tuning neural network-based classifier architecture called Fake Detection Fine-tuning Network (FDFtNet), which is capable of detecting many of the new fake face image generation models, and can be easily combined with existing image classification networks and finetuned on a few datasets. In contrast to many existing methods, our approach aims to reuse popular pre-trained models with only a few images for fine-tuning to effectively detect fake images. The core of our approach is to introduce an image-based self-attention module called Fine-Tune Transformer that uses only the attention module and the down-sampling layer. This module is added to the pre-trained model and fine-tuned on a few data to search for new sets of feature space to detect fake images. We experiment with our FDFtNet on the GANsbased dataset (Progressive Growing GAN) and Deepfake-based dataset (Deepfake and Face2Face) with a small input image resolution of 64x64 that complicates detection. Our FDFtNet achieves an overall accuracy of 90.29% in detecting fake images generated from the GANs-based dataset, outperforming the state-of-the-art. |
Tasks | Face Swapping, Few-Shot Learning, Image Classification, Image Generation |
Published | 2020-01-05 |
URL | https://arxiv.org/abs/2001.01265v1 |
https://arxiv.org/pdf/2001.01265v1.pdf | |
PWC | https://paperswithcode.com/paper/fdftnet-facing-off-fake-images-using-fake |
Repo | |
Framework | |
Decentralized Multi-player Multi-armed Bandits with No Collision Information
Title | Decentralized Multi-player Multi-armed Bandits with No Collision Information |
Authors | Chengshuai Shi, Wei Xiong, Cong Shen, Jing Yang |
Abstract | The decentralized stochastic multi-player multi-armed bandit (MP-MAB) problem, where the collision information is not available to the players, is studied in this paper. Building on the seminal work of Boursier and Perchet (2019), we propose error correction synchronization involving communication (EC-SIC), whose regret is shown to approach that of the centralized stochastic MP-MAB with collision information. By recognizing that the communication phase without collision information corresponds to the Z-channel model in information theory, the proposed EC-SIC algorithm applies optimal error correction coding for the communication of reward statistics. A fixed message length, as opposed to the logarithmically growing one in Boursier and Perchet (2019), also plays a crucial role in controlling the communication loss. Experiments with practical Z-channel codes, such as repetition code, flip code and modified Hamming code, demonstrate the superiority of EC-SIC in both synthetic and real-world datasets. |
Tasks | Multi-Armed Bandits |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00162v1 |
https://arxiv.org/pdf/2003.00162v1.pdf | |
PWC | https://paperswithcode.com/paper/decentralized-multi-player-multi-armed |
Repo | |
Framework | |
Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification
Title | Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification |
Authors | Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco |
Abstract | Gaussian processes (GP) are one of the most successful frameworks to model uncertainty. However, GP optimization (e.g., GP-UCB) suffers from major scalability issues. Experimental time grows linearly with the number of evaluations, unless candidates are selected in batches (e.g., using GP-BUCB) and evaluated in parallel. Furthermore, computational cost is often prohibitive since algorithms such as GP-BUCB require a time at least quadratic in the number of dimensions and iterations to select each batch. In this paper, we introduce BBKB (Batch Budgeted Kernel Bandits), the first no-regret GP optimization algorithm that provably runs in near-linear time and selects candidates in batches. This is obtained with a new guarantee for the tracking of the posterior variances that allows BBKB to choose increasingly larger batches, improving over GP-BUCB. Moreover, we show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation used by BBKB, achieving a near-constant per-step amortized cost. These findings are then confirmed in several experiments, where BBKB is much faster than state-of-the-art methods. |
Tasks | Gaussian Processes |
Published | 2020-02-23 |
URL | https://arxiv.org/abs/2002.09954v2 |
https://arxiv.org/pdf/2002.09954v2.pdf | |
PWC | https://paperswithcode.com/paper/near-linear-time-gaussian-process |
Repo | |
Framework | |
Efficiently sampling functions from Gaussian process posteriors
Title | Efficiently sampling functions from Gaussian process posteriors |
Authors | James T. Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth |
Abstract | Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model’s success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, where quantities of interest are ultimately defined by integrating over posterior distributions. However, these algorithms’ inner workings rarely allow for closed-form integration, giving rise to a need for Monte Carlo methods. Despite substantial progress in scaling up Gaussian processes to large training sets, methods for accurately generating draws from their posterior distributions still scale cubically in the number of test locations. We identify a decomposition of Gaussian processes that naturally lends itself to scalable sampling by enabling us to efficiently generate functions that accurately represent their posteriors. Building off of this factorization, we propose decoupled sampling, an easy-to-use and general-purpose approach for fast posterior sampling. Decoupled sampling works as a drop-in strategy that seamlessly pairs with sparse approximations to Gaussian processes to afford scalability both during training and at test time. In a series of experiments designed to test competing sampling schemes’ statistical behaviors and practical ramifications, we empirically show that functions drawn using decoupled sampling faithfully represent Gaussian process posteriors at a fraction of the usual cost. |
Tasks | Gaussian Processes |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09309v1 |
https://arxiv.org/pdf/2002.09309v1.pdf | |
PWC | https://paperswithcode.com/paper/efficiently-sampling-functions-from-gaussian |
Repo | |
Framework | |
Data-Dependence of Plateau Phenomenon in Learning with Neural Network — Statistical Mechanical Analysis
Title | Data-Dependence of Plateau Phenomenon in Learning with Neural Network — Statistical Mechanical Analysis |
Authors | Yuki Yoshida, Masato Okada |
Abstract | The plateau phenomenon, wherein the loss value stops decreasing during the process of learning, has been reported by various researchers. The phenomenon is actively inspected in the 1990s and found to be due to the fundamental hierarchical structure of neural network models. Then the phenomenon has been thought as inevitable. However, the phenomenon seldom occurs in the context of recent deep learning. There is a gap between theory and reality. In this paper, using statistical mechanical formulation, we clarified the relationship between the plateau phenomenon and the statistical property of the data learned. It is shown that the data whose covariance has small and dispersed eigenvalues tend to make the plateau phenomenon inconspicuous. |
Tasks | |
Published | 2020-01-10 |
URL | https://arxiv.org/abs/2001.03371v1 |
https://arxiv.org/pdf/2001.03371v1.pdf | |
PWC | https://paperswithcode.com/paper/data-dependence-of-plateau-phenomenon-in-1 |
Repo | |
Framework | |
Deep Reinforcement Learning for QoS-Constrained Resource Allocation in Multiservice Networks
Title | Deep Reinforcement Learning for QoS-Constrained Resource Allocation in Multiservice Networks |
Authors | Juno V. Saraiva, Iran M. Braga Jr., Victor F. Monteiro, F. Rafael M. Lima, Tarcisio F. Maciel, Walter C. Freitas Jr., F. Rodrigo P. Cavalcanti |
Abstract | In this article, we study a Radio Resource Allocation (RRA) that was formulated as a non-convex optimization problem whose main aim is to maximize the spectral efficiency subject to satisfaction guarantees in multiservice wireless systems. This problem has already been previously investigated in the literature and efficient heuristics have been proposed. However, in order to assess the performance of Machine Learning (ML) algorithms when solving optimization problems in the context of RRA, we revisit that problem and propose a solution based on a Reinforcement Learning (RL) framework. Specifically, a distributed optimization method based on multi-agent deep RL is developed, where each agent makes its decisions to find a policy by interacting with the local environment, until reaching convergence. Thus, this article focuses on an application of RL and our main proposal consists in a new deep RL based approach to jointly deal with RRA, satisfaction guarantees and Quality of Service (QoS) constraints in multiservice celular networks. Lastly, through computational simulations we compare the state-of-art solutions of the literature with our proposal and we show a near optimal performance of the latter in terms of throughput and outage rate. |
Tasks | Distributed Optimization |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.02643v1 |
https://arxiv.org/pdf/2003.02643v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-qos |
Repo | |
Framework | |
Overview of the TREC 2019 Fair Ranking Track
Title | Overview of the TREC 2019 Fair Ranking Track |
Authors | Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sebastian Kohlmeier |
Abstract | The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers in addition to classic notions of relevance. As part of the benchmark, we defined standardized fairness metrics with evaluation protocols and released a dataset for the fair ranking problem. The 2019 task focused on reranking academic paper abstracts given a query. The objective was to fairly represent relevant authors from several groups that were unknown at the system submission time. Thus, the track emphasized the development of systems which have robust performance across a variety of group definitions. Participants were provided with querylog data (queries, documents, and relevance) from Semantic Scholar. This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process, as well as a comparison of the performance of submitted systems. |
Tasks | |
Published | 2020-03-25 |
URL | https://arxiv.org/abs/2003.11650v1 |
https://arxiv.org/pdf/2003.11650v1.pdf | |
PWC | https://paperswithcode.com/paper/overview-of-the-trec-2019-fair-ranking-track |
Repo | |
Framework | |
The Geometry of Sign Gradient Descent
Title | The Geometry of Sign Gradient Descent |
Authors | Lukas Balles, Fabian Pedregosa, Nicolas Le Roux |
Abstract | Sign-based optimization methods have become popular in machine learning due to their favorable communication cost in distributed optimization and their surprisingly good performance in neural network training. Furthermore, they are closely connected to so-called adaptive gradient methods like Adam. Recent works on signSGD have used a non-standard “separable smoothness” assumption, whereas some older works study sign gradient descent as steepest descent with respect to the $\ell_\infty$-norm. In this work, we unify these existing results by showing a close connection between separable smoothness and $\ell_\infty$-smoothness and argue that the latter is the weaker and more natural assumption. We then proceed to study the smoothness constant with respect to the $\ell_\infty$-norm and thereby isolate geometric properties of the objective function which affect the performance of sign-based methods. In short, we find sign-based methods to be preferable over gradient descent if (i) the Hessian is to some degree concentrated on its diagonal, and (ii) its maximal eigenvalue is much larger than the average eigenvalue. Both properties are common in deep networks. |
Tasks | Distributed Optimization |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08056v1 |
https://arxiv.org/pdf/2002.08056v1.pdf | |
PWC | https://paperswithcode.com/paper/the-geometry-of-sign-gradient-descent-1 |
Repo | |
Framework | |
Depth Enables Long-Term Memory for Recurrent Neural Networks
Title | Depth Enables Long-Term Memory for Recurrent Neural Networks |
Authors | Alon Ziv |
Abstract | A key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ability to model intricate long-term temporal dependencies. However, a well established measure of RNNs long-term memory capacity is lacking, and thus formal understanding of the effect of depth on their ability to correlate data throughout time is limited. Specifically, existing depth efficiency results on convolutional networks do not suffice in order to account for the success of deep RNNs on data of varying lengths. In order to address this, we introduce a measure of the network’s ability to support information flow across time, referred to as the Start-End separation rank, which reflects the distance of the function realized by the recurrent network from modeling no dependency between the beginning and end of the input sequence. We prove that deep recurrent networks support Start-End separation ranks which are combinatorially higher than those supported by their shallow counterparts. Thus, we establish that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and provide an exemplar of quantifying this key attribute. We empirically demonstrate the discussed phenomena on common RNNs through extensive experimental evaluation using the optimization technique of restricting the hidden-to-hidden matrix to being orthogonal. Finally, we employ the tool of quantum Tensor Networks to gain additional graphic insights regarding the complexity brought forth by depth in recurrent networks. |
Tasks | Tensor Networks |
Published | 2020-03-23 |
URL | https://arxiv.org/abs/2003.10163v1 |
https://arxiv.org/pdf/2003.10163v1.pdf | |
PWC | https://paperswithcode.com/paper/depth-enables-long-term-memory-for-recurrent |
Repo | |
Framework | |
Multi-AI competing and winning against humans in iterated Rock-Paper-Scissors game
Title | Multi-AI competing and winning against humans in iterated Rock-Paper-Scissors game |
Authors | Lei Wang, Wenbing Huang, Yuanpeng Li, Julian Evans, Sailing He |
Abstract | Predicting and modeling human behavior and finding trends within human decision-making process is a major social sciences’problem. Rock Paper Scissors (RPS) is the fundamental strategic question in many game theory problems and real-world competitions. Finding the right approach to beat a particular human opponent is challenging. Here we use Markov Chains with set chain lengths as the single AIs (artificial intelligences) to compete against humans in iterated RPS game. This is the first time that an AI algorithm is applied in RPS human competition behavior studies. We developed an architecture of multi-AI with changeable parameters to adapt to different competition strategies. We introduce a parameter “focus length” (an integer of e.g. 5 or 10) to control the speed and sensitivity for our multi-AI to adapt to the opponent’s strategy change. We experimented with 52 different people, each playing 300 rounds continuously against one specific multi-AI model, and demonstrated that our strategy could win over more than 95% of human opponents. |
Tasks | Decision Making |
Published | 2020-03-15 |
URL | https://arxiv.org/abs/2003.06769v1 |
https://arxiv.org/pdf/2003.06769v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-ai-competing-and-winning-against-humans |
Repo | |
Framework | |
The KEEN Universe: An Ecosystem for Knowledge Graph Embeddings with a Focus on Reproducibility and Transferability
Title | The KEEN Universe: An Ecosystem for Knowledge Graph Embeddings with a Focus on Reproducibility and Transferability |
Authors | Mehdi Ali, Hajira Jabeen, Charles Tapley Hoyt, Jens Lehman |
Abstract | There is an emerging trend of embedding knowledge graphs (KGs) in continuous vector spaces in order to use those for machine learning tasks. Recently, many knowledge graph embedding (KGE) models have been proposed that learn low dimensional representations while trying to maintain the structural properties of the KGs such as the similarity of nodes depending on their edges to other nodes. KGEs can be used to address tasks within KGs such as the prediction of novel links and the disambiguation of entities. They can also be used for downstream tasks like question answering and fact-checking. Overall, these tasks are relevant for the semantic web community. Despite their popularity, the reproducibility of KGE experiments and the transferability of proposed KGE models to research fields outside the machine learning community can be a major challenge. Therefore, we present the KEEN Universe, an ecosystem for knowledge graph embeddings that we have developed with a strong focus on reproducibility and transferability. The KEEN Universe currently consists of the Python packages PyKEEN (Python KnowlEdge EmbeddiNgs), BioKEEN (Biological KnowlEdge EmbeddiNgs), and the KEEN Model Zoo for sharing trained KGE models with the community. |
Tasks | Graph Embedding, Knowledge Graph Embedding, Knowledge Graph Embeddings, Knowledge Graphs, Question Answering |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10560v1 |
https://arxiv.org/pdf/2001.10560v1.pdf | |
PWC | https://paperswithcode.com/paper/the-keen-universe-an-ecosystem-for-knowledge |
Repo | |
Framework | |
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
Title | Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models |
Authors | Moshiur R. Farazi, Salman H. Khan, Nick Barnes |
Abstract | Visual Question Answering (VQA) has emerged as a Visual Turing Test to validate the reasoning ability of AI agents. The pivot to existing VQA models is the joint embedding that is learned by combining the visual features from an image and the semantic features from a given question. Consequently, a large body of literature has focused on developing complex joint embedding strategies coupled with visual attention mechanisms to effectively capture the interplay between these two modalities. However, modelling the visual and semantic features in a high dimensional (joint embedding) space is computationally expensive, and more complex models often result in trivial improvements in the VQA accuracy. In this work, we systematically study the trade-off between the model complexity and the performance on the VQA task. VQA models have a diverse architecture comprising of pre-processing, feature extraction, multimodal fusion, attention and final classification stages. We specifically focus on the effect of “multi-modal fusion” in VQA models that is typically the most expensive step in a VQA pipeline. Our thorough experimental evaluation leads us to two proposals, one optimized for minimal complexity and the other one optimized for state-of-the-art VQA performance. |
Tasks | Question Answering, Visual Question Answering |
Published | 2020-01-20 |
URL | https://arxiv.org/abs/2001.07059v1 |
https://arxiv.org/pdf/2001.07059v1.pdf | |
PWC | https://paperswithcode.com/paper/accuracy-vs-complexity-a-trade-off-in-visual |
Repo | |
Framework | |