January 30, 2020

2910 words 14 mins read

Paper Group ANR 282

Paper Group ANR 282

Fitting ReLUs via SGD and Quantized SGD. Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions. Switchable Normalization for Learning-to-Normalize Deep Representation. Meta Learning for Task-Driven Video Summarization. Are deep ResNets provably better than linear predictors?. Reference Network for Neural …

Fitting ReLUs via SGD and Quantized SGD

Title Fitting ReLUs via SGD and Quantized SGD
Authors Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi, A. Salman Avestimehr
Abstract In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form $\mathbf{x}\rightarrow \max(0,\langle\mathbf{w},\mathbf{x}\rangle)$ with $\mathbf{w}\in\mathbb{R}^d$ denoting the weight vector. We focus on a planted model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to a planted weight vector. We first show that mini-batch stochastic gradient descent when suitably initialized, converges at a geometric rate to the planted model with a number of samples that is optimal up to numerical constants. Next we focus on a parallel implementation where in each iteration the mini-batch gradient is calculated in a distributed manner across multiple processors and then broadcast to a master or all other processors. To reduce the communication cost in this setting we utilize a Quanitzed Stochastic Gradient Scheme (QSGD) where the partial gradients are quantized. Perhaps unexpectedly, we show that QSGD maintains the fast convergence of SGD to a globally optimal model while significantly reducing the communication cost. We further corroborate our numerical findings via various experiments including distributed implementations over Amazon EC2.
Tasks
Published 2019-01-19
URL http://arxiv.org/abs/1901.06587v2
PDF http://arxiv.org/pdf/1901.06587v2.pdf
PWC https://paperswithcode.com/paper/fitting-relus-via-sgd-and-quantized-sgd
Repo
Framework

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions

Title Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions
Authors Prashanth L. A., Krishna Jagannathan, Ravi Kumar Kolla
Abstract Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such as finance. We derive concentration bounds for CVaR estimates, considering separately the cases of light-tailed and heavy-tailed distributions. In the light-tailed case, we use a classical CVaR estimator based on the empirical distribution constructed from the samples. For heavy-tailed random variables, we assume a mild `bounded moment’ condition, and derive a concentration bound for a truncation-based estimator. Notably, our concentration bounds enjoy an exponential decay in the sample size, for heavy-tailed as well as light-tailed distributions. To demonstrate the applicability of our concentration results, we consider a CVaR optimization problem in a multi-armed bandit setting. Specifically, we address the best CVaR-arm identification problem under a fixed budget. We modify the well-known successive rejects algorithm to incorporate a CVaR-based criterion. Using the CVaR concentration result, we derive an upper-bound on the probability of incorrect identification by the proposed algorithm. |
Tasks Multi-Armed Bandits
Published 2019-01-04
URL https://arxiv.org/abs/1901.00997v2
PDF https://arxiv.org/pdf/1901.00997v2.pdf
PWC https://paperswithcode.com/paper/risk-aware-multi-armed-bandits-using
Repo
Framework

Switchable Normalization for Learning-to-Normalize Deep Representation

Title Switchable Normalization for Learning-to-Normalize Deep Representation
Authors Ping Luo, Ruimao Zhang, Jiamin Ren, Zhanglin Peng, Jingyu Li
Abstract We address a learning-to-normalize problem by proposing Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network. SN employs three distinct scopes to compute statistics (means and variances) including a channel, a layer, and a minibatch. SN switches between them by learning their importance weights in an end-to-end manner. It has several good properties. First, it adapts to various network architectures and tasks. Second, it is robust to a wide range of batch sizes, maintaining high performance even when small minibatch is presented (e.g. 2 images/GPU). Third, SN does not have sensitive hyper-parameter, unlike group normalization that searches the number of groups as a hyper-parameter. Without bells and whistles, SN outperforms its counterparts on various challenging benchmarks, such as ImageNet, COCO, CityScapes, ADE20K, MegaFace, and Kinetics. Analyses of SN are also presented to answer the following three questions: (a) Is it useful to allow each normalization layer to select its own normalizer? (b) What impacts the choices of normalizers? (c) Do different tasks and datasets prefer different normalizers? We hope SN will help ease the usage and understand the normalization techniques in deep learning. The code of SN has been released at https://github.com/switchablenorms.
Tasks
Published 2019-07-22
URL https://arxiv.org/abs/1907.10473v1
PDF https://arxiv.org/pdf/1907.10473v1.pdf
PWC https://paperswithcode.com/paper/switchable-normalization-for-learning-to
Repo
Framework

Meta Learning for Task-Driven Video Summarization

Title Meta Learning for Task-Driven Video Summarization
Authors Xuelong Li, Hongli Li, Yongsheng Dong
Abstract Existing video summarization approaches mainly concentrate on sequential or structural characteristic of video data. However, they do not pay enough attention to the video summarization task itself. In this paper, we propose a meta learning method for performing task-driven video summarization, denoted by MetaL-TDVS, to explicitly explore the video summarization mechanism among summarizing processes on different videos. Particularly, MetaL-TDVS aims to excavate the latent mechanism for summarizing video by reformulating video summarization as a meta learning problem and promote generalization ability of the trained model. MetaL-TDVS regards summarizing each video as a single task to make better use of the experience and knowledge learned from processes of summarizing other videos to summarize new ones. Furthermore, MetaL-TDVS updates models via a two-fold back propagation which forces the model optimized on one video to obtain high accuracy on another video in every training step. Extensive experiments on benchmark datasets demonstrate the superiority and better generalization ability of MetaL-TDVS against several state-of-the-art methods.
Tasks Meta-Learning, Video Summarization
Published 2019-07-29
URL https://arxiv.org/abs/1907.12342v1
PDF https://arxiv.org/pdf/1907.12342v1.pdf
PWC https://paperswithcode.com/paper/meta-learning-for-task-driven-video
Repo
Framework

Are deep ResNets provably better than linear predictors?

Title Are deep ResNets provably better than linear predictors?
Authors Chulhee Yun, Suvrit Sra, Ali Jadbabaie
Abstract Recent results in the literature indicate that a residual network (ResNet) composed of a single residual block outperforms linear predictors, in the sense that all local minima in its optimization landscape are at least as good as the best linear predictor. However, these results are limited to a single residual block (i.e., shallow ResNets), instead of the deep ResNets composed of multiple residual blocks. We take a step towards extending this result to deep ResNets. We start by two motivating examples. First, we show that there exist datasets for which all local minima of a fully-connected ReLU network are no better than the best linear predictor, whereas a ResNet has strictly better local minima. Second, we show that even at the global minimum, the representation obtained from the residual block outputs of a 2-block ResNet do not necessarily improve monotonically over subsequent blocks, which highlights a fundamental difficulty in analyzing deep ResNets. Our main theorem on deep ResNets shows under simple geometric conditions that, any critical point in the optimization landscape is either (i) at least as good as the best linear predictor; or (ii) the Hessian at this critical point has a strictly negative eigenvalue. Notably, our theorem shows that a chain of multiple skip-connections can improve the optimization landscape, whereas existing results study direct skip-connections to the last hidden layer or output layer. Finally, we complement our results by showing benign properties of the “near-identity regions” of deep ResNets, showing depth-independent upper bounds for the risk attained at critical points as well as the Rademacher complexity.
Tasks
Published 2019-07-09
URL https://arxiv.org/abs/1907.03922v2
PDF https://arxiv.org/pdf/1907.03922v2.pdf
PWC https://paperswithcode.com/paper/are-deep-resnets-provably-better-than-linear
Repo
Framework

Reference Network for Neural Machine Translation

Title Reference Network for Neural Machine Translation
Authors Han Fu, Chenghao Liu, Jianling Sun
Abstract Neural Machine Translation (NMT) has achieved notable success in recent years. Such a framework usually generates translations in isolation. In contrast, human translators often refer to reference data, either rephrasing the intricate sentence fragments with common terms in source language, or just accessing to the golden translation directly. In this paper, we propose a Reference Network to incorporate referring process into translation decoding of NMT. To construct a \emph{reference book}, an intuitive way is to store the detailed translation history with extra memory, which is computationally expensive. Instead, we employ Local Coordinates Coding (LCC) to obtain global context vectors containing monolingual and bilingual contextual information for NMT decoding. Experimental results on Chinese-English and English-German tasks demonstrate that our proposed model is effective in improving the translation quality with lightweight computation cost.
Tasks Machine Translation
Published 2019-08-23
URL https://arxiv.org/abs/1908.09920v1
PDF https://arxiv.org/pdf/1908.09920v1.pdf
PWC https://paperswithcode.com/paper/reference-network-for-neural-machine-1
Repo
Framework

On the Computational Complexity of Multi-Agent Pathfinding on Directed Graphs

Title On the Computational Complexity of Multi-Agent Pathfinding on Directed Graphs
Authors Bernhard Nebel
Abstract The determination of the computational complexity of multi-agent pathfinding on directed graphs has been an open problem for many years. For undirected graphs, solvability can be decided in polynomial time, as has been shown already in the eighties. Further, recently it has been shown that a special case on directed graphs is solvable in polynomial time. In this paper, we show that the problem is NP-hard in the general case. In addition, some upper bounds are proven.
Tasks
Published 2019-11-11
URL https://arxiv.org/abs/1911.04871v1
PDF https://arxiv.org/pdf/1911.04871v1.pdf
PWC https://paperswithcode.com/paper/on-the-computational-complexity-of-multi
Repo
Framework

A Performance Comparison of Loss Functions for Deep Face Recognition

Title A Performance Comparison of Loss Functions for Deep Face Recognition
Authors Yash Srivastava, Vaishnav Murali, Shiv Ram Dubey
Abstract Face recognition is one of the most widely publicized feature in the devices today and hence represents an important problem that should be studied with the utmost priority. As per the recent trends, the Convolutional Neural Network (CNN) based approaches are highly successful in many tasks of Computer Vision including face recognition. The loss function is used on the top of CNN to judge the goodness of any network. In this paper, we present a performance comparison of different loss functions such as Cross-Entropy, Angular Softmax, Additive-Margin Softmax, ArcFace and Marginal Loss for face recognition. The experiments are conducted with two CNN architectures namely, ResNet and MobileNet. Two widely used face datasets namely, CASIA-Webface and MS-Celeb-1M are used for the training and benchmark Labeled Faces in the Wild (LFW) face dataset is used for the testing.
Tasks Face Recognition
Published 2019-01-01
URL https://arxiv.org/abs/1901.05903v2
PDF https://arxiv.org/pdf/1901.05903v2.pdf
PWC https://paperswithcode.com/paper/a-performance-comparison-of-loss-functions
Repo
Framework

Classification of Cognitive Load and Expertise for Adaptive Simulation using Deep Multitask Learning

Title Classification of Cognitive Load and Expertise for Adaptive Simulation using Deep Multitask Learning
Authors Pritam Sarkar, Kyle Ross, Aaron J. Ruberto, Dirk Rodenburg, Paul Hungler, Ali Etemad
Abstract Simulations are a pedagogical means of enabling a risk-free way for healthcare practitioners to learn, maintain, or enhance their knowledge and skills. Such simulations should provide an optimum amount of cognitive load to the learner and be tailored to their levels of expertise. However, most current simulations are a one-type-fits-all tool used to train different learners regardless of their existing skills, expertise, and ability to handle cognitive load. To address this problem, we propose an end-to-end framework for a trauma simulation that actively classifies a participant’s level of cognitive load and expertise for the development of a dynamically adaptive simulation. To facilitate this solution, trauma simulations were developed for the collection of electrocardiogram (ECG) signals of both novice and expert practitioners. A multitask deep neural network was developed to utilize this data and classify high and low cognitive load, as well as expert and novice participants. A leave-one-subject-out (LOSO) validation was used to evaluate the effectiveness of our model, achieving an accuracy of 89.4% and 96.6% for classification of cognitive load and expertise, respectively.
Tasks
Published 2019-07-31
URL https://arxiv.org/abs/1908.00385v1
PDF https://arxiv.org/pdf/1908.00385v1.pdf
PWC https://paperswithcode.com/paper/classification-of-cognitive-load-and
Repo
Framework

Categorization of Program Regions for Agile Compilation using Machine Learning and Hardware Support

Title Categorization of Program Regions for Agile Compilation using Machine Learning and Hardware Support
Authors Sanket Tavarageri
Abstract A compiler processes the code written in a high level language and produces machine executable code. The compiler writers often face the challenge of keeping the compilation times reasonable. That is because aggressive optimization passes which potentially will give rise to high performance are often expensive in terms of running time and memory footprint. Consequently the compiler designers arrive at a compromise where they either simplify the optimization algorithm which may decrease the performance of the produced code, or they will restrict the optimization to the subset of the overall input program in which case large parts of the input application will go un-optimized. The problem we address in this paper is that of keeping the compilation times reasonable, and at the same time optimizing the input program to the fullest extent possible. Consequently, the performance of the produced code will match the performance when all the aggressive optimization passes are applied over the entire input program.
Tasks
Published 2019-05-29
URL https://arxiv.org/abs/1905.12292v1
PDF https://arxiv.org/pdf/1905.12292v1.pdf
PWC https://paperswithcode.com/paper/categorization-of-program-regions-for-agile
Repo
Framework

Multi-type Resource Allocation with Partial Preferences

Title Multi-type Resource Allocation with Partial Preferences
Authors Haibin Wang, Sujoy Sikdar, Xiaoxi Guo, Lirong Xia, Yongzhi Cao, Hanpin Wang
Abstract We propose multi-type probabilistic serial (MPS) and multi-type random priority (MRP) as extensions of the well known PS and RP mechanisms to the multi-type resource allocation problem (MTRA) with partial preferences. In our setting, there are multiple types of divisible items, and a group of agents who have partial order preferences over bundles consisting of one item of each type. We show that for the unrestricted domain of partial order preferences, no mechanism satisfies both sd-efficiency and sd-envy-freeness. Notwithstanding this impossibility result, our main message is positive: When agents’ preferences are represented by acyclic CP-nets, MPS satisfies sd-efficiency, sd-envy-freeness, ordinal fairness, and upper invariance, while MRP satisfies ex-post-efficiency, sd-strategy-proofness, and upper invariance, recovering the properties of PS and RP.
Tasks
Published 2019-06-13
URL https://arxiv.org/abs/1906.06836v2
PDF https://arxiv.org/pdf/1906.06836v2.pdf
PWC https://paperswithcode.com/paper/multi-type-resource-allocation-with-partial
Repo
Framework

Response to “Visual Dialogue without Vision or Dialogue” (Massiceti et al., 2018)

Title Response to “Visual Dialogue without Vision or Dialogue” (Massiceti et al., 2018)
Authors Abhishek Das, Devi Parikh, Dhruv Batra
Abstract In a recent workshop paper, Massiceti et al. presented a baseline model and subsequent critique of Visual Dialog (Das et al., CVPR 2017) that raises what we believe to be unfounded concerns about the dataset and evaluation. This article intends to rebut the critique and clarify potential confusions for practitioners and future participants in the Visual Dialog challenge.
Tasks Visual Dialog
Published 2019-01-16
URL http://arxiv.org/abs/1901.05531v1
PDF http://arxiv.org/pdf/1901.05531v1.pdf
PWC https://paperswithcode.com/paper/response-to-visual-dialogue-without-vision-or
Repo
Framework

Word-level Lexical Normalisation using Context-Dependent Embeddings

Title Word-level Lexical Normalisation using Context-Dependent Embeddings
Authors Michael Stewart, Wei Liu, Rachel Cardell-Oliver
Abstract Lexical normalisation (LN) is the process of correcting each word in a dataset to its canonical form so that it may be more easily and more accurately analysed. Most lexical normalisation systems operate at the character-level, while word-level models are seldom used. Recent language models offer solutions to the drawbacks of word-level LN models, yet, to the best of our knowledge, no research has investigated their effectiveness on LN. In this paper we introduce a word-level GRU-based LN model and investigate the effectiveness of recent embedding techniques on word-level LN. Our results show that our GRU-based word-level model produces greater results than character-level models, and outperforms existing deep-learning based LN techniques on Twitter data. We also find that randomly-initialised embeddings are capable of outperforming pre-trained embedding models in certain scenarios. Finally, we release a substantial lexical normalisation dataset to the community.
Tasks
Published 2019-11-13
URL https://arxiv.org/abs/1911.06172v1
PDF https://arxiv.org/pdf/1911.06172v1.pdf
PWC https://paperswithcode.com/paper/word-level-lexical-normalisation-using
Repo
Framework

Iterative Algorithm for Discrete Structure Recovery

Title Iterative Algorithm for Discrete Structure Recovery
Authors Chao Gao, Anderson Y. Zhang
Abstract We propose a general modeling and algorithmic framework for discrete structure recovery that can be applied to a wide range of problems. Under this framework, we are able to study the recovery of clustering labels, ranks of players, and signs of regression coefficients from a unified perspective. A simple iterative algorithm is proposed for discrete structure recovery, which generalizes methods including Lloyd’s algorithm and the iterative feature matching algorithm. A linear convergence result for the proposed algorithm is established in this paper under appropriate abstract conditions on stochastic errors and initialization. We illustrate our general theory by applying it on three representative problems: clustering in Gaussian mixture model, approximate ranking, and sign recovery in compressed sensing, and show that minimax rate is achieved in each case.
Tasks
Published 2019-11-04
URL https://arxiv.org/abs/1911.01018v1
PDF https://arxiv.org/pdf/1911.01018v1.pdf
PWC https://paperswithcode.com/paper/iterative-algorithm-for-discrete-structure
Repo
Framework

Using Bursty Announcements for Early Detection of BGP Routing Anomalies

Title Using Bursty Announcements for Early Detection of BGP Routing Anomalies
Authors Pablo Moriano, Raquel Hill, L. Jean Camp
Abstract Despite the robust structure of the Internet, it is still susceptible to disruptive routing updates that prevent network traffic from reaching its destination. In this work, we propose a method for early detection of large-scale disruptions based on the analysis of bursty BGP announcements. We hypothesize that the occurrence of large-scale disruptions is preceded by bursty announcements. Our method is grounded in analysis of changes in the inter-arrival times of announcements. BGP announcements that are associated with disruptive updates tend to occur in groups of relatively high frequency, followed by periods of infrequent activity. To test our hypothesis, we quantify the burstiness of inter-arrival times around the date and times of three large-scale incidents: the Indosat hijacking event in April 2014, the Telecom Malaysia leak in June 2015, and the Bharti Airtel Ltd. hijack in November 2015. We show that we can detect these events several hours prior to when they were originally detected. We propose an algorithm that leverages the burstiness of disruptive updates to provide early detection of large-scale malicious incidents using local collector data. We describe limitations, open challenges, and how this method can be used for large-scale routing anomaly detection.
Tasks Anomaly Detection
Published 2019-05-14
URL https://arxiv.org/abs/1905.05835v1
PDF https://arxiv.org/pdf/1905.05835v1.pdf
PWC https://paperswithcode.com/paper/using-bursty-announcements-for-early
Repo
Framework
comments powered by Disqus