July 30, 2019

3061 words 15 mins read

Paper Group AWR 1

Online Learning Rate Adaptation with Hypergradient Descent. Decoupled Weight Decay Regularization. Learning with Opponent-Learning Awareness. End-to-end Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level. An Ensemble Deep Learning Based Approach for Red Lesion Detection in Fundus Images. One Mode …

Online Learning Rate Adaptation with Hypergradient Descent


Title	Online Learning Rate Adaptation with Hypergradient Descent
Authors	Atilim Gunes Baydin, Robert Cornish, David Martinez Rubio, Mark Schmidt, Frank Wood
Abstract	We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this “hypergradient” needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.
Tasks	Stochastic Optimization
Published	2017-03-14
URL	http://arxiv.org/abs/1703.04782v3
PDF	http://arxiv.org/pdf/1703.04782v3.pdf
PWC	https://paperswithcode.com/paper/online-learning-rate-adaptation-with
Repo	https://github.com/awslabs/adatune
Framework	pytorch

Decoupled Weight Decay Regularization


Title	Decoupled Weight Decay Regularization
Authors	Ilya Loshchilov, Frank Hutter
Abstract	L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L$_2$ regularization (often calling it “weight decay” in what may be misleading due to the inequivalence we expose), we propose a simple modification to recover the original formulation of weight decay regularization by \emph{decoupling} the weight decay from the optimization steps taken w.r.t. the loss function. We provide empirical evidence that our proposed modification (i) decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam and (ii) substantially improves Adam’s generalization performance, allowing it to compete with SGD with momentum on image classification datasets (on which it was previously typically outperformed by the latter). Our proposed decoupled weight decay has already been adopted by many researchers, and the community has implemented it in TensorFlow and PyTorch; the complete source code for our experiments is available at https://github.com/loshchil/AdamW-and-SGDW
Tasks	Image Classification
Published	2017-11-14
URL	http://arxiv.org/abs/1711.05101v3
PDF	http://arxiv.org/pdf/1711.05101v3.pdf
PWC	https://paperswithcode.com/paper/decoupled-weight-decay-regularization
Repo	https://github.com/MattSegal/fastai-notes
Framework	pytorch

Learning with Opponent-Learning Awareness


Title	Learning with Opponent-Learning Awareness
Authors	Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch
Abstract	Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical RL, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes a term that accounts for the impact of one agent’s policy on the anticipated parameter update of the other agents. Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners’ dilemma, while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL. The method thus scales to large parameter and input spaces and nonlinear function approximators. We apply LOLA to a grid world task with an embedded social dilemma using recurrent policies and opponent modelling. By explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest. The code is at github.com/alshedivat/lola.
Tasks	Multi-agent Reinforcement Learning
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04326v4
PDF	http://arxiv.org/pdf/1709.04326v4.pdf
PWC	https://paperswithcode.com/paper/learning-with-opponent-learning-awareness
Repo	https://github.com/alexis-jacq/LOLA_DICE
Framework	pytorch

End-to-end Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level


Title	End-to-end Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level
Authors	Thai-Hoang Pham, Phuong Le-Hong
Abstract	This paper demonstrates end-to-end neural network architectures for Vietnamese named entity recognition. Our best model is a combination of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which achieves an F1 score of 88.59% on a standard test set. Our system is able to achieve a comparable performance to the first-rank system of the VLSP campaign without using any syntactic or hand-crafted features. We also give an extensive empirical study on using common deep learning models for Vietnamese NER, at both word and character level.
Tasks	Named Entity Recognition, Word Embeddings
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04044v3
PDF	http://arxiv.org/pdf/1705.04044v3.pdf
PWC	https://paperswithcode.com/paper/end-to-end-recurrent-neural-network-models
Repo	https://github.com/pth1993/NNVLP
Framework	none

An Ensemble Deep Learning Based Approach for Red Lesion Detection in Fundus Images


Title	An Ensemble Deep Learning Based Approach for Red Lesion Detection in Fundus Images
Authors	José Ignacio Orlando, Elena Prokofyeva, Mariana del Fresno, Matthew B. Blaschko
Abstract	Diabetic retinopathy is one of the leading causes of preventable blindness in the world. Its earliest sign are red lesions, a general term that groups both microaneurysms and hemorrhages. In daily clinical practice, these lesions are manually detected by physicians using fundus photographs. However, this task is tedious and time consuming, and requires an intensive effort due to the small size of the lesions and their lack of contrast. Computer-assisted diagnosis of DR based on red lesion detection is being actively explored due to its improvement effects both in clinicians consistency and accuracy. Several methods for detecting red lesions have been proposed in the literature, most of them based on characterizing lesion candidates using hand crafted features, and classifying them into true or false positive detections. Deep learning based approaches, by contrast, are scarce in this domain due to the high expense of annotating the lesions manually. In this paper we propose a novel method for red lesion detection based on combining both deep learned and domain knowledge. Features learned by a CNN are augmented by incorporating hand crafted features. Such ensemble vector of descriptors is used afterwards to identify true lesion candidates using a Random Forest classifier. We empirically observed that combining both sources of information significantly improve results with respect to using each approach separately. Furthermore, our method reported the highest performance on a per-lesion basis on DIARETDB1 and e-ophtha, and for screening and need for referral on MESSIDOR compared to a second human expert. Results highlight the fact that integrating manually engineered approaches with deep learned features is relevant to improve results when the networks are trained from lesion-level annotated data. An open source implementation of our system is publicly available online.
Tasks
Published	2017-06-09
URL	http://arxiv.org/abs/1706.03008v2
PDF	http://arxiv.org/pdf/1706.03008v2.pdf
PWC	https://paperswithcode.com/paper/an-ensemble-deep-learning-based-approach-for
Repo	https://github.com/ignaciorlando/red-lesion-detection
Framework	none

One Model To Learn Them All


Title	One Model To Learn Them All
Authors	Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
Abstract	Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.
Tasks	Image Captioning, Image Classification, Multi-Task Learning
Published	2017-06-16
URL	http://arxiv.org/abs/1706.05137v1
PDF	http://arxiv.org/pdf/1706.05137v1.pdf
PWC	https://paperswithcode.com/paper/one-model-to-learn-them-all
Repo	https://github.com/tensorflow/tensor2tensor
Framework	tf

Learning to Customize Network Security Rules


Title	Learning to Customize Network Security Rules
Authors	Michael Bargury, Roy Levin, Royi Ronen
Abstract	Security is a major concern for organizations who wish to leverage cloud computing. In order to reduce security vulnerabilities, public cloud providers offer firewall functionalities. When properly configured, a firewall protects cloud networks from cyber-attacks. However, proper firewall configuration requires intimate knowledge of the protected system, high expertise and on-going maintenance. As a result, many organizations do not use firewalls effectively, leaving their cloud resources vulnerable. In this paper, we present a novel supervised learning method, and prototype, which compute recommendations for firewall rules. Recommendations are based on sampled network traffic meta-data (NetFlow) collected from a public cloud provider. Labels are extracted from firewall configurations deemed to be authored by experts. NetFlow is collected from network routers, avoiding expensive collection from cloud VMs, as well as relieving privacy concerns. The proposed method captures network routines and dependencies between resources and firewall configuration. The method predicts IPs to be allowed by the firewall. A grouping algorithm is subsequently used to generate a manageable number of IP ranges. Each range is a parameter for a firewall rule. We present results of experiments on real data, showing ROC AUC of 0.92, compared to 0.58 for an unsupervised baseline. The results prove the hypothesis that firewall rules can be automatically generated based on router data, and that an automated method can be effective in blocking a high percentage of malicious traffic.
Tasks
Published	2017-12-28
URL	http://arxiv.org/abs/1712.09795v1
PDF	http://arxiv.org/pdf/1712.09795v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-customize-network-security-rules
Repo	https://github.com/mibarg/IP-Grouping
Framework	none

Learning to Explain Non-Standard English Words and Phrases


Title	Learning to Explain Non-Standard English Words and Phrases
Authors	Ke Ni, William Yang Wang
Abstract	We describe a data-driven approach for automatically explaining new, non-standard English expressions in a given sentence, building on a large dataset that includes 15 years of crowdsourced examples from UrbanDictionary.com. Unlike prior studies that focus on matching keywords from a slang dictionary, we investigate the possibility of learning a neural sequence-to-sequence model that generates explanations of unseen non-standard English expressions given context. We propose a dual encoder approach—a word-level encoder learns the representation of context, and a second character-level encoder to learn the hidden representation of the target non-standard expression. Our model can produce reasonable definitions of new non-standard English expressions given their context with certain confidence.
Tasks
Published	2017-09-26
URL	http://arxiv.org/abs/1709.09254v1
PDF	http://arxiv.org/pdf/1709.09254v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-explain-non-standard-english
Repo	https://github.com/yonguno/cs410explain
Framework	pytorch

Recurrent Soft Attention Model for Common Object Recognition


Title	Recurrent Soft Attention Model for Common Object Recognition
Authors	Liliang Ren
Abstract	We propose the Recurrent Soft Attention Model, which integrates the visual attention from the original image to a LSTM memory cell through a down-sample network. The model recurrently transmits visual attention to the memory cells for glimpse mask generation, which is a more natural way for attention integration and exploitation in general object detection and recognition problem. We test our model under the metric of the top-1 accuracy on the CIFAR-10 dataset. The experiment shows that our down-sample network and feedback mechanism plays an effective role among the whole network structure.
Tasks	Object Detection, Object Recognition
Published	2017-05-04
URL	http://arxiv.org/abs/1705.01921v2
PDF	http://arxiv.org/pdf/1705.01921v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-soft-attention-model-for-common
Repo	https://github.com/renll/RSAM
Framework	pytorch

Annotating Object Instances with a Polygon-RNN


Title	Annotating Object Instances with a Polygon-RNN
Authors	Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, Sanja Fidler
Abstract	We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. In particular, our approach takes as input an image crop and sequentially produces vertices of the polygon outlining the object. This allows a human annotator to interfere at any time and correct a vertex if needed, producing as accurate segmentation as desired by the annotator. We show that our approach speeds up the annotation process by a factor of 4.7 across all classes in Cityscapes, while achieving 78.4% agreement in IoU with original ground-truth, matching the typical agreement between human annotators. For cars, our speed-up factor is 7.3 for an agreement of 82.2%. We further show generalization capabilities of our approach to unseen datasets.
Tasks	Semantic Segmentation
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05548v1
PDF	http://arxiv.org/pdf/1704.05548v1.pdf
PWC	https://paperswithcode.com/paper/annotating-object-instances-with-a-polygon
Repo	https://github.com/AidanRocke/vertex_prediction
Framework	tf

Sentiment Polarity Detection for Software Development


Title	Sentiment Polarity Detection for Software Development
Authors	Fabio Calefato, Filippo Lanubile, Federico Maiorano, Nicole Novielli
Abstract	The role of sentiment analysis is increasingly emerging to study software developers’ emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers’ communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.
Tasks	Sentiment Analysis
Published	2017-09-09
URL	http://arxiv.org/abs/1709.02984v2
PDF	http://arxiv.org/pdf/1709.02984v2.pdf
PWC	https://paperswithcode.com/paper/sentiment-polarity-detection-for-software
Repo	https://github.com/collab-uniba/Senti4SD
Framework	none

Inductive Representation Learning on Large Graphs


Title	Inductive Representation Learning on Large Graphs
Authors	William L. Hamilton, Rex Ying, Jure Leskovec
Abstract	Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.
Tasks	Link Prediction, Node Classification, Representation Learning
Published	2017-06-07
URL	http://arxiv.org/abs/1706.02216v4
PDF	http://arxiv.org/pdf/1706.02216v4.pdf
PWC	https://paperswithcode.com/paper/inductive-representation-learning-on-large
Repo	https://github.com/williamleif/GraphSAGE
Framework	tf

MDP environments for the OpenAI Gym


Title	MDP environments for the OpenAI Gym
Authors	Andreas Kirsch
Abstract	The OpenAI Gym provides researchers and enthusiasts with simple to use environments for reinforcement learning. Even the simplest environment have a level of complexity that can obfuscate the inner workings of RL approaches and make debugging difficult. This whitepaper describes a Python framework that makes it very easy to create simple Markov-Decision-Process environments programmatically by specifying state transitions and rewards of deterministic and non-deterministic MDPs in a domain-specific language in Python. It then presents results and visualizations created with this MDP framework.
Tasks
Published	2017-09-26
URL	http://arxiv.org/abs/1709.09069v1
PDF	http://arxiv.org/pdf/1709.09069v1.pdf
PWC	https://paperswithcode.com/paper/mdp-environments-for-the-openai-gym
Repo	https://github.com/BlackHC/mdp
Framework	none

Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data


Title	Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data
Authors	Chengyu Liu, Wei Wang
Abstract	Machine learning algorithms such as linear regression, SVM and neural network have played an increasingly important role in the process of scientific discovery. However, none of them is both interpretable and accurate on nonlinear datasets. Here we present contextual regression, a method that joins these two desirable properties together using a hybrid architecture of neural network embedding and dot product layer. We demonstrate its high prediction accuracy and sensitivity through the task of predictive feature selection on a simulated dataset and the application of predicting open chromatin sites in the human genome. On the simulated data, our method achieved high fidelity recovery of feature contributions under random noise levels up to 200%. On the open chromatin dataset, the application of our method not only outperformed the state of the art method in terms of accuracy, but also unveiled two previously unfound open chromatin related histone marks. Our method can fill the blank of accurate and interpretable nonlinear modeling in scientific data mining tasks.
Tasks	Feature Selection, Network Embedding
Published	2017-10-30
URL	http://arxiv.org/abs/1710.10728v1
PDF	http://arxiv.org/pdf/1710.10728v1.pdf
PWC	https://paperswithcode.com/paper/contextual-regression-an-accurate-and
Repo	https://github.com/HomoSapienLCY/Contextual_Regression
Framework	tf

Deep Mutual Learning


Title	Deep Mutual Learning
Authors	Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu
Abstract	Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary – mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher.
Tasks	Person Re-Identification
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00384v1
PDF	http://arxiv.org/pdf/1706.00384v1.pdf
PWC	https://paperswithcode.com/paper/deep-mutual-learning
Repo	https://github.com/shubhamtyagii/Aligned_Reid
Framework	pytorch