Paper Group AWR 1
Online Learning Rate Adaptation with Hypergradient Descent. Decoupled Weight Decay Regularization. Learning with Opponent-Learning Awareness. End-to-end Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level. An Ensemble Deep Learning Based Approach for Red Lesion Detection in Fundus Images. One Mode …
Online Learning Rate Adaptation with Hypergradient Descent
Title | Online Learning Rate Adaptation with Hypergradient Descent |
Authors | Atilim Gunes Baydin, Robert Cornish, David Martinez Rubio, Mark Schmidt, Frank Wood |
Abstract | We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this “hypergradient” needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation. |
Tasks | Stochastic Optimization |
Published | 2017-03-14 |
URL | http://arxiv.org/abs/1703.04782v3 |
http://arxiv.org/pdf/1703.04782v3.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-rate-adaptation-with |
Repo | https://github.com/awslabs/adatune |
Framework | pytorch |
Decoupled Weight Decay Regularization
Title | Decoupled Weight Decay Regularization |
Authors | Ilya Loshchilov, Frank Hutter |
Abstract | L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L$_2$ regularization (often calling it “weight decay” in what may be misleading due to the inequivalence we expose), we propose a simple modification to recover the original formulation of weight decay regularization by \emph{decoupling} the weight decay from the optimization steps taken w.r.t. the loss function. We provide empirical evidence that our proposed modification (i) decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam and (ii) substantially improves Adam’s generalization performance, allowing it to compete with SGD with momentum on image classification datasets (on which it was previously typically outperformed by the latter). Our proposed decoupled weight decay has already been adopted by many researchers, and the community has implemented it in TensorFlow and PyTorch; the complete source code for our experiments is available at https://github.com/loshchil/AdamW-and-SGDW |
Tasks | Image Classification |
Published | 2017-11-14 |
URL | http://arxiv.org/abs/1711.05101v3 |
http://arxiv.org/pdf/1711.05101v3.pdf | |
PWC | https://paperswithcode.com/paper/decoupled-weight-decay-regularization |
Repo | https://github.com/MattSegal/fastai-notes |
Framework | pytorch |
Learning with Opponent-Learning Awareness
Title | Learning with Opponent-Learning Awareness |
Authors | Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch |
Abstract | Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical RL, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes a term that accounts for the impact of one agent’s policy on the anticipated parameter update of the other agents. Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners’ dilemma, while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL. The method thus scales to large parameter and input spaces and nonlinear function approximators. We apply LOLA to a grid world task with an embedded social dilemma using recurrent policies and opponent modelling. By explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest. The code is at github.com/alshedivat/lola. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04326v4 |
http://arxiv.org/pdf/1709.04326v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-with-opponent-learning-awareness |
Repo | https://github.com/alexis-jacq/LOLA_DICE |
Framework | pytorch |
End-to-end Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level
Title | End-to-end Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level |
Authors | Thai-Hoang Pham, Phuong Le-Hong |
Abstract | This paper demonstrates end-to-end neural network architectures for Vietnamese named entity recognition. Our best model is a combination of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which achieves an F1 score of 88.59% on a standard test set. Our system is able to achieve a comparable performance to the first-rank system of the VLSP campaign without using any syntactic or hand-crafted features. We also give an extensive empirical study on using common deep learning models for Vietnamese NER, at both word and character level. |
Tasks | Named Entity Recognition, Word Embeddings |
Published | 2017-05-11 |
URL | http://arxiv.org/abs/1705.04044v3 |
http://arxiv.org/pdf/1705.04044v3.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-recurrent-neural-network-models |
Repo | https://github.com/pth1993/NNVLP |
Framework | none |
An Ensemble Deep Learning Based Approach for Red Lesion Detection in Fundus Images
Title | An Ensemble Deep Learning Based Approach for Red Lesion Detection in Fundus Images |
Authors | José Ignacio Orlando, Elena Prokofyeva, Mariana del Fresno, Matthew B. Blaschko |
Abstract | Diabetic retinopathy is one of the leading causes of preventable blindness in the world. Its earliest sign are red lesions, a general term that groups both microaneurysms and hemorrhages. In daily clinical practice, these lesions are manually detected by physicians using fundus photographs. However, this task is tedious and time consuming, and requires an intensive effort due to the small size of the lesions and their lack of contrast. Computer-assisted diagnosis of DR based on red lesion detection is being actively explored due to its improvement effects both in clinicians consistency and accuracy. Several methods for detecting red lesions have been proposed in the literature, most of them based on characterizing lesion candidates using hand crafted features, and classifying them into true or false positive detections. Deep learning based approaches, by contrast, are scarce in this domain due to the high expense of annotating the lesions manually. In this paper we propose a novel method for red lesion detection based on combining both deep learned and domain knowledge. Features learned by a CNN are augmented by incorporating hand crafted features. Such ensemble vector of descriptors is used afterwards to identify true lesion candidates using a Random Forest classifier. We empirically observed that combining both sources of information significantly improve results with respect to using each approach separately. Furthermore, our method reported the highest performance on a per-lesion basis on DIARETDB1 and e-ophtha, and for screening and need for referral on MESSIDOR compared to a second human expert. Results highlight the fact that integrating manually engineered approaches with deep learned features is relevant to improve results when the networks are trained from lesion-level annotated data. An open source implementation of our system is publicly available online. |
Tasks | |
Published | 2017-06-09 |
URL | http://arxiv.org/abs/1706.03008v2 |
http://arxiv.org/pdf/1706.03008v2.pdf | |
PWC | https://paperswithcode.com/paper/an-ensemble-deep-learning-based-approach-for |
Repo | https://github.com/ignaciorlando/red-lesion-detection |
Framework | none |
One Model To Learn Them All
Title | One Model To Learn Them All |
Authors | Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit |
Abstract | Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all. |
Tasks | Image Captioning, Image Classification, Multi-Task Learning |
Published | 2017-06-16 |
URL | http://arxiv.org/abs/1706.05137v1 |
http://arxiv.org/pdf/1706.05137v1.pdf | |
PWC | https://paperswithcode.com/paper/one-model-to-learn-them-all |
Repo | https://github.com/tensorflow/tensor2tensor |
Framework | tf |
Learning to Customize Network Security Rules
Title | Learning to Customize Network Security Rules |
Authors | Michael Bargury, Roy Levin, Royi Ronen |
Abstract | Security is a major concern for organizations who wish to leverage cloud computing. In order to reduce security vulnerabilities, public cloud providers offer firewall functionalities. When properly configured, a firewall protects cloud networks from cyber-attacks. However, proper firewall configuration requires intimate knowledge of the protected system, high expertise and on-going maintenance. As a result, many organizations do not use firewalls effectively, leaving their cloud resources vulnerable. In this paper, we present a novel supervised learning method, and prototype, which compute recommendations for firewall rules. Recommendations are based on sampled network traffic meta-data (NetFlow) collected from a public cloud provider. Labels are extracted from firewall configurations deemed to be authored by experts. NetFlow is collected from network routers, avoiding expensive collection from cloud VMs, as well as relieving privacy concerns. The proposed method captures network routines and dependencies between resources and firewall configuration. The method predicts IPs to be allowed by the firewall. A grouping algorithm is subsequently used to generate a manageable number of IP ranges. Each range is a parameter for a firewall rule. We present results of experiments on real data, showing ROC AUC of 0.92, compared to 0.58 for an unsupervised baseline. The results prove the hypothesis that firewall rules can be automatically generated based on router data, and that an automated method can be effective in blocking a high percentage of malicious traffic. |
Tasks | |
Published | 2017-12-28 |
URL | http://arxiv.org/abs/1712.09795v1 |
http://arxiv.org/pdf/1712.09795v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-customize-network-security-rules |
Repo | https://github.com/mibarg/IP-Grouping |
Framework | none |
Learning to Explain Non-Standard English Words and Phrases
Title | Learning to Explain Non-Standard English Words and Phrases |
Authors | Ke Ni, William Yang Wang |
Abstract | We describe a data-driven approach for automatically explaining new, non-standard English expressions in a given sentence, building on a large dataset that includes 15 years of crowdsourced examples from UrbanDictionary.com. Unlike prior studies that focus on matching keywords from a slang dictionary, we investigate the possibility of learning a neural sequence-to-sequence model that generates explanations of unseen non-standard English expressions given context. We propose a dual encoder approach—a word-level encoder learns the representation of context, and a second character-level encoder to learn the hidden representation of the target non-standard expression. Our model can produce reasonable definitions of new non-standard English expressions given their context with certain confidence. |
Tasks | |
Published | 2017-09-26 |
URL | http://arxiv.org/abs/1709.09254v1 |
http://arxiv.org/pdf/1709.09254v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-explain-non-standard-english |
Repo | https://github.com/yonguno/cs410explain |
Framework | pytorch |
Recurrent Soft Attention Model for Common Object Recognition
Title | Recurrent Soft Attention Model for Common Object Recognition |
Authors | Liliang Ren |
Abstract | We propose the Recurrent Soft Attention Model, which integrates the visual attention from the original image to a LSTM memory cell through a down-sample network. The model recurrently transmits visual attention to the memory cells for glimpse mask generation, which is a more natural way for attention integration and exploitation in general object detection and recognition problem. We test our model under the metric of the top-1 accuracy on the CIFAR-10 dataset. The experiment shows that our down-sample network and feedback mechanism plays an effective role among the whole network structure. |
Tasks | Object Detection, Object Recognition |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.01921v2 |
http://arxiv.org/pdf/1705.01921v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-soft-attention-model-for-common |
Repo | https://github.com/renll/RSAM |
Framework | pytorch |
Annotating Object Instances with a Polygon-RNN
Title | Annotating Object Instances with a Polygon-RNN |
Authors | Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, Sanja Fidler |
Abstract | We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. In particular, our approach takes as input an image crop and sequentially produces vertices of the polygon outlining the object. This allows a human annotator to interfere at any time and correct a vertex if needed, producing as accurate segmentation as desired by the annotator. We show that our approach speeds up the annotation process by a factor of 4.7 across all classes in Cityscapes, while achieving 78.4% agreement in IoU with original ground-truth, matching the typical agreement between human annotators. For cars, our speed-up factor is 7.3 for an agreement of 82.2%. We further show generalization capabilities of our approach to unseen datasets. |
Tasks | Semantic Segmentation |
Published | 2017-04-18 |
URL | http://arxiv.org/abs/1704.05548v1 |
http://arxiv.org/pdf/1704.05548v1.pdf | |
PWC | https://paperswithcode.com/paper/annotating-object-instances-with-a-polygon |
Repo | https://github.com/AidanRocke/vertex_prediction |
Framework | tf |
Sentiment Polarity Detection for Software Development
Title | Sentiment Polarity Detection for Software Development |
Authors | Fabio Calefato, Filippo Lanubile, Federico Maiorano, Nicole Novielli |
Abstract | The role of sentiment analysis is increasingly emerging to study software developers’ emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers’ communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines. |
Tasks | Sentiment Analysis |
Published | 2017-09-09 |
URL | http://arxiv.org/abs/1709.02984v2 |
http://arxiv.org/pdf/1709.02984v2.pdf | |
PWC | https://paperswithcode.com/paper/sentiment-polarity-detection-for-software |
Repo | https://github.com/collab-uniba/Senti4SD |
Framework | none |
Inductive Representation Learning on Large Graphs
Title | Inductive Representation Learning on Large Graphs |
Authors | William L. Hamilton, Rex Ying, Jure Leskovec |
Abstract | Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions. |
Tasks | Link Prediction, Node Classification, Representation Learning |
Published | 2017-06-07 |
URL | http://arxiv.org/abs/1706.02216v4 |
http://arxiv.org/pdf/1706.02216v4.pdf | |
PWC | https://paperswithcode.com/paper/inductive-representation-learning-on-large |
Repo | https://github.com/williamleif/GraphSAGE |
Framework | tf |
MDP environments for the OpenAI Gym
Title | MDP environments for the OpenAI Gym |
Authors | Andreas Kirsch |
Abstract | The OpenAI Gym provides researchers and enthusiasts with simple to use environments for reinforcement learning. Even the simplest environment have a level of complexity that can obfuscate the inner workings of RL approaches and make debugging difficult. This whitepaper describes a Python framework that makes it very easy to create simple Markov-Decision-Process environments programmatically by specifying state transitions and rewards of deterministic and non-deterministic MDPs in a domain-specific language in Python. It then presents results and visualizations created with this MDP framework. |
Tasks | |
Published | 2017-09-26 |
URL | http://arxiv.org/abs/1709.09069v1 |
http://arxiv.org/pdf/1709.09069v1.pdf | |
PWC | https://paperswithcode.com/paper/mdp-environments-for-the-openai-gym |
Repo | https://github.com/BlackHC/mdp |
Framework | none |
Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data
Title | Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data |
Authors | Chengyu Liu, Wei Wang |
Abstract | Machine learning algorithms such as linear regression, SVM and neural network have played an increasingly important role in the process of scientific discovery. However, none of them is both interpretable and accurate on nonlinear datasets. Here we present contextual regression, a method that joins these two desirable properties together using a hybrid architecture of neural network embedding and dot product layer. We demonstrate its high prediction accuracy and sensitivity through the task of predictive feature selection on a simulated dataset and the application of predicting open chromatin sites in the human genome. On the simulated data, our method achieved high fidelity recovery of feature contributions under random noise levels up to 200%. On the open chromatin dataset, the application of our method not only outperformed the state of the art method in terms of accuracy, but also unveiled two previously unfound open chromatin related histone marks. Our method can fill the blank of accurate and interpretable nonlinear modeling in scientific data mining tasks. |
Tasks | Feature Selection, Network Embedding |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.10728v1 |
http://arxiv.org/pdf/1710.10728v1.pdf | |
PWC | https://paperswithcode.com/paper/contextual-regression-an-accurate-and |
Repo | https://github.com/HomoSapienLCY/Contextual_Regression |
Framework | tf |
Deep Mutual Learning
Title | Deep Mutual Learning |
Authors | Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu |
Abstract | Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary – mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher. |
Tasks | Person Re-Identification |
Published | 2017-06-01 |
URL | http://arxiv.org/abs/1706.00384v1 |
http://arxiv.org/pdf/1706.00384v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-mutual-learning |
Repo | https://github.com/shubhamtyagii/Aligned_Reid |
Framework | pytorch |