February 1, 2020

2860 words 14 mins read

Paper Group AWR 219

Paper Group AWR 219

Deep interpretable architecture for plant diseases classification. Spherical Text Embedding. Metric Learning for Adversarial Robustness. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization. Unsupervised Keypoint Learning for Guiding Class-Conditional …

Deep interpretable architecture for plant diseases classification

Title Deep interpretable architecture for plant diseases classification
Authors Mohammed Brahimi, Said Mahmoudi, Kamel Boukhalfa, Abdelouhab Moussaoui
Abstract Recently, many works have been inspired by the success of deep learning in computer vision for plant diseases classification. Unfortunately, these end-to-end deep classifiers lack transparency which can limit their adoption in practice. In this paper, we propose a new trainable visualization method for plant diseases classification based on a Convolutional Neural Network (CNN) architecture composed of two deep classifiers. The first one is named Teacher and the second one Student. This architecture leverages the multitask learning to train the Teacher and the Student jointly. Then, the communicated representation between the Teacher and the Student is used as a proxy to visualize the most important image regions for classification. This new architecture produces sharper visualization than the existing methods in plant diseases context. All experiments are achieved on PlantVillage dataset that contains 54306 plant images.
Tasks
Published 2019-05-31
URL https://arxiv.org/abs/1905.13523v2
PDF https://arxiv.org/pdf/1905.13523v2.pdf
PWC https://paperswithcode.com/paper/deep-interpretable-architecture-for-plant
Repo https://github.com/Tahedi1/Teacher_Student_Architecture
Framework tf

Spherical Text Embedding

Title Spherical Text Embedding
Authors Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, Jiawei Han
Abstract Unsupervised text embedding has shown great power in a wide range of NLP tasks. While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding. To close this gap, we propose a spherical generative model based on which unsupervised word and paragraph embeddings are jointly learned. To learn text embeddings in the spherical space, we develop an efficient optimization algorithm with convergence guarantee based on Riemannian optimization. Our model enjoys high efficiency and achieves state-of-the-art performances on various text embedding tasks including word similarity and document clustering.
Tasks
Published 2019-11-04
URL https://arxiv.org/abs/1911.01196v1
PDF https://arxiv.org/pdf/1911.01196v1.pdf
PWC https://paperswithcode.com/paper/spherical-text-embedding
Repo https://github.com/yumeng5/Spherical-Text-Embedding
Framework none

Metric Learning for Adversarial Robustness

Title Metric Learning for Adversarial Robustness
Authors Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, Baishakhi Ray
Abstract Deep networks are well-known to be fragile to adversarial attacks. We conduct an empirical analysis of deep representations under the state-of-the-art attack method called PGD, and find that the attack causes the internal representation to shift closer to the “false” class. Motivated by this observation, we propose to regularize the representation space under attack with metric learning to produce more robust classifiers. By carefully sampling examples for metric learning, our learned representation not only increases robustness, but also detects previously unseen adversarial samples. Quantitative experiments show improvement of robustness accuracy by up to 4% and detection efficiency by up to 6% according to Area Under Curve score over prior work. The code of our work is available at https://github.com/columbia/Metric_Learning_Adversarial_Robustness.
Tasks Metric Learning
Published 2019-09-03
URL https://arxiv.org/abs/1909.00900v2
PDF https://arxiv.org/pdf/1909.00900v2.pdf
PWC https://paperswithcode.com/paper/metric-learning-for-adversarial-robustness
Repo https://github.com/columbia/Metric_Learning_Adversarial_Robustness
Framework tf

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Title Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Authors Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
Abstract Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
Tasks Common Sense Reasoning, Coreference Resolution, Document Summarization, Linguistic Acceptability, Machine Translation, Natural Language Inference, Question Answering, Semantic Textual Similarity, Sentiment Analysis, Text Classification, Transfer Learning, Word Sense Disambiguation
Published 2019-10-23
URL https://arxiv.org/abs/1910.10683v2
PDF https://arxiv.org/pdf/1910.10683v2.pdf
PWC https://paperswithcode.com/paper/exploring-the-limits-of-transfer-learning
Repo https://github.com/huggingface/transformers
Framework pytorch

DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization

Title DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization
Authors Rixon Crane, Fred Roosta
Abstract For optimization of a sum of functions in a distributed computing environment, we present a novel communication efficient Newton-type algorithm that enjoys a variety of advantages over similar existing methods. Similar to Newton-MR, our algorithm, DINGO, is derived by optimization of the gradient’s norm as a surrogate function. DINGO does not impose any specific form on the underlying functions, and its application range extends far beyond convexity. In addition, the distribution of the data across the computing environment can be arbitrary. Further, the underlying sub-problems of DINGO are simple linear least-squares, for which a plethora of efficient algorithms exist. Lastly, DINGO involves a few hyper-parameters that are easy to tune. Moreover, we theoretically show that DINGO is not sensitive to the choice of its hyper-parameters in that a strict reduction in the gradient norm is guaranteed, regardless of the selected hyper-parameters. We demonstrate empirical evidence of the effectiveness, stability and versatility of our method compared to other relevant algorithms.
Tasks
Published 2019-01-16
URL https://arxiv.org/abs/1901.05134v2
PDF https://arxiv.org/pdf/1901.05134v2.pdf
PWC https://paperswithcode.com/paper/dingo-distributed-newton-type-method-for
Repo https://github.com/RixonC/DINGO
Framework pytorch

Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction

Title Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction
Authors Yunji Kim, Seonghyeon Nam, In Cho, Seon Joo Kim
Abstract We propose a deep video prediction model conditioned on a single image and an action class. To generate future frames, we first detect keypoints of a moving object and predict future motion as a sequence of keypoints. The input image is then translated following the predicted keypoints sequence to compose future frames. Detecting the keypoints is central to our algorithm, and our method is trained to detect the keypoints of arbitrary objects in an unsupervised manner. Moreover, the detected keypoints of the original videos are used as pseudo-labels to learn the motion of objects. Experimental results show that our method is successfully applied to various datasets without the cost of labeling keypoints in videos. The detected keypoints are similar to human-annotated labels, and prediction results are more realistic compared to the previous methods.
Tasks Video Prediction
Published 2019-10-04
URL https://arxiv.org/abs/1910.02027v1
PDF https://arxiv.org/pdf/1910.02027v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-keypoint-learning-for-guiding
Repo https://github.com/YunjiKim/Unsupervised-Keypoint-Learning-for-Guiding-Class-conditional-Video-Prediction
Framework tf

Fully Neural Network based Model for General Temporal Point Processes

Title Fully Neural Network based Model for General Temporal Point Processes
Authors Takahiro Omi, Naonori Ueda, Kazuyuki Aihara
Abstract A temporal point process is a mathematical model for a time series of discrete events, which covers various applications. Recently, recurrent neural network (RNN) based models have been developed for point processes and have been found effective. RNN based models usually assume a specific functional form for the time course of the intensity function of a point process (e.g., exponentially decreasing or increasing with the time since the most recent event). However, such an assumption can restrict the expressive power of the model. We herein propose a novel RNN based model in which the time course of the intensity function is represented in a general manner. In our approach, we first model the integral of the intensity function using a feedforward neural network and then obtain the intensity function as its derivative. This approach enables us to both obtain a flexible model of the intensity function and exactly evaluate the log-likelihood function, which contains the integral of the intensity function, without any numerical approximations. Our model achieves competitive or superior performances compared to the previous state-of-the-art methods for both synthetic and real datasets.
Tasks Point Processes, Time Series
Published 2019-05-23
URL https://arxiv.org/abs/1905.09690v3
PDF https://arxiv.org/pdf/1905.09690v3.pdf
PWC https://paperswithcode.com/paper/fully-neural-network-based-model-for-general
Repo https://github.com/omitakahiro/NeuralNetworkPointProcess
Framework none

Multilateration of Random Networks with Community Structure

Title Multilateration of Random Networks with Community Structure
Authors Richard D. Tillquist, Manuel E. Lladser
Abstract The minimal number of nodes required to multilaterate a network endowed with geodesic distance (i.e., to uniquely identify all nodes based on shortest path distances to the selected nodes) is called its metric dimension. This quantity is related to a useful technique for embedding graphs in low-dimensional Euclidean spaces and representing the nodes of a graph numerically for downstream analyses such as vertex classification via machine learning. While metric dimension has been studied for many kinds of graphs, its behavior on the Stochastic Block Model (SBM) ensemble has not. The simple community structure of graphs in this ensemble make them interesting in a variety of contexts. Here we derive probabilistic bounds for the metric dimension of random graphs generated according to the SBM, and describe algorithms of varying complexity to find—with high probability—subsets of nodes for multilateration. Our methods are tested on SBM ensembles with parameters extracted from real-world networks. We show that our methods scale well with increasing network size as compared to the state-of-the-art Information Content Heuristic algorithm for metric dimension approximation.
Tasks
Published 2019-11-04
URL https://arxiv.org/abs/1911.01521v1
PDF https://arxiv.org/pdf/1911.01521v1.pdf
PWC https://paperswithcode.com/paper/multilateration-of-random-networks-with
Repo https://github.com/riti4538/SBM-Metric-Dimension
Framework none

Volume-preserving Neural Networks: A Solution to the Vanishing Gradient Problem

Title Volume-preserving Neural Networks: A Solution to the Vanishing Gradient Problem
Authors Gordon MacDonald, Andrew Godbout, Bryn Gillcash, Stephanie Cairns
Abstract We propose a novel approach to addressing the vanishing (or exploding) gradient problem in deep neural networks. We construct a new architecture for deep neural networks where all layers (except the output layer) of the network are a combination of rotation, permutation, diagonal, and activation sublayers which are all volume preserving. This control on the volume forces the gradient (on average) to maintain equilibrium and not explode or vanish. Volume-preserving neural networks train reliably, quickly and accurately and the learning rate is consistent across layers in deep volume-preserving neural networks. To demonstrate this we apply our volume-preserving neural network model to two standard datasets.
Tasks
Published 2019-11-21
URL https://arxiv.org/abs/1911.09576v2
PDF https://arxiv.org/pdf/1911.09576v2.pdf
PWC https://paperswithcode.com/paper/volume-preserving-neural-networks-a-solution
Repo https://github.com/andrewgodbout/VPNN_pytorch
Framework pytorch

Learning Dual Retrieval Module for Semi-supervised Relation Extraction

Title Learning Dual Retrieval Module for Semi-supervised Relation Extraction
Authors Hongtao Lin, Jun Yan, Meng Qu, Xiang Ren
Abstract Relation extraction is an important task in structuring content of text data, and becomes especially challenging when learning with weak supervision—where only a limited number of labeled sentences are given and a large number of unlabeled sentences are available. Most existing work exploits unlabeled data based on the ideas of self-training (i.e., bootstrapping a model) and multi-view learning (e.g., ensembling multiple model variants). However, these methods either suffer from the issue of semantic drift, or do not fully capture the problem characteristics of relation extraction. In this paper, we leverage a key insight that retrieving sentences expressing a relation is a dual task of predicting relation label for a given sentence—two tasks are complementary to each other and can be optimized jointly for mutual enhancement. To model this intuition, we propose DualRE, a principled framework that introduces a retrieval module which is jointly trained with the original relation prediction module. In this way, high-quality samples selected by retrieval module from unlabeled data can be used to improve prediction module, and vice versa. Experimental results\footnote{\small Code and data can be found at \url{https://github.com/INK-USC/DualRE}.} on two public datasets as well as case studies demonstrate the effectiveness of the DualRE approach.
Tasks MULTI-VIEW LEARNING, Relation Extraction
Published 2019-02-20
URL http://arxiv.org/abs/1902.07814v2
PDF http://arxiv.org/pdf/1902.07814v2.pdf
PWC https://paperswithcode.com/paper/learning-dual-retrieval-module-for-semi
Repo https://github.com/INK-USC/DualRE
Framework pytorch

Reinforced Dynamic Reasoning for Conversational Question Generation

Title Reinforced Dynamic Reasoning for Conversational Question Generation
Authors Boyuan Pan, Hao Li, Ziyu Yao, Deng Cai, Huan Sun
Abstract This paper investigates a new task named Conversational Question Generation (CQG) which is to generate a question based on a passage and a conversation history (i.e., previous turns of question-answer pairs). CQG is a crucial task for developing intelligent agents that can drive question-answering style conversations or test user understanding of a given passage. Towards that end, we propose a new approach named Reinforced Dynamic Reasoning (ReDR) network, which is based on the general encoder-decoder framework but incorporates a reasoning procedure in a dynamic manner to better understand what has been asked and what to ask next about the passage. To encourage producing meaningful questions, we leverage a popular question answering (QA) model to provide feedback and fine-tune the question generator using a reinforcement learning mechanism. Empirical results on the recently released CoQA dataset demonstrate the effectiveness of our method in comparison with various baselines and model variants. Moreover, to show the applicability of our method, we also apply it to create multi-turn question-answering conversations for passages in SQuAD.
Tasks Question Answering, Question Generation
Published 2019-07-29
URL https://arxiv.org/abs/1907.12667v1
PDF https://arxiv.org/pdf/1907.12667v1.pdf
PWC https://paperswithcode.com/paper/reinforced-dynamic-reasoning-for-1
Repo https://github.com/ZJULearning/ReDR
Framework pytorch

A Survey on Recent Advances in Named Entity Recognition from Deep Learning models

Title A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
Authors Vikas Yadav, Steven Bethard
Abstract Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms. Our results highlight the improvements achieved by neural networks, and show how incorporating some of the lessons learned from past work on feature-based NER systems can yield further improvements.
Tasks Feature Engineering, Information Retrieval, Named Entity Recognition, Question Answering, Relation Extraction
Published 2019-10-25
URL https://arxiv.org/abs/1910.11470v1
PDF https://arxiv.org/pdf/1910.11470v1.pdf
PWC https://paperswithcode.com/paper/a-survey-on-recent-advances-in-named-entity-2
Repo https://github.com/vikas95/Pref_Suff_Span_NN
Framework none

Structured Minimally Supervised Learning for Neural Relation Extraction

Title Structured Minimally Supervised Learning for Neural Relation Extraction
Authors Fan Bai, Alan Ritter
Abstract We present an approach to minimally supervised relation extraction that combines the benefits of learned representations and structured learning, and accurately predicts sentence-level relation mentions given only proposition-level supervision from a KB. By explicitly reasoning about missing data during learning, our approach enables large-scale training of 1D convolutional neural networks while mitigating the issue of label noise inherent in distant supervision. Our approach achieves state-of-the-art results on minimally supervised sentential relation extraction, outperforming a number of baselines, including a competitive approach that uses the attention layer of a purely neural model.
Tasks Relation Extraction
Published 2019-03-29
URL https://arxiv.org/abs/1904.00118v5
PDF https://arxiv.org/pdf/1904.00118v5.pdf
PWC https://paperswithcode.com/paper/structured-minimally-supervised-learning-for
Repo https://github.com/bflashcp3f/PCNN-NMAR
Framework pytorch

A tree-based radial basis function method for noisy parallel surrogate optimization

Title A tree-based radial basis function method for noisy parallel surrogate optimization
Authors Chenchao Shou, Matthew West
Abstract Parallel surrogate optimization algorithms have proven to be efficient methods for solving expensive noisy optimization problems. In this work we develop a new parallel surrogate optimization algorithm (ProSRS), using a novel tree-based “zoom strategy” to improve the efficiency of the algorithm. We prove that if ProSRS is run for sufficiently long, with probability converging to one there will be at least one point among all the evaluations that will be arbitrarily close to the global minimum. We compare our algorithm to several state-of-the-art Bayesian optimization algorithms on a suite of standard benchmark functions and two real machine learning hyperparameter-tuning problems. We find that our algorithm not only achieves significantly faster optimization convergence, but is also 1-4 orders of magnitude cheaper in computational cost.
Tasks
Published 2019-08-21
URL https://arxiv.org/abs/1908.07980v1
PDF https://arxiv.org/pdf/1908.07980v1.pdf
PWC https://paperswithcode.com/paper/190807980
Repo https://github.com/compdyn/ProSRS
Framework none

Flow Models for Arbitrary Conditional Likelihoods

Title Flow Models for Arbitrary Conditional Likelihoods
Authors Yang Li, Shoaib Akbar, Junier B. Oliva
Abstract Understanding the dependencies among features of a dataset is at the core of most unsupervised learning tasks. However, a majority of generative modeling approaches are focused solely on the joint distribution $p(x)$ and utilize models where it is intractable to obtain the conditional distribution of some arbitrary subset of features $x_u$ given the rest of the observed covariates $x_o$: $p(x_u \mid x_o)$. Traditional conditional approaches provide a model for a fixed set of covariates conditioned on another fixed set of observed covariates. Instead, in this work we develop a model that is capable of yielding all conditional distributions $p(x_u \mid x_o)$ (for arbitrary $x_u$) via tractable conditional likelihoods. We propose a novel extension of (change of variables based) flow generative models, arbitrary conditioning flow models (AC-Flow), that can be conditioned on arbitrary subsets of observed covariates, which was previously infeasible. We apply AC-Flow to the imputation of features, and also develop a unified platform for both multiple and single imputation by introducing an auxiliary objective that provides a principled single “best guess” for flow models. Extensive empirical evaluations show that our models achieve state-of-the-art performance in both single and multiple imputation across image inpainting and feature imputation in synthetic and real-world datasets. Code is available at https://github.com/lupalab/ACFlow.
Tasks Image Inpainting, Imputation
Published 2019-09-13
URL https://arxiv.org/abs/1909.06319v1
PDF https://arxiv.org/pdf/1909.06319v1.pdf
PWC https://paperswithcode.com/paper/flow-models-for-arbitrary-conditional
Repo https://github.com/lupalab/ACFlow
Framework tf
comments powered by Disqus