February 1, 2020

2860 words 14 mins read

Paper Group AWR 219

Deep interpretable architecture for plant diseases classification. Spherical Text Embedding. Metric Learning for Adversarial Robustness. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization. Unsupervised Keypoint Learning for Guiding Class-Conditional …

Deep interpretable architecture for plant diseases classification


Title	Deep interpretable architecture for plant diseases classification
Authors	Mohammed Brahimi, Said Mahmoudi, Kamel Boukhalfa, Abdelouhab Moussaoui
Abstract	Recently, many works have been inspired by the success of deep learning in computer vision for plant diseases classification. Unfortunately, these end-to-end deep classifiers lack transparency which can limit their adoption in practice. In this paper, we propose a new trainable visualization method for plant diseases classification based on a Convolutional Neural Network (CNN) architecture composed of two deep classifiers. The first one is named Teacher and the second one Student. This architecture leverages the multitask learning to train the Teacher and the Student jointly. Then, the communicated representation between the Teacher and the Student is used as a proxy to visualize the most important image regions for classification. This new architecture produces sharper visualization than the existing methods in plant diseases context. All experiments are achieved on PlantVillage dataset that contains 54306 plant images.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13523v2
PDF	https://arxiv.org/pdf/1905.13523v2.pdf
PWC	https://paperswithcode.com/paper/deep-interpretable-architecture-for-plant
Repo	https://github.com/Tahedi1/Teacher_Student_Architecture
Framework	tf

Spherical Text Embedding


Title	Spherical Text Embedding
Authors	Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, Jiawei Han
Abstract	Unsupervised text embedding has shown great power in a wide range of NLP tasks. While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding. To close this gap, we propose a spherical generative model based on which unsupervised word and paragraph embeddings are jointly learned. To learn text embeddings in the spherical space, we develop an efficient optimization algorithm with convergence guarantee based on Riemannian optimization. Our model enjoys high efficiency and achieves state-of-the-art performances on various text embedding tasks including word similarity and document clustering.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01196v1
PDF	https://arxiv.org/pdf/1911.01196v1.pdf
PWC	https://paperswithcode.com/paper/spherical-text-embedding
Repo	https://github.com/yumeng5/Spherical-Text-Embedding
Framework	none

Metric Learning for Adversarial Robustness


Title	Metric Learning for Adversarial Robustness
Authors	Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, Baishakhi Ray
Abstract	Deep networks are well-known to be fragile to adversarial attacks. We conduct an empirical analysis of deep representations under the state-of-the-art attack method called PGD, and find that the attack causes the internal representation to shift closer to the “false” class. Motivated by this observation, we propose to regularize the representation space under attack with metric learning to produce more robust classifiers. By carefully sampling examples for metric learning, our learned representation not only increases robustness, but also detects previously unseen adversarial samples. Quantitative experiments show improvement of robustness accuracy by up to 4% and detection efficiency by up to 6% according to Area Under Curve score over prior work. The code of our work is available at https://github.com/columbia/Metric_Learning_Adversarial_Robustness.
Tasks	Metric Learning
Published	2019-09-03
URL	https://arxiv.org/abs/1909.00900v2
PDF	https://arxiv.org/pdf/1909.00900v2.pdf
PWC	https://paperswithcode.com/paper/metric-learning-for-adversarial-robustness
Repo	https://github.com/columbia/Metric_Learning_Adversarial_Robustness
Framework	tf

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer


Title	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Authors	Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
Abstract	Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
Tasks	Common Sense Reasoning, Coreference Resolution, Document Summarization, Linguistic Acceptability, Machine Translation, Natural Language Inference, Question Answering, Semantic Textual Similarity, Sentiment Analysis, Text Classification, Transfer Learning, Word Sense Disambiguation
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10683v2
PDF	https://arxiv.org/pdf/1910.10683v2.pdf
PWC	https://paperswithcode.com/paper/exploring-the-limits-of-transfer-learning
Repo	https://github.com/huggingface/transformers
Framework	pytorch

DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization


Title	DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization
Authors	Rixon Crane, Fred Roosta
Abstract	For optimization of a sum of functions in a distributed computing environment, we present a novel communication efficient Newton-type algorithm that enjoys a variety of advantages over similar existing methods. Similar to Newton-MR, our algorithm, DINGO, is derived by optimization of the gradient’s norm as a surrogate function. DINGO does not impose any specific form on the underlying functions, and its application range extends far beyond convexity. In addition, the distribution of the data across the computing environment can be arbitrary. Further, the underlying sub-problems of DINGO are simple linear least-squares, for which a plethora of efficient algorithms exist. Lastly, DINGO involves a few hyper-parameters that are easy to tune. Moreover, we theoretically show that DINGO is not sensitive to the choice of its hyper-parameters in that a strict reduction in the gradient norm is guaranteed, regardless of the selected hyper-parameters. We demonstrate empirical evidence of the effectiveness, stability and versatility of our method compared to other relevant algorithms.
Tasks
Published	2019-01-16
URL	https://arxiv.org/abs/1901.05134v2
PDF	https://arxiv.org/pdf/1901.05134v2.pdf
PWC	https://paperswithcode.com/paper/dingo-distributed-newton-type-method-for
Repo	https://github.com/RixonC/DINGO
Framework	pytorch

Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction


Title	Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction
Authors	Yunji Kim, Seonghyeon Nam, In Cho, Seon Joo Kim
Abstract	We propose a deep video prediction model conditioned on a single image and an action class. To generate future frames, we first detect keypoints of a moving object and predict future motion as a sequence of keypoints. The input image is then translated following the predicted keypoints sequence to compose future frames. Detecting the keypoints is central to our algorithm, and our method is trained to detect the keypoints of arbitrary objects in an unsupervised manner. Moreover, the detected keypoints of the original videos are used as pseudo-labels to learn the motion of objects. Experimental results show that our method is successfully applied to various datasets without the cost of labeling keypoints in videos. The detected keypoints are similar to human-annotated labels, and prediction results are more realistic compared to the previous methods.
Tasks	Video Prediction
Published	2019-10-04
URL	https://arxiv.org/abs/1910.02027v1
PDF	https://arxiv.org/pdf/1910.02027v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-keypoint-learning-for-guiding
Repo	https://github.com/YunjiKim/Unsupervised-Keypoint-Learning-for-Guiding-Class-conditional-Video-Prediction
Framework	tf

Fully Neural Network based Model for General Temporal Point Processes


Title	Fully Neural Network based Model for General Temporal Point Processes
Authors	Takahiro Omi, Naonori Ueda, Kazuyuki Aihara
Abstract	A temporal point process is a mathematical model for a time series of discrete events, which covers various applications. Recently, recurrent neural network (RNN) based models have been developed for point processes and have been found effective. RNN based models usually assume a specific functional form for the time course of the intensity function of a point process (e.g., exponentially decreasing or increasing with the time since the most recent event). However, such an assumption can restrict the expressive power of the model. We herein propose a novel RNN based model in which the time course of the intensity function is represented in a general manner. In our approach, we first model the integral of the intensity function using a feedforward neural network and then obtain the intensity function as its derivative. This approach enables us to both obtain a flexible model of the intensity function and exactly evaluate the log-likelihood function, which contains the integral of the intensity function, without any numerical approximations. Our model achieves competitive or superior performances compared to the previous state-of-the-art methods for both synthetic and real datasets.
Tasks	Point Processes, Time Series
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09690v3
PDF	https://arxiv.org/pdf/1905.09690v3.pdf
PWC	https://paperswithcode.com/paper/fully-neural-network-based-model-for-general
Repo	https://github.com/omitakahiro/NeuralNetworkPointProcess
Framework	none

Multilateration of Random Networks with Community Structure


Title	Multilateration of Random Networks with Community Structure
Authors	Richard D. Tillquist, Manuel E. Lladser
Abstract	The minimal number of nodes required to multilaterate a network endowed with geodesic distance (i.e., to uniquely identify all nodes based on shortest path distances to the selected nodes) is called its metric dimension. This quantity is related to a useful technique for embedding graphs in low-dimensional Euclidean spaces and representing the nodes of a graph numerically for downstream analyses such as vertex classification via machine learning. While metric dimension has been studied for many kinds of graphs, its behavior on the Stochastic Block Model (SBM) ensemble has not. The simple community structure of graphs in this ensemble make them interesting in a variety of contexts. Here we derive probabilistic bounds for the metric dimension of random graphs generated according to the SBM, and describe algorithms of varying complexity to find—with high probability—subsets of nodes for multilateration. Our methods are tested on SBM ensembles with parameters extracted from real-world networks. We show that our methods scale well with increasing network size as compared to the state-of-the-art Information Content Heuristic algorithm for metric dimension approximation.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01521v1
PDF	https://arxiv.org/pdf/1911.01521v1.pdf
PWC	https://paperswithcode.com/paper/multilateration-of-random-networks-with
Repo	https://github.com/riti4538/SBM-Metric-Dimension
Framework	none

Volume-preserving Neural Networks: A Solution to the Vanishing Gradient Problem


Title	Volume-preserving Neural Networks: A Solution to the Vanishing Gradient Problem
Authors	Gordon MacDonald, Andrew Godbout, Bryn Gillcash, Stephanie Cairns
Abstract	We propose a novel approach to addressing the vanishing (or exploding) gradient problem in deep neural networks. We construct a new architecture for deep neural networks where all layers (except the output layer) of the network are a combination of rotation, permutation, diagonal, and activation sublayers which are all volume preserving. This control on the volume forces the gradient (on average) to maintain equilibrium and not explode or vanish. Volume-preserving neural networks train reliably, quickly and accurately and the learning rate is consistent across layers in deep volume-preserving neural networks. To demonstrate this we apply our volume-preserving neural network model to two standard datasets.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09576v2
PDF	https://arxiv.org/pdf/1911.09576v2.pdf
PWC	https://paperswithcode.com/paper/volume-preserving-neural-networks-a-solution
Repo	https://github.com/andrewgodbout/VPNN_pytorch
Framework	pytorch

Learning Dual Retrieval Module for Semi-supervised Relation Extraction


Title	Learning Dual Retrieval Module for Semi-supervised Relation Extraction
Authors	Hongtao Lin, Jun Yan, Meng Qu, Xiang Ren
Abstract	Relation extraction is an important task in structuring content of text data, and becomes especially challenging when learning with weak supervision—where only a limited number of labeled sentences are given and a large number of unlabeled sentences are available. Most existing work exploits unlabeled data based on the ideas of self-training (i.e., bootstrapping a model) and multi-view learning (e.g., ensembling multiple model variants). However, these methods either suffer from the issue of semantic drift, or do not fully capture the problem characteristics of relation extraction. In this paper, we leverage a key insight that retrieving sentences expressing a relation is a dual task of predicting relation label for a given sentence—two tasks are complementary to each other and can be optimized jointly for mutual enhancement. To model this intuition, we propose DualRE, a principled framework that introduces a retrieval module which is jointly trained with the original relation prediction module. In this way, high-quality samples selected by retrieval module from unlabeled data can be used to improve prediction module, and vice versa. Experimental results\footnote{\small Code and data can be found at \url{https://github.com/INK-USC/DualRE}.} on two public datasets as well as case studies demonstrate the effectiveness of the DualRE approach.
Tasks	MULTI-VIEW LEARNING, Relation Extraction
Published	2019-02-20
URL	http://arxiv.org/abs/1902.07814v2
PDF	http://arxiv.org/pdf/1902.07814v2.pdf
PWC	https://paperswithcode.com/paper/learning-dual-retrieval-module-for-semi
Repo	https://github.com/INK-USC/DualRE
Framework	pytorch

Reinforced Dynamic Reasoning for Conversational Question Generation


Title	Reinforced Dynamic Reasoning for Conversational Question Generation
Authors	Boyuan Pan, Hao Li, Ziyu Yao, Deng Cai, Huan Sun
Abstract	This paper investigates a new task named Conversational Question Generation (CQG) which is to generate a question based on a passage and a conversation history (i.e., previous turns of question-answer pairs). CQG is a crucial task for developing intelligent agents that can drive question-answering style conversations or test user understanding of a given passage. Towards that end, we propose a new approach named Reinforced Dynamic Reasoning (ReDR) network, which is based on the general encoder-decoder framework but incorporates a reasoning procedure in a dynamic manner to better understand what has been asked and what to ask next about the passage. To encourage producing meaningful questions, we leverage a popular question answering (QA) model to provide feedback and fine-tune the question generator using a reinforcement learning mechanism. Empirical results on the recently released CoQA dataset demonstrate the effectiveness of our method in comparison with various baselines and model variants. Moreover, to show the applicability of our method, we also apply it to create multi-turn question-answering conversations for passages in SQuAD.
Tasks	Question Answering, Question Generation
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12667v1
PDF	https://arxiv.org/pdf/1907.12667v1.pdf
PWC	https://paperswithcode.com/paper/reinforced-dynamic-reasoning-for-1
Repo	https://github.com/ZJULearning/ReDR
Framework	pytorch

A Survey on Recent Advances in Named Entity Recognition from Deep Learning models


Title	A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
Authors	Vikas Yadav, Steven Bethard
Abstract	Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms. Our results highlight the improvements achieved by neural networks, and show how incorporating some of the lessons learned from past work on feature-based NER systems can yield further improvements.
Tasks	Feature Engineering, Information Retrieval, Named Entity Recognition, Question Answering, Relation Extraction
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11470v1
PDF	https://arxiv.org/pdf/1910.11470v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-on-recent-advances-in-named-entity-2
Repo	https://github.com/vikas95/Pref_Suff_Span_NN
Framework	none

Structured Minimally Supervised Learning for Neural Relation Extraction


Title	Structured Minimally Supervised Learning for Neural Relation Extraction
Authors	Fan Bai, Alan Ritter
Abstract	We present an approach to minimally supervised relation extraction that combines the benefits of learned representations and structured learning, and accurately predicts sentence-level relation mentions given only proposition-level supervision from a KB. By explicitly reasoning about missing data during learning, our approach enables large-scale training of 1D convolutional neural networks while mitigating the issue of label noise inherent in distant supervision. Our approach achieves state-of-the-art results on minimally supervised sentential relation extraction, outperforming a number of baselines, including a competitive approach that uses the attention layer of a purely neural model.
Tasks	Relation Extraction
Published	2019-03-29
URL	https://arxiv.org/abs/1904.00118v5
PDF	https://arxiv.org/pdf/1904.00118v5.pdf
PWC	https://paperswithcode.com/paper/structured-minimally-supervised-learning-for
Repo	https://github.com/bflashcp3f/PCNN-NMAR
Framework	pytorch

A tree-based radial basis function method for noisy parallel surrogate optimization


Title	A tree-based radial basis function method for noisy parallel surrogate optimization
Authors	Chenchao Shou, Matthew West
Abstract	Parallel surrogate optimization algorithms have proven to be efficient methods for solving expensive noisy optimization problems. In this work we develop a new parallel surrogate optimization algorithm (ProSRS), using a novel tree-based “zoom strategy” to improve the efficiency of the algorithm. We prove that if ProSRS is run for sufficiently long, with probability converging to one there will be at least one point among all the evaluations that will be arbitrarily close to the global minimum. We compare our algorithm to several state-of-the-art Bayesian optimization algorithms on a suite of standard benchmark functions and two real machine learning hyperparameter-tuning problems. We find that our algorithm not only achieves significantly faster optimization convergence, but is also 1-4 orders of magnitude cheaper in computational cost.
Tasks
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07980v1
PDF	https://arxiv.org/pdf/1908.07980v1.pdf
PWC	https://paperswithcode.com/paper/190807980
Repo	https://github.com/compdyn/ProSRS
Framework	none

Flow Models for Arbitrary Conditional Likelihoods


Title	Flow Models for Arbitrary Conditional Likelihoods
Authors	Yang Li, Shoaib Akbar, Junier B. Oliva
Abstract	Understanding the dependencies among features of a dataset is at the core of most unsupervised learning tasks. However, a majority of generative modeling approaches are focused solely on the joint distribution $p(x)$ and utilize models where it is intractable to obtain the conditional distribution of some arbitrary subset of features $x_u$ given the rest of the observed covariates $x_o$: $p(x_u \mid x_o)$. Traditional conditional approaches provide a model for a fixed set of covariates conditioned on another fixed set of observed covariates. Instead, in this work we develop a model that is capable of yielding all conditional distributions $p(x_u \mid x_o)$ (for arbitrary $x_u$) via tractable conditional likelihoods. We propose a novel extension of (change of variables based) flow generative models, arbitrary conditioning flow models (AC-Flow), that can be conditioned on arbitrary subsets of observed covariates, which was previously infeasible. We apply AC-Flow to the imputation of features, and also develop a unified platform for both multiple and single imputation by introducing an auxiliary objective that provides a principled single “best guess” for flow models. Extensive empirical evaluations show that our models achieve state-of-the-art performance in both single and multiple imputation across image inpainting and feature imputation in synthetic and real-world datasets. Code is available at https://github.com/lupalab/ACFlow.
Tasks	Image Inpainting, Imputation
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06319v1
PDF	https://arxiv.org/pdf/1909.06319v1.pdf
PWC	https://paperswithcode.com/paper/flow-models-for-arbitrary-conditional
Repo	https://github.com/lupalab/ACFlow
Framework	tf