Paper Group AWR 219
Deep interpretable architecture for plant diseases classification. Spherical Text Embedding. Metric Learning for Adversarial Robustness. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization. Unsupervised Keypoint Learning for Guiding Class-Conditional …
Deep interpretable architecture for plant diseases classification
Title | Deep interpretable architecture for plant diseases classification |
Authors | Mohammed Brahimi, Said Mahmoudi, Kamel Boukhalfa, Abdelouhab Moussaoui |
Abstract | Recently, many works have been inspired by the success of deep learning in computer vision for plant diseases classification. Unfortunately, these end-to-end deep classifiers lack transparency which can limit their adoption in practice. In this paper, we propose a new trainable visualization method for plant diseases classification based on a Convolutional Neural Network (CNN) architecture composed of two deep classifiers. The first one is named Teacher and the second one Student. This architecture leverages the multitask learning to train the Teacher and the Student jointly. Then, the communicated representation between the Teacher and the Student is used as a proxy to visualize the most important image regions for classification. This new architecture produces sharper visualization than the existing methods in plant diseases context. All experiments are achieved on PlantVillage dataset that contains 54306 plant images. |
Tasks | |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1905.13523v2 |
https://arxiv.org/pdf/1905.13523v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-interpretable-architecture-for-plant |
Repo | https://github.com/Tahedi1/Teacher_Student_Architecture |
Framework | tf |
Spherical Text Embedding
Title | Spherical Text Embedding |
Authors | Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, Jiawei Han |
Abstract | Unsupervised text embedding has shown great power in a wide range of NLP tasks. While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding. To close this gap, we propose a spherical generative model based on which unsupervised word and paragraph embeddings are jointly learned. To learn text embeddings in the spherical space, we develop an efficient optimization algorithm with convergence guarantee based on Riemannian optimization. Our model enjoys high efficiency and achieves state-of-the-art performances on various text embedding tasks including word similarity and document clustering. |
Tasks | |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01196v1 |
https://arxiv.org/pdf/1911.01196v1.pdf | |
PWC | https://paperswithcode.com/paper/spherical-text-embedding |
Repo | https://github.com/yumeng5/Spherical-Text-Embedding |
Framework | none |
Metric Learning for Adversarial Robustness
Title | Metric Learning for Adversarial Robustness |
Authors | Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, Baishakhi Ray |
Abstract | Deep networks are well-known to be fragile to adversarial attacks. We conduct an empirical analysis of deep representations under the state-of-the-art attack method called PGD, and find that the attack causes the internal representation to shift closer to the “false” class. Motivated by this observation, we propose to regularize the representation space under attack with metric learning to produce more robust classifiers. By carefully sampling examples for metric learning, our learned representation not only increases robustness, but also detects previously unseen adversarial samples. Quantitative experiments show improvement of robustness accuracy by up to 4% and detection efficiency by up to 6% according to Area Under Curve score over prior work. The code of our work is available at https://github.com/columbia/Metric_Learning_Adversarial_Robustness. |
Tasks | Metric Learning |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.00900v2 |
https://arxiv.org/pdf/1909.00900v2.pdf | |
PWC | https://paperswithcode.com/paper/metric-learning-for-adversarial-robustness |
Repo | https://github.com/columbia/Metric_Learning_Adversarial_Robustness |
Framework | tf |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Title | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
Authors | Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu |
Abstract | Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code. |
Tasks | Common Sense Reasoning, Coreference Resolution, Document Summarization, Linguistic Acceptability, Machine Translation, Natural Language Inference, Question Answering, Semantic Textual Similarity, Sentiment Analysis, Text Classification, Transfer Learning, Word Sense Disambiguation |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10683v2 |
https://arxiv.org/pdf/1910.10683v2.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-limits-of-transfer-learning |
Repo | https://github.com/huggingface/transformers |
Framework | pytorch |
DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization
Title | DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization |
Authors | Rixon Crane, Fred Roosta |
Abstract | For optimization of a sum of functions in a distributed computing environment, we present a novel communication efficient Newton-type algorithm that enjoys a variety of advantages over similar existing methods. Similar to Newton-MR, our algorithm, DINGO, is derived by optimization of the gradient’s norm as a surrogate function. DINGO does not impose any specific form on the underlying functions, and its application range extends far beyond convexity. In addition, the distribution of the data across the computing environment can be arbitrary. Further, the underlying sub-problems of DINGO are simple linear least-squares, for which a plethora of efficient algorithms exist. Lastly, DINGO involves a few hyper-parameters that are easy to tune. Moreover, we theoretically show that DINGO is not sensitive to the choice of its hyper-parameters in that a strict reduction in the gradient norm is guaranteed, regardless of the selected hyper-parameters. We demonstrate empirical evidence of the effectiveness, stability and versatility of our method compared to other relevant algorithms. |
Tasks | |
Published | 2019-01-16 |
URL | https://arxiv.org/abs/1901.05134v2 |
https://arxiv.org/pdf/1901.05134v2.pdf | |
PWC | https://paperswithcode.com/paper/dingo-distributed-newton-type-method-for |
Repo | https://github.com/RixonC/DINGO |
Framework | pytorch |
Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction
Title | Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction |
Authors | Yunji Kim, Seonghyeon Nam, In Cho, Seon Joo Kim |
Abstract | We propose a deep video prediction model conditioned on a single image and an action class. To generate future frames, we first detect keypoints of a moving object and predict future motion as a sequence of keypoints. The input image is then translated following the predicted keypoints sequence to compose future frames. Detecting the keypoints is central to our algorithm, and our method is trained to detect the keypoints of arbitrary objects in an unsupervised manner. Moreover, the detected keypoints of the original videos are used as pseudo-labels to learn the motion of objects. Experimental results show that our method is successfully applied to various datasets without the cost of labeling keypoints in videos. The detected keypoints are similar to human-annotated labels, and prediction results are more realistic compared to the previous methods. |
Tasks | Video Prediction |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.02027v1 |
https://arxiv.org/pdf/1910.02027v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-keypoint-learning-for-guiding |
Repo | https://github.com/YunjiKim/Unsupervised-Keypoint-Learning-for-Guiding-Class-conditional-Video-Prediction |
Framework | tf |
Fully Neural Network based Model for General Temporal Point Processes
Title | Fully Neural Network based Model for General Temporal Point Processes |
Authors | Takahiro Omi, Naonori Ueda, Kazuyuki Aihara |
Abstract | A temporal point process is a mathematical model for a time series of discrete events, which covers various applications. Recently, recurrent neural network (RNN) based models have been developed for point processes and have been found effective. RNN based models usually assume a specific functional form for the time course of the intensity function of a point process (e.g., exponentially decreasing or increasing with the time since the most recent event). However, such an assumption can restrict the expressive power of the model. We herein propose a novel RNN based model in which the time course of the intensity function is represented in a general manner. In our approach, we first model the integral of the intensity function using a feedforward neural network and then obtain the intensity function as its derivative. This approach enables us to both obtain a flexible model of the intensity function and exactly evaluate the log-likelihood function, which contains the integral of the intensity function, without any numerical approximations. Our model achieves competitive or superior performances compared to the previous state-of-the-art methods for both synthetic and real datasets. |
Tasks | Point Processes, Time Series |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09690v3 |
https://arxiv.org/pdf/1905.09690v3.pdf | |
PWC | https://paperswithcode.com/paper/fully-neural-network-based-model-for-general |
Repo | https://github.com/omitakahiro/NeuralNetworkPointProcess |
Framework | none |
Multilateration of Random Networks with Community Structure
Title | Multilateration of Random Networks with Community Structure |
Authors | Richard D. Tillquist, Manuel E. Lladser |
Abstract | The minimal number of nodes required to multilaterate a network endowed with geodesic distance (i.e., to uniquely identify all nodes based on shortest path distances to the selected nodes) is called its metric dimension. This quantity is related to a useful technique for embedding graphs in low-dimensional Euclidean spaces and representing the nodes of a graph numerically for downstream analyses such as vertex classification via machine learning. While metric dimension has been studied for many kinds of graphs, its behavior on the Stochastic Block Model (SBM) ensemble has not. The simple community structure of graphs in this ensemble make them interesting in a variety of contexts. Here we derive probabilistic bounds for the metric dimension of random graphs generated according to the SBM, and describe algorithms of varying complexity to find—with high probability—subsets of nodes for multilateration. Our methods are tested on SBM ensembles with parameters extracted from real-world networks. We show that our methods scale well with increasing network size as compared to the state-of-the-art Information Content Heuristic algorithm for metric dimension approximation. |
Tasks | |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01521v1 |
https://arxiv.org/pdf/1911.01521v1.pdf | |
PWC | https://paperswithcode.com/paper/multilateration-of-random-networks-with |
Repo | https://github.com/riti4538/SBM-Metric-Dimension |
Framework | none |
Volume-preserving Neural Networks: A Solution to the Vanishing Gradient Problem
Title | Volume-preserving Neural Networks: A Solution to the Vanishing Gradient Problem |
Authors | Gordon MacDonald, Andrew Godbout, Bryn Gillcash, Stephanie Cairns |
Abstract | We propose a novel approach to addressing the vanishing (or exploding) gradient problem in deep neural networks. We construct a new architecture for deep neural networks where all layers (except the output layer) of the network are a combination of rotation, permutation, diagonal, and activation sublayers which are all volume preserving. This control on the volume forces the gradient (on average) to maintain equilibrium and not explode or vanish. Volume-preserving neural networks train reliably, quickly and accurately and the learning rate is consistent across layers in deep volume-preserving neural networks. To demonstrate this we apply our volume-preserving neural network model to two standard datasets. |
Tasks | |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09576v2 |
https://arxiv.org/pdf/1911.09576v2.pdf | |
PWC | https://paperswithcode.com/paper/volume-preserving-neural-networks-a-solution |
Repo | https://github.com/andrewgodbout/VPNN_pytorch |
Framework | pytorch |
Learning Dual Retrieval Module for Semi-supervised Relation Extraction
Title | Learning Dual Retrieval Module for Semi-supervised Relation Extraction |
Authors | Hongtao Lin, Jun Yan, Meng Qu, Xiang Ren |
Abstract | Relation extraction is an important task in structuring content of text data, and becomes especially challenging when learning with weak supervision—where only a limited number of labeled sentences are given and a large number of unlabeled sentences are available. Most existing work exploits unlabeled data based on the ideas of self-training (i.e., bootstrapping a model) and multi-view learning (e.g., ensembling multiple model variants). However, these methods either suffer from the issue of semantic drift, or do not fully capture the problem characteristics of relation extraction. In this paper, we leverage a key insight that retrieving sentences expressing a relation is a dual task of predicting relation label for a given sentence—two tasks are complementary to each other and can be optimized jointly for mutual enhancement. To model this intuition, we propose DualRE, a principled framework that introduces a retrieval module which is jointly trained with the original relation prediction module. In this way, high-quality samples selected by retrieval module from unlabeled data can be used to improve prediction module, and vice versa. Experimental results\footnote{\small Code and data can be found at \url{https://github.com/INK-USC/DualRE}.} on two public datasets as well as case studies demonstrate the effectiveness of the DualRE approach. |
Tasks | MULTI-VIEW LEARNING, Relation Extraction |
Published | 2019-02-20 |
URL | http://arxiv.org/abs/1902.07814v2 |
http://arxiv.org/pdf/1902.07814v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-dual-retrieval-module-for-semi |
Repo | https://github.com/INK-USC/DualRE |
Framework | pytorch |
Reinforced Dynamic Reasoning for Conversational Question Generation
Title | Reinforced Dynamic Reasoning for Conversational Question Generation |
Authors | Boyuan Pan, Hao Li, Ziyu Yao, Deng Cai, Huan Sun |
Abstract | This paper investigates a new task named Conversational Question Generation (CQG) which is to generate a question based on a passage and a conversation history (i.e., previous turns of question-answer pairs). CQG is a crucial task for developing intelligent agents that can drive question-answering style conversations or test user understanding of a given passage. Towards that end, we propose a new approach named Reinforced Dynamic Reasoning (ReDR) network, which is based on the general encoder-decoder framework but incorporates a reasoning procedure in a dynamic manner to better understand what has been asked and what to ask next about the passage. To encourage producing meaningful questions, we leverage a popular question answering (QA) model to provide feedback and fine-tune the question generator using a reinforcement learning mechanism. Empirical results on the recently released CoQA dataset demonstrate the effectiveness of our method in comparison with various baselines and model variants. Moreover, to show the applicability of our method, we also apply it to create multi-turn question-answering conversations for passages in SQuAD. |
Tasks | Question Answering, Question Generation |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12667v1 |
https://arxiv.org/pdf/1907.12667v1.pdf | |
PWC | https://paperswithcode.com/paper/reinforced-dynamic-reasoning-for-1 |
Repo | https://github.com/ZJULearning/ReDR |
Framework | pytorch |
A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
Title | A Survey on Recent Advances in Named Entity Recognition from Deep Learning models |
Authors | Vikas Yadav, Steven Bethard |
Abstract | Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms. Our results highlight the improvements achieved by neural networks, and show how incorporating some of the lessons learned from past work on feature-based NER systems can yield further improvements. |
Tasks | Feature Engineering, Information Retrieval, Named Entity Recognition, Question Answering, Relation Extraction |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11470v1 |
https://arxiv.org/pdf/1910.11470v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-recent-advances-in-named-entity-2 |
Repo | https://github.com/vikas95/Pref_Suff_Span_NN |
Framework | none |
Structured Minimally Supervised Learning for Neural Relation Extraction
Title | Structured Minimally Supervised Learning for Neural Relation Extraction |
Authors | Fan Bai, Alan Ritter |
Abstract | We present an approach to minimally supervised relation extraction that combines the benefits of learned representations and structured learning, and accurately predicts sentence-level relation mentions given only proposition-level supervision from a KB. By explicitly reasoning about missing data during learning, our approach enables large-scale training of 1D convolutional neural networks while mitigating the issue of label noise inherent in distant supervision. Our approach achieves state-of-the-art results on minimally supervised sentential relation extraction, outperforming a number of baselines, including a competitive approach that uses the attention layer of a purely neural model. |
Tasks | Relation Extraction |
Published | 2019-03-29 |
URL | https://arxiv.org/abs/1904.00118v5 |
https://arxiv.org/pdf/1904.00118v5.pdf | |
PWC | https://paperswithcode.com/paper/structured-minimally-supervised-learning-for |
Repo | https://github.com/bflashcp3f/PCNN-NMAR |
Framework | pytorch |
A tree-based radial basis function method for noisy parallel surrogate optimization
Title | A tree-based radial basis function method for noisy parallel surrogate optimization |
Authors | Chenchao Shou, Matthew West |
Abstract | Parallel surrogate optimization algorithms have proven to be efficient methods for solving expensive noisy optimization problems. In this work we develop a new parallel surrogate optimization algorithm (ProSRS), using a novel tree-based “zoom strategy” to improve the efficiency of the algorithm. We prove that if ProSRS is run for sufficiently long, with probability converging to one there will be at least one point among all the evaluations that will be arbitrarily close to the global minimum. We compare our algorithm to several state-of-the-art Bayesian optimization algorithms on a suite of standard benchmark functions and two real machine learning hyperparameter-tuning problems. We find that our algorithm not only achieves significantly faster optimization convergence, but is also 1-4 orders of magnitude cheaper in computational cost. |
Tasks | |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.07980v1 |
https://arxiv.org/pdf/1908.07980v1.pdf | |
PWC | https://paperswithcode.com/paper/190807980 |
Repo | https://github.com/compdyn/ProSRS |
Framework | none |
Flow Models for Arbitrary Conditional Likelihoods
Title | Flow Models for Arbitrary Conditional Likelihoods |
Authors | Yang Li, Shoaib Akbar, Junier B. Oliva |
Abstract | Understanding the dependencies among features of a dataset is at the core of most unsupervised learning tasks. However, a majority of generative modeling approaches are focused solely on the joint distribution $p(x)$ and utilize models where it is intractable to obtain the conditional distribution of some arbitrary subset of features $x_u$ given the rest of the observed covariates $x_o$: $p(x_u \mid x_o)$. Traditional conditional approaches provide a model for a fixed set of covariates conditioned on another fixed set of observed covariates. Instead, in this work we develop a model that is capable of yielding all conditional distributions $p(x_u \mid x_o)$ (for arbitrary $x_u$) via tractable conditional likelihoods. We propose a novel extension of (change of variables based) flow generative models, arbitrary conditioning flow models (AC-Flow), that can be conditioned on arbitrary subsets of observed covariates, which was previously infeasible. We apply AC-Flow to the imputation of features, and also develop a unified platform for both multiple and single imputation by introducing an auxiliary objective that provides a principled single “best guess” for flow models. Extensive empirical evaluations show that our models achieve state-of-the-art performance in both single and multiple imputation across image inpainting and feature imputation in synthetic and real-world datasets. Code is available at https://github.com/lupalab/ACFlow. |
Tasks | Image Inpainting, Imputation |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06319v1 |
https://arxiv.org/pdf/1909.06319v1.pdf | |
PWC | https://paperswithcode.com/paper/flow-models-for-arbitrary-conditional |
Repo | https://github.com/lupalab/ACFlow |
Framework | tf |