January 29, 2020

2941 words 14 mins read

Paper Group ANR 492

Categorization in the Wild: Generalizing Cognitive Models to Naturalistic Data across Languages. What do you mean, BERT? Assessing BERT as a Distributional Semantics Model. Solving Inverse Wave Scattering with Deep Learning. A Survey on Neural Machine Reading Comprehension. OrderNet: Ordering by Example. Exact high-dimensional asymptotics for Suppo …

Categorization in the Wild: Generalizing Cognitive Models to Naturalistic Data across Languages


Title	Categorization in the Wild: Generalizing Cognitive Models to Naturalistic Data across Languages
Authors	Lea Frermann, Mirella Lapata
Abstract	Categories such as animal or furniture are acquired at an early age and play an important role in processing, organizing, and communicating world knowledge. Categories exist across cultures: they allow to efficiently represent the complexity of the world, and members of a community strongly agree on their nature, revealing a shared mental representation. Models of category learning and representation, however, are typically tested on data from small-scale experiments involving small sets of concepts with artificially restricted features; and experiments predominantly involve participants of selected cultural and socio-economical groups (very often involving western native speakers of English such as U.S. college students) . This work investigates whether models of categorization generalize (a) to rich and noisy data approximating the environment humans live in; and (b) across languages and cultures. We present a Bayesian cognitive model designed to jointly learn categories and their structured representation from natural language text which allows us to (a) evaluate performance on a large scale, and (b) apply our model to a diverse set of languages. We show that meaningful categories comprising hundreds of concepts and richly structured featural representations emerge across languages. Our work illustrates the potential of recent advances in computational modeling and large scale naturalistic datasets for cognitive science research.
Tasks
Published	2019-02-23
URL	http://arxiv.org/abs/1902.08830v1
PDF	http://arxiv.org/pdf/1902.08830v1.pdf
PWC	https://paperswithcode.com/paper/categorization-in-the-wild-generalizing
Repo
Framework

What do you mean, BERT? Assessing BERT as a Distributional Semantics Model


Title	What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Authors	Timothee Mickus, Denis Paperno, Mathieu Constant, Kees van Deemeter
Abstract	Contextualized word embeddings, i.e. vector representations for words in context, are naturally seen as an extension of previous noncontextual distributional semantic models. In this work, we focus on BERT, a deep neural network that produces contextualized embeddings and has set the state-of-the-art in several semantic tasks, and study the semantic coherence of its embedding space. While showing a tendency towards coherence, BERT does not fully live up to the natural expectations for a semantic vector space. In particular, we find that the position of the sentence in which a word occurs, while having no meaning correlates, leaves a noticeable trace on the word embeddings and disturbs similarity relationships.
Tasks	Word Embeddings
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05758v1
PDF	https://arxiv.org/pdf/1911.05758v1.pdf
PWC	https://paperswithcode.com/paper/what-do-you-mean-bert-assessing-bert-as-a
Repo
Framework

Solving Inverse Wave Scattering with Deep Learning


Title	Solving Inverse Wave Scattering with Deep Learning
Authors	Yuwei Fan, Lexing Ying
Abstract	This paper proposes a neural network approach for solving two classical problems in the two-dimensional inverse wave scattering: far field pattern problem and seismic imaging. The mathematical problem of inverse wave scattering is to recover the scatterer field of a medium based on the boundary measurement of the scattered wave from the medium, which is high-dimensional and nonlinear. For the far field pattern problem under the circular experimental setup, a perturbative analysis shows that the forward map can be approximated by a vectorized convolution operator in the angular direction. Motivated by this and filtered back-projection, we propose an effective neural network architecture for the inverse map using the recently introduced BCR-Net along with the standard convolution layers. Analogously for the seismic imaging problem, we propose a similar neural network architecture under the rectangular domain setup with a depth-dependent background velocity. Numerical results demonstrate the efficiency of the proposed neural networks.
Tasks
Published	2019-11-27
URL	https://arxiv.org/abs/1911.13202v1
PDF	https://arxiv.org/pdf/1911.13202v1.pdf
PWC	https://paperswithcode.com/paper/solving-inverse-wave-scattering-with-deep
Repo
Framework

A Survey on Neural Machine Reading Comprehension


Title	A Survey on Neural Machine Reading Comprehension
Authors	Boyu Qiu, Xu Chen, Jungang Xu, Yingfei Sun
Abstract	Enabling a machine to read and comprehend the natural language documents so that it can answer some questions remains an elusive challenge. In recent years, the popularity of deep learning and the establishment of large-scale datasets have both promoted the prosperity of Machine Reading Comprehension. This paper aims to present how to utilize the Neural Network to build a Reader and introduce some classic models, analyze what improvements they make. Further, we also point out the defects of existing models and future research directions
Tasks	Machine Reading Comprehension, Reading Comprehension
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03824v1
PDF	https://arxiv.org/pdf/1906.03824v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-on-neural-machine-reading
Repo
Framework

OrderNet: Ordering by Example


Title	OrderNet: Ordering by Example
Authors	Robert Porter
Abstract	In this paper we introduce a new neural architecture for sorting unordered sequences where the correct sequence order is not easily defined but must rather be inferred from training data. We refer to this architecture as OrderNet and describe how it was constructed to be naturally permutation equivariant while still allowing for rich interactions of elements of the input set. We evaluate the capabilities of our architecture by training it to approximate solutions for the Traveling Salesman Problem and find that it outperforms previously studied supervised techniques in its ability to generalize to longer sequences than it was trained with. We further demonstrate the capability by reconstructing the order of sentences with scrambled word order.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11536v1
PDF	https://arxiv.org/pdf/1905.11536v1.pdf
PWC	https://paperswithcode.com/paper/ordernet-ordering-by-example
Repo
Framework

Exact high-dimensional asymptotics for Support Vector Machine


Title	Exact high-dimensional asymptotics for Support Vector Machine
Authors	Haoyang Liu
Abstract	The Support Vector Machine (SVM) is one of the most widely used classification methods. In this paper, we consider the soft-margin SVM used on data points with independent features, where the sample size $n$ and the feature dimension $p$ grows to $\infty$ in a fixed ratio $p/n\rightarrow \delta$. We propose a set of equations that exactly characterizes the asymptotic behavior of support vector machine. In particular, we give exact formulas for (1) the variability of the optimal coefficients, (2) the proportion of data points lying on the margin boundary (i.e. number of support vectors), (3) the final objective function value, and (4) the expected misclassification error on new data points, which in particular implies the exact formula for the optimal tuning parameter given a data generating mechanism. We first establish these formulas in the case where the label $y\in{+1,-1}$ is independent of the feature $x$. Then the results are generalized to the case where the label $y\in{+1,-1}$ is allowed to have a general dependence on the feature $x$ through a linear combination $a_0^Tx$. These formulas for the non-smooth hinge loss are analogous to the recent results in \citep{sur2018modern} for smooth logistic loss. Our approach is based on heuristic leave-one-out calculations.
Tasks
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05125v2
PDF	https://arxiv.org/pdf/1905.05125v2.pdf
PWC	https://paperswithcode.com/paper/exact-high-dimensional-asymptotics-for
Repo
Framework

BAM! Born-Again Multi-Task Networks for Natural Language Understanding


Title	BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Authors	Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le
Abstract	It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training.
Tasks
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04829v1
PDF	https://arxiv.org/pdf/1907.04829v1.pdf
PWC	https://paperswithcode.com/paper/bam-born-again-multi-task-networks-for
Repo
Framework

Modeling Heterogeneity in Mode-Switching Behavior Under a Mobility-on-Demand Transit System: An Interpretable Machine Learning Approach


Title	Modeling Heterogeneity in Mode-Switching Behavior Under a Mobility-on-Demand Transit System: An Interpretable Machine Learning Approach
Authors	Xilei Zhao, Xiang Yan, Pascal Van Hentenryck
Abstract	Recent years have witnessed an increased focus on interpretability and the use of machine learning to inform policy analysis and decision making. This paper applies machine learning to examine travel behavior and, in particular, on modeling changes in travel modes when individuals are presented with a novel (on-demand) mobility option. It addresses the following question: Can machine learning be applied to model individual taste heterogeneity (preference heterogeneity for travel modes and response heterogeneity to travel attributes) in travel mode choice? This paper first develops a high-accuracy classifier to predict mode-switching behavior under a hypothetical Mobility-on-Demand Transit system (i.e., stated-preference data), which represents the case study underlying this research. We show that this classifier naturally captures individual heterogeneity available in the data. Moreover, the paper derives insights on heterogeneous switching behaviors through the generation of marginal effects and elasticities by current travel mode, partial dependence plots, and individual conditional expectation plots. The paper also proposes two new model-agnostic interpretation tools for machine learning, i.e., conditional partial dependence plots and conditional individual partial dependence plots, specifically designed to examine response heterogeneity. The results on the case study show that the machine-learning classifier, together with model-agnostic interpretation tools, provides valuable insights on travel mode switching behavior for different individuals and population segments. For example, the existing drivers are more sensitive to additional pickups than people using other travel modes, and current transit users are generally willing to share rides but reluctant to take any additional transfers.
Tasks	Decision Making, Interpretable Machine Learning
Published	2019-02-08
URL	http://arxiv.org/abs/1902.02904v1
PDF	http://arxiv.org/pdf/1902.02904v1.pdf
PWC	https://paperswithcode.com/paper/modeling-heterogeneity-in-mode-switching
Repo
Framework

Model Agnostic Defence against Backdoor Attacks in Machine Learning


Title	Model Agnostic Defence against Backdoor Attacks in Machine Learning
Authors	Sakshi Udeshi, Shanshan Peng, Gerald Woo, Lionell Loh, Louth Rawshan, Sudipta Chattopadhyay
Abstract	Machine Learning (ML) has automated a multitude of our day-to-day decision making domains such as education, employment and driving automation. The continued success of ML largely depends on our ability to trust the model we are using. Recently, a new class of attacks called Backdoor Attacks have been developed. These attacks undermine the user’s trust in ML models. In this work, we present NEO, a model agnostic framework to detect and mitigate such backdoor attacks in image classification ML models. For a given image classification model, our approach analyses the inputs it receives and determines if the model is backdoored. In addition to this feature, we also mitigate these attacks by determining the correct predictions of the poisoned images. An appealing feature of NEO is that it can, for the first time, isolate and reconstruct the backdoor trigger. NEO is also the first defence methodology, to the best of our knowledge that is completely blackbox. We have implemented NEO and evaluated it against three state of the art poisoned models. These models include highly critical applications such as traffic sign detection (USTS) and facial detection. In our evaluation, we show that NEO can detect $\approx$88% of the poisoned inputs on average and it is as fast as 4.4 ms per input image. We also reconstruct the poisoned input for the user to effectively test their systems.
Tasks	Decision Making, Image Classification
Published	2019-08-06
URL	https://arxiv.org/abs/1908.02203v2
PDF	https://arxiv.org/pdf/1908.02203v2.pdf
PWC	https://paperswithcode.com/paper/model-agnostic-defence-against-backdoor
Repo
Framework

Implicit Regularization of Normalization Methods


Title	Implicit Regularization of Normalization Methods
Authors	Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu
Abstract	Normalization methods such as batch normalization are commonly used in overparametrized models like neural networks. Here, we study the weight normalization (WN) method (Salimans & Kingma, 2016) and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least squares regression and some more general loss functions. WN and rPGD reparametrize the weights with a scale $g$ and a unit vector such that the objective function becomes \emph{non-convex}. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. We show that these methods adaptively regularize the weights and \emph{converge with exponential rate} to the minimum $\ell_2$ norm solution (or close to it) even for initializations \emph{far from zero}. This is different from the behavior of gradient descent, which only converges to the min norm solution when started at zero, and is more sensitive to initialization. Some of our proof techniques are different from many related works; for instance we find explicit invariants along the gradient flow paths. We verify our results experimentally and suggest that there may be a similar phenomenon for nonlinear problems such as matrix sensing.
Tasks
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07956v2
PDF	https://arxiv.org/pdf/1911.07956v2.pdf
PWC	https://paperswithcode.com/paper/implicit-regularization-of-normalization
Repo
Framework

Deep Model Reference Adaptive Control


Title	Deep Model Reference Adaptive Control
Authors	Girish Joshi, Girish Chowdhary
Abstract	We present a new neuroadaptive architecture: Deep Neural Network based Model Reference Adaptive Control (DMRAC). Our architecture utilizes the power of deep neural network representations for modeling significant nonlinearities while marrying it with the boundedness guarantees that characterize MRAC based controllers. We demonstrate through simulations and analysis that DMRAC can subsume previously studied learning based MRAC methods, such as concurrent learning and GP-MRAC. This makes DMRAC a highly powerful architecture for high-performance control of nonlinear systems with long-term learning properties.
Tasks
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08602v1
PDF	https://arxiv.org/pdf/1909.08602v1.pdf
PWC	https://paperswithcode.com/paper/deep-model-reference-adaptive-control
Repo
Framework

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron


Title	Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron
Authors	Yunsu Kim, Hendrik Rosendahl, Nick Rossenbach, Jan Rosendahl, Shahram Khadivi, Hermann Ney
Abstract	We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model.
Tasks	Machine Translation, Sentence Embeddings
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01942v1
PDF	https://arxiv.org/pdf/1906.01942v1.pdf
PWC	https://paperswithcode.com/paper/learning-bilingual-sentence-embeddings-via
Repo
Framework

Pairwise Feedback for Data Programming


Title	Pairwise Feedback for Data Programming
Authors	Benedikt Boecking, Artur Dubrawski
Abstract	The scalability of the labeling process and the attainable quality of labels have become limiting factors for many applications of machine learning. The programmatic creation of labeled datasets via the synthesis of noisy heuristics provides a promising avenue to address this problem. We propose to improve modeling of latent class variables in the programmatic creation of labeled datasets by incorporating pairwise feedback into the process. We discuss the ease with which such pairwise feedback can be obtained or generated in many application domains. Our experiments show that even a small number of sources of pairwise feedback can substantially improve the quality of the posterior estimate of the latent class variable.
Tasks
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07685v1
PDF	https://arxiv.org/pdf/1912.07685v1.pdf
PWC	https://paperswithcode.com/paper/pairwise-feedback-for-data-programming
Repo
Framework

A Free Lunch in Generating Datasets: Building a VQG and VQA System with Attention and Humans in the Loop


Title	A Free Lunch in Generating Datasets: Building a VQG and VQA System with Attention and Humans in the Loop
Authors	Jihyeon Janel Lee, Sho Arora
Abstract	Despite their importance in training artificial intelligence systems, large datasets remain challenging to acquire. For example, the ImageNet dataset required fourteen million labels of basic human knowledge, such as whether an image contains a chair. Unfortunately, this knowledge is so simple that it is tedious for human annotators but also tacit enough such that they are necessary. However, human collaborative efforts for tasks like labeling massive amounts of data are costly, inconsistent, and prone to failure, and this method does not resolve the issue of the resulting dataset being static in nature. What if we asked people questions they want to answer and collected their responses as data? This would mean we could gather data at a much lower cost, and expanding a dataset would simply become a matter of asking more questions. We focus on the task of Visual Question Answering (VQA) and propose a system that uses Visual Question Generation (VQG) to produce questions, asks them to social media users, and collects their responses. We present two models that can then parse clean answers from the noisy human responses significantly better than our baselines, with the goal of eventually incorporating the answers into a Visual Question Answering (VQA) dataset. By demonstrating how our system can collect large amounts of data at little to no cost, we envision similar systems being used to improve performance on other tasks in the future.
Tasks	Question Answering, Question Generation, Visual Question Answering
Published	2019-11-30
URL	https://arxiv.org/abs/1912.00124v1
PDF	https://arxiv.org/pdf/1912.00124v1.pdf
PWC	https://paperswithcode.com/paper/a-free-lunch-in-generating-datasets-building
Repo
Framework

Feature Weighting and Boosting for Few-Shot Segmentation


Title	Feature Weighting and Boosting for Few-Shot Segmentation
Authors	Khoi Nguyen, Sinisa Todorovic
Abstract	This paper is about few-shot segmentation of foreground objects in images. We train a CNN on small subsets of training images, each mimicking the few-shot setting. In each subset, one image serves as the query and the other(s) as support image(s) with ground-truth segmentation. The CNN first extracts feature maps from the query and support images. Then, a class feature vector is computed as an average of the support’s feature maps over the known foreground. Finally, the target object is segmented in the query image by using a cosine similarity between the class feature vector and the query’s feature map. We make two contributions by: (1) Improving discriminativeness of features so their activations are high on the foreground and low elsewhere; and (2) Boosting inference with an ensemble of experts guided with the gradient of loss incurred when segmenting the support images in testing. Our evaluations on the PASCAL-$5^i$ and COCO-$20^i$ datasets demonstrate that we significantly outperform existing approaches.
Tasks
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13140v1
PDF	https://arxiv.org/pdf/1909.13140v1.pdf
PWC	https://paperswithcode.com/paper/feature-weighting-and-boosting-for-few-shot
Repo
Framework