Paper Group ANR 492
Categorization in the Wild: Generalizing Cognitive Models to Naturalistic Data across Languages. What do you mean, BERT? Assessing BERT as a Distributional Semantics Model. Solving Inverse Wave Scattering with Deep Learning. A Survey on Neural Machine Reading Comprehension. OrderNet: Ordering by Example. Exact high-dimensional asymptotics for Suppo …
Categorization in the Wild: Generalizing Cognitive Models to Naturalistic Data across Languages
Title | Categorization in the Wild: Generalizing Cognitive Models to Naturalistic Data across Languages |
Authors | Lea Frermann, Mirella Lapata |
Abstract | Categories such as animal or furniture are acquired at an early age and play an important role in processing, organizing, and communicating world knowledge. Categories exist across cultures: they allow to efficiently represent the complexity of the world, and members of a community strongly agree on their nature, revealing a shared mental representation. Models of category learning and representation, however, are typically tested on data from small-scale experiments involving small sets of concepts with artificially restricted features; and experiments predominantly involve participants of selected cultural and socio-economical groups (very often involving western native speakers of English such as U.S. college students) . This work investigates whether models of categorization generalize (a) to rich and noisy data approximating the environment humans live in; and (b) across languages and cultures. We present a Bayesian cognitive model designed to jointly learn categories and their structured representation from natural language text which allows us to (a) evaluate performance on a large scale, and (b) apply our model to a diverse set of languages. We show that meaningful categories comprising hundreds of concepts and richly structured featural representations emerge across languages. Our work illustrates the potential of recent advances in computational modeling and large scale naturalistic datasets for cognitive science research. |
Tasks | |
Published | 2019-02-23 |
URL | http://arxiv.org/abs/1902.08830v1 |
http://arxiv.org/pdf/1902.08830v1.pdf | |
PWC | https://paperswithcode.com/paper/categorization-in-the-wild-generalizing |
Repo | |
Framework | |
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Title | What do you mean, BERT? Assessing BERT as a Distributional Semantics Model |
Authors | Timothee Mickus, Denis Paperno, Mathieu Constant, Kees van Deemeter |
Abstract | Contextualized word embeddings, i.e. vector representations for words in context, are naturally seen as an extension of previous noncontextual distributional semantic models. In this work, we focus on BERT, a deep neural network that produces contextualized embeddings and has set the state-of-the-art in several semantic tasks, and study the semantic coherence of its embedding space. While showing a tendency towards coherence, BERT does not fully live up to the natural expectations for a semantic vector space. In particular, we find that the position of the sentence in which a word occurs, while having no meaning correlates, leaves a noticeable trace on the word embeddings and disturbs similarity relationships. |
Tasks | Word Embeddings |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05758v1 |
https://arxiv.org/pdf/1911.05758v1.pdf | |
PWC | https://paperswithcode.com/paper/what-do-you-mean-bert-assessing-bert-as-a |
Repo | |
Framework | |
Solving Inverse Wave Scattering with Deep Learning
Title | Solving Inverse Wave Scattering with Deep Learning |
Authors | Yuwei Fan, Lexing Ying |
Abstract | This paper proposes a neural network approach for solving two classical problems in the two-dimensional inverse wave scattering: far field pattern problem and seismic imaging. The mathematical problem of inverse wave scattering is to recover the scatterer field of a medium based on the boundary measurement of the scattered wave from the medium, which is high-dimensional and nonlinear. For the far field pattern problem under the circular experimental setup, a perturbative analysis shows that the forward map can be approximated by a vectorized convolution operator in the angular direction. Motivated by this and filtered back-projection, we propose an effective neural network architecture for the inverse map using the recently introduced BCR-Net along with the standard convolution layers. Analogously for the seismic imaging problem, we propose a similar neural network architecture under the rectangular domain setup with a depth-dependent background velocity. Numerical results demonstrate the efficiency of the proposed neural networks. |
Tasks | |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.13202v1 |
https://arxiv.org/pdf/1911.13202v1.pdf | |
PWC | https://paperswithcode.com/paper/solving-inverse-wave-scattering-with-deep |
Repo | |
Framework | |
A Survey on Neural Machine Reading Comprehension
Title | A Survey on Neural Machine Reading Comprehension |
Authors | Boyu Qiu, Xu Chen, Jungang Xu, Yingfei Sun |
Abstract | Enabling a machine to read and comprehend the natural language documents so that it can answer some questions remains an elusive challenge. In recent years, the popularity of deep learning and the establishment of large-scale datasets have both promoted the prosperity of Machine Reading Comprehension. This paper aims to present how to utilize the Neural Network to build a Reader and introduce some classic models, analyze what improvements they make. Further, we also point out the defects of existing models and future research directions |
Tasks | Machine Reading Comprehension, Reading Comprehension |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03824v1 |
https://arxiv.org/pdf/1906.03824v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-neural-machine-reading |
Repo | |
Framework | |
OrderNet: Ordering by Example
Title | OrderNet: Ordering by Example |
Authors | Robert Porter |
Abstract | In this paper we introduce a new neural architecture for sorting unordered sequences where the correct sequence order is not easily defined but must rather be inferred from training data. We refer to this architecture as OrderNet and describe how it was constructed to be naturally permutation equivariant while still allowing for rich interactions of elements of the input set. We evaluate the capabilities of our architecture by training it to approximate solutions for the Traveling Salesman Problem and find that it outperforms previously studied supervised techniques in its ability to generalize to longer sequences than it was trained with. We further demonstrate the capability by reconstructing the order of sentences with scrambled word order. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11536v1 |
https://arxiv.org/pdf/1905.11536v1.pdf | |
PWC | https://paperswithcode.com/paper/ordernet-ordering-by-example |
Repo | |
Framework | |
Exact high-dimensional asymptotics for Support Vector Machine
Title | Exact high-dimensional asymptotics for Support Vector Machine |
Authors | Haoyang Liu |
Abstract | The Support Vector Machine (SVM) is one of the most widely used classification methods. In this paper, we consider the soft-margin SVM used on data points with independent features, where the sample size $n$ and the feature dimension $p$ grows to $\infty$ in a fixed ratio $p/n\rightarrow \delta$. We propose a set of equations that exactly characterizes the asymptotic behavior of support vector machine. In particular, we give exact formulas for (1) the variability of the optimal coefficients, (2) the proportion of data points lying on the margin boundary (i.e. number of support vectors), (3) the final objective function value, and (4) the expected misclassification error on new data points, which in particular implies the exact formula for the optimal tuning parameter given a data generating mechanism. We first establish these formulas in the case where the label $y\in{+1,-1}$ is independent of the feature $x$. Then the results are generalized to the case where the label $y\in{+1,-1}$ is allowed to have a general dependence on the feature $x$ through a linear combination $a_0^Tx$. These formulas for the non-smooth hinge loss are analogous to the recent results in \citep{sur2018modern} for smooth logistic loss. Our approach is based on heuristic leave-one-out calculations. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05125v2 |
https://arxiv.org/pdf/1905.05125v2.pdf | |
PWC | https://paperswithcode.com/paper/exact-high-dimensional-asymptotics-for |
Repo | |
Framework | |
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Title | BAM! Born-Again Multi-Task Networks for Natural Language Understanding |
Authors | Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le |
Abstract | It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training. |
Tasks | |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04829v1 |
https://arxiv.org/pdf/1907.04829v1.pdf | |
PWC | https://paperswithcode.com/paper/bam-born-again-multi-task-networks-for |
Repo | |
Framework | |
Modeling Heterogeneity in Mode-Switching Behavior Under a Mobility-on-Demand Transit System: An Interpretable Machine Learning Approach
Title | Modeling Heterogeneity in Mode-Switching Behavior Under a Mobility-on-Demand Transit System: An Interpretable Machine Learning Approach |
Authors | Xilei Zhao, Xiang Yan, Pascal Van Hentenryck |
Abstract | Recent years have witnessed an increased focus on interpretability and the use of machine learning to inform policy analysis and decision making. This paper applies machine learning to examine travel behavior and, in particular, on modeling changes in travel modes when individuals are presented with a novel (on-demand) mobility option. It addresses the following question: Can machine learning be applied to model individual taste heterogeneity (preference heterogeneity for travel modes and response heterogeneity to travel attributes) in travel mode choice? This paper first develops a high-accuracy classifier to predict mode-switching behavior under a hypothetical Mobility-on-Demand Transit system (i.e., stated-preference data), which represents the case study underlying this research. We show that this classifier naturally captures individual heterogeneity available in the data. Moreover, the paper derives insights on heterogeneous switching behaviors through the generation of marginal effects and elasticities by current travel mode, partial dependence plots, and individual conditional expectation plots. The paper also proposes two new model-agnostic interpretation tools for machine learning, i.e., conditional partial dependence plots and conditional individual partial dependence plots, specifically designed to examine response heterogeneity. The results on the case study show that the machine-learning classifier, together with model-agnostic interpretation tools, provides valuable insights on travel mode switching behavior for different individuals and population segments. For example, the existing drivers are more sensitive to additional pickups than people using other travel modes, and current transit users are generally willing to share rides but reluctant to take any additional transfers. |
Tasks | Decision Making, Interpretable Machine Learning |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.02904v1 |
http://arxiv.org/pdf/1902.02904v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-heterogeneity-in-mode-switching |
Repo | |
Framework | |
Model Agnostic Defence against Backdoor Attacks in Machine Learning
Title | Model Agnostic Defence against Backdoor Attacks in Machine Learning |
Authors | Sakshi Udeshi, Shanshan Peng, Gerald Woo, Lionell Loh, Louth Rawshan, Sudipta Chattopadhyay |
Abstract | Machine Learning (ML) has automated a multitude of our day-to-day decision making domains such as education, employment and driving automation. The continued success of ML largely depends on our ability to trust the model we are using. Recently, a new class of attacks called Backdoor Attacks have been developed. These attacks undermine the user’s trust in ML models. In this work, we present NEO, a model agnostic framework to detect and mitigate such backdoor attacks in image classification ML models. For a given image classification model, our approach analyses the inputs it receives and determines if the model is backdoored. In addition to this feature, we also mitigate these attacks by determining the correct predictions of the poisoned images. An appealing feature of NEO is that it can, for the first time, isolate and reconstruct the backdoor trigger. NEO is also the first defence methodology, to the best of our knowledge that is completely blackbox. We have implemented NEO and evaluated it against three state of the art poisoned models. These models include highly critical applications such as traffic sign detection (USTS) and facial detection. In our evaluation, we show that NEO can detect $\approx$88% of the poisoned inputs on average and it is as fast as 4.4 ms per input image. We also reconstruct the poisoned input for the user to effectively test their systems. |
Tasks | Decision Making, Image Classification |
Published | 2019-08-06 |
URL | https://arxiv.org/abs/1908.02203v2 |
https://arxiv.org/pdf/1908.02203v2.pdf | |
PWC | https://paperswithcode.com/paper/model-agnostic-defence-against-backdoor |
Repo | |
Framework | |
Implicit Regularization of Normalization Methods
Title | Implicit Regularization of Normalization Methods |
Authors | Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu |
Abstract | Normalization methods such as batch normalization are commonly used in overparametrized models like neural networks. Here, we study the weight normalization (WN) method (Salimans & Kingma, 2016) and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least squares regression and some more general loss functions. WN and rPGD reparametrize the weights with a scale $g$ and a unit vector such that the objective function becomes \emph{non-convex}. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. We show that these methods adaptively regularize the weights and \emph{converge with exponential rate} to the minimum $\ell_2$ norm solution (or close to it) even for initializations \emph{far from zero}. This is different from the behavior of gradient descent, which only converges to the min norm solution when started at zero, and is more sensitive to initialization. Some of our proof techniques are different from many related works; for instance we find explicit invariants along the gradient flow paths. We verify our results experimentally and suggest that there may be a similar phenomenon for nonlinear problems such as matrix sensing. |
Tasks | |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07956v2 |
https://arxiv.org/pdf/1911.07956v2.pdf | |
PWC | https://paperswithcode.com/paper/implicit-regularization-of-normalization |
Repo | |
Framework | |
Deep Model Reference Adaptive Control
Title | Deep Model Reference Adaptive Control |
Authors | Girish Joshi, Girish Chowdhary |
Abstract | We present a new neuroadaptive architecture: Deep Neural Network based Model Reference Adaptive Control (DMRAC). Our architecture utilizes the power of deep neural network representations for modeling significant nonlinearities while marrying it with the boundedness guarantees that characterize MRAC based controllers. We demonstrate through simulations and analysis that DMRAC can subsume previously studied learning based MRAC methods, such as concurrent learning and GP-MRAC. This makes DMRAC a highly powerful architecture for high-performance control of nonlinear systems with long-term learning properties. |
Tasks | |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08602v1 |
https://arxiv.org/pdf/1909.08602v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-model-reference-adaptive-control |
Repo | |
Framework | |
Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron
Title | Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron |
Authors | Yunsu Kim, Hendrik Rosendahl, Nick Rossenbach, Jan Rosendahl, Shahram Khadivi, Hermann Ney |
Abstract | We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model. |
Tasks | Machine Translation, Sentence Embeddings |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01942v1 |
https://arxiv.org/pdf/1906.01942v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-bilingual-sentence-embeddings-via |
Repo | |
Framework | |
Pairwise Feedback for Data Programming
Title | Pairwise Feedback for Data Programming |
Authors | Benedikt Boecking, Artur Dubrawski |
Abstract | The scalability of the labeling process and the attainable quality of labels have become limiting factors for many applications of machine learning. The programmatic creation of labeled datasets via the synthesis of noisy heuristics provides a promising avenue to address this problem. We propose to improve modeling of latent class variables in the programmatic creation of labeled datasets by incorporating pairwise feedback into the process. We discuss the ease with which such pairwise feedback can be obtained or generated in many application domains. Our experiments show that even a small number of sources of pairwise feedback can substantially improve the quality of the posterior estimate of the latent class variable. |
Tasks | |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07685v1 |
https://arxiv.org/pdf/1912.07685v1.pdf | |
PWC | https://paperswithcode.com/paper/pairwise-feedback-for-data-programming |
Repo | |
Framework | |
A Free Lunch in Generating Datasets: Building a VQG and VQA System with Attention and Humans in the Loop
Title | A Free Lunch in Generating Datasets: Building a VQG and VQA System with Attention and Humans in the Loop |
Authors | Jihyeon Janel Lee, Sho Arora |
Abstract | Despite their importance in training artificial intelligence systems, large datasets remain challenging to acquire. For example, the ImageNet dataset required fourteen million labels of basic human knowledge, such as whether an image contains a chair. Unfortunately, this knowledge is so simple that it is tedious for human annotators but also tacit enough such that they are necessary. However, human collaborative efforts for tasks like labeling massive amounts of data are costly, inconsistent, and prone to failure, and this method does not resolve the issue of the resulting dataset being static in nature. What if we asked people questions they want to answer and collected their responses as data? This would mean we could gather data at a much lower cost, and expanding a dataset would simply become a matter of asking more questions. We focus on the task of Visual Question Answering (VQA) and propose a system that uses Visual Question Generation (VQG) to produce questions, asks them to social media users, and collects their responses. We present two models that can then parse clean answers from the noisy human responses significantly better than our baselines, with the goal of eventually incorporating the answers into a Visual Question Answering (VQA) dataset. By demonstrating how our system can collect large amounts of data at little to no cost, we envision similar systems being used to improve performance on other tasks in the future. |
Tasks | Question Answering, Question Generation, Visual Question Answering |
Published | 2019-11-30 |
URL | https://arxiv.org/abs/1912.00124v1 |
https://arxiv.org/pdf/1912.00124v1.pdf | |
PWC | https://paperswithcode.com/paper/a-free-lunch-in-generating-datasets-building |
Repo | |
Framework | |
Feature Weighting and Boosting for Few-Shot Segmentation
Title | Feature Weighting and Boosting for Few-Shot Segmentation |
Authors | Khoi Nguyen, Sinisa Todorovic |
Abstract | This paper is about few-shot segmentation of foreground objects in images. We train a CNN on small subsets of training images, each mimicking the few-shot setting. In each subset, one image serves as the query and the other(s) as support image(s) with ground-truth segmentation. The CNN first extracts feature maps from the query and support images. Then, a class feature vector is computed as an average of the support’s feature maps over the known foreground. Finally, the target object is segmented in the query image by using a cosine similarity between the class feature vector and the query’s feature map. We make two contributions by: (1) Improving discriminativeness of features so their activations are high on the foreground and low elsewhere; and (2) Boosting inference with an ensemble of experts guided with the gradient of loss incurred when segmenting the support images in testing. Our evaluations on the PASCAL-$5^i$ and COCO-$20^i$ datasets demonstrate that we significantly outperform existing approaches. |
Tasks | |
Published | 2019-09-28 |
URL | https://arxiv.org/abs/1909.13140v1 |
https://arxiv.org/pdf/1909.13140v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-weighting-and-boosting-for-few-shot |
Repo | |
Framework | |