January 29, 2020

3395 words 16 mins read

Paper Group ANR 647

Privately Learning Thresholds: Closing the Exponential Gap. Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks. Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment. Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns. Partially-supervised Mention Detection. Meta …

Privately Learning Thresholds: Closing the Exponential Gap


Title	Privately Learning Thresholds: Closing the Exponential Gap
Authors	Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, Uri Stemmer
Abstract	We study the sample complexity of learning threshold functions under the constraint of differential privacy. It is assumed that each labeled example in the training data is the information of one individual and we would like to come up with a generalizing hypothesis $h$ while guaranteeing differential privacy for the individuals. Intuitively, this means that any single labeled example in the training data should not have a significant effect on the choice of the hypothesis. This problem has received much attention recently; unlike the non-private case, where the sample complexity is independent of the domain size and just depends on the desired accuracy and confidence, for private learning the sample complexity must depend on the domain size $X$ (even for approximate differential privacy). Alon et al. (STOC 2019) showed a lower bound of $\Omega(\log^X)$ on the sample complexity and Bun et al. (FOCS 2015) presented an approximate-private learner with sample complexity $\tilde{O}\left(2^{\log^X}\right)$. In this work we reduce this gap significantly, almost settling the sample complexity. We first present a new upper bound (algorithm) of $\tilde{O}\left(\left(\log^X\right)^2\right)$ on the sample complexity and then present an improved version with sample complexity $\tilde{O}\left(\left(\log^X\right)^{1.5}\right)$. Our algorithm is constructed for the related interior point problem, where the goal is to find a point between the largest and smallest input elements. It is based on selecting an input-dependent hash function and using it to embed the database into a domain whose size is reduced logarithmically; this results in a new database, an interior point of which can be used to generate an interior point in the original database in a differentially private manner.
Tasks
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10137v1
PDF	https://arxiv.org/pdf/1911.10137v1.pdf
PWC	https://paperswithcode.com/paper/privately-learning-thresholds-closing-the
Repo
Framework

Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks


Title	Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks
Authors	Alexander Levine, Soheil Feizi
Abstract	In the last couple of years, several adversarial attack methods based on different threat models have been proposed for the image classification problem. Most existing defenses consider additive threat models in which sample perturbations have bounded L_p norms. These defenses, however, can be vulnerable against adversarial attacks under non-additive threat models. An example of an attack method based on a non-additive threat model is the Wasserstein adversarial attack proposed by Wong et al. (2019), where the distance between an image and its adversarial example is determined by the Wasserstein metric (“earth-mover distance”) between their normalized pixel intensities. Until now, there has been no certifiable defense against this type of attack. In this work, we propose the first defense with certified robustness against Wasserstein Adversarial attacks using randomized smoothing. We develop this certificate by considering the space of possible flows between images, and representing this space such that Wasserstein distance between images is upper-bounded by L_1 distance in this flow-space. We can then apply existing randomized smoothing certificates for the L_1 metric. In MNIST and CIFAR-10 datasets, we find that our proposed defense is also practically effective, demonstrating significantly improved accuracy under Wasserstein adversarial attack compared to unprotected models.
Tasks	Adversarial Attack, Image Classification
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10783v1
PDF	https://arxiv.org/pdf/1910.10783v1.pdf
PWC	https://paperswithcode.com/paper/wasserstein-smoothing-certified-robustness
Repo
Framework

Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment


Title	Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment
Authors	Alex Bozarth, Brendan Dwyer, Fei Hu, Daniel Jalova, Karthik Muthuraman, Nick Pentreath, Simon Plovyt, Gabriela de Queiroz, Saishruthi Swaminathan, Patrick Titzler, Xin Wu, Hong Xu, Frederick R Reiss, Vijay Bommireddipalli
Abstract	A recent trend observed in traditionally challenging fields such as computer vision and natural language processing has been the significant performance gains shown by deep learning (DL). In many different research fields, DL models have been evolving rapidly and become ubiquitous. Despite researchers’ excitement, unfortunately, most software developers are not DL experts and oftentimes have a difficult time following the booming DL research outputs. As a result, it usually takes a significant amount of time for the latest superior DL models to prevail in industry. This issue is further exacerbated by the common use of sundry incompatible DL programming frameworks, such as Tensorflow, PyTorch, Theano, etc. To address this issue, we propose a system, called Model Asset Exchange (MAX), that avails developers of easy access to state-of-the-art DL models. Regardless of the underlying DL programming frameworks, it provides an open source Python library (called the MAX framework) that wraps DL models and unifies programming interfaces with our standardized RESTful APIs. These RESTful APIs enable developers to exploit the wrapped DL models for inference tasks without the need to fully understand different DL programming frameworks. Using MAX, we have wrapped and open-sourced more than 30 state-of-the-art DL models from various research fields, including computer vision, natural language processing and signal processing, etc. In the end, we selectively demonstrate two web applications that are built on top of MAX, as well as the process of adding a DL model to MAX.
Tasks
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01606v1
PDF	https://arxiv.org/pdf/1909.01606v1.pdf
PWC	https://paperswithcode.com/paper/model-asset-exchange-path-to-ubiquitous-deep
Repo
Framework

Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns


Title	Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns
Authors	YooJung Choi, Golnoosh Farnadi, Behrouz Babaki, Guy Van den Broeck
Abstract	As machine learning is increasingly used to make real-world decisions, recent research efforts aim to define and ensure fairness in algorithmic decision making. Existing methods often assume a fixed set of observable features to define individuals, but lack a discussion of certain features not being observed at test time. In this paper, we study fairness of naive Bayes classifiers, which allow partial observations. In particular, we introduce the notion of a discrimination pattern, which refers to an individual receiving different classifications depending on whether some sensitive attributes were observed. Then a model is considered fair if it has no such pattern. We propose an algorithm to discover and mine for discrimination patterns in a naive Bayes classifier, and show how to learn maximum-likelihood parameters subject to these fairness constraints. Our approach iteratively discovers and eliminates discrimination patterns until a fair model is learned. An empirical evaluation on three real-world datasets demonstrates that we can remove exponentially many discrimination patterns by only adding a small fraction of them as constraints.
Tasks	Decision Making
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03843v1
PDF	https://arxiv.org/pdf/1906.03843v1.pdf
PWC	https://paperswithcode.com/paper/learning-fair-naive-bayes-classifiers-by
Repo
Framework

Partially-supervised Mention Detection


Title	Partially-supervised Mention Detection
Authors	Lesly Miculicich, James Henderson
Abstract	Learning to detect entity mentions without using syntactic information can be useful for integration and joint optimization with other tasks. However, it is common to have partially annotated data for this problem. Here, we investigate two approaches to deal with partial annotation of mentions: weighted loss and soft-target classification. We also propose two neural mention detection approaches: a sequence tagging, and an exhaustive search. We evaluate our methods with coreference resolution as a downstream task, using multitask learning. The results show that the recall and F1 score improve for all methods.
Tasks	Coreference Resolution
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09507v1
PDF	https://arxiv.org/pdf/1908.09507v1.pdf
PWC	https://paperswithcode.com/paper/partially-supervised-mention-detection
Repo
Framework

Meta-Learning with Warped Gradient Descent


Title	Meta-Learning with Warped Gradient Descent
Authors	Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell
Abstract	Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network that directly produces updates or by attempting to learn better initialisations or scaling factors for a gradient-based update rule. Both of these approaches pose challenges. On one hand, directly producing an update forgoes a useful inductive bias and can easily lead to non-converging behaviour. On the other hand, approaches that try to control a gradient-based update rule typically resort to computing gradients through the learning process to obtain their meta-gradients, leading to methods that can not scale beyond few-shot task adaptation. In this work, we propose Warped Gradient Descent (WarpGrad), a method that intersects these approaches to mitigate their limitations. WarpGrad meta-learns an efficiently parameterised preconditioning matrix that facilitates gradient descent across the task distribution. Preconditioning arises by interleaving non-linear layers, referred to as warp-layers, between the layers of a task-learner. Warp-layers are meta-learned without backpropagating through the task training process in a manner similar to methods that learn to directly produce updates. WarpGrad is computationally efficient, easy to implement, and can scale to arbitrarily large meta-learning problems. We provide a geometrical interpretation of the approach and evaluate its effectiveness in a variety of settings, including few-shot, standard supervised, continual and reinforcement learning.
Tasks	Few-Shot Learning, Meta-Learning
Published	2019-08-30
URL	https://arxiv.org/abs/1909.00025v2
PDF	https://arxiv.org/pdf/1909.00025v2.pdf
PWC	https://paperswithcode.com/paper/meta-learning-with-warped-gradient-descent
Repo
Framework

Knowledge Transfer between Datasets for Learning-based Tissue Microstructure Estimation


Title	Knowledge Transfer between Datasets for Learning-based Tissue Microstructure Estimation
Authors	Yu Qin, Yuxing Li, Zhiwen Liu, Chuyang Ye
Abstract	Learning-based approaches, especially those based on deep networks, have enabled high-quality estimation of tissue microstructure from low-quality diffusion magnetic resonance imaging (dMRI) scans, which are acquired with a limited number of diffusion gradients and a relatively poor spatial resolution. These learning-based approaches to tissue microstructure estimation require acquisitions of training dMRI scans with high-quality diffusion signals, which are densely sampled in the q-space and have a high spatial resolution. However, the acquisition of training scans may not be available for all datasets. Therefore, we explore knowledge transfer between different dMRI datasets so that learning-based tissue microstructure estimation can be applied for datasets where training scans are not acquired. Specifically, for a target dataset of interest, where only low-quality diffusion signals are acquired without training scans, we exploit the information in a source dMRI dataset acquired with high-quality diffusion signals. We interpolate the diffusion signals in the source dataset in the q-space using a dictionary-based signal representation, so that the interpolated signals match the acquisition scheme of the target dataset. Then, the interpolated signals are used together with the high-quality tissue microstructure computed from the source dataset to train deep networks that perform tissue microstructure estimation for the target dataset. Experiments were performed on brain dMRI scans with low-quality diffusion signals, where the benefit of the proposed strategy is demonstrated.
Tasks	Transfer Learning
Published	2019-10-24
URL	https://arxiv.org/abs/1910.10930v1
PDF	https://arxiv.org/pdf/1910.10930v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-transfer-between-datasets-for
Repo
Framework

Visual-Inertial Localization for Skid-Steering Robots with Kinematic Constraints


Title	Visual-Inertial Localization for Skid-Steering Robots with Kinematic Constraints
Authors	Xingxing Zuo, Mingming Zhang, Yiming Chen, Yong Liu, Guoquan Huang, Mingyang Li
Abstract	While visual localization or SLAM has witnessed great progress in past decades, when deploying it on a mobile robot in practice, few works have explicitly considered the kinematic (or dynamic) constraints of the real robotic system when designing state estimators. To promote the practical deployment of current state-of-the-art visual-inertial localization algorithms, in this work we propose a low-cost kinematics-constrained localization system particularly for a skid-steering mobile robot. In particular, we derive in a principle way the robot’s kinematic constraints based on the instantaneous centers of rotation (ICR) model and integrate them in a tightly-coupled manner into the sliding-window bundle adjustment (BA)-based visual-inertial estimator. Because the ICR model parameters are time-varying due to, for example, track-to-terrain interaction and terrain roughness, we estimate these kinematic parameters online along with the navigation state. To this end, we perform in-depth the observability analysis and identify motion conditions under which the state/parameter estimation is viable. The proposed kinematics-constrained visual-inertial localization system has been validated extensively in different terrain scenarios.
Tasks	Visual Localization
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05787v1
PDF	https://arxiv.org/pdf/1911.05787v1.pdf
PWC	https://paperswithcode.com/paper/visual-inertial-localization-for-skid
Repo
Framework

Self-supervised Moving Vehicle Tracking with Stereo Sound


Title	Self-supervised Moving Vehicle Tracking with Stereo Sound
Authors	Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba
Abstract	Humans are able to localize objects in the environment using both visual and auditory cues, integrating information from multiple modalities into a common reference frame. We introduce a system that can leverage unlabeled audio-visual data to learn to localize objects (moving vehicles) in a visual reference frame, purely using stereo sound at inference time. Since it is labor-intensive to manually annotate the correspondences between audio and object bounding boxes, we achieve this goal by using the co-occurrence of visual and audio streams in unlabeled videos as a form of self-supervision, without resorting to the collection of ground-truth annotations. In particular, we propose a framework that consists of a vision “teacher” network and a stereo-sound “student” network. During training, knowledge embodied in a well-established visual vehicle detection model is transferred to the audio domain using unlabeled videos as a bridge. At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input. Experimental results on a newly collected Au-ditory Vehicle Tracking dataset verify that our proposed approach outperforms several baseline approaches. We also demonstrate that our cross-modal auditory localization approach can assist in the visual localization of moving vehicles under poor lighting conditions.
Tasks	Object Localization, Visual Localization
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11760v1
PDF	https://arxiv.org/pdf/1910.11760v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-moving-vehicle-tracking-with-1
Repo
Framework

Adaptivity and Optimality: A Universal Algorithm for Online Convex Optimization


Title	Adaptivity and Optimality: A Universal Algorithm for Online Convex Optimization
Authors	Guanghui Wang, Shiyin Lu, Lijun Zhang
Abstract	In this paper, we study adaptive online convex optimization, and aim to design a universal algorithm that achieves optimal regret bounds for multiple common types of loss functions. Existing universal methods are limited in the sense that they are optimal for only a subclass of loss functions. To address this limitation, we propose a novel online method, namely Maler, which enjoys the optimal $O(\sqrt{T})$, $O(d\log T)$ and $O(\log T)$ regret bounds for general convex, exponentially concave, and strongly convex functions respectively. The essential idea is to run multiple types of learning algorithms with different learning rates in parallel, and utilize a meta algorithm to track the best one on the fly. Empirical results demonstrate the effectiveness of our method.
Tasks
Published	2019-05-15
URL	https://arxiv.org/abs/1905.05917v1
PDF	https://arxiv.org/pdf/1905.05917v1.pdf
PWC	https://paperswithcode.com/paper/adaptivity-and-optimality-a-universal
Repo
Framework

Emotional Embeddings: Refining Word Embeddings to Capture Emotional Content of Words


Title	Emotional Embeddings: Refining Word Embeddings to Capture Emotional Content of Words
Authors	Armin Seyeditabari, Narges Tabari, Shafie Gholizade, Wlodek Zadrozny
Abstract	Word embeddings are one of the most useful tools in any modern natural language processing expert’s toolkit. They contain various types of information about each word which makes them the best way to represent the terms in any NLP task. But there are some types of information that cannot be learned by these models. Emotional information of words are one of those. In this paper, we present an approach to incorporate emotional information of words into these models. We accomplish this by adding a secondary training stage which uses an emotional lexicon and a psychological model of basic emotions. We show that fitting an emotional model into pre-trained word vectors can increase the performance of these models in emotional similarity metrics. Retrained models perform better than their original counterparts from 13% improvement for Word2Vec model, to 29% for GloVe vectors. This is the first such model presented in the literature, and although preliminary, these emotion sensitive models can open the way to increase performance in variety of emotion detection techniques.
Tasks	Word Embeddings
Published	2019-05-31
URL	https://arxiv.org/abs/1906.00112v2
PDF	https://arxiv.org/pdf/1906.00112v2.pdf
PWC	https://paperswithcode.com/paper/190600112
Repo
Framework

Annotating and normalizing biomedical NEs with limited knowledge


Title	Annotating and normalizing biomedical NEs with limited knowledge
Authors	Fernando Sánchez León, Ana González Ledesma
Abstract	Named entity recognition (NER) is the very first step in the linguistic processing of any new domain. It is currently a common process in BioNLP on English clinical text. However, it is still in its infancy in other major languages, as it is the case for Spanish. Presented under the umbrella of the PharmaCoNER shared task, this paper describes a very simple method for the annotation and normalization of pharmacological, chemical and, ultimately, biomedical named entities in clinical cases. The system developed for the shared task is based on limited knowledge, collected, structured and munged in a way that clearly outperforms scores obtained by similar dictionary-based systems for English in the past. Along with this recovering of the knowledge-based methods for NER in subdomains, the paper also highlights the key contribution of resource-based systems in the validation and consolidation of both the annotation guidelines and the human annotation practices. In this sense, some of the authors discoverings on the overall quality of human annotated datasets question the above-mentioned `official’ results obtained by this system, that ranked second (0.91 F1-score) and first (0.916 F1-score), respectively, in the two PharmaCoNER subtasks. \|
Tasks	Named Entity Recognition
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09152v1
PDF	https://arxiv.org/pdf/1912.09152v1.pdf
PWC	https://paperswithcode.com/paper/annotating-and-normalizing-biomedical-nes
Repo
Framework

Learning Canonical Representations for Scene Graph to Image Generation


Title	Learning Canonical Representations for Scene Graph to Image Generation
Authors	Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson
Abstract	Generating realistic images of complex visual scenes becomes very challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approach struggles as the complexity of the graph (the number of objects and edges) increases. In this work, we show that one limitation of current methods is their inability to capture semantic equivalence in graphs. We present a novel model to address this, by employing a canonical graph representation, which ensures that semantically similar graphs will result in similar images. We show improved performance of the model on three different benchmarks: Visual Genome, COCO and CLEVR.
Tasks	Image Generation
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07414v2
PDF	https://arxiv.org/pdf/1912.07414v2.pdf
PWC	https://paperswithcode.com/paper/learning-canonical-representations-for-scene
Repo
Framework

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization


Title	Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization
Authors	Kaiyi Ji, Zhe Wang, Yi Zhou, Yingbin Liang
Abstract	Two types of zeroth-order stochastic algorithms have recently been designed for nonconvex optimization respectively based on the first-order techniques SVRG and SARAH/SPIDER. This paper addresses several important issues that are still open in these methods. First, all existing SVRG-type zeroth-order algorithms suffer from worse function query complexities than either zeroth-order gradient descent (ZO-GD) or stochastic gradient descent (ZO-SGD). In this paper, we propose a new algorithm ZO-SVRG-Coord-Rand and develop a new analysis for an existing ZO-SVRG-Coord algorithm proposed in Liu et al. 2018b, and show that both ZO-SVRG-Coord-Rand and ZO-SVRG-Coord (under our new analysis) outperform other exiting SVRG-type zeroth-order methods as well as ZO-GD and ZO-SGD. Second, the existing SPIDER-type algorithm SPIDER-SZO (Fang et al. 2018) has superior theoretical performance, but suffers from the generation of a large number of Gaussian random variables as well as a $\sqrt{\epsilon}$-level stepsize in practice. In this paper, we develop a new algorithm ZO-SPIDER-Coord, which is free from Gaussian variable generation and allows a large constant stepsize while maintaining the same convergence rate and query complexity, and we further show that ZO-SPIDER-Coord automatically achieves a linear convergence rate as the iterate enters into a local PL region without restart and algorithmic modification.
Tasks
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12166v1
PDF	https://arxiv.org/pdf/1910.12166v1.pdf
PWC	https://paperswithcode.com/paper/improved-zeroth-order-variance-reduced
Repo
Framework

Cross-Lingual Ability of Multilingual BERT: An Empirical Study


Title	Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Authors	Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth
Abstract	Recent work has exhibited the surprising cross-lingual abilities of multilingual BERT (M-BERT) – surprising since it is trained without any cross-lingual objective and with no aligned data. In this work, we provide a comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability. We study the impact of linguistic properties of the languages, the architecture of the model, and the learning objectives. The experimental study is done in the context of three typologically different languages – Spanish, Hindi, and Russian – and using two conceptually different NLP tasks, textual entailment and named entity recognition. Among our key conclusions is the fact that the lexical overlap between languages plays a negligible role in the cross-lingual success, while the depth of the network is an integral part of it. All our models and implementations can be found on our project page: http://cogcomp.org/page/publication_view/900 .
Tasks	Named Entity Recognition, Natural Language Inference
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07840v2
PDF	https://arxiv.org/pdf/1912.07840v2.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-ability-of-multilingual-bert-an-1
Repo
Framework