Paper Group ANR 647
Privately Learning Thresholds: Closing the Exponential Gap. Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks. Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment. Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns. Partially-supervised Mention Detection. Meta …
Privately Learning Thresholds: Closing the Exponential Gap
Title | Privately Learning Thresholds: Closing the Exponential Gap |
Authors | Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, Uri Stemmer |
Abstract | We study the sample complexity of learning threshold functions under the constraint of differential privacy. It is assumed that each labeled example in the training data is the information of one individual and we would like to come up with a generalizing hypothesis $h$ while guaranteeing differential privacy for the individuals. Intuitively, this means that any single labeled example in the training data should not have a significant effect on the choice of the hypothesis. This problem has received much attention recently; unlike the non-private case, where the sample complexity is independent of the domain size and just depends on the desired accuracy and confidence, for private learning the sample complexity must depend on the domain size $X$ (even for approximate differential privacy). Alon et al. (STOC 2019) showed a lower bound of $\Omega(\log^X)$ on the sample complexity and Bun et al. (FOCS 2015) presented an approximate-private learner with sample complexity $\tilde{O}\left(2^{\log^X}\right)$. In this work we reduce this gap significantly, almost settling the sample complexity. We first present a new upper bound (algorithm) of $\tilde{O}\left(\left(\log^X\right)^2\right)$ on the sample complexity and then present an improved version with sample complexity $\tilde{O}\left(\left(\log^X\right)^{1.5}\right)$. Our algorithm is constructed for the related interior point problem, where the goal is to find a point between the largest and smallest input elements. It is based on selecting an input-dependent hash function and using it to embed the database into a domain whose size is reduced logarithmically; this results in a new database, an interior point of which can be used to generate an interior point in the original database in a differentially private manner. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10137v1 |
https://arxiv.org/pdf/1911.10137v1.pdf | |
PWC | https://paperswithcode.com/paper/privately-learning-thresholds-closing-the |
Repo | |
Framework | |
Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks
Title | Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks |
Authors | Alexander Levine, Soheil Feizi |
Abstract | In the last couple of years, several adversarial attack methods based on different threat models have been proposed for the image classification problem. Most existing defenses consider additive threat models in which sample perturbations have bounded L_p norms. These defenses, however, can be vulnerable against adversarial attacks under non-additive threat models. An example of an attack method based on a non-additive threat model is the Wasserstein adversarial attack proposed by Wong et al. (2019), where the distance between an image and its adversarial example is determined by the Wasserstein metric (“earth-mover distance”) between their normalized pixel intensities. Until now, there has been no certifiable defense against this type of attack. In this work, we propose the first defense with certified robustness against Wasserstein Adversarial attacks using randomized smoothing. We develop this certificate by considering the space of possible flows between images, and representing this space such that Wasserstein distance between images is upper-bounded by L_1 distance in this flow-space. We can then apply existing randomized smoothing certificates for the L_1 metric. In MNIST and CIFAR-10 datasets, we find that our proposed defense is also practically effective, demonstrating significantly improved accuracy under Wasserstein adversarial attack compared to unprotected models. |
Tasks | Adversarial Attack, Image Classification |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10783v1 |
https://arxiv.org/pdf/1910.10783v1.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-smoothing-certified-robustness |
Repo | |
Framework | |
Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment
Title | Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment |
Authors | Alex Bozarth, Brendan Dwyer, Fei Hu, Daniel Jalova, Karthik Muthuraman, Nick Pentreath, Simon Plovyt, Gabriela de Queiroz, Saishruthi Swaminathan, Patrick Titzler, Xin Wu, Hong Xu, Frederick R Reiss, Vijay Bommireddipalli |
Abstract | A recent trend observed in traditionally challenging fields such as computer vision and natural language processing has been the significant performance gains shown by deep learning (DL). In many different research fields, DL models have been evolving rapidly and become ubiquitous. Despite researchers’ excitement, unfortunately, most software developers are not DL experts and oftentimes have a difficult time following the booming DL research outputs. As a result, it usually takes a significant amount of time for the latest superior DL models to prevail in industry. This issue is further exacerbated by the common use of sundry incompatible DL programming frameworks, such as Tensorflow, PyTorch, Theano, etc. To address this issue, we propose a system, called Model Asset Exchange (MAX), that avails developers of easy access to state-of-the-art DL models. Regardless of the underlying DL programming frameworks, it provides an open source Python library (called the MAX framework) that wraps DL models and unifies programming interfaces with our standardized RESTful APIs. These RESTful APIs enable developers to exploit the wrapped DL models for inference tasks without the need to fully understand different DL programming frameworks. Using MAX, we have wrapped and open-sourced more than 30 state-of-the-art DL models from various research fields, including computer vision, natural language processing and signal processing, etc. In the end, we selectively demonstrate two web applications that are built on top of MAX, as well as the process of adding a DL model to MAX. |
Tasks | |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01606v1 |
https://arxiv.org/pdf/1909.01606v1.pdf | |
PWC | https://paperswithcode.com/paper/model-asset-exchange-path-to-ubiquitous-deep |
Repo | |
Framework | |
Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns
Title | Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns |
Authors | YooJung Choi, Golnoosh Farnadi, Behrouz Babaki, Guy Van den Broeck |
Abstract | As machine learning is increasingly used to make real-world decisions, recent research efforts aim to define and ensure fairness in algorithmic decision making. Existing methods often assume a fixed set of observable features to define individuals, but lack a discussion of certain features not being observed at test time. In this paper, we study fairness of naive Bayes classifiers, which allow partial observations. In particular, we introduce the notion of a discrimination pattern, which refers to an individual receiving different classifications depending on whether some sensitive attributes were observed. Then a model is considered fair if it has no such pattern. We propose an algorithm to discover and mine for discrimination patterns in a naive Bayes classifier, and show how to learn maximum-likelihood parameters subject to these fairness constraints. Our approach iteratively discovers and eliminates discrimination patterns until a fair model is learned. An empirical evaluation on three real-world datasets demonstrates that we can remove exponentially many discrimination patterns by only adding a small fraction of them as constraints. |
Tasks | Decision Making |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03843v1 |
https://arxiv.org/pdf/1906.03843v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-fair-naive-bayes-classifiers-by |
Repo | |
Framework | |
Partially-supervised Mention Detection
Title | Partially-supervised Mention Detection |
Authors | Lesly Miculicich, James Henderson |
Abstract | Learning to detect entity mentions without using syntactic information can be useful for integration and joint optimization with other tasks. However, it is common to have partially annotated data for this problem. Here, we investigate two approaches to deal with partial annotation of mentions: weighted loss and soft-target classification. We also propose two neural mention detection approaches: a sequence tagging, and an exhaustive search. We evaluate our methods with coreference resolution as a downstream task, using multitask learning. The results show that the recall and F1 score improve for all methods. |
Tasks | Coreference Resolution |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09507v1 |
https://arxiv.org/pdf/1908.09507v1.pdf | |
PWC | https://paperswithcode.com/paper/partially-supervised-mention-detection |
Repo | |
Framework | |
Meta-Learning with Warped Gradient Descent
Title | Meta-Learning with Warped Gradient Descent |
Authors | Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell |
Abstract | Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network that directly produces updates or by attempting to learn better initialisations or scaling factors for a gradient-based update rule. Both of these approaches pose challenges. On one hand, directly producing an update forgoes a useful inductive bias and can easily lead to non-converging behaviour. On the other hand, approaches that try to control a gradient-based update rule typically resort to computing gradients through the learning process to obtain their meta-gradients, leading to methods that can not scale beyond few-shot task adaptation. In this work, we propose Warped Gradient Descent (WarpGrad), a method that intersects these approaches to mitigate their limitations. WarpGrad meta-learns an efficiently parameterised preconditioning matrix that facilitates gradient descent across the task distribution. Preconditioning arises by interleaving non-linear layers, referred to as warp-layers, between the layers of a task-learner. Warp-layers are meta-learned without backpropagating through the task training process in a manner similar to methods that learn to directly produce updates. WarpGrad is computationally efficient, easy to implement, and can scale to arbitrarily large meta-learning problems. We provide a geometrical interpretation of the approach and evaluate its effectiveness in a variety of settings, including few-shot, standard supervised, continual and reinforcement learning. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1909.00025v2 |
https://arxiv.org/pdf/1909.00025v2.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-with-warped-gradient-descent |
Repo | |
Framework | |
Knowledge Transfer between Datasets for Learning-based Tissue Microstructure Estimation
Title | Knowledge Transfer between Datasets for Learning-based Tissue Microstructure Estimation |
Authors | Yu Qin, Yuxing Li, Zhiwen Liu, Chuyang Ye |
Abstract | Learning-based approaches, especially those based on deep networks, have enabled high-quality estimation of tissue microstructure from low-quality diffusion magnetic resonance imaging (dMRI) scans, which are acquired with a limited number of diffusion gradients and a relatively poor spatial resolution. These learning-based approaches to tissue microstructure estimation require acquisitions of training dMRI scans with high-quality diffusion signals, which are densely sampled in the q-space and have a high spatial resolution. However, the acquisition of training scans may not be available for all datasets. Therefore, we explore knowledge transfer between different dMRI datasets so that learning-based tissue microstructure estimation can be applied for datasets where training scans are not acquired. Specifically, for a target dataset of interest, where only low-quality diffusion signals are acquired without training scans, we exploit the information in a source dMRI dataset acquired with high-quality diffusion signals. We interpolate the diffusion signals in the source dataset in the q-space using a dictionary-based signal representation, so that the interpolated signals match the acquisition scheme of the target dataset. Then, the interpolated signals are used together with the high-quality tissue microstructure computed from the source dataset to train deep networks that perform tissue microstructure estimation for the target dataset. Experiments were performed on brain dMRI scans with low-quality diffusion signals, where the benefit of the proposed strategy is demonstrated. |
Tasks | Transfer Learning |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.10930v1 |
https://arxiv.org/pdf/1910.10930v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-transfer-between-datasets-for |
Repo | |
Framework | |
Visual-Inertial Localization for Skid-Steering Robots with Kinematic Constraints
Title | Visual-Inertial Localization for Skid-Steering Robots with Kinematic Constraints |
Authors | Xingxing Zuo, Mingming Zhang, Yiming Chen, Yong Liu, Guoquan Huang, Mingyang Li |
Abstract | While visual localization or SLAM has witnessed great progress in past decades, when deploying it on a mobile robot in practice, few works have explicitly considered the kinematic (or dynamic) constraints of the real robotic system when designing state estimators. To promote the practical deployment of current state-of-the-art visual-inertial localization algorithms, in this work we propose a low-cost kinematics-constrained localization system particularly for a skid-steering mobile robot. In particular, we derive in a principle way the robot’s kinematic constraints based on the instantaneous centers of rotation (ICR) model and integrate them in a tightly-coupled manner into the sliding-window bundle adjustment (BA)-based visual-inertial estimator. Because the ICR model parameters are time-varying due to, for example, track-to-terrain interaction and terrain roughness, we estimate these kinematic parameters online along with the navigation state. To this end, we perform in-depth the observability analysis and identify motion conditions under which the state/parameter estimation is viable. The proposed kinematics-constrained visual-inertial localization system has been validated extensively in different terrain scenarios. |
Tasks | Visual Localization |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05787v1 |
https://arxiv.org/pdf/1911.05787v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-inertial-localization-for-skid |
Repo | |
Framework | |
Self-supervised Moving Vehicle Tracking with Stereo Sound
Title | Self-supervised Moving Vehicle Tracking with Stereo Sound |
Authors | Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba |
Abstract | Humans are able to localize objects in the environment using both visual and auditory cues, integrating information from multiple modalities into a common reference frame. We introduce a system that can leverage unlabeled audio-visual data to learn to localize objects (moving vehicles) in a visual reference frame, purely using stereo sound at inference time. Since it is labor-intensive to manually annotate the correspondences between audio and object bounding boxes, we achieve this goal by using the co-occurrence of visual and audio streams in unlabeled videos as a form of self-supervision, without resorting to the collection of ground-truth annotations. In particular, we propose a framework that consists of a vision “teacher” network and a stereo-sound “student” network. During training, knowledge embodied in a well-established visual vehicle detection model is transferred to the audio domain using unlabeled videos as a bridge. At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input. Experimental results on a newly collected Au-ditory Vehicle Tracking dataset verify that our proposed approach outperforms several baseline approaches. We also demonstrate that our cross-modal auditory localization approach can assist in the visual localization of moving vehicles under poor lighting conditions. |
Tasks | Object Localization, Visual Localization |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11760v1 |
https://arxiv.org/pdf/1910.11760v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-moving-vehicle-tracking-with-1 |
Repo | |
Framework | |
Adaptivity and Optimality: A Universal Algorithm for Online Convex Optimization
Title | Adaptivity and Optimality: A Universal Algorithm for Online Convex Optimization |
Authors | Guanghui Wang, Shiyin Lu, Lijun Zhang |
Abstract | In this paper, we study adaptive online convex optimization, and aim to design a universal algorithm that achieves optimal regret bounds for multiple common types of loss functions. Existing universal methods are limited in the sense that they are optimal for only a subclass of loss functions. To address this limitation, we propose a novel online method, namely Maler, which enjoys the optimal $O(\sqrt{T})$, $O(d\log T)$ and $O(\log T)$ regret bounds for general convex, exponentially concave, and strongly convex functions respectively. The essential idea is to run multiple types of learning algorithms with different learning rates in parallel, and utilize a meta algorithm to track the best one on the fly. Empirical results demonstrate the effectiveness of our method. |
Tasks | |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.05917v1 |
https://arxiv.org/pdf/1905.05917v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptivity-and-optimality-a-universal |
Repo | |
Framework | |
Emotional Embeddings: Refining Word Embeddings to Capture Emotional Content of Words
Title | Emotional Embeddings: Refining Word Embeddings to Capture Emotional Content of Words |
Authors | Armin Seyeditabari, Narges Tabari, Shafie Gholizade, Wlodek Zadrozny |
Abstract | Word embeddings are one of the most useful tools in any modern natural language processing expert’s toolkit. They contain various types of information about each word which makes them the best way to represent the terms in any NLP task. But there are some types of information that cannot be learned by these models. Emotional information of words are one of those. In this paper, we present an approach to incorporate emotional information of words into these models. We accomplish this by adding a secondary training stage which uses an emotional lexicon and a psychological model of basic emotions. We show that fitting an emotional model into pre-trained word vectors can increase the performance of these models in emotional similarity metrics. Retrained models perform better than their original counterparts from 13% improvement for Word2Vec model, to 29% for GloVe vectors. This is the first such model presented in the literature, and although preliminary, these emotion sensitive models can open the way to increase performance in variety of emotion detection techniques. |
Tasks | Word Embeddings |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.00112v2 |
https://arxiv.org/pdf/1906.00112v2.pdf | |
PWC | https://paperswithcode.com/paper/190600112 |
Repo | |
Framework | |
Annotating and normalizing biomedical NEs with limited knowledge
Title | Annotating and normalizing biomedical NEs with limited knowledge |
Authors | Fernando Sánchez León, Ana González Ledesma |
Abstract | Named entity recognition (NER) is the very first step in the linguistic processing of any new domain. It is currently a common process in BioNLP on English clinical text. However, it is still in its infancy in other major languages, as it is the case for Spanish. Presented under the umbrella of the PharmaCoNER shared task, this paper describes a very simple method for the annotation and normalization of pharmacological, chemical and, ultimately, biomedical named entities in clinical cases. The system developed for the shared task is based on limited knowledge, collected, structured and munged in a way that clearly outperforms scores obtained by similar dictionary-based systems for English in the past. Along with this recovering of the knowledge-based methods for NER in subdomains, the paper also highlights the key contribution of resource-based systems in the validation and consolidation of both the annotation guidelines and the human annotation practices. In this sense, some of the authors discoverings on the overall quality of human annotated datasets question the above-mentioned `official’ results obtained by this system, that ranked second (0.91 F1-score) and first (0.916 F1-score), respectively, in the two PharmaCoNER subtasks. | |
Tasks | Named Entity Recognition |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09152v1 |
https://arxiv.org/pdf/1912.09152v1.pdf | |
PWC | https://paperswithcode.com/paper/annotating-and-normalizing-biomedical-nes |
Repo | |
Framework | |
Learning Canonical Representations for Scene Graph to Image Generation
Title | Learning Canonical Representations for Scene Graph to Image Generation |
Authors | Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson |
Abstract | Generating realistic images of complex visual scenes becomes very challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approach struggles as the complexity of the graph (the number of objects and edges) increases. In this work, we show that one limitation of current methods is their inability to capture semantic equivalence in graphs. We present a novel model to address this, by employing a canonical graph representation, which ensures that semantically similar graphs will result in similar images. We show improved performance of the model on three different benchmarks: Visual Genome, COCO and CLEVR. |
Tasks | Image Generation |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07414v2 |
https://arxiv.org/pdf/1912.07414v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-canonical-representations-for-scene |
Repo | |
Framework | |
Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization
Title | Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization |
Authors | Kaiyi Ji, Zhe Wang, Yi Zhou, Yingbin Liang |
Abstract | Two types of zeroth-order stochastic algorithms have recently been designed for nonconvex optimization respectively based on the first-order techniques SVRG and SARAH/SPIDER. This paper addresses several important issues that are still open in these methods. First, all existing SVRG-type zeroth-order algorithms suffer from worse function query complexities than either zeroth-order gradient descent (ZO-GD) or stochastic gradient descent (ZO-SGD). In this paper, we propose a new algorithm ZO-SVRG-Coord-Rand and develop a new analysis for an existing ZO-SVRG-Coord algorithm proposed in Liu et al. 2018b, and show that both ZO-SVRG-Coord-Rand and ZO-SVRG-Coord (under our new analysis) outperform other exiting SVRG-type zeroth-order methods as well as ZO-GD and ZO-SGD. Second, the existing SPIDER-type algorithm SPIDER-SZO (Fang et al. 2018) has superior theoretical performance, but suffers from the generation of a large number of Gaussian random variables as well as a $\sqrt{\epsilon}$-level stepsize in practice. In this paper, we develop a new algorithm ZO-SPIDER-Coord, which is free from Gaussian variable generation and allows a large constant stepsize while maintaining the same convergence rate and query complexity, and we further show that ZO-SPIDER-Coord automatically achieves a linear convergence rate as the iterate enters into a local PL region without restart and algorithmic modification. |
Tasks | |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12166v1 |
https://arxiv.org/pdf/1910.12166v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-zeroth-order-variance-reduced |
Repo | |
Framework | |
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Title | Cross-Lingual Ability of Multilingual BERT: An Empirical Study |
Authors | Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth |
Abstract | Recent work has exhibited the surprising cross-lingual abilities of multilingual BERT (M-BERT) – surprising since it is trained without any cross-lingual objective and with no aligned data. In this work, we provide a comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability. We study the impact of linguistic properties of the languages, the architecture of the model, and the learning objectives. The experimental study is done in the context of three typologically different languages – Spanish, Hindi, and Russian – and using two conceptually different NLP tasks, textual entailment and named entity recognition. Among our key conclusions is the fact that the lexical overlap between languages plays a negligible role in the cross-lingual success, while the depth of the network is an integral part of it. All our models and implementations can be found on our project page: http://cogcomp.org/page/publication_view/900 . |
Tasks | Named Entity Recognition, Natural Language Inference |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.07840v2 |
https://arxiv.org/pdf/1912.07840v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-ability-of-multilingual-bert-an-1 |
Repo | |
Framework | |