January 31, 2020

3284 words 16 mins read

Paper Group AWR 410

Paper Group AWR 410

Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation. Direct information transfer rate optimisation for SSVEP-based BCI. Consistency by Agreement in Zero-shot Neural Machine Translation. Accelerating Extreme Classification via Adaptive Feature Agglomeration. Efficient Algorithms for Set-Valued Prediction in Multi-Cla …

Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation

Title Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Authors Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli
Abstract Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks. Previous work has used adversarial training and data augmentation to partially mitigate such brittleness, but these are unlikely to find worst-case adversaries due to the complexity of the search space arising from discrete text perturbations. In this work, we approach the problem from the opposite direction: to formally verify a system’s robustness against a predefined class of adversarial attacks. We study text classification under synonym replacements or character flip perturbations. We propose modeling these input perturbations as a simplex and then using Interval Bound Propagation – a formal model verification method. We modify the conventional log-likelihood training objective to train models that can be efficiently verified, which would otherwise come with exponential search complexity. The resulting models show only little difference in terms of nominal accuracy, but have much improved verified accuracy under perturbations and come with an efficiently computable formal guarantee on worst case adversaries.
Tasks Data Augmentation, Text Classification
Published 2019-09-03
URL https://arxiv.org/abs/1909.01492v2
PDF https://arxiv.org/pdf/1909.01492v2.pdf
PWC https://paperswithcode.com/paper/achieving-verified-robustness-to-symbol
Repo https://github.com/deepmind/interval-bound-propagation
Framework tf

Direct information transfer rate optimisation for SSVEP-based BCI

Title Direct information transfer rate optimisation for SSVEP-based BCI
Authors Anti Ingel, Ilya Kuzovkin, Raul Vicente
Abstract In this work, a classification method for SSVEP-based BCI is proposed. The classification method uses features extracted by traditional SSVEP-based BCI methods and finds optimal discrimination thresholds for each feature to classify the targets. Optimising the thresholds is formalised as a maximisation task of a performance measure of BCIs called information transfer rate (ITR). However, instead of the standard method of calculating ITR, which makes certain assumptions about the data, a more general formula is derived to avoid incorrect ITR calculation when the standard assumptions are not met. This allows the optimal discrimination thresholds to be automatically calculated and thus eliminates the need for manual parameter selection or performing computationally expensive grid searches. The proposed method shows good performance in classifying targets of a BCI, outperforming previously reported results on the same dataset by a factor of 2 in terms of ITR. The highest achieved ITR on the used dataset was 62 bit/min. The proposed method also provides a way to reduce false classifications, which is important in real-world applications.
Tasks
Published 2019-07-19
URL https://arxiv.org/abs/1907.10509v1
PDF https://arxiv.org/pdf/1907.10509v1.pdf
PWC https://paperswithcode.com/paper/direct-information-transfer-rate-optimisation
Repo https://github.com/antiingel/ITR-optimisation
Framework none

Consistency by Agreement in Zero-shot Neural Machine Translation

Title Consistency by Agreement in Zero-shot Neural Machine Translation
Authors Maruan Al-Shedivat, Ankur P. Parikh
Abstract Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization—a challenging setup that tests models on translation directions they have not been optimized for at training time. To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zero-shot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreement-based learning often results in 2-3 BLEU zero-shot improvement over strong baselines without any loss in performance on supervised translation directions.
Tasks Machine Translation, Zero-Shot Machine Translation
Published 2019-04-04
URL http://arxiv.org/abs/1904.02338v2
PDF http://arxiv.org/pdf/1904.02338v2.pdf
PWC https://paperswithcode.com/paper/consistency-by-agreement-in-zero-shot-neural
Repo https://github.com/google-research/language/blob/master/language/labs/consistent_zero_shot_nmt
Framework tf

Accelerating Extreme Classification via Adaptive Feature Agglomeration

Title Accelerating Extreme Classification via Adaptive Feature Agglomeration
Authors Ankit Jalan, Purushottam Kar
Abstract Extreme classification seeks to assign each data point, the most relevant labels from a universe of a million or more labels. This task is faced with the dual challenge of high precision and scalability, with millisecond level prediction times being a benchmark. We propose DEFRAG, an adaptive feature agglomeration technique to accelerate extreme classification algorithms. Despite past works on feature clustering and selection, DEFRAG distinguishes itself in being able to scale to millions of features, and is especially beneficial when feature sets are sparse, which is typical of recommendation and multi-label datasets. The method comes with provable performance guarantees and performs efficient task-driven agglomeration to reduce feature dimensionalities by an order of magnitude or more. Experiments show that DEFRAG can not only reduce training and prediction times of several leading extreme classification algorithms by as much as 40%, but also be used for feature reconstruction to address the problem of missing features, as well as offer superior coverage on rare labels.
Tasks
Published 2019-05-28
URL https://arxiv.org/abs/1905.11769v1
PDF https://arxiv.org/pdf/1905.11769v1.pdf
PWC https://paperswithcode.com/paper/accelerating-extreme-classification-via
Repo https://github.com/purushottamkar/defrag
Framework none

Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification

Title Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification
Authors Thomas Mortier, Marek Wydmuch, Eyke Hüllermeier, Krzysztof Dembczyński, Willem Waegeman
Abstract In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between the correctness (the true class is among the candidates) and the precision (the candidates are not too many) of its prediction. We formalize this problem within a general decision-theoretic framework that unifies most of the existing work in this area. In this framework, uncertainty is quantified in terms of conditional class probabilities, and the quality of a predicted set is measured in terms of a utility function. We then address the problem of finding the Bayes-optimal prediction, i.e., the subset of class labels with highest expected utility. For this problem, which is computationally challenging as there are exponentially (in the number of classes) many predictions to choose from, we propose efficient algorithms that can be applied to a broad family of utility scores. Two of these algorithms make use of structural information in the form of a class hierarchy, which is often available in prediction problems with many classes. Our theoretical results are complemented by experimental studies, in which we analyze the proposed algorithms in terms of predictive accuracy and runtime efficiency.
Tasks
Published 2019-06-19
URL https://arxiv.org/abs/1906.08129v1
PDF https://arxiv.org/pdf/1906.08129v1.pdf
PWC https://paperswithcode.com/paper/efficient-algorithms-for-set-valued
Repo https://github.com/tfmortie/setvaluedprediction
Framework none

Deep Residual Neural Networks for Audio Spoofing Detection

Title Deep Residual Neural Networks for Audio Spoofing Detection
Authors Moustafa Alzantot, Ziqi Wang, Mani B. Srivastava
Abstract The state-of-art models for speech synthesis and voice conversion are capable of generating synthetic speech that is perceptually indistinguishable from bonafide human speech. These methods represent a threat to the automatic speaker verification (ASV) systems. Additionally, replay attacks where the attacker uses a speaker to replay a previously recorded genuine human speech are also possible. We present our solution for the ASVSpoof2019 competition, which aims to develop countermeasure systems that distinguish between spoofing attacks and genuine speeches. Our model is inspired by the success of residual convolutional networks in many classification tasks. We build three variants of a residual convolutional neural network that accept different feature representations (MFCC, Log-magnitude STFT, and CQCC) of input. We compare the performance achieved by our model variants and the competition baseline models. In the logical access scenario, the fusion of our models has zero t-DCF cost and zero equal error rate (EER), as evaluated on the development set. On the evaluation set, our model fusion improves the t-DCF and EER by 25% compared to the baseline algorithms. Against physical access replay attacks, our model fusion improves the baseline algorithms t-DCF and EER scores by 71% and 75% on the evaluation set, respectively.
Tasks Speaker Verification, Speech Synthesis, Voice Conversion
Published 2019-06-30
URL https://arxiv.org/abs/1907.00501v1
PDF https://arxiv.org/pdf/1907.00501v1.pdf
PWC https://paperswithcode.com/paper/deep-residual-neural-networks-for-audio
Repo https://github.com/nesl/asvspoof2019
Framework none

DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs

Title DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
Authors Ali Sadeghian, Mohammadreza Armandpour, Patrick Ding, Daisy Zhe Wang
Abstract In this paper, we study the problem of learning probabilistic logical rules for inductive and interpretable link prediction. Despite the importance of inductive link prediction, most previous works focused on transductive link prediction and cannot manage previously unseen entities. Moreover, they are black-box models that are not easily explainable for humans. We propose DRUM, a scalable and differentiable approach for mining first-order logical rules from knowledge graphs which resolves these problems. We motivate our method by making a connection between learning confidence scores for each rule and low-rank tensor approximation. DRUM uses bidirectional RNNs to share useful information across the tasks of learning rules for different relations. We also empirically demonstrate the efficiency of DRUM over existing rule mining methods for inductive link prediction on a variety of benchmark datasets.
Tasks Knowledge Graphs, Link Prediction
Published 2019-10-31
URL https://arxiv.org/abs/1911.00055v1
PDF https://arxiv.org/pdf/1911.00055v1.pdf
PWC https://paperswithcode.com/paper/drum-end-to-end-differentiable-rule-mining-on
Repo https://github.com/irokin/Experiments-Results-for-Link-Prediction
Framework none

Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels

Title Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels
Authors Lukas Galke, Florian Mai, Iacopo Vagliano, Ansgar Scherp
Abstract We present multi-modal adversarial autoencoders for recommendation and evaluate them on two different tasks: citation recommendation and subject label recommendation. We analyze the effects of adversarial regularization, sparsity, and different input modalities. By conducting 408 experiments, we show that adversarial regularization consistently improves the performance of autoencoders for recommendation. We demonstrate, however, that the two tasks differ in the semantics of item co-occurrence in the sense that item co-occurrence resembles relatedness in case of citations, yet implies diversity in case of subject labels. Our results reveal that supplying the partial item set as input is only helpful, when item co-occurrence resembles relatedness. When facing a new recommendation task it is therefore crucial to consider the semantics of item co-occurrence for the choice of an appropriate model.
Tasks
Published 2019-07-22
URL https://arxiv.org/abs/1907.12366v1
PDF https://arxiv.org/pdf/1907.12366v1.pdf
PWC https://paperswithcode.com/paper/multi-modal-adversarial-autoencoders-for
Repo https://github.com/lgalke/aae-recommender
Framework pytorch

Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR

Title Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR
Authors Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach
Abstract In this paper, we present Hitachi and Paderborn University’s joint effort for automatic speech recognition (ASR) in a dinner party scenario. The main challenges of ASR systems for dinner party recordings obtained by multiple microphone arrays are (1) heavy speech overlaps, (2) severe noise and reverberation, (3) very natural conversational content, and possibly (4) insufficient training data. As an example of a dinner party scenario, we have chosen the data presented during the CHiME-5 speech recognition challenge, where the baseline ASR had a 73.3% word error rate (WER), and even the best performing system at the CHiME-5 challenge had a 46.1% WER. We extensively investigated a combination of the guided source separation-based speech enhancement technique and an already proposed strong ASR backend and found that a tight combination of these techniques provided substantial accuracy improvements. Our final system achieved WERs of 39.94% and 41.64% for the development and evaluation data, respectively, both of which are the best published results for the dataset. We also investigated with additional training data on the official small data in the CHiME-5 corpus to assess the intrinsic difficulty of this ASR task.
Tasks Speech Enhancement, Speech Recognition
Published 2019-05-29
URL https://arxiv.org/abs/1905.12230v2
PDF https://arxiv.org/pdf/1905.12230v2.pdf
PWC https://paperswithcode.com/paper/guided-source-separation-meets-a-strong-asr
Repo https://github.com/fgnt/pb_chime5
Framework none

Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation

Title Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation
Authors Sun Yu, Ye Yun, Liu Wu, Gao Wenpeng, Fu YiLi, Mei Tao
Abstract We describe an end-to-end method for recovering 3D human body mesh from single images and monocular videos. Different from the existing methods try to obtain all the complex 3D pose, shape, and camera parameters from one coupling feature, we propose a skeleton-disentangling based framework, which divides this task into multi-level spatial and temporal granularity in a decoupling manner. In spatial, we propose an effective and pluggable “disentangling the skeleton from the details” (DSD) module. It reduces the complexity and decouples the skeleton, which lays a good foundation for temporal modeling. In temporal, the self-attention based temporal convolution network is proposed to efficiently exploit the short and long-term temporal cues. Furthermore, an unsupervised adversarial training strategy, temporal shuffles and order recovery, is designed to promote the learning of motion dynamics. The proposed method outperforms the state-of-the-art 3D human mesh recovery methods by 15.4% MPJPE and 23.8% PA-MPJPE on Human3.6M. State-of-the-art results are also achieved on the 3D pose in the wild (3DPW) dataset without any fine-tuning. Especially, ablation studies demonstrate that skeleton-disentangled representation is crucial for better temporal modeling and generalization.
Tasks
Published 2019-08-20
URL https://arxiv.org/abs/1908.07172v2
PDF https://arxiv.org/pdf/1908.07172v2.pdf
PWC https://paperswithcode.com/paper/human-mesh-recovery-from-monocular-images-via
Repo https://github.com/Arthur151/DSD-SATN
Framework pytorch

RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time

Title RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time
Authors Gaurav Gupta, Benjamin Coleman, Tharun Medini, Vijai Mohan, Anshumali Shrivastava
Abstract Approximate set membership is a common problem with wide applications in databases, networking, and search. Given a set S and a query q, the task is to determine whether q in S. The Bloom Filter (BF) is a popular data structure for approximate membership testing due to its simplicity. In particular, a BF consists of a bit array that can be incrementally updated. A related problem concerning this paper is the Multiple Set Membership Testing (MSMT) problem. Here we are given K different sets, and for any given query q the goal is the find all of the sets containing the query element. Trivially, a multiple set membership instance can be reduced to K membership testing instances, each with the same q, leading to O(K) query time. A simple array of Bloom Filters can achieve that. In this paper, we show the first non-trivial data-structure for streaming keys, RAMBO (Repeated And Merged Bloom Filter) that achieves expected O(sqrt(K) logK) query time with an additional worst case memory cost factor of O(logK) than the array of Bloom Filters. The proposed data-structure is simply a count-min sketch arrangement of Bloom Filters and retains all its favorable properties. We replace the addition operation with a set union and the minimum operation with a set intersection during estimation.
Tasks
Published 2019-10-07
URL https://arxiv.org/abs/1910.02611v1
PDF https://arxiv.org/pdf/1910.02611v1.pdf
PWC https://paperswithcode.com/paper/rambo-repeated-and-merged-bloom-filter-for
Repo https://github.com/RUSH-LAB/RAMBO
Framework none

Dreem Open Datasets: Multi-Scored Sleep Datasets to compare Human and Automated sleep staging

Title Dreem Open Datasets: Multi-Scored Sleep Datasets to compare Human and Automated sleep staging
Authors Antoine Guillot, Fabien Sauvet, Emmanuel H During, Valentin Thorey
Abstract Sleep stage classification constitutes an important element of sleep disorder diagnosis. It relies on the visual inspection of polysomnography records by trained sleep technologists. Automated approaches have been designed to alleviate this resource-intensive task. However, such approaches are usually compared to a single human scorer annotation despite an inter-rater agreement of about 85 % only. The present study introduces two publicly-available datasets, DOD-H including 25 healthy volunteers and DOD-O including 55 patients suffering from obstructive sleep apnea (OSA). Both datasets have been scored by 5 sleep technologists from different sleep centers. We developed a framework to compare automated approaches to a consensus of multiple human scorers. Using this framework, we benchmarked and compared the main literature approaches. We also developed and benchmarked a new deep learning method, SimpleSleepNet, inspired by current state-of-the-art. We demonstrated that many methods can reach human-level performance on both datasets. SimpleSleepNet achieved an F1 of 89.9 % vs 86.8 % on average for human scorers on DOD-H, and an F1 of 88.3 % vs 84.8 % on DOD-O. Our study highlights that using state-of-the-art automated sleep staging outperforms human scorers performance for healthy volunteers and patients suffering from OSA. Consideration could be made to use automated approaches in the clinical setting.
Tasks
Published 2019-10-31
URL https://arxiv.org/abs/1911.03221v3
PDF https://arxiv.org/pdf/1911.03221v3.pdf
PWC https://paperswithcode.com/paper/dreem-open-datasets-multi-scored-sleep
Repo https://github.com/Dreem-Organization/dreem-learning-evaluation
Framework none

TedEval: A Fair Evaluation Metric for Scene Text Detectors

Title TedEval: A Fair Evaluation Metric for Scene Text Detectors
Authors Chae Young Lee, Youngmin Baek, Hwalsuk Lee
Abstract Despite the recent success of scene text detection methods, common evaluation metrics fail to provide a fair and reliable comparison among detectors. They have obvious drawbacks in reflecting the inherent characteristic of text detection tasks, unable to address issues such as granularity, multiline, and character incompleteness. In this paper, we propose a novel evaluation protocol called TedEval (Text detector Evaluation), which evaluates text detections by an instance-level matching and a character-level scoring. Based on a firm standard rewarding behaviors that result in successful recognition, TedEval can act as a reliable standard for comparing and quantizing the detection quality throughout all difficulty levels. In this regard, we believe that TedEval can play a key role in developing state-of-the-art scene text detectors. The code is publicly available at https://github.com/clovaai/TedEval.
Tasks Scene Text Detection
Published 2019-07-02
URL https://arxiv.org/abs/1907.01227v1
PDF https://arxiv.org/pdf/1907.01227v1.pdf
PWC https://paperswithcode.com/paper/tedeval-a-fair-evaluation-metric-for-scene
Repo https://github.com/clovaai/TedEval
Framework none

Linearized Multi-Sampling for Differentiable Image Transformation

Title Linearized Multi-Sampling for Differentiable Image Transformation
Authors Wei Jiang, Weiwei Sun, Andrea Tagliasacchi, Eduard Trulls, Kwang Moo Yi
Abstract We propose a novel image sampling method for differentiable image transformation in deep neural networks. The sampling schemes currently used in deep learning, such as Spatial Transformer Networks, rely on bilinear interpolation, which performs poorly under severe scale changes, and more importantly, results in poor gradient propagation. This is due to their strict reliance on direct neighbors. Instead, we propose to generate random auxiliary samples in the vicinity of each pixel in the sampled image, and create a linear approximation with their intensity values. We then use this approximation as a differentiable formula for the transformed image. We demonstrate that our approach produces more representative gradients with a wider basin of convergence for image alignment, which leads to considerable performance improvements when training networks for classification tasks. This is not only true under large downsampling, but also when there are no scale changes. We compare our approach with multi-scale sampling and show that we outperform it. We then demonstrate that our improvements to the sampler are compatible with other tangential improvements to Spatial Transformer Networks and that it further improves their performance.
Tasks Image Registration
Published 2019-01-22
URL https://arxiv.org/abs/1901.07124v3
PDF https://arxiv.org/pdf/1901.07124v3.pdf
PWC https://paperswithcode.com/paper/linearized-multi-sampling-for-differentiable
Repo https://github.com/vcg-uvic/linearized_multisampling_release
Framework pytorch

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution

Title MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution
Authors Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis
Abstract We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime. Our method trains a cohort of sub-networks with different widths using different input resolutions to mutually learn multi-scale representations for each sub-network. It achieves consistently better ImageNet top-1 accuracy over the state-of-the-art adaptive network US-Net under different computation constraints, and outperforms the best compound scaled MobileNet in EfficientNet by 1.5%. The superiority of our method is also validated on COCO object detection and instance segmentation as well as transfer learning. Surprisingly, the training strategy of MutualNet can also boost the performance of a single network, which substantially outperforms the powerful AutoAugmentation in both efficiency (GPU search hours: 15000 vs. 0) and accuracy (ImageNet: 77.6% vs. 78.6%). Code is available at \url{https://github.com/taoyang1122/MutualNet}.
Tasks Instance Segmentation, Object Detection, Semantic Segmentation, Transfer Learning
Published 2019-09-27
URL https://arxiv.org/abs/1909.12978v3
PDF https://arxiv.org/pdf/1909.12978v3.pdf
PWC https://paperswithcode.com/paper/a-closer-look-at-network-resolution-for
Repo https://github.com/taoyang1122/MutualNet
Framework pytorch
comments powered by Disqus