Paper Group AWR 410
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation. Direct information transfer rate optimisation for SSVEP-based BCI. Consistency by Agreement in Zero-shot Neural Machine Translation. Accelerating Extreme Classification via Adaptive Feature Agglomeration. Efficient Algorithms for Set-Valued Prediction in Multi-Cla …
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Title | Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation |
Authors | Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli |
Abstract | Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks. Previous work has used adversarial training and data augmentation to partially mitigate such brittleness, but these are unlikely to find worst-case adversaries due to the complexity of the search space arising from discrete text perturbations. In this work, we approach the problem from the opposite direction: to formally verify a system’s robustness against a predefined class of adversarial attacks. We study text classification under synonym replacements or character flip perturbations. We propose modeling these input perturbations as a simplex and then using Interval Bound Propagation – a formal model verification method. We modify the conventional log-likelihood training objective to train models that can be efficiently verified, which would otherwise come with exponential search complexity. The resulting models show only little difference in terms of nominal accuracy, but have much improved verified accuracy under perturbations and come with an efficiently computable formal guarantee on worst case adversaries. |
Tasks | Data Augmentation, Text Classification |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01492v2 |
https://arxiv.org/pdf/1909.01492v2.pdf | |
PWC | https://paperswithcode.com/paper/achieving-verified-robustness-to-symbol |
Repo | https://github.com/deepmind/interval-bound-propagation |
Framework | tf |
Direct information transfer rate optimisation for SSVEP-based BCI
Title | Direct information transfer rate optimisation for SSVEP-based BCI |
Authors | Anti Ingel, Ilya Kuzovkin, Raul Vicente |
Abstract | In this work, a classification method for SSVEP-based BCI is proposed. The classification method uses features extracted by traditional SSVEP-based BCI methods and finds optimal discrimination thresholds for each feature to classify the targets. Optimising the thresholds is formalised as a maximisation task of a performance measure of BCIs called information transfer rate (ITR). However, instead of the standard method of calculating ITR, which makes certain assumptions about the data, a more general formula is derived to avoid incorrect ITR calculation when the standard assumptions are not met. This allows the optimal discrimination thresholds to be automatically calculated and thus eliminates the need for manual parameter selection or performing computationally expensive grid searches. The proposed method shows good performance in classifying targets of a BCI, outperforming previously reported results on the same dataset by a factor of 2 in terms of ITR. The highest achieved ITR on the used dataset was 62 bit/min. The proposed method also provides a way to reduce false classifications, which is important in real-world applications. |
Tasks | |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.10509v1 |
https://arxiv.org/pdf/1907.10509v1.pdf | |
PWC | https://paperswithcode.com/paper/direct-information-transfer-rate-optimisation |
Repo | https://github.com/antiingel/ITR-optimisation |
Framework | none |
Consistency by Agreement in Zero-shot Neural Machine Translation
Title | Consistency by Agreement in Zero-shot Neural Machine Translation |
Authors | Maruan Al-Shedivat, Ankur P. Parikh |
Abstract | Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization—a challenging setup that tests models on translation directions they have not been optimized for at training time. To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zero-shot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreement-based learning often results in 2-3 BLEU zero-shot improvement over strong baselines without any loss in performance on supervised translation directions. |
Tasks | Machine Translation, Zero-Shot Machine Translation |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02338v2 |
http://arxiv.org/pdf/1904.02338v2.pdf | |
PWC | https://paperswithcode.com/paper/consistency-by-agreement-in-zero-shot-neural |
Repo | https://github.com/google-research/language/blob/master/language/labs/consistent_zero_shot_nmt |
Framework | tf |
Accelerating Extreme Classification via Adaptive Feature Agglomeration
Title | Accelerating Extreme Classification via Adaptive Feature Agglomeration |
Authors | Ankit Jalan, Purushottam Kar |
Abstract | Extreme classification seeks to assign each data point, the most relevant labels from a universe of a million or more labels. This task is faced with the dual challenge of high precision and scalability, with millisecond level prediction times being a benchmark. We propose DEFRAG, an adaptive feature agglomeration technique to accelerate extreme classification algorithms. Despite past works on feature clustering and selection, DEFRAG distinguishes itself in being able to scale to millions of features, and is especially beneficial when feature sets are sparse, which is typical of recommendation and multi-label datasets. The method comes with provable performance guarantees and performs efficient task-driven agglomeration to reduce feature dimensionalities by an order of magnitude or more. Experiments show that DEFRAG can not only reduce training and prediction times of several leading extreme classification algorithms by as much as 40%, but also be used for feature reconstruction to address the problem of missing features, as well as offer superior coverage on rare labels. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11769v1 |
https://arxiv.org/pdf/1905.11769v1.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-extreme-classification-via |
Repo | https://github.com/purushottamkar/defrag |
Framework | none |
Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification
Title | Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification |
Authors | Thomas Mortier, Marek Wydmuch, Eyke Hüllermeier, Krzysztof Dembczyński, Willem Waegeman |
Abstract | In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between the correctness (the true class is among the candidates) and the precision (the candidates are not too many) of its prediction. We formalize this problem within a general decision-theoretic framework that unifies most of the existing work in this area. In this framework, uncertainty is quantified in terms of conditional class probabilities, and the quality of a predicted set is measured in terms of a utility function. We then address the problem of finding the Bayes-optimal prediction, i.e., the subset of class labels with highest expected utility. For this problem, which is computationally challenging as there are exponentially (in the number of classes) many predictions to choose from, we propose efficient algorithms that can be applied to a broad family of utility scores. Two of these algorithms make use of structural information in the form of a class hierarchy, which is often available in prediction problems with many classes. Our theoretical results are complemented by experimental studies, in which we analyze the proposed algorithms in terms of predictive accuracy and runtime efficiency. |
Tasks | |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.08129v1 |
https://arxiv.org/pdf/1906.08129v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-algorithms-for-set-valued |
Repo | https://github.com/tfmortie/setvaluedprediction |
Framework | none |
Deep Residual Neural Networks for Audio Spoofing Detection
Title | Deep Residual Neural Networks for Audio Spoofing Detection |
Authors | Moustafa Alzantot, Ziqi Wang, Mani B. Srivastava |
Abstract | The state-of-art models for speech synthesis and voice conversion are capable of generating synthetic speech that is perceptually indistinguishable from bonafide human speech. These methods represent a threat to the automatic speaker verification (ASV) systems. Additionally, replay attacks where the attacker uses a speaker to replay a previously recorded genuine human speech are also possible. We present our solution for the ASVSpoof2019 competition, which aims to develop countermeasure systems that distinguish between spoofing attacks and genuine speeches. Our model is inspired by the success of residual convolutional networks in many classification tasks. We build three variants of a residual convolutional neural network that accept different feature representations (MFCC, Log-magnitude STFT, and CQCC) of input. We compare the performance achieved by our model variants and the competition baseline models. In the logical access scenario, the fusion of our models has zero t-DCF cost and zero equal error rate (EER), as evaluated on the development set. On the evaluation set, our model fusion improves the t-DCF and EER by 25% compared to the baseline algorithms. Against physical access replay attacks, our model fusion improves the baseline algorithms t-DCF and EER scores by 71% and 75% on the evaluation set, respectively. |
Tasks | Speaker Verification, Speech Synthesis, Voice Conversion |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00501v1 |
https://arxiv.org/pdf/1907.00501v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-residual-neural-networks-for-audio |
Repo | https://github.com/nesl/asvspoof2019 |
Framework | none |
DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
Title | DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs |
Authors | Ali Sadeghian, Mohammadreza Armandpour, Patrick Ding, Daisy Zhe Wang |
Abstract | In this paper, we study the problem of learning probabilistic logical rules for inductive and interpretable link prediction. Despite the importance of inductive link prediction, most previous works focused on transductive link prediction and cannot manage previously unseen entities. Moreover, they are black-box models that are not easily explainable for humans. We propose DRUM, a scalable and differentiable approach for mining first-order logical rules from knowledge graphs which resolves these problems. We motivate our method by making a connection between learning confidence scores for each rule and low-rank tensor approximation. DRUM uses bidirectional RNNs to share useful information across the tasks of learning rules for different relations. We also empirically demonstrate the efficiency of DRUM over existing rule mining methods for inductive link prediction on a variety of benchmark datasets. |
Tasks | Knowledge Graphs, Link Prediction |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1911.00055v1 |
https://arxiv.org/pdf/1911.00055v1.pdf | |
PWC | https://paperswithcode.com/paper/drum-end-to-end-differentiable-rule-mining-on |
Repo | https://github.com/irokin/Experiments-Results-for-Link-Prediction |
Framework | none |
Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels
Title | Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels |
Authors | Lukas Galke, Florian Mai, Iacopo Vagliano, Ansgar Scherp |
Abstract | We present multi-modal adversarial autoencoders for recommendation and evaluate them on two different tasks: citation recommendation and subject label recommendation. We analyze the effects of adversarial regularization, sparsity, and different input modalities. By conducting 408 experiments, we show that adversarial regularization consistently improves the performance of autoencoders for recommendation. We demonstrate, however, that the two tasks differ in the semantics of item co-occurrence in the sense that item co-occurrence resembles relatedness in case of citations, yet implies diversity in case of subject labels. Our results reveal that supplying the partial item set as input is only helpful, when item co-occurrence resembles relatedness. When facing a new recommendation task it is therefore crucial to consider the semantics of item co-occurrence for the choice of an appropriate model. |
Tasks | |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.12366v1 |
https://arxiv.org/pdf/1907.12366v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-adversarial-autoencoders-for |
Repo | https://github.com/lgalke/aae-recommender |
Framework | pytorch |
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR
Title | Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR |
Authors | Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach |
Abstract | In this paper, we present Hitachi and Paderborn University’s joint effort for automatic speech recognition (ASR) in a dinner party scenario. The main challenges of ASR systems for dinner party recordings obtained by multiple microphone arrays are (1) heavy speech overlaps, (2) severe noise and reverberation, (3) very natural conversational content, and possibly (4) insufficient training data. As an example of a dinner party scenario, we have chosen the data presented during the CHiME-5 speech recognition challenge, where the baseline ASR had a 73.3% word error rate (WER), and even the best performing system at the CHiME-5 challenge had a 46.1% WER. We extensively investigated a combination of the guided source separation-based speech enhancement technique and an already proposed strong ASR backend and found that a tight combination of these techniques provided substantial accuracy improvements. Our final system achieved WERs of 39.94% and 41.64% for the development and evaluation data, respectively, both of which are the best published results for the dataset. We also investigated with additional training data on the official small data in the CHiME-5 corpus to assess the intrinsic difficulty of this ASR task. |
Tasks | Speech Enhancement, Speech Recognition |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12230v2 |
https://arxiv.org/pdf/1905.12230v2.pdf | |
PWC | https://paperswithcode.com/paper/guided-source-separation-meets-a-strong-asr |
Repo | https://github.com/fgnt/pb_chime5 |
Framework | none |
Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation
Title | Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation |
Authors | Sun Yu, Ye Yun, Liu Wu, Gao Wenpeng, Fu YiLi, Mei Tao |
Abstract | We describe an end-to-end method for recovering 3D human body mesh from single images and monocular videos. Different from the existing methods try to obtain all the complex 3D pose, shape, and camera parameters from one coupling feature, we propose a skeleton-disentangling based framework, which divides this task into multi-level spatial and temporal granularity in a decoupling manner. In spatial, we propose an effective and pluggable “disentangling the skeleton from the details” (DSD) module. It reduces the complexity and decouples the skeleton, which lays a good foundation for temporal modeling. In temporal, the self-attention based temporal convolution network is proposed to efficiently exploit the short and long-term temporal cues. Furthermore, an unsupervised adversarial training strategy, temporal shuffles and order recovery, is designed to promote the learning of motion dynamics. The proposed method outperforms the state-of-the-art 3D human mesh recovery methods by 15.4% MPJPE and 23.8% PA-MPJPE on Human3.6M. State-of-the-art results are also achieved on the 3D pose in the wild (3DPW) dataset without any fine-tuning. Especially, ablation studies demonstrate that skeleton-disentangled representation is crucial for better temporal modeling and generalization. |
Tasks | |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07172v2 |
https://arxiv.org/pdf/1908.07172v2.pdf | |
PWC | https://paperswithcode.com/paper/human-mesh-recovery-from-monocular-images-via |
Repo | https://github.com/Arthur151/DSD-SATN |
Framework | pytorch |
RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time
Title | RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time |
Authors | Gaurav Gupta, Benjamin Coleman, Tharun Medini, Vijai Mohan, Anshumali Shrivastava |
Abstract | Approximate set membership is a common problem with wide applications in databases, networking, and search. Given a set S and a query q, the task is to determine whether q in S. The Bloom Filter (BF) is a popular data structure for approximate membership testing due to its simplicity. In particular, a BF consists of a bit array that can be incrementally updated. A related problem concerning this paper is the Multiple Set Membership Testing (MSMT) problem. Here we are given K different sets, and for any given query q the goal is the find all of the sets containing the query element. Trivially, a multiple set membership instance can be reduced to K membership testing instances, each with the same q, leading to O(K) query time. A simple array of Bloom Filters can achieve that. In this paper, we show the first non-trivial data-structure for streaming keys, RAMBO (Repeated And Merged Bloom Filter) that achieves expected O(sqrt(K) logK) query time with an additional worst case memory cost factor of O(logK) than the array of Bloom Filters. The proposed data-structure is simply a count-min sketch arrangement of Bloom Filters and retains all its favorable properties. We replace the addition operation with a set union and the minimum operation with a set intersection during estimation. |
Tasks | |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02611v1 |
https://arxiv.org/pdf/1910.02611v1.pdf | |
PWC | https://paperswithcode.com/paper/rambo-repeated-and-merged-bloom-filter-for |
Repo | https://github.com/RUSH-LAB/RAMBO |
Framework | none |
Dreem Open Datasets: Multi-Scored Sleep Datasets to compare Human and Automated sleep staging
Title | Dreem Open Datasets: Multi-Scored Sleep Datasets to compare Human and Automated sleep staging |
Authors | Antoine Guillot, Fabien Sauvet, Emmanuel H During, Valentin Thorey |
Abstract | Sleep stage classification constitutes an important element of sleep disorder diagnosis. It relies on the visual inspection of polysomnography records by trained sleep technologists. Automated approaches have been designed to alleviate this resource-intensive task. However, such approaches are usually compared to a single human scorer annotation despite an inter-rater agreement of about 85 % only. The present study introduces two publicly-available datasets, DOD-H including 25 healthy volunteers and DOD-O including 55 patients suffering from obstructive sleep apnea (OSA). Both datasets have been scored by 5 sleep technologists from different sleep centers. We developed a framework to compare automated approaches to a consensus of multiple human scorers. Using this framework, we benchmarked and compared the main literature approaches. We also developed and benchmarked a new deep learning method, SimpleSleepNet, inspired by current state-of-the-art. We demonstrated that many methods can reach human-level performance on both datasets. SimpleSleepNet achieved an F1 of 89.9 % vs 86.8 % on average for human scorers on DOD-H, and an F1 of 88.3 % vs 84.8 % on DOD-O. Our study highlights that using state-of-the-art automated sleep staging outperforms human scorers performance for healthy volunteers and patients suffering from OSA. Consideration could be made to use automated approaches in the clinical setting. |
Tasks | |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1911.03221v3 |
https://arxiv.org/pdf/1911.03221v3.pdf | |
PWC | https://paperswithcode.com/paper/dreem-open-datasets-multi-scored-sleep |
Repo | https://github.com/Dreem-Organization/dreem-learning-evaluation |
Framework | none |
TedEval: A Fair Evaluation Metric for Scene Text Detectors
Title | TedEval: A Fair Evaluation Metric for Scene Text Detectors |
Authors | Chae Young Lee, Youngmin Baek, Hwalsuk Lee |
Abstract | Despite the recent success of scene text detection methods, common evaluation metrics fail to provide a fair and reliable comparison among detectors. They have obvious drawbacks in reflecting the inherent characteristic of text detection tasks, unable to address issues such as granularity, multiline, and character incompleteness. In this paper, we propose a novel evaluation protocol called TedEval (Text detector Evaluation), which evaluates text detections by an instance-level matching and a character-level scoring. Based on a firm standard rewarding behaviors that result in successful recognition, TedEval can act as a reliable standard for comparing and quantizing the detection quality throughout all difficulty levels. In this regard, we believe that TedEval can play a key role in developing state-of-the-art scene text detectors. The code is publicly available at https://github.com/clovaai/TedEval. |
Tasks | Scene Text Detection |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01227v1 |
https://arxiv.org/pdf/1907.01227v1.pdf | |
PWC | https://paperswithcode.com/paper/tedeval-a-fair-evaluation-metric-for-scene |
Repo | https://github.com/clovaai/TedEval |
Framework | none |
Linearized Multi-Sampling for Differentiable Image Transformation
Title | Linearized Multi-Sampling for Differentiable Image Transformation |
Authors | Wei Jiang, Weiwei Sun, Andrea Tagliasacchi, Eduard Trulls, Kwang Moo Yi |
Abstract | We propose a novel image sampling method for differentiable image transformation in deep neural networks. The sampling schemes currently used in deep learning, such as Spatial Transformer Networks, rely on bilinear interpolation, which performs poorly under severe scale changes, and more importantly, results in poor gradient propagation. This is due to their strict reliance on direct neighbors. Instead, we propose to generate random auxiliary samples in the vicinity of each pixel in the sampled image, and create a linear approximation with their intensity values. We then use this approximation as a differentiable formula for the transformed image. We demonstrate that our approach produces more representative gradients with a wider basin of convergence for image alignment, which leads to considerable performance improvements when training networks for classification tasks. This is not only true under large downsampling, but also when there are no scale changes. We compare our approach with multi-scale sampling and show that we outperform it. We then demonstrate that our improvements to the sampler are compatible with other tangential improvements to Spatial Transformer Networks and that it further improves their performance. |
Tasks | Image Registration |
Published | 2019-01-22 |
URL | https://arxiv.org/abs/1901.07124v3 |
https://arxiv.org/pdf/1901.07124v3.pdf | |
PWC | https://paperswithcode.com/paper/linearized-multi-sampling-for-differentiable |
Repo | https://github.com/vcg-uvic/linearized_multisampling_release |
Framework | pytorch |
MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution
Title | MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution |
Authors | Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis |
Abstract | We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime. Our method trains a cohort of sub-networks with different widths using different input resolutions to mutually learn multi-scale representations for each sub-network. It achieves consistently better ImageNet top-1 accuracy over the state-of-the-art adaptive network US-Net under different computation constraints, and outperforms the best compound scaled MobileNet in EfficientNet by 1.5%. The superiority of our method is also validated on COCO object detection and instance segmentation as well as transfer learning. Surprisingly, the training strategy of MutualNet can also boost the performance of a single network, which substantially outperforms the powerful AutoAugmentation in both efficiency (GPU search hours: 15000 vs. 0) and accuracy (ImageNet: 77.6% vs. 78.6%). Code is available at \url{https://github.com/taoyang1122/MutualNet}. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation, Transfer Learning |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1909.12978v3 |
https://arxiv.org/pdf/1909.12978v3.pdf | |
PWC | https://paperswithcode.com/paper/a-closer-look-at-network-resolution-for |
Repo | https://github.com/taoyang1122/MutualNet |
Framework | pytorch |