January 31, 2020

3284 words 16 mins read

Paper Group AWR 410

Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation. Direct information transfer rate optimisation for SSVEP-based BCI. Consistency by Agreement in Zero-shot Neural Machine Translation. Accelerating Extreme Classification via Adaptive Feature Agglomeration. Efficient Algorithms for Set-Valued Prediction in Multi-Cla …

Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation


Title	Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Authors	Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli
Abstract	Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks. Previous work has used adversarial training and data augmentation to partially mitigate such brittleness, but these are unlikely to find worst-case adversaries due to the complexity of the search space arising from discrete text perturbations. In this work, we approach the problem from the opposite direction: to formally verify a system’s robustness against a predefined class of adversarial attacks. We study text classification under synonym replacements or character flip perturbations. We propose modeling these input perturbations as a simplex and then using Interval Bound Propagation – a formal model verification method. We modify the conventional log-likelihood training objective to train models that can be efficiently verified, which would otherwise come with exponential search complexity. The resulting models show only little difference in terms of nominal accuracy, but have much improved verified accuracy under perturbations and come with an efficiently computable formal guarantee on worst case adversaries.
Tasks	Data Augmentation, Text Classification
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01492v2
PDF	https://arxiv.org/pdf/1909.01492v2.pdf
PWC	https://paperswithcode.com/paper/achieving-verified-robustness-to-symbol
Repo	https://github.com/deepmind/interval-bound-propagation
Framework	tf

Direct information transfer rate optimisation for SSVEP-based BCI


Title	Direct information transfer rate optimisation for SSVEP-based BCI
Authors	Anti Ingel, Ilya Kuzovkin, Raul Vicente
Abstract	In this work, a classification method for SSVEP-based BCI is proposed. The classification method uses features extracted by traditional SSVEP-based BCI methods and finds optimal discrimination thresholds for each feature to classify the targets. Optimising the thresholds is formalised as a maximisation task of a performance measure of BCIs called information transfer rate (ITR). However, instead of the standard method of calculating ITR, which makes certain assumptions about the data, a more general formula is derived to avoid incorrect ITR calculation when the standard assumptions are not met. This allows the optimal discrimination thresholds to be automatically calculated and thus eliminates the need for manual parameter selection or performing computationally expensive grid searches. The proposed method shows good performance in classifying targets of a BCI, outperforming previously reported results on the same dataset by a factor of 2 in terms of ITR. The highest achieved ITR on the used dataset was 62 bit/min. The proposed method also provides a way to reduce false classifications, which is important in real-world applications.
Tasks
Published	2019-07-19
URL	https://arxiv.org/abs/1907.10509v1
PDF	https://arxiv.org/pdf/1907.10509v1.pdf
PWC	https://paperswithcode.com/paper/direct-information-transfer-rate-optimisation
Repo	https://github.com/antiingel/ITR-optimisation
Framework	none

Consistency by Agreement in Zero-shot Neural Machine Translation


Title	Consistency by Agreement in Zero-shot Neural Machine Translation
Authors	Maruan Al-Shedivat, Ankur P. Parikh
Abstract	Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization—a challenging setup that tests models on translation directions they have not been optimized for at training time. To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zero-shot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreement-based learning often results in 2-3 BLEU zero-shot improvement over strong baselines without any loss in performance on supervised translation directions.
Tasks	Machine Translation, Zero-Shot Machine Translation
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02338v2
PDF	http://arxiv.org/pdf/1904.02338v2.pdf
PWC	https://paperswithcode.com/paper/consistency-by-agreement-in-zero-shot-neural
Repo	https://github.com/google-research/language/blob/master/language/labs/consistent_zero_shot_nmt
Framework	tf

Accelerating Extreme Classification via Adaptive Feature Agglomeration


Title	Accelerating Extreme Classification via Adaptive Feature Agglomeration
Authors	Ankit Jalan, Purushottam Kar
Abstract	Extreme classification seeks to assign each data point, the most relevant labels from a universe of a million or more labels. This task is faced with the dual challenge of high precision and scalability, with millisecond level prediction times being a benchmark. We propose DEFRAG, an adaptive feature agglomeration technique to accelerate extreme classification algorithms. Despite past works on feature clustering and selection, DEFRAG distinguishes itself in being able to scale to millions of features, and is especially beneficial when feature sets are sparse, which is typical of recommendation and multi-label datasets. The method comes with provable performance guarantees and performs efficient task-driven agglomeration to reduce feature dimensionalities by an order of magnitude or more. Experiments show that DEFRAG can not only reduce training and prediction times of several leading extreme classification algorithms by as much as 40%, but also be used for feature reconstruction to address the problem of missing features, as well as offer superior coverage on rare labels.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11769v1
PDF	https://arxiv.org/pdf/1905.11769v1.pdf
PWC	https://paperswithcode.com/paper/accelerating-extreme-classification-via
Repo	https://github.com/purushottamkar/defrag
Framework	none

Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification


Title	Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification
Authors	Thomas Mortier, Marek Wydmuch, Eyke Hüllermeier, Krzysztof Dembczyński, Willem Waegeman
Abstract	In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between the correctness (the true class is among the candidates) and the precision (the candidates are not too many) of its prediction. We formalize this problem within a general decision-theoretic framework that unifies most of the existing work in this area. In this framework, uncertainty is quantified in terms of conditional class probabilities, and the quality of a predicted set is measured in terms of a utility function. We then address the problem of finding the Bayes-optimal prediction, i.e., the subset of class labels with highest expected utility. For this problem, which is computationally challenging as there are exponentially (in the number of classes) many predictions to choose from, we propose efficient algorithms that can be applied to a broad family of utility scores. Two of these algorithms make use of structural information in the form of a class hierarchy, which is often available in prediction problems with many classes. Our theoretical results are complemented by experimental studies, in which we analyze the proposed algorithms in terms of predictive accuracy and runtime efficiency.
Tasks
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08129v1
PDF	https://arxiv.org/pdf/1906.08129v1.pdf
PWC	https://paperswithcode.com/paper/efficient-algorithms-for-set-valued
Repo	https://github.com/tfmortie/setvaluedprediction
Framework	none

Deep Residual Neural Networks for Audio Spoofing Detection


Title	Deep Residual Neural Networks for Audio Spoofing Detection
Authors	Moustafa Alzantot, Ziqi Wang, Mani B. Srivastava
Abstract	The state-of-art models for speech synthesis and voice conversion are capable of generating synthetic speech that is perceptually indistinguishable from bonafide human speech. These methods represent a threat to the automatic speaker verification (ASV) systems. Additionally, replay attacks where the attacker uses a speaker to replay a previously recorded genuine human speech are also possible. We present our solution for the ASVSpoof2019 competition, which aims to develop countermeasure systems that distinguish between spoofing attacks and genuine speeches. Our model is inspired by the success of residual convolutional networks in many classification tasks. We build three variants of a residual convolutional neural network that accept different feature representations (MFCC, Log-magnitude STFT, and CQCC) of input. We compare the performance achieved by our model variants and the competition baseline models. In the logical access scenario, the fusion of our models has zero t-DCF cost and zero equal error rate (EER), as evaluated on the development set. On the evaluation set, our model fusion improves the t-DCF and EER by 25% compared to the baseline algorithms. Against physical access replay attacks, our model fusion improves the baseline algorithms t-DCF and EER scores by 71% and 75% on the evaluation set, respectively.
Tasks	Speaker Verification, Speech Synthesis, Voice Conversion
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00501v1
PDF	https://arxiv.org/pdf/1907.00501v1.pdf
PWC	https://paperswithcode.com/paper/deep-residual-neural-networks-for-audio
Repo	https://github.com/nesl/asvspoof2019
Framework	none

DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs


Title	DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
Authors	Ali Sadeghian, Mohammadreza Armandpour, Patrick Ding, Daisy Zhe Wang
Abstract	In this paper, we study the problem of learning probabilistic logical rules for inductive and interpretable link prediction. Despite the importance of inductive link prediction, most previous works focused on transductive link prediction and cannot manage previously unseen entities. Moreover, they are black-box models that are not easily explainable for humans. We propose DRUM, a scalable and differentiable approach for mining first-order logical rules from knowledge graphs which resolves these problems. We motivate our method by making a connection between learning confidence scores for each rule and low-rank tensor approximation. DRUM uses bidirectional RNNs to share useful information across the tasks of learning rules for different relations. We also empirically demonstrate the efficiency of DRUM over existing rule mining methods for inductive link prediction on a variety of benchmark datasets.
Tasks	Knowledge Graphs, Link Prediction
Published	2019-10-31
URL	https://arxiv.org/abs/1911.00055v1
PDF	https://arxiv.org/pdf/1911.00055v1.pdf
PWC	https://paperswithcode.com/paper/drum-end-to-end-differentiable-rule-mining-on
Repo	https://github.com/irokin/Experiments-Results-for-Link-Prediction
Framework	none


Title	Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels
Authors	Lukas Galke, Florian Mai, Iacopo Vagliano, Ansgar Scherp
Abstract	We present multi-modal adversarial autoencoders for recommendation and evaluate them on two different tasks: citation recommendation and subject label recommendation. We analyze the effects of adversarial regularization, sparsity, and different input modalities. By conducting 408 experiments, we show that adversarial regularization consistently improves the performance of autoencoders for recommendation. We demonstrate, however, that the two tasks differ in the semantics of item co-occurrence in the sense that item co-occurrence resembles relatedness in case of citations, yet implies diversity in case of subject labels. Our results reveal that supplying the partial item set as input is only helpful, when item co-occurrence resembles relatedness. When facing a new recommendation task it is therefore crucial to consider the semantics of item co-occurrence for the choice of an appropriate model.
Tasks
Published	2019-07-22
URL	https://arxiv.org/abs/1907.12366v1
PDF	https://arxiv.org/pdf/1907.12366v1.pdf
PWC	https://paperswithcode.com/paper/multi-modal-adversarial-autoencoders-for
Repo	https://github.com/lgalke/aae-recommender
Framework	pytorch

Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR


Title	Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR
Authors	Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach
Abstract	In this paper, we present Hitachi and Paderborn University’s joint effort for automatic speech recognition (ASR) in a dinner party scenario. The main challenges of ASR systems for dinner party recordings obtained by multiple microphone arrays are (1) heavy speech overlaps, (2) severe noise and reverberation, (3) very natural conversational content, and possibly (4) insufficient training data. As an example of a dinner party scenario, we have chosen the data presented during the CHiME-5 speech recognition challenge, where the baseline ASR had a 73.3% word error rate (WER), and even the best performing system at the CHiME-5 challenge had a 46.1% WER. We extensively investigated a combination of the guided source separation-based speech enhancement technique and an already proposed strong ASR backend and found that a tight combination of these techniques provided substantial accuracy improvements. Our final system achieved WERs of 39.94% and 41.64% for the development and evaluation data, respectively, both of which are the best published results for the dataset. We also investigated with additional training data on the official small data in the CHiME-5 corpus to assess the intrinsic difficulty of this ASR task.
Tasks	Speech Enhancement, Speech Recognition
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12230v2
PDF	https://arxiv.org/pdf/1905.12230v2.pdf
PWC	https://paperswithcode.com/paper/guided-source-separation-meets-a-strong-asr
Repo	https://github.com/fgnt/pb_chime5
Framework	none

Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation


Title	Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation
Authors	Sun Yu, Ye Yun, Liu Wu, Gao Wenpeng, Fu YiLi, Mei Tao
Abstract	We describe an end-to-end method for recovering 3D human body mesh from single images and monocular videos. Different from the existing methods try to obtain all the complex 3D pose, shape, and camera parameters from one coupling feature, we propose a skeleton-disentangling based framework, which divides this task into multi-level spatial and temporal granularity in a decoupling manner. In spatial, we propose an effective and pluggable “disentangling the skeleton from the details” (DSD) module. It reduces the complexity and decouples the skeleton, which lays a good foundation for temporal modeling. In temporal, the self-attention based temporal convolution network is proposed to efficiently exploit the short and long-term temporal cues. Furthermore, an unsupervised adversarial training strategy, temporal shuffles and order recovery, is designed to promote the learning of motion dynamics. The proposed method outperforms the state-of-the-art 3D human mesh recovery methods by 15.4% MPJPE and 23.8% PA-MPJPE on Human3.6M. State-of-the-art results are also achieved on the 3D pose in the wild (3DPW) dataset without any fine-tuning. Especially, ablation studies demonstrate that skeleton-disentangled representation is crucial for better temporal modeling and generalization.
Tasks
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07172v2
PDF	https://arxiv.org/pdf/1908.07172v2.pdf
PWC	https://paperswithcode.com/paper/human-mesh-recovery-from-monocular-images-via
Repo	https://github.com/Arthur151/DSD-SATN
Framework	pytorch

RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time


Title	RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time
Authors	Gaurav Gupta, Benjamin Coleman, Tharun Medini, Vijai Mohan, Anshumali Shrivastava
Abstract	Approximate set membership is a common problem with wide applications in databases, networking, and search. Given a set S and a query q, the task is to determine whether q in S. The Bloom Filter (BF) is a popular data structure for approximate membership testing due to its simplicity. In particular, a BF consists of a bit array that can be incrementally updated. A related problem concerning this paper is the Multiple Set Membership Testing (MSMT) problem. Here we are given K different sets, and for any given query q the goal is the find all of the sets containing the query element. Trivially, a multiple set membership instance can be reduced to K membership testing instances, each with the same q, leading to O(K) query time. A simple array of Bloom Filters can achieve that. In this paper, we show the first non-trivial data-structure for streaming keys, RAMBO (Repeated And Merged Bloom Filter) that achieves expected O(sqrt(K) logK) query time with an additional worst case memory cost factor of O(logK) than the array of Bloom Filters. The proposed data-structure is simply a count-min sketch arrangement of Bloom Filters and retains all its favorable properties. We replace the addition operation with a set union and the minimum operation with a set intersection during estimation.
Tasks
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02611v1
PDF	https://arxiv.org/pdf/1910.02611v1.pdf
PWC	https://paperswithcode.com/paper/rambo-repeated-and-merged-bloom-filter-for
Repo	https://github.com/RUSH-LAB/RAMBO
Framework	none

Dreem Open Datasets: Multi-Scored Sleep Datasets to compare Human and Automated sleep staging


Title	Dreem Open Datasets: Multi-Scored Sleep Datasets to compare Human and Automated sleep staging
Authors	Antoine Guillot, Fabien Sauvet, Emmanuel H During, Valentin Thorey
Abstract	Sleep stage classification constitutes an important element of sleep disorder diagnosis. It relies on the visual inspection of polysomnography records by trained sleep technologists. Automated approaches have been designed to alleviate this resource-intensive task. However, such approaches are usually compared to a single human scorer annotation despite an inter-rater agreement of about 85 % only. The present study introduces two publicly-available datasets, DOD-H including 25 healthy volunteers and DOD-O including 55 patients suffering from obstructive sleep apnea (OSA). Both datasets have been scored by 5 sleep technologists from different sleep centers. We developed a framework to compare automated approaches to a consensus of multiple human scorers. Using this framework, we benchmarked and compared the main literature approaches. We also developed and benchmarked a new deep learning method, SimpleSleepNet, inspired by current state-of-the-art. We demonstrated that many methods can reach human-level performance on both datasets. SimpleSleepNet achieved an F1 of 89.9 % vs 86.8 % on average for human scorers on DOD-H, and an F1 of 88.3 % vs 84.8 % on DOD-O. Our study highlights that using state-of-the-art automated sleep staging outperforms human scorers performance for healthy volunteers and patients suffering from OSA. Consideration could be made to use automated approaches in the clinical setting.
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1911.03221v3
PDF	https://arxiv.org/pdf/1911.03221v3.pdf
PWC	https://paperswithcode.com/paper/dreem-open-datasets-multi-scored-sleep
Repo	https://github.com/Dreem-Organization/dreem-learning-evaluation
Framework	none

TedEval: A Fair Evaluation Metric for Scene Text Detectors


Title	TedEval: A Fair Evaluation Metric for Scene Text Detectors
Authors	Chae Young Lee, Youngmin Baek, Hwalsuk Lee
Abstract	Despite the recent success of scene text detection methods, common evaluation metrics fail to provide a fair and reliable comparison among detectors. They have obvious drawbacks in reflecting the inherent characteristic of text detection tasks, unable to address issues such as granularity, multiline, and character incompleteness. In this paper, we propose a novel evaluation protocol called TedEval (Text detector Evaluation), which evaluates text detections by an instance-level matching and a character-level scoring. Based on a firm standard rewarding behaviors that result in successful recognition, TedEval can act as a reliable standard for comparing and quantizing the detection quality throughout all difficulty levels. In this regard, we believe that TedEval can play a key role in developing state-of-the-art scene text detectors. The code is publicly available at https://github.com/clovaai/TedEval.
Tasks	Scene Text Detection
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01227v1
PDF	https://arxiv.org/pdf/1907.01227v1.pdf
PWC	https://paperswithcode.com/paper/tedeval-a-fair-evaluation-metric-for-scene
Repo	https://github.com/clovaai/TedEval
Framework	none

Linearized Multi-Sampling for Differentiable Image Transformation


Title	Linearized Multi-Sampling for Differentiable Image Transformation
Authors	Wei Jiang, Weiwei Sun, Andrea Tagliasacchi, Eduard Trulls, Kwang Moo Yi
Abstract	We propose a novel image sampling method for differentiable image transformation in deep neural networks. The sampling schemes currently used in deep learning, such as Spatial Transformer Networks, rely on bilinear interpolation, which performs poorly under severe scale changes, and more importantly, results in poor gradient propagation. This is due to their strict reliance on direct neighbors. Instead, we propose to generate random auxiliary samples in the vicinity of each pixel in the sampled image, and create a linear approximation with their intensity values. We then use this approximation as a differentiable formula for the transformed image. We demonstrate that our approach produces more representative gradients with a wider basin of convergence for image alignment, which leads to considerable performance improvements when training networks for classification tasks. This is not only true under large downsampling, but also when there are no scale changes. We compare our approach with multi-scale sampling and show that we outperform it. We then demonstrate that our improvements to the sampler are compatible with other tangential improvements to Spatial Transformer Networks and that it further improves their performance.
Tasks	Image Registration
Published	2019-01-22
URL	https://arxiv.org/abs/1901.07124v3
PDF	https://arxiv.org/pdf/1901.07124v3.pdf
PWC	https://paperswithcode.com/paper/linearized-multi-sampling-for-differentiable
Repo	https://github.com/vcg-uvic/linearized_multisampling_release
Framework	pytorch

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution


Title	MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution
Authors	Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis
Abstract	We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime. Our method trains a cohort of sub-networks with different widths using different input resolutions to mutually learn multi-scale representations for each sub-network. It achieves consistently better ImageNet top-1 accuracy over the state-of-the-art adaptive network US-Net under different computation constraints, and outperforms the best compound scaled MobileNet in EfficientNet by 1.5%. The superiority of our method is also validated on COCO object detection and instance segmentation as well as transfer learning. Surprisingly, the training strategy of MutualNet can also boost the performance of a single network, which substantially outperforms the powerful AutoAugmentation in both efficiency (GPU search hours: 15000 vs. 0) and accuracy (ImageNet: 77.6% vs. 78.6%). Code is available at \url{https://github.com/taoyang1122/MutualNet}.
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation, Transfer Learning
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12978v3
PDF	https://arxiv.org/pdf/1909.12978v3.pdf
PWC	https://paperswithcode.com/paper/a-closer-look-at-network-resolution-for
Repo	https://github.com/taoyang1122/MutualNet
Framework	pytorch