January 24, 2020

2900 words 14 mins read

Paper Group NANR 101

ACE: Artificial Checkerboard Enhancer to Induce and Evade Adversarial Attacks. A Semi-Markov Structured Support Vector Machine Model for High-Precision Named Entity Recognition. Objects365: A Large-Scale, High-Quality Dataset for Object Detection. Entity resolution for noisy ASR transcripts. ON BREIMAN’S DILEMMA IN NEURAL NETWORKS: SUCCESS AND FAIL …

ACE: Artificial Checkerboard Enhancer to Induce and Evade Adversarial Attacks


Title	ACE: Artificial Checkerboard Enhancer to Induce and Evade Adversarial Attacks
Authors	Jisung Hwang, Younghoon Kim, Sanghyuk Chun, Jaejun Yoo, Ji-Hoon Kim, Dongyoon Han, Jung-Woo Ha
Abstract	The checkerboard phenomenon is one of the well-known visual artifacts in the computer vision field. The origins and solutions of checkerboard artifacts in the pixel space have been studied for a long time, but their effects on the gradient space have rarely been investigated. In this paper, we revisit the checkerboard artifacts in the gradient space which turn out to be the weak point of a network architecture. We explore image-agnostic property of gradient checkerboard artifacts and propose a simple yet effective defense method by utilizing the artifacts. We introduce our defense module, dubbed Artificial Checkerboard Enhancer (ACE), which induces adversarial attacks on designated pixels. This enables the model to deflect attacks by shifting only a single pixel in the image with a remarkable defense rate. We provide extensive experiments to support the effectiveness of our work for various attack scenarios using state-of-the-art attack methods. Furthermore, we show that ACE is even applicable to large-scale datasets including ImageNet dataset and can be easily transferred to various pretrained networks.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=BJlc6iA5YX
PDF	https://openreview.net/pdf?id=BJlc6iA5YX
PWC	https://paperswithcode.com/paper/ace-artificial-checkerboard-enhancer-to
Repo
Framework

A Semi-Markov Structured Support Vector Machine Model for High-Precision Named Entity Recognition


Title	A Semi-Markov Structured Support Vector Machine Model for High-Precision Named Entity Recognition
Authors	Ravneet Arora, Chen-Tse Tsai, Ketevan Tsereteli, Prabhanjan Kambadur, Yi Yang
Abstract	Named entity recognition (NER) is the backbone of many NLP solutions. F1 score, the harmonic mean of precision and recall, is often used to select/evaluate the best models. However, when precision needs to be prioritized over recall, a state-of-the-art model might not be the best choice. There is little in literature that directly addresses training-time modifications to achieve higher precision information extraction. In this paper, we propose a neural semi-Markov structured support vector machine model that controls the precision-recall trade-off by assigning weights to different types of errors in the loss-augmented inference during training. The semi-Markov property provides more accurate phrase-level predictions, thereby improving performance. We empirically demonstrate the advantage of our model when high precision is required by comparing against strong baselines based on CRF. In our experiments with the CoNLL 2003 dataset, our model achieves a better precision-recall trade-off at various precision levels.
Tasks	Named Entity Recognition
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1587/
PDF	https://www.aclweb.org/anthology/P19-1587
PWC	https://paperswithcode.com/paper/a-semi-markov-structured-support-vector
Repo
Framework

Objects365: A Large-Scale, High-Quality Dataset for Object Detection


Title	Objects365: A Large-Scale, High-Quality Dataset for Object Detection
Authors	Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, Jian Sun
Abstract	In this paper, we introduce a new large-scale object detection dataset, Objects365, which has 365 object categories over 600K training images. More than 10 million, high-quality bounding boxes are manually labeled through a three-step, carefully designed annotation pipeline. It is the largest object detection dataset (with full annotation) so far and establishes a more challenging benchmark for the community. Objects365 can serve as a better feature learning dataset for localization-sensitive tasks like object detection and semantic segmentation. The Objects365 pre-trained models significantly outperform ImageNet pre-trained models with 5.6 points gain (42 vs 36.4) based on the standard setting of 90K iterations on COCO benchmark. Even compared with much long training time like 540K iterations, our Objects365 pretrained model with 90K iterations still have 2.7 points gain (42 vs 39.3). Meanwhile, the finetuning time can be greatly reduced (up to 10 times) when reaching the same accuracy. Better generalization ability of Object365 has also been verified on CityPersons, VOC segmentation, and ADE tasks. The dataset as well as the pretrained-models have been released at www.objects365.org.
Tasks	Object Detection, Semantic Segmentation
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Shao_Objects365_A_Large-Scale_High-Quality_Dataset_for_Object_Detection_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Shao_Objects365_A_Large-Scale_High-Quality_Dataset_for_Object_Detection_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/objects365-a-large-scale-high-quality-dataset
Repo
Framework

Entity resolution for noisy ASR transcripts


Title	Entity resolution for noisy ASR transcripts
Authors	Arushi Raghuvanshi, Vijay Ramakrishnan, Varsha Embar, Lucien Carroll, Karthik Raghunathan
Abstract	Large vocabulary domain-agnostic Automatic Speech Recognition (ASR) systems often mistranscribe domain-specific words and phrases. Since these generic ASR systems are the first component of most voice assistants in production, building Natural Language Understanding (NLU) systems that are robust to these errors can be a challenging task. In this paper, we focus on handling ASR errors in named entities, specifically person names, for a voice-based collaboration assistant. We demonstrate an effective method for resolving person names that are mistranscribed by black-box ASR systems, using character and phoneme-based information retrieval techniques and contextual information, which improves accuracy by 40.8{%} on our production system. We provide a live interactive demo to further illustrate the nuances of this problem and the effectiveness of our solution.
Tasks	Entity Resolution, Information Retrieval, Speech Recognition
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-3011/
PDF	https://www.aclweb.org/anthology/D19-3011
PWC	https://paperswithcode.com/paper/entity-resolution-for-noisy-asr-transcripts
Repo
Framework

ON BREIMAN’S DILEMMA IN NEURAL NETWORKS: SUCCESS AND FAILURE OF NORMALIZED MARGINS


Title	ON BREIMAN’S DILEMMA IN NEURAL NETWORKS: SUCCESS AND FAILURE OF NORMALIZED MARGINS
Authors	Yifei HUANG, Yuan YAO, Weizhi ZHU
Abstract	A belief persists long in machine learning that enlargement of margins over training data accounts for the resistance of models to overfitting by increasing the robustness. Yet Breiman shows a dilemma (Breiman, 1999) that a uniform improvement on margin distribution \emph{does not} necessarily reduces generalization error. In this paper, we revisit Breiman’s dilemma in deep neural networks with recently proposed normalized margins using Lipschitz constant bound by spectral norm products. With both simplified theory and extensive experiments, Breiman’s dilemma is shown to rely on dynamics of normalized margin distributions, that reflects the trade-off between model expression power and data complexity. When the complexity of data is comparable to the model expression power in the sense that training and test data share similar phase transitions in normalized margin dynamics, two efficient ways are derived via classic margin-based generalization bounds to successfully predict the trend of generalization error. On the other hand, over-expressed models that exhibit uniform improvements on training normalized margins may lose such a prediction power and fail to prevent the overfitting.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=Byl_ciRcY7
PDF	https://openreview.net/pdf?id=Byl_ciRcY7
PWC	https://paperswithcode.com/paper/on-breimans-dilemma-in-neural-networks
Repo
Framework

Learning Information Propagation in the Dynamical Systems via Information Bottleneck Hierarchy


Title	Learning Information Propagation in the Dynamical Systems via Information Bottleneck Hierarchy
Authors	Gaurav Gupta, Mohamed Ridha Znaidi, Paul Bogdan
Abstract	Extracting relevant information, causally inferring and predicting the future states with high accuracy is a crucial task for modeling complex systems. The endeavor to address these tasks is made even more challenging when we have to deal with high-dimensional heterogeneous data streams. Such data streams often have higher-order inter-dependencies across spatial and temporal dimensions. We propose to perform a soft-clustering of the data and learn its dynamics to produce a compact dynamical model while still ensuring the original objectives of causal inference and accurate predictions. To efficiently and rigorously process the dynamics of soft-clustering, we advocate for an information theory inspired approach that incorporates stochastic calculus and seeks to determine a trade-off between the predictive accuracy and compactness of the mathematical representation. We cast the model construction as a maximization of the compression of the state variables such that the predictive ability and causal interdependence (relatedness) constraints between the original data streams and the compact model are closely bounded. We provide theoretical guarantees concerning the convergence of the proposed learning algorithm. To further test the proposed framework, we consider a high-dimensional Gaussian case study and describe an iterative scheme for updating the new model parameters. Using numerical experiments, we demonstrate the benefits on compression and prediction accuracy for a class of dynamical systems. Finally, we apply the proposed algorithm to the real-world dataset of multimodal sentiment intensity and show improvements in prediction with reduced dimensions.
Tasks	Causal Inference
Published	2019-05-01
URL	https://openreview.net/forum?id=rJgTciR9tm
PDF	https://openreview.net/pdf?id=rJgTciR9tm
PWC	https://paperswithcode.com/paper/learning-information-propagation-in-the
Repo
Framework

Gendered Ambiguous Pronoun (GAP) Shared Task at the Gender Bias in NLP Workshop 2019


Title	Gendered Ambiguous Pronoun (GAP) Shared Task at the Gender Bias in NLP Workshop 2019
Authors	Kellie Webster, Marta R. Costa-juss{`a}, Christian Hardmeier, Will Radford
Abstract	The 1st ACL workshop on Gender Bias in Natural Language Processing included a shared task on gendered ambiguous pronoun (GAP) resolution. This task was based on the coreference challenge defined in Webster et al. (2018), designed to benchmark the ability of systems to resolve pronouns in real-world contexts in a gender-fair way. 263 teams competed via a Kaggle competition, with the winning system achieving logloss of 0.13667 and near gender parity. We review the approaches of eleven systems with accepted description papers, noting their effective use of BERT (Devlin et al., 2018), both via fine-tuning and for feature extraction, as well as ensembling.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-3801/
PDF	https://www.aclweb.org/anthology/W19-3801
PWC	https://paperswithcode.com/paper/gendered-ambiguous-pronoun-gap-shared-task-at
Repo
Framework

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop


Title	Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Authors
Abstract
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-2000/
PDF	https://www.aclweb.org/anthology/P19-2000
PWC	https://paperswithcode.com/paper/proceedings-of-the-57th-conference-of-the-1
Repo
Framework

Self-Adaptation for Unsupervised Domain Adaptation


Title	Self-Adaptation for Unsupervised Domain Adaptation
Authors	Xia Cui, Danushka Bollegala
Abstract	Lack of labelled data in the target domain for training is a common problem in domain adaptation. To overcome this problem, we propose a novel unsupervised domain adaptation method that combines projection and self-training based approaches. Using the labelled data from the source domain, we first learn a projection that maximises the distance among the nearest neighbours with opposite labels in the source domain. Next, we project the source domain labelled data using the learnt projection and train a classifier for the target class prediction. We then use the trained classifier to predict pseudo labels for the target domain unlabelled data. Finally, we learn a projection for the target domain as we did for the source domain using the pseudo-labelled target domain data, where we maximise the distance between nearest neighbours having opposite pseudo labels. Experiments on a standard benchmark dataset for domain adaptation show that the proposed method consistently outperforms numerous baselines and returns competitive results comparable to that of SOTA including self-training, tri-training, and neural adaptations.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-1025/
PDF	https://www.aclweb.org/anthology/R19-1025
PWC	https://paperswithcode.com/paper/self-adaptation-for-unsupervised-domain
Repo
Framework

Dependency-Based Self-Attention for Transformer NMT


Title	Dependency-Based Self-Attention for Transformer NMT
Authors	Hiroyuki Deguchi, Akihiro Tamura, Takashi Ninomiya
Abstract	In this paper, we propose a new Transformer neural machine translation (NMT) model that incorporates dependency relations into self-attention on both source and target sides, dependency-based self-attention. The dependency-based self-attention is trained to attend to the modifiee for each token under constraints based on the dependency relations, inspired by Linguistically-Informed Self-Attention (LISA). While LISA is originally proposed for Transformer encoder for semantic role labeling, this paper extends LISA to Transformer NMT by masking future information on words in the decoder-side dependency-based self-attention. Additionally, our dependency-based self-attention operates at sub-word units created by byte pair encoding. The experiments show that our model improves 1.0 BLEU points over the baseline model on the WAT{'}18 Asian Scientific Paper Excerpt Corpus Japanese-to-English translation task.
Tasks	Machine Translation, Semantic Role Labeling
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-1028/
PDF	https://www.aclweb.org/anthology/R19-1028
PWC	https://paperswithcode.com/paper/dependency-based-self-attention-for
Repo
Framework

Neural Causal Discovery with Learnable Input Noise


Title	Neural Causal Discovery with Learnable Input Noise
Authors	Tailin Wu, Thomas Breuel, Jan Kautz
Abstract	Learning causal relations from observational time series with nonlinear interactions and complex causal structures is a key component of human intelligence, and has a wide range of applications. Although neural nets have demonstrated their effectiveness in a variety of fields, their application in learning causal relations has been scarce. This is due to both a lack of theoretical results connecting risk minimization and causality (enabling function approximators like neural nets to apply), and a lack of scalability in prior causal measures to allow for expressive function approximators like neural nets to apply. In this work, we propose a novel causal measure and algorithm using risk minimization to infer causal relations from time series. We demonstrate the effectiveness and scalability of our algorithms to learn nonlinear causal models in synthetic datasets as comparing to other methods, and its effectiveness in inferring causal relations in a video game environment and real-world heart-rate vs. breath-rate and rat brain EEG datasets.
Tasks	Causal Discovery, EEG, Time Series
Published	2019-05-01
URL	https://openreview.net/forum?id=B14ejsA5YQ
PDF	https://openreview.net/pdf?id=B14ejsA5YQ
PWC	https://paperswithcode.com/paper/neural-causal-discovery-with-learnable-input
Repo
Framework

Explaining Adversarial Examples with Knowledge Representation


Title	Explaining Adversarial Examples with Knowledge Representation
Authors	Xingyu Zhou, Tengyu Ma, Huahong Zhang
Abstract	Adversarial examples are modified samples that preserve original image structures but deviate classifiers. Researchers have put efforts into developing methods for generating adversarial examples and finding out origins. Past research put much attention on decision boundary changes caused by these methods. This paper, in contrast, discusses the origin of adversarial examples from a more underlying knowledge representation point of view. Human beings can learn and classify prototypes as well as transformations of objects. While neural networks store learned knowledge in a more hybrid way of combining all prototypes and transformations as a whole distribution. Hybrid storage may lead to lower distances between different classes so that small modifications can mislead the classifier. A one-step distribution imitation method is designed to imitate distribution of the nearest different class neighbor. Experiments show that simply by imitating distributions from a training set without any knowledge of the classifier can still lead to obvious impacts on classification results from deep networks. It also implies that adversarial examples can be in more forms than small perturbations. Potential ways of alleviating adversarial examples are discussed from the representation point of view. The first path is to change the encoding of data sent to the training step. Training data that are more prototypical can help seize more robust and accurate structural knowledge. The second path requires constructing learning frameworks with improved representations.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=BylRVjC9K7
PDF	https://openreview.net/pdf?id=BylRVjC9K7
PWC	https://paperswithcode.com/paper/explaining-adversarial-examples-with
Repo
Framework

BSC Participation in the WMT Translation of Biomedical Abstracts


Title	BSC Participation in the WMT Translation of Biomedical Abstracts
Authors	Felipe Soares, Martin Krallinger
Abstract	This paper describes the machine translation systems developed by the Barcelona Supercomputing (BSC) team for the biomedical translation shared task of WMT19. Our system is based on Neural Machine Translation unsing the OpenNMT-py toolkit and Transformer architecture. We participated in four translation directions for the English/Spanish and English/Portuguese language pairs. To create our training data, we concatenated several parallel corpora, both from in-domain and out-of-domain sources, as well as terminological resources from UMLS.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5422/
PDF	https://www.aclweb.org/anthology/W19-5422
PWC	https://paperswithcode.com/paper/bsc-participation-in-the-wmt-translation-of
Repo
Framework

Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol


Title	Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol
Authors	Zakia Salod, Yashik Singh
Abstract	BACKGROUND: Breast Cancer (BC) is a known global crisis. The World Health Organization reports a global 2.09 million incidences and 627,000 deaths in 2018 relating to BC. The traditional BC screening method in developed countries is mammography, whilst developing countries employ breast self-examination and clinical breast examination. The prominent gold standard for BC detection is triple assessment: i) clinical examination, ii) mammography and/ or ultrasonography; and iii) Fine Needle Aspirate Cytology. However, the introduction of cheaper, efficient and non-invasive methods of BC screening and detection would be beneficial. DESIGN AND METHODS: We propose the use of eight machine learning algorithms: i) Logistic Regression; ii) Support Vector Machine; iii) K-Nearest Neighbors; iv) Decision Tree; v) Random Forest; vi) Adaptive Boosting; vii) Gradient Boosting; viii) eXtreme Gradient Boosting, and blood test results using BC Coimbra Dataset (BCCD) from University of California Irvine online database to create models for BC prediction. To ensure the models’ robustness, we will employ: i) Stratified k-fold Cross-Validation; ii) Correlation-based Feature Selection (CFS); and iii) parameter tuning. The models will be validated on validation and test sets of BCCD for full features and reduced features. Feature reduction has an impact on algorithm performance. Seven metrics will be used for model evaluation, including accuracy. EXPECTED IMPACT OF THE STUDY FOR PUBLIC HEALTH: The CFS together with highest performing model(s) can serve to identify important specific blood tests that point towards BC, which may serve as an important BC biomarker. Highest performing model(s) may eventually be used to create an artificial intelligence tool to assist clinicians in BC screening and detection.
Tasks	Breast Cancer Detection, Feature Selection
Published	2019-12-04
URL	https://www.jphres.org/index.php/jphres/article/view/1677
PDF	https://www.jphres.org/index.php/jphres/article/view/1677
PWC	https://paperswithcode.com/paper/comparison-of-the-performance-of-machine
Repo
Framework

The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task


Title	The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task
Authors	Pau Baquero-Arnal, Javier Iranzo-S{'a}nchez, Jorge Civera, Alfons Juan
Abstract	This paper describes the participation of the MLLP research group of the Universitat Polit{`e}cnica de Val{`e}ncia in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese â†” Spanish language pair, in both directions. We have submitted systems based on the Transformer architecture as well as an in development novel architecture which we have called 2D alternating RNN. We have carried out domain adaptation through fine-tuning.
Tasks	Domain Adaptation, Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5423/
PDF	https://www.aclweb.org/anthology/W19-5423
PWC	https://paperswithcode.com/paper/the-mllp-upv-spanish-portuguese-and
Repo
Framework