Paper Group NANR 101
ACE: Artificial Checkerboard Enhancer to Induce and Evade Adversarial Attacks. A Semi-Markov Structured Support Vector Machine Model for High-Precision Named Entity Recognition. Objects365: A Large-Scale, High-Quality Dataset for Object Detection. Entity resolution for noisy ASR transcripts. ON BREIMAN’S DILEMMA IN NEURAL NETWORKS: SUCCESS AND FAIL …
ACE: Artificial Checkerboard Enhancer to Induce and Evade Adversarial Attacks
Title | ACE: Artificial Checkerboard Enhancer to Induce and Evade Adversarial Attacks |
Authors | Jisung Hwang, Younghoon Kim, Sanghyuk Chun, Jaejun Yoo, Ji-Hoon Kim, Dongyoon Han, Jung-Woo Ha |
Abstract | The checkerboard phenomenon is one of the well-known visual artifacts in the computer vision field. The origins and solutions of checkerboard artifacts in the pixel space have been studied for a long time, but their effects on the gradient space have rarely been investigated. In this paper, we revisit the checkerboard artifacts in the gradient space which turn out to be the weak point of a network architecture. We explore image-agnostic property of gradient checkerboard artifacts and propose a simple yet effective defense method by utilizing the artifacts. We introduce our defense module, dubbed Artificial Checkerboard Enhancer (ACE), which induces adversarial attacks on designated pixels. This enables the model to deflect attacks by shifting only a single pixel in the image with a remarkable defense rate. We provide extensive experiments to support the effectiveness of our work for various attack scenarios using state-of-the-art attack methods. Furthermore, we show that ACE is even applicable to large-scale datasets including ImageNet dataset and can be easily transferred to various pretrained networks. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=BJlc6iA5YX |
https://openreview.net/pdf?id=BJlc6iA5YX | |
PWC | https://paperswithcode.com/paper/ace-artificial-checkerboard-enhancer-to |
Repo | |
Framework | |
A Semi-Markov Structured Support Vector Machine Model for High-Precision Named Entity Recognition
Title | A Semi-Markov Structured Support Vector Machine Model for High-Precision Named Entity Recognition |
Authors | Ravneet Arora, Chen-Tse Tsai, Ketevan Tsereteli, Prabhanjan Kambadur, Yi Yang |
Abstract | Named entity recognition (NER) is the backbone of many NLP solutions. F1 score, the harmonic mean of precision and recall, is often used to select/evaluate the best models. However, when precision needs to be prioritized over recall, a state-of-the-art model might not be the best choice. There is little in literature that directly addresses training-time modifications to achieve higher precision information extraction. In this paper, we propose a neural semi-Markov structured support vector machine model that controls the precision-recall trade-off by assigning weights to different types of errors in the loss-augmented inference during training. The semi-Markov property provides more accurate phrase-level predictions, thereby improving performance. We empirically demonstrate the advantage of our model when high precision is required by comparing against strong baselines based on CRF. In our experiments with the CoNLL 2003 dataset, our model achieves a better precision-recall trade-off at various precision levels. |
Tasks | Named Entity Recognition |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1587/ |
https://www.aclweb.org/anthology/P19-1587 | |
PWC | https://paperswithcode.com/paper/a-semi-markov-structured-support-vector |
Repo | |
Framework | |
Objects365: A Large-Scale, High-Quality Dataset for Object Detection
Title | Objects365: A Large-Scale, High-Quality Dataset for Object Detection |
Authors | Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, Jian Sun |
Abstract | In this paper, we introduce a new large-scale object detection dataset, Objects365, which has 365 object categories over 600K training images. More than 10 million, high-quality bounding boxes are manually labeled through a three-step, carefully designed annotation pipeline. It is the largest object detection dataset (with full annotation) so far and establishes a more challenging benchmark for the community. Objects365 can serve as a better feature learning dataset for localization-sensitive tasks like object detection and semantic segmentation. The Objects365 pre-trained models significantly outperform ImageNet pre-trained models with 5.6 points gain (42 vs 36.4) based on the standard setting of 90K iterations on COCO benchmark. Even compared with much long training time like 540K iterations, our Objects365 pretrained model with 90K iterations still have 2.7 points gain (42 vs 39.3). Meanwhile, the finetuning time can be greatly reduced (up to 10 times) when reaching the same accuracy. Better generalization ability of Object365 has also been verified on CityPersons, VOC segmentation, and ADE tasks. The dataset as well as the pretrained-models have been released at www.objects365.org. |
Tasks | Object Detection, Semantic Segmentation |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Shao_Objects365_A_Large-Scale_High-Quality_Dataset_for_Object_Detection_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Shao_Objects365_A_Large-Scale_High-Quality_Dataset_for_Object_Detection_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/objects365-a-large-scale-high-quality-dataset |
Repo | |
Framework | |
Entity resolution for noisy ASR transcripts
Title | Entity resolution for noisy ASR transcripts |
Authors | Arushi Raghuvanshi, Vijay Ramakrishnan, Varsha Embar, Lucien Carroll, Karthik Raghunathan |
Abstract | Large vocabulary domain-agnostic Automatic Speech Recognition (ASR) systems often mistranscribe domain-specific words and phrases. Since these generic ASR systems are the first component of most voice assistants in production, building Natural Language Understanding (NLU) systems that are robust to these errors can be a challenging task. In this paper, we focus on handling ASR errors in named entities, specifically person names, for a voice-based collaboration assistant. We demonstrate an effective method for resolving person names that are mistranscribed by black-box ASR systems, using character and phoneme-based information retrieval techniques and contextual information, which improves accuracy by 40.8{%} on our production system. We provide a live interactive demo to further illustrate the nuances of this problem and the effectiveness of our solution. |
Tasks | Entity Resolution, Information Retrieval, Speech Recognition |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-3011/ |
https://www.aclweb.org/anthology/D19-3011 | |
PWC | https://paperswithcode.com/paper/entity-resolution-for-noisy-asr-transcripts |
Repo | |
Framework | |
ON BREIMAN’S DILEMMA IN NEURAL NETWORKS: SUCCESS AND FAILURE OF NORMALIZED MARGINS
Title | ON BREIMAN’S DILEMMA IN NEURAL NETWORKS: SUCCESS AND FAILURE OF NORMALIZED MARGINS |
Authors | Yifei HUANG, Yuan YAO, Weizhi ZHU |
Abstract | A belief persists long in machine learning that enlargement of margins over training data accounts for the resistance of models to overfitting by increasing the robustness. Yet Breiman shows a dilemma (Breiman, 1999) that a uniform improvement on margin distribution \emph{does not} necessarily reduces generalization error. In this paper, we revisit Breiman’s dilemma in deep neural networks with recently proposed normalized margins using Lipschitz constant bound by spectral norm products. With both simplified theory and extensive experiments, Breiman’s dilemma is shown to rely on dynamics of normalized margin distributions, that reflects the trade-off between model expression power and data complexity. When the complexity of data is comparable to the model expression power in the sense that training and test data share similar phase transitions in normalized margin dynamics, two efficient ways are derived via classic margin-based generalization bounds to successfully predict the trend of generalization error. On the other hand, over-expressed models that exhibit uniform improvements on training normalized margins may lose such a prediction power and fail to prevent the overfitting. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=Byl_ciRcY7 |
https://openreview.net/pdf?id=Byl_ciRcY7 | |
PWC | https://paperswithcode.com/paper/on-breimans-dilemma-in-neural-networks |
Repo | |
Framework | |
Learning Information Propagation in the Dynamical Systems via Information Bottleneck Hierarchy
Title | Learning Information Propagation in the Dynamical Systems via Information Bottleneck Hierarchy |
Authors | Gaurav Gupta, Mohamed Ridha Znaidi, Paul Bogdan |
Abstract | Extracting relevant information, causally inferring and predicting the future states with high accuracy is a crucial task for modeling complex systems. The endeavor to address these tasks is made even more challenging when we have to deal with high-dimensional heterogeneous data streams. Such data streams often have higher-order inter-dependencies across spatial and temporal dimensions. We propose to perform a soft-clustering of the data and learn its dynamics to produce a compact dynamical model while still ensuring the original objectives of causal inference and accurate predictions. To efficiently and rigorously process the dynamics of soft-clustering, we advocate for an information theory inspired approach that incorporates stochastic calculus and seeks to determine a trade-off between the predictive accuracy and compactness of the mathematical representation. We cast the model construction as a maximization of the compression of the state variables such that the predictive ability and causal interdependence (relatedness) constraints between the original data streams and the compact model are closely bounded. We provide theoretical guarantees concerning the convergence of the proposed learning algorithm. To further test the proposed framework, we consider a high-dimensional Gaussian case study and describe an iterative scheme for updating the new model parameters. Using numerical experiments, we demonstrate the benefits on compression and prediction accuracy for a class of dynamical systems. Finally, we apply the proposed algorithm to the real-world dataset of multimodal sentiment intensity and show improvements in prediction with reduced dimensions. |
Tasks | Causal Inference |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rJgTciR9tm |
https://openreview.net/pdf?id=rJgTciR9tm | |
PWC | https://paperswithcode.com/paper/learning-information-propagation-in-the |
Repo | |
Framework | |
Gendered Ambiguous Pronoun (GAP) Shared Task at the Gender Bias in NLP Workshop 2019
Title | Gendered Ambiguous Pronoun (GAP) Shared Task at the Gender Bias in NLP Workshop 2019 |
Authors | Kellie Webster, Marta R. Costa-juss{`a}, Christian Hardmeier, Will Radford |
Abstract | The 1st ACL workshop on Gender Bias in Natural Language Processing included a shared task on gendered ambiguous pronoun (GAP) resolution. This task was based on the coreference challenge defined in Webster et al. (2018), designed to benchmark the ability of systems to resolve pronouns in real-world contexts in a gender-fair way. 263 teams competed via a Kaggle competition, with the winning system achieving logloss of 0.13667 and near gender parity. We review the approaches of eleven systems with accepted description papers, noting their effective use of BERT (Devlin et al., 2018), both via fine-tuning and for feature extraction, as well as ensembling. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-3801/ |
https://www.aclweb.org/anthology/W19-3801 | |
PWC | https://paperswithcode.com/paper/gendered-ambiguous-pronoun-gap-shared-task-at |
Repo | |
Framework | |
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Title | Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop |
Authors | |
Abstract | |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-2000/ |
https://www.aclweb.org/anthology/P19-2000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-57th-conference-of-the-1 |
Repo | |
Framework | |
Self-Adaptation for Unsupervised Domain Adaptation
Title | Self-Adaptation for Unsupervised Domain Adaptation |
Authors | Xia Cui, Danushka Bollegala |
Abstract | Lack of labelled data in the target domain for training is a common problem in domain adaptation. To overcome this problem, we propose a novel unsupervised domain adaptation method that combines projection and self-training based approaches. Using the labelled data from the source domain, we first learn a projection that maximises the distance among the nearest neighbours with opposite labels in the source domain. Next, we project the source domain labelled data using the learnt projection and train a classifier for the target class prediction. We then use the trained classifier to predict pseudo labels for the target domain unlabelled data. Finally, we learn a projection for the target domain as we did for the source domain using the pseudo-labelled target domain data, where we maximise the distance between nearest neighbours having opposite pseudo labels. Experiments on a standard benchmark dataset for domain adaptation show that the proposed method consistently outperforms numerous baselines and returns competitive results comparable to that of SOTA including self-training, tri-training, and neural adaptations. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1025/ |
https://www.aclweb.org/anthology/R19-1025 | |
PWC | https://paperswithcode.com/paper/self-adaptation-for-unsupervised-domain |
Repo | |
Framework | |
Dependency-Based Self-Attention for Transformer NMT
Title | Dependency-Based Self-Attention for Transformer NMT |
Authors | Hiroyuki Deguchi, Akihiro Tamura, Takashi Ninomiya |
Abstract | In this paper, we propose a new Transformer neural machine translation (NMT) model that incorporates dependency relations into self-attention on both source and target sides, dependency-based self-attention. The dependency-based self-attention is trained to attend to the modifiee for each token under constraints based on the dependency relations, inspired by Linguistically-Informed Self-Attention (LISA). While LISA is originally proposed for Transformer encoder for semantic role labeling, this paper extends LISA to Transformer NMT by masking future information on words in the decoder-side dependency-based self-attention. Additionally, our dependency-based self-attention operates at sub-word units created by byte pair encoding. The experiments show that our model improves 1.0 BLEU points over the baseline model on the WAT{'}18 Asian Scientific Paper Excerpt Corpus Japanese-to-English translation task. |
Tasks | Machine Translation, Semantic Role Labeling |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1028/ |
https://www.aclweb.org/anthology/R19-1028 | |
PWC | https://paperswithcode.com/paper/dependency-based-self-attention-for |
Repo | |
Framework | |
Neural Causal Discovery with Learnable Input Noise
Title | Neural Causal Discovery with Learnable Input Noise |
Authors | Tailin Wu, Thomas Breuel, Jan Kautz |
Abstract | Learning causal relations from observational time series with nonlinear interactions and complex causal structures is a key component of human intelligence, and has a wide range of applications. Although neural nets have demonstrated their effectiveness in a variety of fields, their application in learning causal relations has been scarce. This is due to both a lack of theoretical results connecting risk minimization and causality (enabling function approximators like neural nets to apply), and a lack of scalability in prior causal measures to allow for expressive function approximators like neural nets to apply. In this work, we propose a novel causal measure and algorithm using risk minimization to infer causal relations from time series. We demonstrate the effectiveness and scalability of our algorithms to learn nonlinear causal models in synthetic datasets as comparing to other methods, and its effectiveness in inferring causal relations in a video game environment and real-world heart-rate vs. breath-rate and rat brain EEG datasets. |
Tasks | Causal Discovery, EEG, Time Series |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=B14ejsA5YQ |
https://openreview.net/pdf?id=B14ejsA5YQ | |
PWC | https://paperswithcode.com/paper/neural-causal-discovery-with-learnable-input |
Repo | |
Framework | |
Explaining Adversarial Examples with Knowledge Representation
Title | Explaining Adversarial Examples with Knowledge Representation |
Authors | Xingyu Zhou, Tengyu Ma, Huahong Zhang |
Abstract | Adversarial examples are modified samples that preserve original image structures but deviate classifiers. Researchers have put efforts into developing methods for generating adversarial examples and finding out origins. Past research put much attention on decision boundary changes caused by these methods. This paper, in contrast, discusses the origin of adversarial examples from a more underlying knowledge representation point of view. Human beings can learn and classify prototypes as well as transformations of objects. While neural networks store learned knowledge in a more hybrid way of combining all prototypes and transformations as a whole distribution. Hybrid storage may lead to lower distances between different classes so that small modifications can mislead the classifier. A one-step distribution imitation method is designed to imitate distribution of the nearest different class neighbor. Experiments show that simply by imitating distributions from a training set without any knowledge of the classifier can still lead to obvious impacts on classification results from deep networks. It also implies that adversarial examples can be in more forms than small perturbations. Potential ways of alleviating adversarial examples are discussed from the representation point of view. The first path is to change the encoding of data sent to the training step. Training data that are more prototypical can help seize more robust and accurate structural knowledge. The second path requires constructing learning frameworks with improved representations. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=BylRVjC9K7 |
https://openreview.net/pdf?id=BylRVjC9K7 | |
PWC | https://paperswithcode.com/paper/explaining-adversarial-examples-with |
Repo | |
Framework | |
BSC Participation in the WMT Translation of Biomedical Abstracts
Title | BSC Participation in the WMT Translation of Biomedical Abstracts |
Authors | Felipe Soares, Martin Krallinger |
Abstract | This paper describes the machine translation systems developed by the Barcelona Supercomputing (BSC) team for the biomedical translation shared task of WMT19. Our system is based on Neural Machine Translation unsing the OpenNMT-py toolkit and Transformer architecture. We participated in four translation directions for the English/Spanish and English/Portuguese language pairs. To create our training data, we concatenated several parallel corpora, both from in-domain and out-of-domain sources, as well as terminological resources from UMLS. |
Tasks | Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5422/ |
https://www.aclweb.org/anthology/W19-5422 | |
PWC | https://paperswithcode.com/paper/bsc-participation-in-the-wmt-translation-of |
Repo | |
Framework | |
Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol
Title | Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol |
Authors | Zakia Salod, Yashik Singh |
Abstract | BACKGROUND: Breast Cancer (BC) is a known global crisis. The World Health Organization reports a global 2.09 million incidences and 627,000 deaths in 2018 relating to BC. The traditional BC screening method in developed countries is mammography, whilst developing countries employ breast self-examination and clinical breast examination. The prominent gold standard for BC detection is triple assessment: i) clinical examination, ii) mammography and/ or ultrasonography; and iii) Fine Needle Aspirate Cytology. However, the introduction of cheaper, efficient and non-invasive methods of BC screening and detection would be beneficial. DESIGN AND METHODS: We propose the use of eight machine learning algorithms: i) Logistic Regression; ii) Support Vector Machine; iii) K-Nearest Neighbors; iv) Decision Tree; v) Random Forest; vi) Adaptive Boosting; vii) Gradient Boosting; viii) eXtreme Gradient Boosting, and blood test results using BC Coimbra Dataset (BCCD) from University of California Irvine online database to create models for BC prediction. To ensure the models’ robustness, we will employ: i) Stratified k-fold Cross-Validation; ii) Correlation-based Feature Selection (CFS); and iii) parameter tuning. The models will be validated on validation and test sets of BCCD for full features and reduced features. Feature reduction has an impact on algorithm performance. Seven metrics will be used for model evaluation, including accuracy. EXPECTED IMPACT OF THE STUDY FOR PUBLIC HEALTH: The CFS together with highest performing model(s) can serve to identify important specific blood tests that point towards BC, which may serve as an important BC biomarker. Highest performing model(s) may eventually be used to create an artificial intelligence tool to assist clinicians in BC screening and detection. |
Tasks | Breast Cancer Detection, Feature Selection |
Published | 2019-12-04 |
URL | https://www.jphres.org/index.php/jphres/article/view/1677 |
https://www.jphres.org/index.php/jphres/article/view/1677 | |
PWC | https://paperswithcode.com/paper/comparison-of-the-performance-of-machine |
Repo | |
Framework | |
The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task
Title | The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task |
Authors | Pau Baquero-Arnal, Javier Iranzo-S{'a}nchez, Jorge Civera, Alfons Juan |
Abstract | This paper describes the participation of the MLLP research group of the Universitat Polit{`e}cnica de Val{`e}ncia in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese ↔ Spanish language pair, in both directions. We have submitted systems based on the Transformer architecture as well as an in development novel architecture which we have called 2D alternating RNN. We have carried out domain adaptation through fine-tuning. |
Tasks | Domain Adaptation, Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5423/ |
https://www.aclweb.org/anthology/W19-5423 | |
PWC | https://paperswithcode.com/paper/the-mllp-upv-spanish-portuguese-and |
Repo | |
Framework | |