January 25, 2020

2797 words 14 mins read

Paper Group NANR 78

Paper Group NANR 78

AGRR 2019: Corpus for Gapping Resolution in Russian. Segmentation for Domain Adaptation in Arabic. Distinguishability of Adversarial Examples. Studying Summarization Evaluation Metrics in the Appropriate Scoring Range. Dataset Mention Extraction and Classification. Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Tra …

AGRR 2019: Corpus for Gapping Resolution in Russian

Title AGRR 2019: Corpus for Gapping Resolution in Russian
Authors Maria Ponomareva, Kira Droganova, Ivan Smurov, Tatiana Shavrina
Abstract This paper provides a comprehensive overview of the gapping dataset for Russian that consists of 7.5k sentences with gapping (as well as 15k relevant negative sentences) and comprises data from various genres: news, fiction, social media and technical texts. The dataset was prepared for the Automatic Gapping Resolution Shared Task for Russian (AGRR-2019) - a competition aimed at stimulating the development of NLP tools and methods for processing of ellipsis. In this paper, we pay special attention to the gapping resolution methods that were introduced within the shared task as well as an alternative test set that illustrates that our corpus is a diverse and representative subset of Russian language gapping sufficient for effective utilization of machine learning techniques.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3705/
PDF https://www.aclweb.org/anthology/W19-3705
PWC https://paperswithcode.com/paper/agrr-2019-corpus-for-gapping-resolution-in
Repo
Framework

Segmentation for Domain Adaptation in Arabic

Title Segmentation for Domain Adaptation in Arabic
Authors Mohammed Attia, Ali Elkahky
Abstract Segmentation serves as an integral part in many NLP applications including Machine Translation, Parsing, and Information Retrieval. When a model trained on the standard language is applied to dialects, the accuracy drops dramatically. However, there are more lexical items shared by the standard language and dialects than can be found by mere surface word matching. This shared lexicon is obscured by a lot of cliticization, gemination, and character repetition. In this paper, we prove that segmentation and base normalization of dialects can help in domain adaptation by reducing data sparseness. Segmentation will improve a system performance by reducing the number of OOVs, help isolate the differences and allow better utilization of the commonalities. We show that adding a small amount of dialectal segmentation training data reduced OOVs by 5{%} and remarkably improves POS tagging for dialects by 7.37{%} f-score, even though no dialect-specific POS training data is included.
Tasks Domain Adaptation, Information Retrieval, Machine Translation
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4613/
PDF https://www.aclweb.org/anthology/W19-4613
PWC https://paperswithcode.com/paper/segmentation-for-domain-adaptation-in-arabic
Repo
Framework

Distinguishability of Adversarial Examples

Title Distinguishability of Adversarial Examples
Authors Yi Qin, Ryan Hunt, Chuan Yue
Abstract Machine learning models including traditional models and neural networks can be easily fooled by adversarial examples which are generated from the natural examples with small perturbations. This poses a critical challenge to machine learning security, and impedes the wide application of machine learning in many important domains such as computer vision and malware detection. Unfortunately, even state-of-the-art defense approaches such as adversarial training and defensive distillation still suffer from major limitations and can be circumvented. From a unique angle, we propose to investigate two important research questions in this paper: Are adversarial examples distinguishable from natural examples? Are adversarial examples generated by different methods distinguishable from each other? These two questions concern the distinguishability of adversarial examples. Answering them will potentially lead to a simple yet effective approach, termed as defensive distinction in this paper under the formulation of multi-label classification, for protecting against adversarial examples. We design and perform experiments using the MNIST dataset to investigate these two questions, and obtain highly positive results demonstrating the strong distinguishability of adversarial examples. We recommend that this unique defensive distinction approach should be seriously considered to complement other defense approaches.
Tasks Malware Detection, Multi-Label Classification
Published 2019-05-01
URL https://openreview.net/forum?id=r1glehC5tQ
PDF https://openreview.net/pdf?id=r1glehC5tQ
PWC https://paperswithcode.com/paper/distinguishability-of-adversarial-examples
Repo
Framework

Studying Summarization Evaluation Metrics in the Appropriate Scoring Range

Title Studying Summarization Evaluation Metrics in the Appropriate Scoring Range
Authors Maxime Peyrard
Abstract In summarization, automatic evaluation metrics are usually compared based on their ability to correlate with human judgments. Unfortunately, the few existing human judgment datasets have been created as by-products of the manual evaluations performed during the DUC/TAC shared tasks. However, modern systems are typically better than the best systems submitted at the time of these shared tasks. We show that, surprisingly, evaluation metrics which behave similarly on these datasets (average-scoring range) strongly disagree in the higher-scoring range in which current systems now operate. It is problematic because metrics disagree yet we can{'}t decide which one to trust. This is a call for collecting human judgments for high-scoring summaries as this would resolve the debate over which metrics to trust. This would also be greatly beneficial to further improve summarization systems and metrics alike.
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1502/
PDF https://www.aclweb.org/anthology/P19-1502
PWC https://paperswithcode.com/paper/studying-summarization-evaluation-metrics-in
Repo
Framework

Dataset Mention Extraction and Classification

Title Dataset Mention Extraction and Classification
Authors Animesh Prasad, Chenglei Si, Min-Yen Kan
Abstract Datasets are integral artifacts of empirical scientific research. However, due to natural language variation, their recognition can be difficult and even when identified, can often be inconsistently referred across and within publications. We report our approach to the Coleridge Initiative{'}s Rich Context Competition, which tasks participants with identifying dataset surface forms (dataset mention extraction) and associating the extracted mention to its referred dataset (dataset classification). In this work, we propose various neural baselines and evaluate these model on one-plus and zero-shot classification scenarios. We further explore various joint learning approaches - exploring the synergy between the tasks - and report the issues with such techniques.
Tasks Zero-Shot Learning
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2604/
PDF https://www.aclweb.org/anthology/W19-2604
PWC https://paperswithcode.com/paper/dataset-mention-extraction-and-classification
Repo
Framework

Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System

Title Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System
Authors {.I}lknur Durgar El-Kahlout, Emre Bekta{\c{s}}, Naime {\c{S}}eyma Erdem, Hamza Kaya
Abstract This paper introduces the work on building a machine translation system for Arabic-to-Turkish in the news domain. Our work includes collecting parallel datasets in several ways for a new and low-resourced language pair, building baseline systems with state-of-the-art architectures and developing language specific algorithms for better translation. Parallel datasets are mainly collected three different ways; i) translating Arabic texts into Turkish by professional translators, ii) exploiting the web for open-source Arabic-Turkish parallel texts, iii) using back-translation. We per-formed preliminary experiments for Arabic-to-Turkish machine translation with neural(Marian) machine translation tools with a novel morphologically motivated vocabulary reduction method.
Tasks Machine Translation
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4617/
PDF https://www.aclweb.org/anthology/W19-4617
PWC https://paperswithcode.com/paper/translating-between-morphologically-rich
Repo
Framework

Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings

Title Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings
Authors Zied Haj-Yahia, Adrien Sieg, L{'e}a A. Deleris
Abstract Text classification aims at mapping documents into a set of predefined categories. Supervised machine learning models have shown great success in this area but they require a large number of labeled documents to reach adequate accuracy. This is particularly true when the number of target categories is in the tens or the hundreds. In this work, we explore an unsupervised approach to classify documents into categories simply described by a label. The proposed method is inspired by the way a human proceeds in this situation: It draws on textual similarity between the most relevant words in each document and a dictionary of keywords for each category reflecting its semantics and lexical field. The novelty of our method hinges on the enrichment of the category labels through a combination of human expertise and language models, both generic and domain specific. Our experiments on 5 standard corpora show that the proposed method increases F1-score over relying solely on human expertise and can also be on par with simple supervised approaches. It thus provides a practical alternative to situations where low cost text categorization is needed, as we illustrate with our application to operational risk incidents classification.
Tasks Text Categorization, Text Classification, Word Embeddings
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1036/
PDF https://www.aclweb.org/anthology/P19-1036
PWC https://paperswithcode.com/paper/towards-unsupervised-text-classification
Repo
Framework

RainFlow: Optical Flow Under Rain Streaks and Rain Veiling Effect

Title RainFlow: Optical Flow Under Rain Streaks and Rain Veiling Effect
Authors Ruoteng Li, Robby T. Tan, Loong-Fah Cheong, Angelica I. Aviles-Rivero, Qingnan Fan, Carola-Bibiane Schonlieb
Abstract Optical flow in heavy rainy scenes is challenging due to the presence of both rain steaks and rain veiling effect, which break the existing optical flow constraints. Concerning this, we propose a deep-learning based optical flow method designed to handle heavy rain. We introduce a feature multiplier in our network that transforms the features of an image affected by the rain veiling effect into features that are less affected by it, which we call veiling-invariant features. We establish a new mapping operation in the feature space to produce streak-invariant features. The operation is based on a feature pyramid structure of the input images, and the basic idea is to preserve the chromatic features of the background scenes while canceling the rain-streak patterns. Both the veiling-invariant and streak-invariant features are computed and optimized automatically based on the the accuracy of our optical flow estimation. Our network is end-to-end, and handles both rain streaks and the veiling effect in an integrated framework. Extensive experiments show the effectiveness of our method, which outperforms the state of the art method and other baseline methods. We also show that our network can robustly maintain good performance on clean (no rain) images even though it is trained under rain image data.
Tasks Optical Flow Estimation
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Li_RainFlow_Optical_Flow_Under_Rain_Streaks_and_Rain_Veiling_Effect_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Li_RainFlow_Optical_Flow_Under_Rain_Streaks_and_Rain_Veiling_Effect_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/rainflow-optical-flow-under-rain-streaks-and
Repo
Framework

A multi-constraint structured hinge loss for named-entity recognition

Title A multi-constraint structured hinge loss for named-entity recognition
Authors Hanieh Poostchi, Massimo Piccardi
Abstract
Tasks Named Entity Recognition
Published 2019-04-01
URL https://www.aclweb.org/anthology/U19-1006/
PDF https://www.aclweb.org/anthology/U19-1006
PWC https://paperswithcode.com/paper/a-multi-constraint-structured-hinge-loss-for
Repo
Framework

AUTOHOME-ORCA at SemEval-2019 Task 8: Application of BERT for Fact-Checking in Community Forums

Title AUTOHOME-ORCA at SemEval-2019 Task 8: Application of BERT for Fact-Checking in Community Forums
Authors Zhengwei Lv, Duoxing Liu, Haifeng Sun, Xiao Liang, Tao Lei, Zhizhong Shi, Feng Zhu, Lei Yang
Abstract Fact checking is an important task for maintaining high quality posts and improving user experience in Community Question Answering forums. Therefore, the SemEval-2019 task 8 is aimed to identify factual question (subtask A) and detect true factual information from corresponding answers (subtask B). In order to address this task, we propose a system based on the BERT model with meta information of questions. For the subtask A, the outputs of fine-tuned BERT classification model are combined with the feature of length of questions to boost the performance. For the subtask B, the predictions of several variants of BERT model encoding the meta information are combined to create an ensemble model. Our system achieved competitive results with an accuracy of 0.82 in the subtask A and 0.83 in the subtask B. The experimental results validate the effectiveness of our system.
Tasks Community Question Answering, Question Answering
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2150/
PDF https://www.aclweb.org/anthology/S19-2150
PWC https://paperswithcode.com/paper/autohome-orca-at-semeval-2019-task-8
Repo
Framework

AutoAugment: Learning Augmentation Strategies From Data

Title AutoAugment: Learning Augmentation Strategies From Data
Authors Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le
Abstract Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies. In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. Our method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data). On ImageNet, we attain a Top-1 accuracy of 83.5% which is 0.4% better than the previous record of 83.1%. On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-the-art. Augmentation policies we find are transferable between datasets. The policy learned on ImageNet transfers well to achieve significant improvements on other datasets, such as Oxford Flowers, Caltech-101, Oxford-IIT Pets, FGVC Aircraft, and Stanford Cars.
Tasks Data Augmentation
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Cubuk_AutoAugment_Learning_Augmentation_Strategies_From_Data_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Cubuk_AutoAugment_Learning_Augmentation_Strategies_From_Data_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/autoaugment-learning-augmentation-strategies
Repo
Framework

BeautyGlow: On-Demand Makeup Transfer Framework With Reversible Generative Network

Title BeautyGlow: On-Demand Makeup Transfer Framework With Reversible Generative Network
Authors Hung-Jen Chen, Ka-Ming Hui, Szu-Yu Wang, Li-Wu Tsao, Hong-Han Shuai, Wen-Huang Cheng
Abstract As makeup has been widely-adopted for beautification, finding suitable makeup by virtual makeup applications becomes popular. Therefore, a recent line of studies proposes to transfer the makeup from a given reference makeup image to the source non-makeup one. However, it is still challenging due to the massive number of makeup combinations. To facilitate on-demand makeup transfer, in this work, we propose BeautyGlow that decompose the latent vectors of face images derived from the Glow model into makeup and non-makeup latent vectors. Since there is no paired dataset, we formulate a new loss function to guide the decomposition. Afterward, the non-makeup latent vector of a source image and makeup latent vector of a reference image and are effectively combined and revert back to the image domain to derive the results. Experimental results show that the transfer quality of BeautyGlow is comparable to the state-of-the-art methods, while the unique ability to manipulate latent vectors allows BeautyGlow to realize on-demand makeup transfer.
Tasks
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_BeautyGlow_On-Demand_Makeup_Transfer_Framework_With_Reversible_Generative_Network_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_BeautyGlow_On-Demand_Makeup_Transfer_Framework_With_Reversible_Generative_Network_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/beautyglow-on-demand-makeup-transfer
Repo
Framework

Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection

Title Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
Authors Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel, Lizhen Qu
Abstract Due to the sharp increase in the severity of the threat imposed by software vulnerabilities, the detection of vulnerabilities in binary code has become an important concern in the software industry, such as the embedded systems industry, and in the field of computer security. However, most of the work in binary code vulnerability detection has relied on handcrafted features which are manually chosen by a select few, knowledgeable domain experts. In this paper, we attempt to alleviate this severe binary vulnerability detection bottleneck by leveraging recent advances in deep learning representations and propose the Maximal Divergence Sequential Auto-Encoder. In particular, latent codes representing vulnerable and non-vulnerable binaries are encouraged to be maximally divergent, while still being able to maintain crucial information from the original binaries. We conducted extensive experiments to compare and contrast our proposed methods with the baselines, and the results show that our proposed methods outperform the baselines in all performance measures of interest.
Tasks Vulnerability Detection
Published 2019-05-01
URL https://openreview.net/forum?id=ByloIiCqYQ
PDF https://openreview.net/pdf?id=ByloIiCqYQ
PWC https://paperswithcode.com/paper/maximal-divergence-sequential-autoencoder-for
Repo
Framework

Latent Domain Transfer: Crossing modalities with Bridging Autoencoders

Title Latent Domain Transfer: Crossing modalities with Bridging Autoencoders
Authors Yingtao Tian, Jesse Engel
Abstract Domain transfer is a exciting and challenging branch of machine learning because models must learn to smoothly transfer between domains, preserving local variations and capturing many aspects of variation without labels. However, most successful applications to date require the two domains to be closely related (ex. image-to-image, video-video), utilizing similar or shared networks to transform domain specific properties like texture, coloring, and line shapes. Here, we demonstrate that it is possible to transfer across modalities (ex. image-to-audio) by first abstracting the data with latent generative models and then learning transformations between latent spaces. We find that a simple variational autoencoder is able to learn a shared latent space to bridge between two generative models in an unsupervised fashion, and even between different types of models (ex. variational autoencoder and a generative adversarial network). We can further impose desired semantic alignment of attributes with a linear classifier in the shared latent space. The proposed variation autoencoder enables preserving both locality and semantic alignment through the transfer process, as shown in the qualitative and quantitative evaluations. Finally, the hierarchical structure decouples the cost of training the base generative models and semantic alignments, enabling computationally efficient and data efficient retraining of personalized mapping functions.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=r1xrb3CqtQ
PDF https://openreview.net/pdf?id=r1xrb3CqtQ
PWC https://paperswithcode.com/paper/latent-domain-transfer-crossing-modalities
Repo
Framework

Boosting Dialog Response Generation

Title Boosting Dialog Response Generation
Authors Wenchao Du, Alan W Black
Abstract Neural models have become one of the most important approaches to dialog response generation. However, they still tend to generate the most common and generic responses in the corpus all the time. To address this problem, we designed an iterative training process and ensemble method based on boosting. We combined our method with different training and decoding paradigms as the base model, including mutual-information-based decoding and reward-augmented maximum likelihood learning. Empirical results show that our approach can significantly improve the diversity and relevance of the responses generated by all base models, backed by objective measurements and human evaluation.
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1005/
PDF https://www.aclweb.org/anthology/P19-1005
PWC https://paperswithcode.com/paper/boosting-dialog-response-generation
Repo
Framework
comments powered by Disqus