October 15, 2019

2207 words 11 mins read

Paper Group NANR 186

STACC, OOV Density and N-gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering. W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection. Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing. Single Image Highlight Removal with a Sparse and Low-Ran …

STACC, OOV Density and N-gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering


Title	STACC, OOV Density and N-gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering
Authors	Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{'\i}nez Garcia
Abstract	We describe Vicomtech{'}s participation in the WMT 2018 Shared Task on parallel corpus filtering. We aimed to evaluate a simple approach to the task, which can efficiently process large volumes of data and can be easily deployed for new datasets in different language pairs and domains. We based our approach on STACC, an efficient and portable method for parallel sentence identification in comparable corpora. To address the specifics of the corpus filtering task, which features significant volumes of noisy data, the core method was expanded with a penalty based on the amount of unknown words in sentence pairs. Additionally, we experimented with a complementary data saturation method based on source sentence n-grams, with the goal of demoting parallel sentence pairs that do not contribute significant amounts of yet unobserved n-grams. Our approach requires no prior training and is highly efficient on the type of large datasets featured in the corpus filtering task. We achieved competitive results with this simple and portable method, ranking in the top half among competing systems overall.
Tasks	Machine Translation, Outlier Detection
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6473/
PDF	https://www.aclweb.org/anthology/W18-6473
PWC	https://paperswithcode.com/paper/stacc-oov-density-and-n-gram-saturation
Repo
Framework

W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection


Title	W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection
Authors	Yongqiang Zhang, Yancheng Bai, Mingli Ding, Yongqiang Li, Bernard Ghanem
Abstract	Weakly-supervised object detection has attracted much attention lately, since it does not require bounding box annotations for training. Although significant progress has also been made, there is still a large gap in performance between weakly-supervised and fully-supervised object detection. Recently, some works use pseudo ground-truths which are generated by a weakly-supervised detector to train a supervised detector. Such approaches incline to find the most representative parts of objects, and only seek one ground-truth box per class even though many same-class instances exist. To overcome these issues, we propose a weakly-supervised to fully-supervised framework, where a weakly-supervised detector is implemented using multiple instance learning. Then, we propose a pseudo ground-truth excavation (PGE) algorithm to find the pseudo ground-truth of each instance in the image. Moreover, the pseudo ground-truth adaptation (PGA) algorithm is designed to further refine the pseudo ground-truths from PGE. Finally, we use these pseudo ground-truths to train a fully-supervised detector. Extensive experiments on the challenging PASCAL VOC 2007 and 2012 benchmarks strongly demonstrate the effectiveness of our framework. We obtain 52.4% and 47.8% mAP on VOC2007 and VOC2012 respectively, a significant improvement over previous state-of-the-art methods.
Tasks	Multiple Instance Learning, Object Detection, Weakly Supervised Object Detection
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_W2F_A_Weakly-Supervised_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_W2F_A_Weakly-Supervised_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/w2f-a-weakly-supervised-to-fully-supervised
Repo
Framework

Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing


Title	Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
Authors
Abstract
Tasks
Published	2018-08-01
URL	https://www.aclweb.org/anthology/W18-3800/
PDF	https://www.aclweb.org/anthology/W18-3800
PWC	https://paperswithcode.com/paper/proceedings-of-the-first-workshop-on-1
Repo
Framework

Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model


Title	Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model
Authors	Jie Guo, Zuojian Zhou, Limin Wang
Abstract	We propose a sparse and low-rank reflection model for specular highlight detection and removal using a single input image. This model is motivated by the observation that the specular highlight of a natural image usually has large intensity but is rather sparsely distributed while the remaining diffuse reflection can be well approximated by a linear combination of several distinct colors with a sparse and low-rank weighting matrix. We further impose the non-negativity constraint on the weighting matrix as well as the highlight component to ensure that the model is purely additive. With this reflection model, we reformulate the task of highlight removal as a constrained nuclear norm and $l_1$-norm minimization problem which can be solved effectively by the augmented Lagrange multiplier method. Experimental results show that our method performs well on both synthetic images and many real-world examples and is competitive with previous methods, especially in some challenging scenarios featuring natural illumination, hue-saturation ambiguity and strong noises.
Tasks
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Jie_Guo_Single_Image_Highlight_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Jie_Guo_Single_Image_Highlight_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/single-image-highlight-removal-with-a-sparse
Repo
Framework

Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering


Title	Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering
Authors
Abstract
Tasks	Question Answering
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-5300/
PDF	https://www.aclweb.org/anthology/W18-5300
PWC	https://paperswithcode.com/paper/proceedings-of-the-6th-bioasq-workshop-a
Repo
Framework

A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System


Title	A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System
Authors	Tianyu Zhao, Tatsuya Kawahara
Abstract	In spoken dialog systems (SDSs), dialog act (DA) segmentation and recognition provide essential information for response generation. A majority of previous works assumed ground-truth segmentation of DA units, which is not available from automatic speech recognition (ASR) in SDS. We propose a unified architecture based on neural networks, which consists of a sequence tagger for segmentation and a classifier for recognition. The DA recognition model is based on hierarchical neural networks to incorporate the context of preceding sentences. We investigate sharing some layers of the two components so that they can be trained jointly and learn generalized features from both tasks. An evaluation on the Switchboard Dialog Act (SwDA) corpus shows that the jointly-trained models outperform independently-trained models, single-step models, and other reported results in DA segmentation, recognition, and joint tasks.
Tasks	Language Modelling, Speech Recognition, Spoken Language Understanding
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-5021/
PDF	https://www.aclweb.org/anthology/W18-5021
PWC	https://paperswithcode.com/paper/a-unified-neural-architecture-for-joint
Repo
Framework

Annotation of a Large Clinical Entity Corpus


Title	Annotation of a Large Clinical Entity Corpus
Authors	Pinal Patel, Disha Davey, Vishal Panchal, Parth Pathak
Abstract	Having an entity annotated corpus of the clinical domain is one of the basic requirements for detection of clinical entities using machine learning (ML) approaches. Past researches have shown the superiority of statistical/ML approaches over the rule based approaches. But in order to take full advantage of the ML approaches, an accurately annotated corpus becomes an essential requirement. Though there are a few annotated corpora available either on a small data set, or covering a narrower domain (like cancer patients records, lab reports), annotation of a large data set representing the entire clinical domain has not been created yet. In this paper, we have described in detail the annotation guidelines, annotation process and our approaches in creating a CER (clinical entity recognition) corpus of 5,160 clinical documents from forty different clinical specialities. The clinical entities range across various types such as diseases, procedures, medications, medical devices and so on. We have classified them into eleven categories for annotation. Our annotation also reflects the relations among the group of entities that constitute larger concepts altogether.
Tasks	Machine Translation
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1228/
PDF	https://www.aclweb.org/anthology/D18-1228
PWC	https://paperswithcode.com/paper/annotation-of-a-large-clinical-entity-corpus
Repo
Framework

Cross-topic Argument Mining from Heterogeneous Sources


Title	Cross-topic Argument Mining from Heterogeneous Sources
Authors	Christian Stab, Tristan Miller, Benjamin Schiller, Pranav Rai, Iryna Gurevych
Abstract	Argument mining is a core technology for automating argument search in large document collections. Despite its usefulness for this task, most current approaches are designed for use only with specific text types and fall short when applied to heterogeneous texts. In this paper, we propose a new sentential annotation scheme that is reliably applicable by crowd workers to arbitrary Web texts. We source annotations for over 25,000 instances covering eight controversial topics. We show that integrating topic information into bidirectional long short-term memory networks outperforms vanilla BiLSTMs by more than 3 percentage points in F1 in two- and three-label cross-topic settings. We also show that these results can be further improved by leveraging additional data for topic relevance using multi-task learning.
Tasks	Argument Mining, Decision Making, Information Retrieval, Multi-Task Learning, Question Answering
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1402/
PDF	https://www.aclweb.org/anthology/D18-1402
PWC	https://paperswithcode.com/paper/cross-topic-argument-mining-from-1
Repo
Framework

A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task


Title	A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
Authors	Elvys Linhares Pontes, Juan-Manuel Torres-Moreno, St{'e}phane Huet, Andr{'e}a Carneiro Linhares
Abstract
Tasks	Abstractive Text Summarization, Question Answering, Sentence Compression, Text Summarization
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1504/
PDF	https://www.aclweb.org/anthology/L18-1504
PWC	https://paperswithcode.com/paper/a-new-annotated-portuguesespanish-corpus-for
Repo
Framework

Anaphora Resolution for Improving Spatial Relation Extraction from Text


Title	Anaphora Resolution for Improving Spatial Relation Extraction from Text
Authors	Umar Manzoor, Parisa Kordjamshidi
Abstract	Spatial relation extraction from generic text is a challenging problem due to the ambiguity of the prepositions spatial meaning as well as the nesting structure of the spatial descriptions. In this work, we highlight the difficulties that the anaphora can make in the extraction of spatial relations. We use external multi-modal (here visual) resources to find the most probable candidates for resolving the anaphoras that refer to the landmarks of the spatial relations. We then use global inference to decide jointly on resolving the anaphora and extraction of the spatial relations. Our preliminary results show that resolving anaphora improves the state-of-the-art results on spatial relation extraction.
Tasks	Relation Extraction
Published	2018-06-01
URL	https://www.aclweb.org/anthology/W18-1407/
PDF	https://www.aclweb.org/anthology/W18-1407
PWC	https://paperswithcode.com/paper/anaphora-resolution-for-improving-spatial
Repo
Framework

Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus


Title	Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus
Authors	Abbas Ghaddar, Philippe Langlais
Abstract
Tasks	Entity Linking, Entity Typing, Named Entity Recognition, Question Answering, Relation Extraction
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1699/
PDF	https://www.aclweb.org/anthology/L18-1699
PWC	https://paperswithcode.com/paper/transforming-wikipedia-into-a-large-scale
Repo
Framework

A New Version of the Sk\ladnica Treebank of Polish Harmonised with the Walenty Valency Dictionary


Title	A New Version of the Sk\ladnica Treebank of Polish Harmonised with the Walenty Valency Dictionary
Authors	Marcin Woli{'n}ski, El{.z}bieta Hajnicz, Tomasz Bartosiak
Abstract
Tasks	Constituency Parsing
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1289/
PDF	https://www.aclweb.org/anthology/L18-1289
PWC	https://paperswithcode.com/paper/a-new-version-of-the-skaadnica-treebank-of
Repo
Framework

Do GANs learn the distribution? Some Theory and Empirics


Title	Do GANs learn the distribution? Some Theory and Empirics
Authors	Sanjeev Arora, Andrej Risteski, Yi Zhang
Abstract	Do GANS (Generative Adversarial Nets) actually learn the target distribution? The foundational paper of Goodfellow et al. (2014) suggested they do, if they were given sufficiently large deep nets, sample size, and computation time. A recent theoretical analysis in Arora et al. (2017) raised doubts whether the same holds when discriminator has bounded size. It showed that the training objective can approach its optimum value even if the generated distribution has very low support. In other words, the training objective is unable to prevent mode collapse. The current paper makes two contributions. (1) It proposes a novel test for estimating support size using the birthday paradox of discrete probability. Using this evidence is presented that well-known GANs approaches do learn distributions of fairly low support. (2) It theoretically studies encoder-decoder GANs architectures (e.g., BiGAN/ALI), which were proposed to learn more meaningful features via GANs, and consequently to also solve the mode-collapse issue. Our result shows that such encoder-decoder training objectives also cannot guarantee learning of the full distribution because they cannot prevent serious mode collapse. More seriously, they cannot prevent learning meaningless codes for data, contrary to usual intuition.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=BJehNfW0-
PDF	https://openreview.net/pdf?id=BJehNfW0-
PWC	https://paperswithcode.com/paper/do-gans-learn-the-distribution-some-theory
Repo
Framework


Title	Structured Local Minima in Sparse Blind Deconvolution
Authors	Yuqian Zhang, Han-Wen Kuo, John Wright
Abstract	Blind deconvolution is a ubiquitous problem of recovering two unknown signals from their convolution. Unfortunately, this is an ill-posed problem in general. This paper focuses on the {\em short and sparse} blind deconvolution problem, where the one unknown signal is short and the other one is sparsely and randomly supported. This variant captures the structure of the unknown signals in several important applications. We assume the short signal to have unit $\ell^2$ norm and cast the blind deconvolution problem as a nonconvex optimization problem over the sphere. We demonstrate that (i) in a certain region of the sphere, every local optimum is close to some shift truncation of the ground truth, and (ii) for a generic short signal of length $k$, when the sparsity of activation signal $\theta\lesssim k^{-2/3}$ and number of measurements $m\gtrsim\poly\paren{k}$, a simple initialization method together with a descent algorithm which escapes strict saddle points recovers a near shift truncation of the ground truth kernel.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/7500-structured-local-minima-in-sparse-blind-deconvolution
PDF	http://papers.nips.cc/paper/7500-structured-local-minima-in-sparse-blind-deconvolution.pdf
PWC	https://paperswithcode.com/paper/structured-local-minima-in-sparse-blind
Repo
Framework

OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification


Title	OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
Authors	Sowmya Vajjala, Ivana Lu{\v{c}}i{'c}
Abstract	This paper describes the collection and compilation of the OneStopEnglish corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total). The corpus is now freely available under a CC by-SA 4.0 license and we hope that it would foster further research on the topics of readability assessment and text simplification.
Tasks	Feature Engineering, Text Simplification
Published	2018-06-01
URL	https://www.aclweb.org/anthology/W18-0535/
PDF	https://www.aclweb.org/anthology/W18-0535
PWC	https://paperswithcode.com/paper/onestopenglish-corpus-a-new-corpus-for
Repo
Framework