Paper Group NANR 186
STACC, OOV Density and N-gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering. W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection. Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing. Single Image Highlight Removal with a Sparse and Low-Ran …
STACC, OOV Density and N-gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering
Title | STACC, OOV Density and N-gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering |
Authors | Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{'\i}nez Garcia |
Abstract | We describe Vicomtech{'}s participation in the WMT 2018 Shared Task on parallel corpus filtering. We aimed to evaluate a simple approach to the task, which can efficiently process large volumes of data and can be easily deployed for new datasets in different language pairs and domains. We based our approach on STACC, an efficient and portable method for parallel sentence identification in comparable corpora. To address the specifics of the corpus filtering task, which features significant volumes of noisy data, the core method was expanded with a penalty based on the amount of unknown words in sentence pairs. Additionally, we experimented with a complementary data saturation method based on source sentence n-grams, with the goal of demoting parallel sentence pairs that do not contribute significant amounts of yet unobserved n-grams. Our approach requires no prior training and is highly efficient on the type of large datasets featured in the corpus filtering task. We achieved competitive results with this simple and portable method, ranking in the top half among competing systems overall. |
Tasks | Machine Translation, Outlier Detection |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6473/ |
https://www.aclweb.org/anthology/W18-6473 | |
PWC | https://paperswithcode.com/paper/stacc-oov-density-and-n-gram-saturation |
Repo | |
Framework | |
W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection
Title | W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection |
Authors | Yongqiang Zhang, Yancheng Bai, Mingli Ding, Yongqiang Li, Bernard Ghanem |
Abstract | Weakly-supervised object detection has attracted much attention lately, since it does not require bounding box annotations for training. Although significant progress has also been made, there is still a large gap in performance between weakly-supervised and fully-supervised object detection. Recently, some works use pseudo ground-truths which are generated by a weakly-supervised detector to train a supervised detector. Such approaches incline to find the most representative parts of objects, and only seek one ground-truth box per class even though many same-class instances exist. To overcome these issues, we propose a weakly-supervised to fully-supervised framework, where a weakly-supervised detector is implemented using multiple instance learning. Then, we propose a pseudo ground-truth excavation (PGE) algorithm to find the pseudo ground-truth of each instance in the image. Moreover, the pseudo ground-truth adaptation (PGA) algorithm is designed to further refine the pseudo ground-truths from PGE. Finally, we use these pseudo ground-truths to train a fully-supervised detector. Extensive experiments on the challenging PASCAL VOC 2007 and 2012 benchmarks strongly demonstrate the effectiveness of our framework. We obtain 52.4% and 47.8% mAP on VOC2007 and VOC2012 respectively, a significant improvement over previous state-of-the-art methods. |
Tasks | Multiple Instance Learning, Object Detection, Weakly Supervised Object Detection |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_W2F_A_Weakly-Supervised_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_W2F_A_Weakly-Supervised_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/w2f-a-weakly-supervised-to-fully-supervised |
Repo | |
Framework | |
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
Title | Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing |
Authors | |
Abstract | |
Tasks | |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/W18-3800/ |
https://www.aclweb.org/anthology/W18-3800 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-first-workshop-on-1 |
Repo | |
Framework | |
Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model
Title | Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model |
Authors | Jie Guo, Zuojian Zhou, Limin Wang |
Abstract | We propose a sparse and low-rank reflection model for specular highlight detection and removal using a single input image. This model is motivated by the observation that the specular highlight of a natural image usually has large intensity but is rather sparsely distributed while the remaining diffuse reflection can be well approximated by a linear combination of several distinct colors with a sparse and low-rank weighting matrix. We further impose the non-negativity constraint on the weighting matrix as well as the highlight component to ensure that the model is purely additive. With this reflection model, we reformulate the task of highlight removal as a constrained nuclear norm and $l_1$-norm minimization problem which can be solved effectively by the augmented Lagrange multiplier method. Experimental results show that our method performs well on both synthetic images and many real-world examples and is competitive with previous methods, especially in some challenging scenarios featuring natural illumination, hue-saturation ambiguity and strong noises. |
Tasks | |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Jie_Guo_Single_Image_Highlight_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Jie_Guo_Single_Image_Highlight_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/single-image-highlight-removal-with-a-sparse |
Repo | |
Framework | |
Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering
Title | Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering |
Authors | |
Abstract | |
Tasks | Question Answering |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-5300/ |
https://www.aclweb.org/anthology/W18-5300 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-6th-bioasq-workshop-a |
Repo | |
Framework | |
A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System
Title | A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System |
Authors | Tianyu Zhao, Tatsuya Kawahara |
Abstract | In spoken dialog systems (SDSs), dialog act (DA) segmentation and recognition provide essential information for response generation. A majority of previous works assumed ground-truth segmentation of DA units, which is not available from automatic speech recognition (ASR) in SDS. We propose a unified architecture based on neural networks, which consists of a sequence tagger for segmentation and a classifier for recognition. The DA recognition model is based on hierarchical neural networks to incorporate the context of preceding sentences. We investigate sharing some layers of the two components so that they can be trained jointly and learn generalized features from both tasks. An evaluation on the Switchboard Dialog Act (SwDA) corpus shows that the jointly-trained models outperform independently-trained models, single-step models, and other reported results in DA segmentation, recognition, and joint tasks. |
Tasks | Language Modelling, Speech Recognition, Spoken Language Understanding |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-5021/ |
https://www.aclweb.org/anthology/W18-5021 | |
PWC | https://paperswithcode.com/paper/a-unified-neural-architecture-for-joint |
Repo | |
Framework | |
Annotation of a Large Clinical Entity Corpus
Title | Annotation of a Large Clinical Entity Corpus |
Authors | Pinal Patel, Disha Davey, Vishal Panchal, Parth Pathak |
Abstract | Having an entity annotated corpus of the clinical domain is one of the basic requirements for detection of clinical entities using machine learning (ML) approaches. Past researches have shown the superiority of statistical/ML approaches over the rule based approaches. But in order to take full advantage of the ML approaches, an accurately annotated corpus becomes an essential requirement. Though there are a few annotated corpora available either on a small data set, or covering a narrower domain (like cancer patients records, lab reports), annotation of a large data set representing the entire clinical domain has not been created yet. In this paper, we have described in detail the annotation guidelines, annotation process and our approaches in creating a CER (clinical entity recognition) corpus of 5,160 clinical documents from forty different clinical specialities. The clinical entities range across various types such as diseases, procedures, medications, medical devices and so on. We have classified them into eleven categories for annotation. Our annotation also reflects the relations among the group of entities that constitute larger concepts altogether. |
Tasks | Machine Translation |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1228/ |
https://www.aclweb.org/anthology/D18-1228 | |
PWC | https://paperswithcode.com/paper/annotation-of-a-large-clinical-entity-corpus |
Repo | |
Framework | |
Cross-topic Argument Mining from Heterogeneous Sources
Title | Cross-topic Argument Mining from Heterogeneous Sources |
Authors | Christian Stab, Tristan Miller, Benjamin Schiller, Pranav Rai, Iryna Gurevych |
Abstract | Argument mining is a core technology for automating argument search in large document collections. Despite its usefulness for this task, most current approaches are designed for use only with specific text types and fall short when applied to heterogeneous texts. In this paper, we propose a new sentential annotation scheme that is reliably applicable by crowd workers to arbitrary Web texts. We source annotations for over 25,000 instances covering eight controversial topics. We show that integrating topic information into bidirectional long short-term memory networks outperforms vanilla BiLSTMs by more than 3 percentage points in F1 in two- and three-label cross-topic settings. We also show that these results can be further improved by leveraging additional data for topic relevance using multi-task learning. |
Tasks | Argument Mining, Decision Making, Information Retrieval, Multi-Task Learning, Question Answering |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1402/ |
https://www.aclweb.org/anthology/D18-1402 | |
PWC | https://paperswithcode.com/paper/cross-topic-argument-mining-from-1 |
Repo | |
Framework | |
A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
Title | A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task |
Authors | Elvys Linhares Pontes, Juan-Manuel Torres-Moreno, St{'e}phane Huet, Andr{'e}a Carneiro Linhares |
Abstract | |
Tasks | Abstractive Text Summarization, Question Answering, Sentence Compression, Text Summarization |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1504/ |
https://www.aclweb.org/anthology/L18-1504 | |
PWC | https://paperswithcode.com/paper/a-new-annotated-portuguesespanish-corpus-for |
Repo | |
Framework | |
Anaphora Resolution for Improving Spatial Relation Extraction from Text
Title | Anaphora Resolution for Improving Spatial Relation Extraction from Text |
Authors | Umar Manzoor, Parisa Kordjamshidi |
Abstract | Spatial relation extraction from generic text is a challenging problem due to the ambiguity of the prepositions spatial meaning as well as the nesting structure of the spatial descriptions. In this work, we highlight the difficulties that the anaphora can make in the extraction of spatial relations. We use external multi-modal (here visual) resources to find the most probable candidates for resolving the anaphoras that refer to the landmarks of the spatial relations. We then use global inference to decide jointly on resolving the anaphora and extraction of the spatial relations. Our preliminary results show that resolving anaphora improves the state-of-the-art results on spatial relation extraction. |
Tasks | Relation Extraction |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-1407/ |
https://www.aclweb.org/anthology/W18-1407 | |
PWC | https://paperswithcode.com/paper/anaphora-resolution-for-improving-spatial |
Repo | |
Framework | |
Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus
Title | Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus |
Authors | Abbas Ghaddar, Philippe Langlais |
Abstract | |
Tasks | Entity Linking, Entity Typing, Named Entity Recognition, Question Answering, Relation Extraction |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1699/ |
https://www.aclweb.org/anthology/L18-1699 | |
PWC | https://paperswithcode.com/paper/transforming-wikipedia-into-a-large-scale |
Repo | |
Framework | |
A New Version of the Sk\ladnica Treebank of Polish Harmonised with the Walenty Valency Dictionary
Title | A New Version of the Sk\ladnica Treebank of Polish Harmonised with the Walenty Valency Dictionary |
Authors | Marcin Woli{'n}ski, El{.z}bieta Hajnicz, Tomasz Bartosiak |
Abstract | |
Tasks | Constituency Parsing |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1289/ |
https://www.aclweb.org/anthology/L18-1289 | |
PWC | https://paperswithcode.com/paper/a-new-version-of-the-skaadnica-treebank-of |
Repo | |
Framework | |
Do GANs learn the distribution? Some Theory and Empirics
Title | Do GANs learn the distribution? Some Theory and Empirics |
Authors | Sanjeev Arora, Andrej Risteski, Yi Zhang |
Abstract | Do GANS (Generative Adversarial Nets) actually learn the target distribution? The foundational paper of Goodfellow et al. (2014) suggested they do, if they were given sufficiently large deep nets, sample size, and computation time. A recent theoretical analysis in Arora et al. (2017) raised doubts whether the same holds when discriminator has bounded size. It showed that the training objective can approach its optimum value even if the generated distribution has very low support. In other words, the training objective is unable to prevent mode collapse. The current paper makes two contributions. (1) It proposes a novel test for estimating support size using the birthday paradox of discrete probability. Using this evidence is presented that well-known GANs approaches do learn distributions of fairly low support. (2) It theoretically studies encoder-decoder GANs architectures (e.g., BiGAN/ALI), which were proposed to learn more meaningful features via GANs, and consequently to also solve the mode-collapse issue. Our result shows that such encoder-decoder training objectives also cannot guarantee learning of the full distribution because they cannot prevent serious mode collapse. More seriously, they cannot prevent learning meaningless codes for data, contrary to usual intuition. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=BJehNfW0- |
https://openreview.net/pdf?id=BJehNfW0- | |
PWC | https://paperswithcode.com/paper/do-gans-learn-the-distribution-some-theory |
Repo | |
Framework | |
Structured Local Minima in Sparse Blind Deconvolution
Title | Structured Local Minima in Sparse Blind Deconvolution |
Authors | Yuqian Zhang, Han-Wen Kuo, John Wright |
Abstract | Blind deconvolution is a ubiquitous problem of recovering two unknown signals from their convolution. Unfortunately, this is an ill-posed problem in general. This paper focuses on the {\em short and sparse} blind deconvolution problem, where the one unknown signal is short and the other one is sparsely and randomly supported. This variant captures the structure of the unknown signals in several important applications. We assume the short signal to have unit $\ell^2$ norm and cast the blind deconvolution problem as a nonconvex optimization problem over the sphere. We demonstrate that (i) in a certain region of the sphere, every local optimum is close to some shift truncation of the ground truth, and (ii) for a generic short signal of length $k$, when the sparsity of activation signal $\theta\lesssim k^{-2/3}$ and number of measurements $m\gtrsim\poly\paren{k}$, a simple initialization method together with a descent algorithm which escapes strict saddle points recovers a near shift truncation of the ground truth kernel. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7500-structured-local-minima-in-sparse-blind-deconvolution |
http://papers.nips.cc/paper/7500-structured-local-minima-in-sparse-blind-deconvolution.pdf | |
PWC | https://paperswithcode.com/paper/structured-local-minima-in-sparse-blind |
Repo | |
Framework | |
OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
Title | OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification |
Authors | Sowmya Vajjala, Ivana Lu{\v{c}}i{'c} |
Abstract | This paper describes the collection and compilation of the OneStopEnglish corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total). The corpus is now freely available under a CC by-SA 4.0 license and we hope that it would foster further research on the topics of readability assessment and text simplification. |
Tasks | Feature Engineering, Text Simplification |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-0535/ |
https://www.aclweb.org/anthology/W18-0535 | |
PWC | https://paperswithcode.com/paper/onestopenglish-corpus-a-new-corpus-for |
Repo | |
Framework | |