May 4, 2019

2224 words 11 mins read

Paper Group NANR 229

Paper Group NANR 229

Learned Region Sparsity and Diversity Also Predicts Visual Attention. A Proposal for combining ``general’’ and specialized frames. Unsupervised Abbreviation Detection in Clinical Narratives. C2D2E2: Using Call Centers to Motivate the Use of Dialog and Diarization in Entity Extraction. A Large-scale Recipe and Meal Data Collection as Infrastructure …

Learned Region Sparsity and Diversity Also Predicts Visual Attention

Title Learned Region Sparsity and Diversity Also Predicts Visual Attention
Authors Zijun Wei, Hossein Adeli, Minh Hoai Nguyen, Greg Zelinsky, Dimitris Samaras
Abstract Learned region sparsity has achieved state-of-the-art performance in classification tasks by exploiting and integrating a sparse set of local information into global decisions. The underlying mechanism resembles how people sample information from an image with their eye movements when making similar decisions. In this paper we incorporate the biologically plausible mechanism of Inhibition of Return into the learned region sparsity model, thereby imposing diversity on the selected regions. We investigate how these mechanisms of sparsity and diversity relate to visual attention by testing our model on three different types of visual search tasks. We report state-of-the-art results in predicting the locations of human gaze fixations, even though our model is trained only on image-level labels without object location annotations. Notably, the classification performance of the extended model remains the same as the original. This work suggests a new computational perspective on visual attention mechanisms and shows how the inclusion of attention-based mechanisms can improve computer vision techniques.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6451-learned-region-sparsity-and-diversity-also-predicts-visual-attention
PDF http://papers.nips.cc/paper/6451-learned-region-sparsity-and-diversity-also-predicts-visual-attention.pdf
PWC https://paperswithcode.com/paper/learned-region-sparsity-and-diversity-also
Repo
Framework

A Proposal for combining ``general’’ and specialized frames

Title A Proposal for combining ``general’’ and specialized frames |
Authors Marie-Claude L{'} Homme, Carlos Subirats, Beno{^\i}t Robichaud
Abstract The objectives of the work described in this paper are: 1. To list the differences between a general language resource (namely FrameNet) and a domain-specific resource; 2. To devise solutions to merge their contents in order to increase the coverage of the general resource. Both resources are based on Frame Semantics (Fillmore 1985; Fillmore and Baker 2010) and this raises specific challenges since the theoretical framework and the methodology derived from it provide for both a lexical description and a conceptual representation. We propose a series of strategies that handle both lexical and conceptual (frame) differences and implemented them in the specialized resource. We also show that most differences can be handled in a straightforward manner. However, some more domain specific differences (such as frames defined exclusively for the specialized domain or relations between these frames) are likely to be much more difficult to take into account since some are domain-specific.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-5321/
PDF https://www.aclweb.org/anthology/W16-5321
PWC https://paperswithcode.com/paper/a-proposal-for-combining-general-and
Repo
Framework

Unsupervised Abbreviation Detection in Clinical Narratives

Title Unsupervised Abbreviation Detection in Clinical Narratives
Authors Markus Kreuzthaler, Michel Oleynik, Alex Avian, er, Stefan Schulz
Abstract Clinical narratives in electronic health record systems are a rich resource of patient-based information. They constitute an ongoing challenge for natural language processing, due to their high compactness and abundance of short forms. German medical texts exhibit numerous ad-hoc abbreviations that terminate with a period character. The disambiguation of period characters is therefore an important task for sentence and abbreviation detection. This task is addressed by a combination of co-occurrence information of word types with trailing period characters, a large domain dictionary, and a simple rule engine, thus merging statistical and dictionary-based disambiguation strategies. An F-measure of 0.95 could be reached by using the unsupervised approach presented in this paper. The results are promising for a domain-independent abbreviation detection strategy, because our approach avoids retraining of models or use case specific feature engineering efforts required for supervised machine learning approaches.
Tasks Feature Engineering, Tokenization
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4213/
PDF https://www.aclweb.org/anthology/W16-4213
PWC https://paperswithcode.com/paper/unsupervised-abbreviation-detection-in
Repo
Framework

C2D2E2: Using Call Centers to Motivate the Use of Dialog and Diarization in Entity Extraction

Title C2D2E2: Using Call Centers to Motivate the Use of Dialog and Diarization in Entity Extraction
Authors Ken Church, Weizhong Zhu, Jason Pelecanos
Abstract
Tasks Entity Extraction
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-6008/
PDF https://www.aclweb.org/anthology/W16-6008
PWC https://paperswithcode.com/paper/c2d2e2-using-call-centers-to-motivate-the-use
Repo
Framework

A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research

Title A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research
Authors Jun Harashima, Michiaki Ariga, Kenta Murata, Masayuki Ioki
Abstract Everyday meals are an important part of our daily lives and, currently, there are many Internet sites that help us plan these meals. Allied to the growth in the amount of food data such as recipes available on the Internet is an increase in the number of studies on these data, such as recipe analysis and recipe search. However, there are few publicly available resources for food research; those that do exist do not include a wide range of food data or any meal data (that is, likely combinations of recipes). In this study, we construct a large-scale recipe and meal data collection as the underlying infrastructure to promote food research. Our corpus consists of approximately 1.7 million recipes and 36000 meals in cookpad, one of the largest recipe sites in the world. We made the corpus available to researchers in February 2015 and as of February 2016, 82 research groups at 56 universities have made use of it to enhance their studies.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1389/
PDF https://www.aclweb.org/anthology/L16-1389
PWC https://paperswithcode.com/paper/a-large-scale-recipe-and-meal-data-collection
Repo
Framework

Example-based Acquisition of Fine-grained Collocation Resources

Title Example-based Acquisition of Fine-grained Collocation Resources
Authors Sara Rodr{'\i}guez-Fern{'a}ndez, Roberto Carlini, Luis Espinosa Anke, Leo Wanner
Abstract Collocations such as {}heavy rain{''} or {}make [a] decision{''}, are combinations of two elements where one (the base) is freely chosen, while the choice of the other (collocate) is restricted, depending on the base. Collocations present difficulties even to advanced language learners, who usually struggle to find the right collocate to express a particular meaning, e.g., both {}heavy{''} and {}strong{''} express the meaning {}intense{'}, but while {``}rain{''} selects {``}heavy{''}, {``}wind{''} selects {``}strong{''}. Lexical Functions (LFs) describe the meanings that hold between the elements of collocations, such as {}intense{'}, {}perform{'}, {}create{'}, {`}increase{'}, etc. Language resources with semantically classified collocations would be of great help for students, however they are expensive to build, since they are manually constructed, and scarce. We present an unsupervised approach to the acquisition and semantic classification of collocations according to LFs, based on word embeddings in which, given an example of a collocation for each of the target LFs and a set of bases, the system retrieves a list of collocates for each base and LF. |
Tasks Word Embeddings
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1367/
PDF https://www.aclweb.org/anthology/L16-1367
PWC https://paperswithcode.com/paper/example-based-acquisition-of-fine-grained
Repo
Framework

Word Embeddings and Convolutional Neural Network for Arabic Sentiment Classification

Title Word Embeddings and Convolutional Neural Network for Arabic Sentiment Classification
Authors Abdelghani Dahou, Shengwu Xiong, Junwei Zhou, Mohamed Houcine Haddoud, Pengfei Duan
Abstract With the development and the advancement of social networks, forums, blogs and online sales, a growing number of Arabs are expressing their opinions on the web. In this paper, a scheme of Arabic sentiment classification, which evaluates and detects the sentiment polarity from Arabic reviews and Arabic social media, is studied. We investigated in several architectures to build a quality neural word embeddings using a 3.4 billion words corpus from a collected 10 billion words web-crawled corpus. Moreover, a convolutional neural network trained on top of pre-trained Arabic word embeddings is used for sentiment classification to evaluate the quality of these word embeddings. The simulation results show that the proposed scheme outperforms the existed methods on 4 out of 5 balanced and unbalanced datasets.
Tasks Sentiment Analysis, Word Embeddings
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1228/
PDF https://www.aclweb.org/anthology/C16-1228
PWC https://paperswithcode.com/paper/word-embeddings-and-convolutional-neural
Repo
Framework

POS Tagging Experts via Topic Modeling

Title POS Tagging Experts via Topic Modeling
Authors Atreyee Mukherjee, S K{"u}bler, ra, Matthias Scheutz
Abstract
Tasks Domain Adaptation, Part-Of-Speech Tagging, Topic Models
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-6316/
PDF https://www.aclweb.org/anthology/W16-6316
PWC https://paperswithcode.com/paper/pos-tagging-experts-via-topic-modeling
Repo
Framework

Revisiting the Evaluation for Cross Document Event Coreference

Title Revisiting the Evaluation for Cross Document Event Coreference
Authors Shyam Upadhyay, Nitish Gupta, Christos Christodoulopoulos, Dan Roth
Abstract Cross document event coreference (CDEC) is an important task that aims at aggregating event-related information across multiple documents. We revisit the evaluation for CDEC, and discover that past works have adopted different, often inconsistent, evaluation settings, which either overlook certain mistakes in coreference decisions, or make assumptions that simplify the coreference task considerably. We suggest a new evaluation methodology which overcomes these limitations, and allows for an accurate assessment of CDEC systems. Our new evaluation setting better reflects the corpus-wide information aggregation ability of CDEC systems by separating event-coreference decisions made across documents from those made within a document. In addition, we suggest a better baseline for the task and semi-automatically identify several inconsistent annotations in the evaluation dataset.
Tasks Document Summarization, Multi-Document Summarization, Question Answering
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1183/
PDF https://www.aclweb.org/anthology/C16-1183
PWC https://paperswithcode.com/paper/revisiting-the-evaluation-for-cross-document
Repo
Framework

NRC Russian-English Machine Translation System for WMT 2016

Title NRC Russian-English Machine Translation System for WMT 2016
Authors Chi-kiu Lo, Colin Cherry, George Foster, Darlene Stewart, Rabib Islam, Anna Kazantseva, Rol Kuhn,
Abstract
Tasks Lemmatization, Machine Translation, Transliteration, Word Alignment
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2317/
PDF https://www.aclweb.org/anthology/W16-2317
PWC https://paperswithcode.com/paper/nrc-russian-english-machine-translation
Repo
Framework

On the Robustness of Standalone Referring Expression Generation Algorithms Using RDF Data

Title On the Robustness of Standalone Referring Expression Generation Algorithms Using RDF Data
Authors Pablo Duboue, Martin Ariel Dom{'\i}nguez, Paula Estrella
Abstract
Tasks Text Generation
Published 2016-09-01
URL https://www.aclweb.org/anthology/W16-3504/
PDF https://www.aclweb.org/anthology/W16-3504
PWC https://paperswithcode.com/paper/on-the-robustness-of-standalone-referring
Repo
Framework

Word Embeddings as Metric Recovery in Semantic Spaces

Title Word Embeddings as Metric Recovery in Semantic Spaces
Authors Tatsunori B. Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola
Abstract Continuous word representations have been remarkably useful across NLP tasks but remain poorly understood. We ground word embeddings in semantic spaces studied in the cognitive-psychometric literature, taking these spaces as the primary objects to recover. To this end, we relate log co-occurrences of words in large corpora to semantic similarity assessments and show that co-occurrences are indeed consistent with an Euclidean semantic space hypothesis. Framing word embedding as metric recovery of a semantic space unifies existing word embedding algorithms, ties them to manifold learning, and demonstrates that existing algorithms are consistent metric recovery methods given co-occurrence counts from random walks. Furthermore, we propose a simple, principled, direct metric recovery algorithm that performs on par with the state-of-the-art word embedding and manifold learning methods. Finally, we complement recent focus on analogies by constructing two new inductive reasoning datasets{—}series completion and classification{—}and demonstrate that word embeddings can be used to solve them as well.
Tasks Named Entity Recognition, Semantic Similarity, Semantic Textual Similarity, Word Embeddings
Published 2016-01-01
URL https://www.aclweb.org/anthology/Q16-1020/
PDF https://www.aclweb.org/anthology/Q16-1020
PWC https://paperswithcode.com/paper/word-embeddings-as-metric-recovery-in
Repo
Framework

Is all that Glitters in Machine Translation Quality Estimation really Gold?

Title Is all that Glitters in Machine Translation Quality Estimation really Gold?
Authors Yvette Graham, Timothy Baldwin, Meghan Dowling, Maria Eskevich, Teresa Lynn, Lamia Tounsi
Abstract Human-targeted metrics provide a compromise between human evaluation of machine translation, where high inter-annotator agreement is difficult to achieve, and fully automatic metrics, such as BLEU or TER, that lack the validity of human assessment. Human-targeted translation edit rate (HTER) is by far the most widely employed human-targeted metric in machine translation, commonly employed, for example, as a gold standard in evaluation of quality estimation. Original experiments justifying the design of HTER, as opposed to other possible formulations, were limited to a small sample of translations and a single language pair, however, and this motivates our re-evaluation of a range of human-targeted metrics on a substantially larger scale. Results show significantly stronger correlation with human judgment for HBLEU over HTER for two of the nine language pairs we include and no significant difference between correlations achieved by HTER and HBLEU for the remaining language pairs. Finally, we evaluate a range of quality estimation systems employing HTER and direct assessment (DA) of translation adequacy as gold labels, resulting in a divergence in system rankings, and propose employment of DA for future quality estimation evaluations.
Tasks Machine Translation
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1294/
PDF https://www.aclweb.org/anthology/C16-1294
PWC https://paperswithcode.com/paper/is-all-that-glitters-in-machine-translation
Repo
Framework

AMISCO: The Austrian German Multi-Sensor Corpus

Title AMISCO: The Austrian German Multi-Sensor Corpus
Authors Hannes Pessentheiner, Thomas Pichler, Martin Hagm{"u}ller
Abstract We introduce a unique, comprehensive Austrian German multi-sensor corpus with moving and non-moving speakers to facilitate the evaluation of estimators and detectors that jointly detect a speaker{'}s spatial and temporal parameters. The corpus is suitable for various machine learning and signal processing tasks, linguistic studies, and studies related to a speaker{'}s fundamental frequency (due to recorded glottograms). Available corpora are limited to (synthetically generated/spatialized) speech data or recordings of musical instruments that lack moving speakers, glottograms, and/or multi-channel distant speech recordings. That is why we recorded 24 spatially non-moving and moving speakers, balanced male and female, to set up a two-room and 43-channel Austrian German multi-sensor speech corpus. It contains 8.2 hours of read speech based on phonetically balanced sentences, commands, and digits. The orthographic transcriptions include around 53,000 word tokens and 2,070 word types. Special features of this corpus are the laryngograph recordings (representing glottograms required to detect a speaker{'}s instantaneous fundamental frequency and pitch), corresponding clean-speech recordings, and spatial information and video data provided by four Kinects and a camera.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1121/
PDF https://www.aclweb.org/anthology/L16-1121
PWC https://paperswithcode.com/paper/amisco-the-austrian-german-multi-sensor
Repo
Framework

A Comparative Study of Post-editing Guidelines

Title A Comparative Study of Post-editing Guidelines
Authors Ke Hu, Patrick Cadwell
Abstract
Tasks Machine Translation
Published 2016-01-01
URL https://www.aclweb.org/anthology/W16-3420/
PDF https://www.aclweb.org/anthology/W16-3420
PWC https://paperswithcode.com/paper/a-comparative-study-of-post-editing
Repo
Framework
comments powered by Disqus