May 5, 2019

1732 words 9 mins read

Paper Group NAWR 5

Paper Group NAWR 5

Detecting and Characterizing Events. Database of Mandarin Neighborhood Statistics. Converting SynTagRus Dependency Treebank into Penn Treebank Style. AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding. CharNER: Character-Level Named Entity Recognition. Quality Assessment of the Reuters Vol. 2 Multilingual Corpus. Uns …

Detecting and Characterizing Events

Title Detecting and Characterizing Events
Authors Allison Chaney, Hanna Wallach, Matthew Connelly, David Blei
Abstract
Tasks
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1122/
PDF https://www.aclweb.org/anthology/D16-1122
PWC https://paperswithcode.com/paper/detecting-and-characterizing-events
Repo https://github.com/ajbc/capsule
Framework none

Database of Mandarin Neighborhood Statistics

Title Database of Mandarin Neighborhood Statistics
Authors Karl Neergaard, Hongzhi Xu, Chu-Ren Huang
Abstract In the design of controlled experiments with language stimuli, researchers from psycholinguistic, neurolinguistic, and related fields, require language resources that isolate variables known to affect language processing. This article describes a freely available database that provides word level statistics for words and nonwords of Mandarin, Chinese. The featured lexical statistics include subtitle corpus frequency, phonological neighborhood density, neighborhood frequency, and homophone density. The accompanying word descriptors include pinyin, ascii phonetic transcription (sampa), lexical tone, syllable structure, dominant PoS, and syllable, segment and pinyin lengths for each phonological word. It is designed for researchers particularly concerned with language processing of isolated words and made to accommodate multiple existing hypotheses concerning the structure of the Mandarin syllable. The database is divided into multiple files according to the desired search criteria: 1) the syllable segmentation schema used to calculate density measures, and 2) whether the search is for words or nonwords. The database is open to the research community at https://github.com/karlneergaard/Mandarin-Neighborhood-Statistics.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1636/
PDF https://www.aclweb.org/anthology/L16-1636
PWC https://paperswithcode.com/paper/database-of-mandarin-neighborhood-statistics
Repo https://github.com/karlneergaard/Mandarin-Neighborhood-Statistics
Framework none

Converting SynTagRus Dependency Treebank into Penn Treebank Style

Title Converting SynTagRus Dependency Treebank into Penn Treebank Style
Authors Alex Luu, Sophia A. Malamud, Nianwen Xue
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1703/
PDF https://www.aclweb.org/anthology/W16-1703
PWC https://paperswithcode.com/paper/converting-syntagrus-dependency-treebank-into
Repo https://github.com/luutuntin/SynTagRus_DS2PS
Framework none

AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding

Title AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding
Authors Xiang Ren, Wenqi He, Meng Qu, Lifu Huang, Heng Ji, Jiawei Han
Abstract
Tasks Entity Typing, Named Entity Recognition, Question Answering, Relation Extraction
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1144/
PDF https://www.aclweb.org/anthology/D16-1144
PWC https://paperswithcode.com/paper/afet-automatic-fine-grained-entity-typing-by
Repo https://github.com/shanzhenren/AFET
Framework none

CharNER: Character-Level Named Entity Recognition

Title CharNER: Character-Level Named Entity Recognition
Authors Onur Kuru, Ozan Arkan Can, Deniz Yuret
Abstract We describe and evaluate a character-level tagger for language-independent Named Entity Recognition (NER). Instead of words, a sentence is represented as a sequence of characters. The model consists of stacked bidirectional LSTMs which inputs characters and outputs tag probabilities for each character. These probabilities are then converted to consistent word level named entity tags using a Viterbi decoder. We are able to achieve close to state-of-the-art NER performance in seven languages with the same basic model using only labeled NER data and no hand-engineered features or other external resources like syntactic taggers or Gazetteers.
Tasks Feature Engineering, Named Entity Recognition, Word Embeddings
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1087/
PDF https://www.aclweb.org/anthology/C16-1087
PWC https://paperswithcode.com/paper/charner-character-level-named-entity
Repo https://github.com/ozanarkancan/char-ner
Framework none

Quality Assessment of the Reuters Vol. 2 Multilingual Corpus

Title Quality Assessment of the Reuters Vol. 2 Multilingual Corpus
Authors Robin Eriksson
Abstract We introduce a framework for quality assurance of corpora, and apply it to the Reuters Multilingual Corpus (RCV2). The results of this quality assessment of this standard newsprint corpus reveal a significant duplication problem and, to a lesser extent, a problem with corrupted articles. From the raw collection of some 487,000 articles, almost one tenth are trivial duplicates. A smaller fraction of articles appear to be corrupted and should be excluded for that reason. The detailed results are being made available as on-line appendices to this article. This effort also demonstrates the beginnings of a constraint-based methodological framework for quality assessment and quality assurance for corpora. As a first implementation of this framework, we have investigated constraints to verify sample integrity, and to diagnose sample duplication, entropy aberrations, and tagging inconsistencies. To help identify near-duplicates in the corpus, we have employed both entropy measurements and a simple byte bigram incidence digest.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1286/
PDF https://www.aclweb.org/anthology/L16-1286
PWC https://paperswithcode.com/paper/quality-assessment-of-the-reuters-vol-2
Repo https://github.com/rcv2/rcv2r1
Framework none

Unsupervised Neural Dependency Parsing

Title Unsupervised Neural Dependency Parsing
Authors Yong Jiang, Wenjuan Han, Kewei Tu
Abstract
Tasks Dependency Grammar Induction, Structured Prediction
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1073/
PDF https://www.aclweb.org/anthology/D16-1073
PWC https://paperswithcode.com/paper/unsupervised-neural-dependency-parsing
Repo https://github.com/ByronCHAO/neural_based_dmv
Framework pytorch

On the Compositionality and Semantic Interpretation of English Noun Compounds

Title On the Compositionality and Semantic Interpretation of English Noun Compounds
Authors Corina Dima
Abstract
Tasks Relation Classification, Representation Learning
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1604/
PDF https://www.aclweb.org/anthology/W16-1604
PWC https://paperswithcode.com/paper/on-the-compositionality-and-semantic
Repo https://github.com/corinadima/gWordcomp
Framework torch

ccg2lambda: A Compositional Semantics System

Title ccg2lambda: A Compositional Semantics System
Authors Pascual Mart{'\i}nez-G{'o}mez, Koji Mineshima, Yusuke Miyao, Daisuke Bekki
Abstract
Tasks Natural Language Inference, Semantic Parsing
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-4015/
PDF https://www.aclweb.org/anthology/P16-4015
PWC https://paperswithcode.com/paper/ccg2lambda-a-compositional-semantics-system
Repo https://github.com/mynlp/ccg2lambda
Framework none

From Euclidean to Riemannian Means: Information Geometry for SSVEP Classification

Title From Euclidean to Riemannian Means: Information Geometry for SSVEP Classification
Authors Emmanuel Kalunga, Sylvain Chevallier, Quentin Barthélemy, Karim Djouani, Yskandar Hamam, Eric Monacelli
Abstract Brain Computer Interfaces (BCI) based on electroencephalog-raphy (EEG) rely on multichannel brain signal processing. Most of the state-of-the-art approaches deal with covariance matrices , and indeed Riemannian geometry has provided a substantial framework for developing new algorithms. Most notably , a straightforward algorithm such as Minimum Distance to Mean yields competitive results when applied with a Riemannian distance. This applicative contribution aims at assessing the impact of several distances on real EEG dataset , as the invariances embedded in those distances have an influence on the classification accuracy . Euclidean and Riemannian distances and means are compared both in term of quality of results and of computational load .
Tasks EEG
Published 2016-04-03
URL https://hal.archives-ouvertes.fr/hal-01351753
PDF https://hal.archives-ouvertes.fr/hal-01351753/document
PWC https://paperswithcode.com/paper/from-euclidean-to-riemannian-means
Repo https://github.com/emmanuelkalunga/Offline-Riemannian-SSVEP
Framework none

Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation

Title Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation
Authors He He, Jordan Boyd-Graber, Hal Daum{'e} III
Abstract
Tasks Feature Selection, Machine Translation
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1111/
PDF https://www.aclweb.org/anthology/N16-1111
PWC https://paperswithcode.com/paper/interpretese-vs-translationese-the-uniqueness
Repo https://github.com/hhexiy/interpretese
Framework none

CNTK: Microsoft’s Open-Source Deep-Learning Toolkit

Title CNTK: Microsoft’s Open-Source Deep-Learning Toolkit
Authors Frank Seide, Amit Agarwal
Abstract This tutorial will introduce the Computational Network Toolkit, or CNTK, Microsoft’s cutting-edge open-source deep-learning toolkit for Windows and Linux. CNTK is a powerful computation-graph based deep-learning toolkit for training and evaluating deep neural networks. Microsoft product groups use CNTK, for example to create the Cortana speech models and web ranking. CNTK supports feed-forward, convolutional, and recurrent networks for speech, image, and text workloads, also in combination. Popular network types are supported either natively (convolution) or can be described as a CNTK configuration (LSTM, sequence-to-sequence). CNTK scales to multiple GPU servers and is designed around efficiency. The tutorial will give an overview of CNTK’s general architecture and describe the specific methods and algorithms used for automatic differentiation, recurrent-loop inference and execution, memory sharing, on-the-fly randomization of large corpora, and multi-server parallelization. We will then show how typical uses looks like for relevant tasks like image recognition, sequence-to-sequence modeling, and speech recognition.
Tasks Dimensionality Reduction
Published 2016-08-01
URL https://www.researchgate.net/publication/305997858_CNTK_Microsoft's_Open-Source_Deep-Learning_Toolkit
PDF https://www.researchgate.net/publication/305997858_CNTK_Microsoft's_Open-Source_Deep-Learning_Toolkit
PWC https://paperswithcode.com/paper/cntk-microsofts-open-source-deep-learning
Repo https://github.com/Microsoft/CNTK
Framework tf

Grammar induction from (lots of) words alone

Title Grammar induction from (lots of) words alone
Authors John K Pate, Mark Johnson
Abstract Grammar induction is the task of learning syntactic structure in a setting where that structure is hidden. Grammar induction from words alone is interesting because it is similiar to the problem that a child learning a language faces. Previous work has typically assumed richer but cognitively implausible input, such as POS tag annotated data, which makes that work less relevant to human language acquisition. We show that grammar induction from words alone is in fact feasible when the model is provided with sufficient training data, and present two new streaming or mini-batch algorithms for PCFG inference that can learn from millions of words of training data. We compare the performance of these algorithms to a batch algorithm that learns from less data. The minibatch algorithms outperform the batch algorithm, showing that cheap inference with more data is better than intensive inference with less data. Additionally, we show that the harmonic initialiser, which previous work identified as essential when learning from small POS-tag annotated corpora (Klein and Manning, 2004), is not superior to a uniform initialisation.
Tasks Language Acquisition, Topic Models
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1003/
PDF https://www.aclweb.org/anthology/C16-1003
PWC https://paperswithcode.com/paper/grammar-induction-from-lots-of-words-alone
Repo https://github.com/jkpate/streamingDMV
Framework none

Typed Entity and Relation Annotation on Computer Science Papers

Title Typed Entity and Relation Annotation on Computer Science Papers
Authors Yuka Tateisi, Tomoko Ohta, Sampo Pyysalo, Yusuke Miyao, Akiko Aizawa
Abstract We describe our ongoing effort to establish an annotation scheme for describing the semantic structures of research articles in the computer science domain, with the intended use of developing search systems that can refine their results by the roles of the entities denoted by the query keys. In our scheme, mentions of entities are annotated with ontology-based types, and the roles of the entities are annotated as relations with other entities described in the text. So far, we have annotated 400 abstracts from the ACL anthology and the ACM digital library. In this paper, the scheme and the annotated dataset are described, along with the problems found in the course of annotation. We also show the results of automatic annotation and evaluate the corpus in a practical setting in application to topic extraction.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1607/
PDF https://www.aclweb.org/anthology/L16-1607
PWC https://paperswithcode.com/paper/typed-entity-and-relation-annotation-on
Repo https://github.com/mynlp/ranis
Framework none

Tweet Sarcasm Detection Using Deep Neural Network

Title Tweet Sarcasm Detection Using Deep Neural Network
Authors Meishan Zhang, Yue Zhang, Guohong Fu
Abstract Sarcasm detection has been modeled as a binary document classification task, with rich features being defined manually over input documents. Traditional models employ discrete manual features to address the task, with much research effect being devoted to the design of effective feature templates. We investigate the use of neural network for tweet sarcasm detection, and compare the effects of the continuous automatic features with discrete manual features. In particular, we use a bi-directional gated recurrent neural network to capture syntactic and semantic information over tweets locally, and a pooling neural network to extract contextual features automatically from history tweets. Results show that neural features give improved accuracies for sarcasm detection, with different error distributions compared with discrete manual features.
Tasks Sarcasm Detection
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1231/
PDF https://www.aclweb.org/anthology/C16-1231
PWC https://paperswithcode.com/paper/tweet-sarcasm-detection-using-deep-neural
Repo https://github.com/zhangmeishan/SarcasmDetection
Framework none
comments powered by Disqus