Paper Group NAWR 5
Detecting and Characterizing Events. Database of Mandarin Neighborhood Statistics. Converting SynTagRus Dependency Treebank into Penn Treebank Style. AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding. CharNER: Character-Level Named Entity Recognition. Quality Assessment of the Reuters Vol. 2 Multilingual Corpus. Uns …
Detecting and Characterizing Events
Title | Detecting and Characterizing Events |
Authors | Allison Chaney, Hanna Wallach, Matthew Connelly, David Blei |
Abstract | |
Tasks | |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1122/ |
https://www.aclweb.org/anthology/D16-1122 | |
PWC | https://paperswithcode.com/paper/detecting-and-characterizing-events |
Repo | https://github.com/ajbc/capsule |
Framework | none |
Database of Mandarin Neighborhood Statistics
Title | Database of Mandarin Neighborhood Statistics |
Authors | Karl Neergaard, Hongzhi Xu, Chu-Ren Huang |
Abstract | In the design of controlled experiments with language stimuli, researchers from psycholinguistic, neurolinguistic, and related fields, require language resources that isolate variables known to affect language processing. This article describes a freely available database that provides word level statistics for words and nonwords of Mandarin, Chinese. The featured lexical statistics include subtitle corpus frequency, phonological neighborhood density, neighborhood frequency, and homophone density. The accompanying word descriptors include pinyin, ascii phonetic transcription (sampa), lexical tone, syllable structure, dominant PoS, and syllable, segment and pinyin lengths for each phonological word. It is designed for researchers particularly concerned with language processing of isolated words and made to accommodate multiple existing hypotheses concerning the structure of the Mandarin syllable. The database is divided into multiple files according to the desired search criteria: 1) the syllable segmentation schema used to calculate density measures, and 2) whether the search is for words or nonwords. The database is open to the research community at https://github.com/karlneergaard/Mandarin-Neighborhood-Statistics. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1636/ |
https://www.aclweb.org/anthology/L16-1636 | |
PWC | https://paperswithcode.com/paper/database-of-mandarin-neighborhood-statistics |
Repo | https://github.com/karlneergaard/Mandarin-Neighborhood-Statistics |
Framework | none |
Converting SynTagRus Dependency Treebank into Penn Treebank Style
Title | Converting SynTagRus Dependency Treebank into Penn Treebank Style |
Authors | Alex Luu, Sophia A. Malamud, Nianwen Xue |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1703/ |
https://www.aclweb.org/anthology/W16-1703 | |
PWC | https://paperswithcode.com/paper/converting-syntagrus-dependency-treebank-into |
Repo | https://github.com/luutuntin/SynTagRus_DS2PS |
Framework | none |
AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding
Title | AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding |
Authors | Xiang Ren, Wenqi He, Meng Qu, Lifu Huang, Heng Ji, Jiawei Han |
Abstract | |
Tasks | Entity Typing, Named Entity Recognition, Question Answering, Relation Extraction |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1144/ |
https://www.aclweb.org/anthology/D16-1144 | |
PWC | https://paperswithcode.com/paper/afet-automatic-fine-grained-entity-typing-by |
Repo | https://github.com/shanzhenren/AFET |
Framework | none |
CharNER: Character-Level Named Entity Recognition
Title | CharNER: Character-Level Named Entity Recognition |
Authors | Onur Kuru, Ozan Arkan Can, Deniz Yuret |
Abstract | We describe and evaluate a character-level tagger for language-independent Named Entity Recognition (NER). Instead of words, a sentence is represented as a sequence of characters. The model consists of stacked bidirectional LSTMs which inputs characters and outputs tag probabilities for each character. These probabilities are then converted to consistent word level named entity tags using a Viterbi decoder. We are able to achieve close to state-of-the-art NER performance in seven languages with the same basic model using only labeled NER data and no hand-engineered features or other external resources like syntactic taggers or Gazetteers. |
Tasks | Feature Engineering, Named Entity Recognition, Word Embeddings |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1087/ |
https://www.aclweb.org/anthology/C16-1087 | |
PWC | https://paperswithcode.com/paper/charner-character-level-named-entity |
Repo | https://github.com/ozanarkancan/char-ner |
Framework | none |
Quality Assessment of the Reuters Vol. 2 Multilingual Corpus
Title | Quality Assessment of the Reuters Vol. 2 Multilingual Corpus |
Authors | Robin Eriksson |
Abstract | We introduce a framework for quality assurance of corpora, and apply it to the Reuters Multilingual Corpus (RCV2). The results of this quality assessment of this standard newsprint corpus reveal a significant duplication problem and, to a lesser extent, a problem with corrupted articles. From the raw collection of some 487,000 articles, almost one tenth are trivial duplicates. A smaller fraction of articles appear to be corrupted and should be excluded for that reason. The detailed results are being made available as on-line appendices to this article. This effort also demonstrates the beginnings of a constraint-based methodological framework for quality assessment and quality assurance for corpora. As a first implementation of this framework, we have investigated constraints to verify sample integrity, and to diagnose sample duplication, entropy aberrations, and tagging inconsistencies. To help identify near-duplicates in the corpus, we have employed both entropy measurements and a simple byte bigram incidence digest. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1286/ |
https://www.aclweb.org/anthology/L16-1286 | |
PWC | https://paperswithcode.com/paper/quality-assessment-of-the-reuters-vol-2 |
Repo | https://github.com/rcv2/rcv2r1 |
Framework | none |
Unsupervised Neural Dependency Parsing
Title | Unsupervised Neural Dependency Parsing |
Authors | Yong Jiang, Wenjuan Han, Kewei Tu |
Abstract | |
Tasks | Dependency Grammar Induction, Structured Prediction |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1073/ |
https://www.aclweb.org/anthology/D16-1073 | |
PWC | https://paperswithcode.com/paper/unsupervised-neural-dependency-parsing |
Repo | https://github.com/ByronCHAO/neural_based_dmv |
Framework | pytorch |
On the Compositionality and Semantic Interpretation of English Noun Compounds
Title | On the Compositionality and Semantic Interpretation of English Noun Compounds |
Authors | Corina Dima |
Abstract | |
Tasks | Relation Classification, Representation Learning |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1604/ |
https://www.aclweb.org/anthology/W16-1604 | |
PWC | https://paperswithcode.com/paper/on-the-compositionality-and-semantic |
Repo | https://github.com/corinadima/gWordcomp |
Framework | torch |
ccg2lambda: A Compositional Semantics System
Title | ccg2lambda: A Compositional Semantics System |
Authors | Pascual Mart{'\i}nez-G{'o}mez, Koji Mineshima, Yusuke Miyao, Daisuke Bekki |
Abstract | |
Tasks | Natural Language Inference, Semantic Parsing |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-4015/ |
https://www.aclweb.org/anthology/P16-4015 | |
PWC | https://paperswithcode.com/paper/ccg2lambda-a-compositional-semantics-system |
Repo | https://github.com/mynlp/ccg2lambda |
Framework | none |
From Euclidean to Riemannian Means: Information Geometry for SSVEP Classification
Title | From Euclidean to Riemannian Means: Information Geometry for SSVEP Classification |
Authors | Emmanuel Kalunga, Sylvain Chevallier, Quentin Barthélemy, Karim Djouani, Yskandar Hamam, Eric Monacelli |
Abstract | Brain Computer Interfaces (BCI) based on electroencephalog-raphy (EEG) rely on multichannel brain signal processing. Most of the state-of-the-art approaches deal with covariance matrices , and indeed Riemannian geometry has provided a substantial framework for developing new algorithms. Most notably , a straightforward algorithm such as Minimum Distance to Mean yields competitive results when applied with a Riemannian distance. This applicative contribution aims at assessing the impact of several distances on real EEG dataset , as the invariances embedded in those distances have an influence on the classification accuracy . Euclidean and Riemannian distances and means are compared both in term of quality of results and of computational load . |
Tasks | EEG |
Published | 2016-04-03 |
URL | https://hal.archives-ouvertes.fr/hal-01351753 |
https://hal.archives-ouvertes.fr/hal-01351753/document | |
PWC | https://paperswithcode.com/paper/from-euclidean-to-riemannian-means |
Repo | https://github.com/emmanuelkalunga/Offline-Riemannian-SSVEP |
Framework | none |
Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation
Title | Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation |
Authors | He He, Jordan Boyd-Graber, Hal Daum{'e} III |
Abstract | |
Tasks | Feature Selection, Machine Translation |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1111/ |
https://www.aclweb.org/anthology/N16-1111 | |
PWC | https://paperswithcode.com/paper/interpretese-vs-translationese-the-uniqueness |
Repo | https://github.com/hhexiy/interpretese |
Framework | none |
CNTK: Microsoft’s Open-Source Deep-Learning Toolkit
Title | CNTK: Microsoft’s Open-Source Deep-Learning Toolkit |
Authors | Frank Seide, Amit Agarwal |
Abstract | This tutorial will introduce the Computational Network Toolkit, or CNTK, Microsoft’s cutting-edge open-source deep-learning toolkit for Windows and Linux. CNTK is a powerful computation-graph based deep-learning toolkit for training and evaluating deep neural networks. Microsoft product groups use CNTK, for example to create the Cortana speech models and web ranking. CNTK supports feed-forward, convolutional, and recurrent networks for speech, image, and text workloads, also in combination. Popular network types are supported either natively (convolution) or can be described as a CNTK configuration (LSTM, sequence-to-sequence). CNTK scales to multiple GPU servers and is designed around efficiency. The tutorial will give an overview of CNTK’s general architecture and describe the specific methods and algorithms used for automatic differentiation, recurrent-loop inference and execution, memory sharing, on-the-fly randomization of large corpora, and multi-server parallelization. We will then show how typical uses looks like for relevant tasks like image recognition, sequence-to-sequence modeling, and speech recognition. |
Tasks | Dimensionality Reduction |
Published | 2016-08-01 |
URL | https://www.researchgate.net/publication/305997858_CNTK_Microsoft's_Open-Source_Deep-Learning_Toolkit |
https://www.researchgate.net/publication/305997858_CNTK_Microsoft's_Open-Source_Deep-Learning_Toolkit | |
PWC | https://paperswithcode.com/paper/cntk-microsofts-open-source-deep-learning |
Repo | https://github.com/Microsoft/CNTK |
Framework | tf |
Grammar induction from (lots of) words alone
Title | Grammar induction from (lots of) words alone |
Authors | John K Pate, Mark Johnson |
Abstract | Grammar induction is the task of learning syntactic structure in a setting where that structure is hidden. Grammar induction from words alone is interesting because it is similiar to the problem that a child learning a language faces. Previous work has typically assumed richer but cognitively implausible input, such as POS tag annotated data, which makes that work less relevant to human language acquisition. We show that grammar induction from words alone is in fact feasible when the model is provided with sufficient training data, and present two new streaming or mini-batch algorithms for PCFG inference that can learn from millions of words of training data. We compare the performance of these algorithms to a batch algorithm that learns from less data. The minibatch algorithms outperform the batch algorithm, showing that cheap inference with more data is better than intensive inference with less data. Additionally, we show that the harmonic initialiser, which previous work identified as essential when learning from small POS-tag annotated corpora (Klein and Manning, 2004), is not superior to a uniform initialisation. |
Tasks | Language Acquisition, Topic Models |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1003/ |
https://www.aclweb.org/anthology/C16-1003 | |
PWC | https://paperswithcode.com/paper/grammar-induction-from-lots-of-words-alone |
Repo | https://github.com/jkpate/streamingDMV |
Framework | none |
Typed Entity and Relation Annotation on Computer Science Papers
Title | Typed Entity and Relation Annotation on Computer Science Papers |
Authors | Yuka Tateisi, Tomoko Ohta, Sampo Pyysalo, Yusuke Miyao, Akiko Aizawa |
Abstract | We describe our ongoing effort to establish an annotation scheme for describing the semantic structures of research articles in the computer science domain, with the intended use of developing search systems that can refine their results by the roles of the entities denoted by the query keys. In our scheme, mentions of entities are annotated with ontology-based types, and the roles of the entities are annotated as relations with other entities described in the text. So far, we have annotated 400 abstracts from the ACL anthology and the ACM digital library. In this paper, the scheme and the annotated dataset are described, along with the problems found in the course of annotation. We also show the results of automatic annotation and evaluate the corpus in a practical setting in application to topic extraction. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1607/ |
https://www.aclweb.org/anthology/L16-1607 | |
PWC | https://paperswithcode.com/paper/typed-entity-and-relation-annotation-on |
Repo | https://github.com/mynlp/ranis |
Framework | none |
Tweet Sarcasm Detection Using Deep Neural Network
Title | Tweet Sarcasm Detection Using Deep Neural Network |
Authors | Meishan Zhang, Yue Zhang, Guohong Fu |
Abstract | Sarcasm detection has been modeled as a binary document classification task, with rich features being defined manually over input documents. Traditional models employ discrete manual features to address the task, with much research effect being devoted to the design of effective feature templates. We investigate the use of neural network for tweet sarcasm detection, and compare the effects of the continuous automatic features with discrete manual features. In particular, we use a bi-directional gated recurrent neural network to capture syntactic and semantic information over tweets locally, and a pooling neural network to extract contextual features automatically from history tweets. Results show that neural features give improved accuracies for sarcasm detection, with different error distributions compared with discrete manual features. |
Tasks | Sarcasm Detection |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1231/ |
https://www.aclweb.org/anthology/C16-1231 | |
PWC | https://paperswithcode.com/paper/tweet-sarcasm-detection-using-deep-neural |
Repo | https://github.com/zhangmeishan/SarcasmDetection |
Framework | none |