July 26, 2019

2282 words 11 mins read

Paper Group NANR 95

Paper Group NANR 95

A Code-Switching Corpus of Turkish-German Conversations. Multi-view Matrix Factorization for Linear Dynamical System Estimation. Twitter Language Identification Of Similar Languages And Dialects Without Ground Truth. Evaluating HeLI with Non-Linear Mappings. A Perplexity-Based Method for Similar Languages Discrimination. Developing collection manag …

A Code-Switching Corpus of Turkish-German Conversations

Title A Code-Switching Corpus of Turkish-German Conversations
Authors {"O}zlem {\c{C}}etino{\u{g}}lu
Abstract We present a code-switching corpus of Turkish-German that is collected by recording conversations of bilinguals. The recordings are then transcribed in two layers following speech and orthography conventions, and annotated with sentence boundaries and intersentential, intrasentential, and intra-word switch points. The total amount of data is 5 hours of speech which corresponds to 3614 sentences. The corpus aims at serving as a resource for speech or text analysis, as well as a collection for linguistic inquiries.
Tasks Language Identification, Language Modelling, Part-Of-Speech Tagging, Sentiment Analysis, Speech Recognition
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-0804/
PDF https://www.aclweb.org/anthology/W17-0804
PWC https://paperswithcode.com/paper/a-code-switching-corpus-of-turkish-german
Repo
Framework

Multi-view Matrix Factorization for Linear Dynamical System Estimation

Title Multi-view Matrix Factorization for Linear Dynamical System Estimation
Authors Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari
Abstract We consider maximum likelihood estimation of linear dynamical systems with generalized-linear observation models. Maximum likelihood is typically considered to be hard in this setting since latent states and transition parameters must be inferred jointly. Given that expectation-maximization does not scale and is prone to local minima, moment-matching approaches from the subspace identification literature have become standard, despite known statistical efficiency issues. In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters. Key to the approach is a two-view reformulation of maximum likelihood estimation for linear dynamical systems that enables the use of global optimization algorithms for matrix factorization. We show that the proposed estimation strategy outperforms widely-used identification algorithms such as subspace identification methods, both in terms of accuracy and runtime.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/7284-multi-view-matrix-factorization-for-linear-dynamical-system-estimation
PDF http://papers.nips.cc/paper/7284-multi-view-matrix-factorization-for-linear-dynamical-system-estimation.pdf
PWC https://paperswithcode.com/paper/multi-view-matrix-factorization-for-linear
Repo
Framework

Twitter Language Identification Of Similar Languages And Dialects Without Ground Truth

Title Twitter Language Identification Of Similar Languages And Dialects Without Ground Truth
Authors Jennifer Williams, Charlie Dagli
Abstract We present a new method to bootstrap filter Twitter language ID labels in our dataset for automatic language identification (LID). Our method combines geo-location, original Twitter LID labels, and Amazon Mechanical Turk to resolve missing and unreliable labels. We are the first to compare LID classification performance using the MIRA algorithm and langid.py. We show classifier performance on different versions of our dataset with high accuracy using only Twitter data, without ground truth, and very few training examples. We also show how Platt Scaling can be use to calibrate MIRA classifier output values into a probability distribution over candidate classes, making the output more intuitive. Our method allows for fine-grained distinctions between similar languages and dialects and allows us to rediscover the language composition of our Twitter dataset.
Tasks Language Identification, Sentiment Analysis
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1209/
PDF https://www.aclweb.org/anthology/W17-1209
PWC https://paperswithcode.com/paper/twitter-language-identification-of-similar
Repo
Framework

Evaluating HeLI with Non-Linear Mappings

Title Evaluating HeLI with Non-Linear Mappings
Authors Tommi Jauhiainen, Krister Lind{'e}n, Heidi Jauhiainen
Abstract In this paper we describe the non-linear mappings we used with the Helsinki language identification method, HeLI, in the 4th edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial 2017 workshop. Our SUKI team participated on the closed track together with 10 other teams. Our system reached the 7th position in the track. We describe the HeLI method and the non-linear mappings in mathematical notation. The HeLI method uses a probabilistic model with character n-grams and word-based backoff. We also describe our trials using the non-linear mappings instead of relative frequencies and we present statistics about the back-off function of the HeLI method.
Tasks Language Identification
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1212/
PDF https://www.aclweb.org/anthology/W17-1212
PWC https://paperswithcode.com/paper/evaluating-heli-with-non-linear-mappings
Repo
Framework

A Perplexity-Based Method for Similar Languages Discrimination

Title A Perplexity-Based Method for Similar Languages Discrimination
Authors Pablo Gamallo, Jose Ramom Pichel, I{~n}aki Alegria
Abstract This article describes the system submitted by the Citius{_}Ixa{_}Imaxin team to the VarDial 2017 (DSL and GDI tasks). The strategy underlying our system is based on a language distance computed by means of model perplexity. The best model configuration we have tested is a voting system making use of several $n$-grams models of both words and characters, even if word unigrams turned out to be a very competitive model with reasonable results in the tasks we have participated. An error analysis has been performed in which we identified many test examples with no linguistic evidences to distinguish among the variants.
Tasks Language Identification
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1213/
PDF https://www.aclweb.org/anthology/W17-1213
PWC https://paperswithcode.com/paper/a-perplexity-based-method-for-similar
Repo
Framework

Developing collection management tools to create more robust and reliable linguistic data

Title Developing collection management tools to create more robust and reliable linguistic data
Authors Gary Holton, Kavon Hooshiar, Nick Thieberger
Abstract
Tasks
Published 2017-03-01
URL https://www.aclweb.org/anthology/W17-0105/
PDF https://www.aclweb.org/anthology/W17-0105
PWC https://paperswithcode.com/paper/developing-collection-management-tools-to
Repo
Framework

LABDA at SemEval-2017 Task 10: Relation Classification between keyphrases via Convolutional Neural Network

Title LABDA at SemEval-2017 Task 10: Relation Classification between keyphrases via Convolutional Neural Network
Authors V{'\i}ctor Su{'a}rez-Paniagua, Isabel Segura-Bedmar, Paloma Mart{'\i}nez
Abstract In this paper, we describe our participation at the subtask of extraction of relationships between two identified keyphrases. This task can be very helpful in improving search engines for scientific articles. Our approach is based on the use of a convolutional neural network (CNN) trained on the training dataset. This deep learning model has already achieved successful results for the extraction relationships between named entities. Thus, our hypothesis is that this model can be also applied to extract relations between keyphrases. The official results of the task show that our architecture obtained an F1-score of 0.38{%} for Keyphrases Relation Classification. This performance is lower than the expected due to the generic preprocessing phase and the basic configuration of the CNN model, more complex architectures are proposed as future work to increase the classification rate.
Tasks Lemmatization, Relation Classification, Relation Extraction, Sentence Classification
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2169/
PDF https://www.aclweb.org/anthology/S17-2169
PWC https://paperswithcode.com/paper/labda-at-semeval-2017-task-10-relation
Repo
Framework

Domain-Targeted, High Precision Knowledge Extraction

Title Domain-Targeted, High Precision Knowledge Extraction
Authors Bhavana Dalvi Mishra, T, Niket on, Peter Clark
Abstract Our goal is to construct a domain-targeted, high precision knowledge base (KB), containing general (subject,predicate,object) statements about the world, in support of a downstream question-answering (QA) application. Despite recent advances in information extraction (IE) techniques, no suitable resource for our task already exists; existing resources are either too noisy, too named-entity centric, or too incomplete, and typically have not been constructed with a clear scope or purpose. To address these, we have created a domain-targeted, high precision knowledge extraction pipeline, leveraging Open IE, crowdsourcing, and a novel canonical schema learning algorithm (called CASI), that produces high precision knowledge targeted to a particular domain - in our case, elementary science. To measure the KB{'}s coverage of the target domain{'}s knowledge (its {``}comprehensiveness{''} with respect to science) we measure recall with respect to an independent corpus of domain text, and show that our pipeline produces output with over 80{%} precision and 23{%} recall with respect to that target, a substantially higher coverage of tuple-expressible science knowledge than other comparable resources. We have made the KB publicly available. |
Tasks Open Information Extraction, Question Answering
Published 2017-01-01
URL https://www.aclweb.org/anthology/Q17-1017/
PDF https://www.aclweb.org/anthology/Q17-1017
PWC https://paperswithcode.com/paper/domain-targeted-high-precision-knowledge
Repo
Framework

Affordable On-line Dialogue Policy Learning

Title Affordable On-line Dialogue Policy Learning
Authors Cheng Chang, Runzhe Yang, Lu Chen, Xiang Zhou, Kai Yu
Abstract The key to building an evolvable dialogue system in real-world scenarios is to ensure an affordable on-line dialogue policy learning, which requires the on-line learning process to be safe, efficient and economical. But in reality, due to the scarcity of real interaction data, the dialogue system usually grows slowly. Besides, the poor initial dialogue policy easily leads to bad user experience and incurs a failure of attracting users to contribute training data, so that the learning process is unsustainable. To accurately depict this, two quantitative metrics are proposed to assess safety and efficiency issues. For solving the unsustainable learning problem, we proposed a complete companion teaching framework incorporating the guidance from the human teacher. Since the human teaching is expensive, we compared various teaching schemes answering the question how and when to teach, to economically utilize teaching budget, so that make the online learning process affordable.
Tasks Dialogue Management
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1234/
PDF https://www.aclweb.org/anthology/D17-1234
PWC https://paperswithcode.com/paper/affordable-on-line-dialogue-policy-learning
Repo
Framework

Identification of Character Adjectives from Mahabharata

Title Identification of Character Adjectives from Mahabharata
Authors Apurba Paul, Dipankar Das
Abstract The present paper describes the identification of prominent characters and their adjectives from Indian mythological epic, Mahabharata, written in English texts. However, in contrast to the tra-ditional approaches of named entity identifica-tion, the present system extracts hidden attributes associated with each of the characters (e.g., character adjectives). We observed distinct phrase level linguistic patterns that hint the pres-ence of characters in different text spans. Such six patterns were used in order to extract the cha-racters. On the other hand, a distinguishing set of novel features (e.g., multi-word expression, nodes and paths of parse tree, immediate ancestors etc.) was employed. Further, the correlation of the features is also measured in order to identify the important features. Finally, we applied various machine learning algorithms (e.g., Naive Bayes, KNN, Logistic Regression, Decision Tree, Random Forest etc.) along with deep learning to classify the patterns as characters or non-characters in order to achieve decent accuracy. Evaluation shows that phrase level linguistic patterns as well as the adopted features are highly active in capturing characters and their adjectives.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1074/
PDF https://doi.org/10.26615/978-954-452-049-6_074
PWC https://paperswithcode.com/paper/identification-of-character-adjectives-from
Repo
Framework

Harmonic Serialism and Finite-State Optimality Theory

Title Harmonic Serialism and Finite-State Optimality Theory
Authors Yiding Hao
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4003/
PDF https://www.aclweb.org/anthology/W17-4003
PWC https://paperswithcode.com/paper/harmonic-serialism-and-finite-state
Repo
Framework

Computational Characterization of Mental States: A Natural Language Processing Approach

Title Computational Characterization of Mental States: A Natural Language Processing Approach
Authors Facundo Carrillo
Abstract
Tasks
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-3001/
PDF https://www.aclweb.org/anthology/P17-3001
PWC https://paperswithcode.com/paper/computational-characterization-of-mental
Repo
Framework

IJCNLP-2017 Task 2: Dimensional Sentiment Analysis for Chinese Phrases

Title IJCNLP-2017 Task 2: Dimensional Sentiment Analysis for Chinese Phrases
Authors Liang-Chih Yu, Lung-Hao Lee, Jin Wang, Kam-Fai Wong
Abstract This paper presents the IJCNLP 2017 shared task on Dimensional Sentiment Analysis for Chinese Phrases (DSAP) which seeks to identify a real-value sentiment score of Chinese single words and multi-word phrases in the both valence and arousal dimensions. Valence represents the degree of pleasant and unpleasant (or positive and negative) feelings, and arousal represents the degree of excitement and calm. Of the 19 teams registered for this shared task for two-dimensional sentiment analysis, 13 submitted results. We expected that this evaluation campaign could produce more advanced dimensional sentiment analysis techniques, especially for Chinese affective computing. All data sets with gold standards and scoring script are made publicly available to researchers.
Tasks Sentiment Analysis
Published 2017-12-01
URL https://www.aclweb.org/anthology/I17-4002/
PDF https://www.aclweb.org/anthology/I17-4002
PWC https://paperswithcode.com/paper/ijcnlp-2017-task-2-dimensional-sentiment
Repo
Framework

Parsing with Traces: An O(n4) Algorithm and a Structural Representation

Title Parsing with Traces: An O(n4) Algorithm and a Structural Representation
Authors Jonathan K. Kummerfeld, Dan Klein
Abstract General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons. We propose a new representation and algorithm for a class of graph structures that is flexible enough to cover almost all treebank structures, while still admitting efficient learning and inference. In particular, we consider directed, acyclic, one-endpoint-crossing graph structures, which cover most long-distance dislocation, shared argumentation, and similar tree-violating linguistic phenomena. We describe how to convert phrase structure parses, including traces, to our new representation in a reversible manner. Our dynamic program uniquely decomposes structures, is sound and complete, and covers 97.3{%} of the Penn English Treebank. We also implement a proof-of-concept parser that recovers a range of null elements and trace types.
Tasks Question Answering
Published 2017-01-01
URL https://www.aclweb.org/anthology/Q17-1031/
PDF https://www.aclweb.org/anthology/Q17-1031
PWC https://paperswithcode.com/paper/parsing-with-traces-an-on4-algorithm-and-a
Repo
Framework

Translation Divergences in Chinese–English Machine Translation: An Empirical Investigation

Title Translation Divergences in Chinese–English Machine Translation: An Empirical Investigation
Authors Dun Deng, Nianwen Xue
Abstract In this article, we conduct an empirical investigation of translation divergences between Chinese and English relying on a parallel treebank. To do this, we first devise a hierarchical alignment scheme where Chinese and English parse trees are aligned in a way that eliminates conflicts and redundancies between word alignments and syntactic parses to prevent the generation of spurious translation divergences. Using this Hierarchically Aligned Chinese{–}English Parallel Treebank (HACEPT), we are able to semi-automatically identify and categorize the translation divergences between the two languages and quantify each type of translation divergence. Our results show that the translation divergences are much broader than described in previous studies that are largely based on anecdotal evidence and linguistic knowledge. The distribution of the translation divergences also shows that some high-profile translation divergences that motivate previous research are actually very rare in our data, whereas other translation divergences that have previously received little attention actually exist in large quantities. We also show that HACEPT allows the extraction of syntax-based translation rules, most of which are expressive enough to capture the translation divergences, and point out that the syntactic annotation in existing treebanks is not optimal for extracting such translation rules. We also discuss the implications of our study for attempts to bridge translation divergences by devising shared semantic representations across languages. Our quantitative results lend further support to the observation that although it is possible to bridge some translation divergences with semantic representations, other translation divergences are open-ended, thus building a semantic representation that captures all possible translation divergences may be impractical.
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/J17-3002/
PDF https://www.aclweb.org/anthology/J17-3002
PWC https://paperswithcode.com/paper/translation-divergences-in-chineseaenglish
Repo
Framework
comments powered by Disqus