May 4, 2019

2105 words 10 mins read

Paper Group NANR 209

Paper Group NANR 209

Implicit readability ranking using the latent variable of a Bayesian Probit model. CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis. Graph- and surface-level sentence chunking. Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market. Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments …

Implicit readability ranking using the latent variable of a Bayesian Probit model

Title Implicit readability ranking using the latent variable of a Bayesian Probit model
Authors Johan Falkenjack, Arne J{"o}nsson
Abstract Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora. Often, relevant corpora consist only of easy-to-read texts with no rank information or empirical readability scores, making only binary approaches, such as classification, applicable. We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encouraging results. We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4112/
PDF https://www.aclweb.org/anthology/W16-4112
PWC https://paperswithcode.com/paper/implicit-readability-ranking-using-the-latent
Repo
Framework

CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis

Title CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis
Authors Xiaobin Chen, Detmar Meurers
Abstract Informed by research on readability and language acquisition, computational linguists have developed sophisticated tools for the analysis of linguistic complexity. While some tools are starting to become accessible on the web, there still is a disconnect between the features that can in principle be identified based on state-of-the-art computational linguistic analysis, and the analyses a second language acquisition researcher, teacher, or textbook writer can readily obtain and visualize for their own collection of texts. This short paper presents a web-based tool development that aims to meet this challenge. The Common Text Analysis Platform (CTAP) is designed to support fully configurable linguistic feature extraction for a wide range of complexity analyses. It features a user-friendly interface, modularized and reusable analysis component integration, and flexible corpus and feature management. Building on the Unstructured Information Management framework (UIMA), CTAP readily supports integration of state-of-the-art NLP and complexity feature extraction maintaining modularization and reusability. CTAP thereby aims at providing a common platform for complexity analysis, encouraging research collaboration and sharing of feature extraction components{—}to jointly advance the state-of-the-art in complexity analysis in a form that readily supports real-life use by ordinary users.
Tasks Language Acquisition
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4113/
PDF https://www.aclweb.org/anthology/W16-4113
PWC https://paperswithcode.com/paper/ctap-a-web-based-tool-supporting-automatic
Repo
Framework

Graph- and surface-level sentence chunking

Title Graph- and surface-level sentence chunking
Authors Ewa Muszy{'n}ska
Abstract
Tasks Chunking
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-3014/
PDF https://www.aclweb.org/anthology/P16-3014
PWC https://paperswithcode.com/paper/graph-and-surface-level-sentence-chunking
Repo
Framework

Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market

Title Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market
Authors Rachel Harsley, Bhavesh Gupta, Barbara Di Eugenio, Huayi Li
Abstract
Tasks Sentiment Analysis
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0406/
PDF https://www.aclweb.org/anthology/W16-0406
PWC https://paperswithcode.com/paper/hit-songsa-sentiments-harness-public-mood
Repo
Framework

Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments

Title Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments
Authors Hao Wang, Yves Lepage
Abstract fast align is a simple and fast word alignment tool which is widely used in state-of-the-art machine translation systems. It yields comparable results in the end-to-end translation experiments of various language pairs. However, fast align does not perform as well as GIZA++ when applied to language pairs with distinct word orders, like English and Japanese. In this paper, given the lexical translation table output by fast align, we propose to realign words using the hierarchical sub-sentential alignment approach. Experimental results show that simple additional processing improves the performance of word alignment, which is measured by counting alignment matches in comparison with fast align. We also report the result of final machine translation in both English-Japanese and Japanese-English. We show our best system provided significant improvements over the baseline as measured by BLEU and RIBES.
Tasks Machine Translation, Word Alignment
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4501/
PDF https://www.aclweb.org/anthology/W16-4501
PWC https://paperswithcode.com/paper/combining-fast_align-with-hierarchical-sub
Repo
Framework

Using Ambiguity Detection to Streamline Linguistic Annotation

Title Using Ambiguity Detection to Streamline Linguistic Annotation
Authors Wajdi Zaghouani, Abdelati Hawwari, Sawsan Alqahtani, Houda Bouamor, Mahmoud Ghoneim, Mona Diab, Kemal Oflazer
Abstract Arabic writing is typically underspecified for short vowels and other markups, referred to as diacritics. In addition to the lexical ambiguity exhibited in most languages, the lack of diacritics in written Arabic adds another layer of ambiguity which is an artifact of the orthography. In this paper, we present the details of three annotation experimental conditions designed to study the impact of automatic ambiguity detection, on annotation speed and quality in a large scale annotation project.
Tasks Machine Translation, Speech Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4115/
PDF https://www.aclweb.org/anthology/W16-4115
PWC https://paperswithcode.com/paper/using-ambiguity-detection-to-streamline
Repo
Framework

Mixed Linear Regression with Multiple Components

Title Mixed Linear Regression with Multiple Components
Authors Kai Zhong, Prateek Jain, Inderjit S. Dhillon
Abstract In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$. Furthermore, we show that our non-convex formulation can be extended to solving the {\em subspace clustering} problem as well. In particular, when initialized within a small constant distance to the true subspaces, our method converges to the global optima (and recovers true subspaces) in time {\em linear} in the number of points. Furthermore, our empirical results indicate that even with random initialization, our approach converges to the global optima in linear time, providing speed-up of up to two orders of magnitude.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6240-mixed-linear-regression-with-multiple-components
PDF http://papers.nips.cc/paper/6240-mixed-linear-regression-with-multiple-components.pdf
PWC https://paperswithcode.com/paper/mixed-linear-regression-with-multiple
Repo
Framework

A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora

Title A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora
Authors Christian Bentz, Tatyana Ruzsics, Alex Koplenig, er, Tanja Samard{\v{z}}i{'c}
Abstract Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing. The need to compare languages with regard to their complexity resulted in a multitude of approaches and methods, ranging from accounts targeting specific structural features to global quantification of variation more generally. In this paper, we investigate the degree to which morphological complexity measures are mutually correlated in a sample of more than 500 languages of 101 language families. We use human expert judgements from the World Atlas of Language Structures (WALS), and compare them to four quantitative measures automatically calculated from language corpora. These consist of three previously defined corpus-derived measures, which are all monolingual, and one new measure based on automatic word-alignment across pairs of languages. We find strong correlations between all the measures, illustrating that both expert judgements and automated approaches converge to similar complexity ratings, and can be used interchangeably.
Tasks Machine Translation, Word Alignment
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4117/
PDF https://www.aclweb.org/anthology/W16-4117
PWC https://paperswithcode.com/paper/a-comparison-between-morphological-complexity
Repo
Framework

Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings

Title Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings
Authors Peyman Passban, Qun Liu, Andy Way
Abstract The phrase table is considered to be the main bilingual resource for the phrase-based statistical machine translation (PBSMT) model. During translation, a source sentence is decomposed into several phrases. The best match of each source phrase is selected among several target-side counterparts within the phrase table, and processed by the decoder to generate a sentence-level translation. The best match is chosen according to several factors, including a set of bilingual features. PBSMT engines by default provide four probability scores in phrase tables which are considered as the main set of bilingual features. Our goal is to enrich that set of features, as a better feature set should yield better translations. We propose new scores generated by a Convolutional Neural Network (CNN) which indicate the semantic relatedness of phrase pairs. We evaluate our model in different experimental settings with different language pairs. We observe significant improvements when the proposed features are incorporated into the PBSMT pipeline.
Tasks Document Classification, Machine Translation, Word Embeddings
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1243/
PDF https://www.aclweb.org/anthology/C16-1243
PWC https://paperswithcode.com/paper/enriching-phrase-tables-for-statistical
Repo
Framework

Learning Transducer Models for Morphological Analysis from Example Inflections

Title Learning Transducer Models for Morphological Analysis from Example Inflections
Authors Markus Forsberg, Mans Hulden
Abstract
Tasks Morphological Analysis, Morphological Inflection
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2405/
PDF https://www.aclweb.org/anthology/W16-2405
PWC https://paperswithcode.com/paper/learning-transducer-models-for-morphological
Repo
Framework

Nomen Omen. Enhancing the Latin Morphological Analyser Lemlat with an Onomasticon

Title Nomen Omen. Enhancing the Latin Morphological Analyser Lemlat with an Onomasticon
Authors Marco Budassi, Marco Passarotti
Abstract
Tasks Morphological Analysis, Morphological Inflection, Named Entity Recognition
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2110/
PDF https://www.aclweb.org/anthology/W16-2110
PWC https://paperswithcode.com/paper/nomen-omen-enhancing-the-latin-morphological
Repo
Framework

Testing the Processing Hypothesis of word order variation using a probabilistic language model

Title Testing the Processing Hypothesis of word order variation using a probabilistic language model
Authors Jelke Bloem
Abstract This work investigates the application of a measure of surprisal to modeling a grammatical variation phenomenon between near-synonymous constructions. We investigate a particular variation phenomenon, word order variation in Dutch two-verb clusters, where it has been established that word order choice is affected by processing cost. Several multifactorial corpus studies of Dutch verb clusters have used other measures of processing complexity to show that this factor affects word order choice. This previous work allows us to compare the surprisal measure, which is based on constraint satisfaction theories of language modeling, to those previously used measures, which are more directly linked to empirical observations of processing complexity. Our results show that surprisal does not predict the word order choice by itself, but is a significant predictor when used in a measure of uniform information density (UID). This lends support to the view that human language processing is facilitated not so much by predictable sequences of words but more by sequences of words in which information is spread evenly.
Tasks Language Modelling
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4120/
PDF https://www.aclweb.org/anthology/W16-4120
PWC https://paperswithcode.com/paper/testing-the-processing-hypothesis-of-word
Repo
Framework

Crossmodal Network-Based Distributional Semantic Models

Title Crossmodal Network-Based Distributional Semantic Models
Authors Elias Iosif, Alex Potamianos, ros
Abstract Despite the recent success of distributional semantic models (DSMs) in various semantic tasks they remain disconnected with real-world perceptual cues since they typically rely on linguistic features. Text data constitute the dominant source of features for the majority of such models, although there is evidence from cognitive science that cues from other modalities contribute to the acquisition and representation of semantic knowledge. In this work, we propose the crossmodal extension of a two-tier text-based model, where semantic representations are encoded in the first layer, while the second layer is used for computing similarity between words. We exploit text- and image-derived features for performing computations at each layer, as well as various approaches for their crossmodal fusion. It is shown that the crossmodal model performs better (from 0.68 to 0.71 correlation coefficient) than the unimodal one for the task of similarity computation between words.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1627/
PDF https://www.aclweb.org/anthology/L16-1627
PWC https://paperswithcode.com/paper/crossmodal-network-based-distributional
Repo
Framework

Coreference in Wikipedia: Main Concept Resolution

Title Coreference in Wikipedia: Main Concept Resolution
Authors Abbas Ghaddar, Phillippe Langlais
Abstract
Tasks Coreference Resolution, Open Information Extraction
Published 2016-08-01
URL https://www.aclweb.org/anthology/K16-1023/
PDF https://www.aclweb.org/anthology/K16-1023
PWC https://paperswithcode.com/paper/coreference-in-wikipedia-main-concept
Repo
Framework

Temporal Lobes as Combinatory Engines for both Form and Meaning

Title Temporal Lobes as Combinatory Engines for both Form and Meaning
Authors Jixing Li, Jonathan Brennan, Adam Mahar, John Hale
Abstract The relative contributions of meaning and form to sentence processing remains an outstanding issue across the language sciences. We examine this issue by formalizing four incremental complexity metrics and comparing them against freely-available ROI timecourses. Syntax-related metrics based on top-down parsing and structural dependency-distance turn out to significantly improve a regression model, compared to a simpler model that formalizes only conceptual combination using a distributional vector-space model. This confirms the view of the anterior temporal lobes as combinatory engines that deal in both form (see e.g. Brennan et al., 2012; Mazoyer, 1993) and meaning (see e.g., Patterson et al., 2007). This same characterization applies to a posterior temporal region in roughly {``}Wernicke{'}s Area.{''} |
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4121/
PDF https://www.aclweb.org/anthology/W16-4121
PWC https://paperswithcode.com/paper/temporal-lobes-as-combinatory-engines-for
Repo
Framework
comments powered by Disqus