May 4, 2019

2105 words 10 mins read

Paper Group NANR 209

Implicit readability ranking using the latent variable of a Bayesian Probit model. CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis. Graph- and surface-level sentence chunking. Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market. Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments …

Implicit readability ranking using the latent variable of a Bayesian Probit model


Title	Implicit readability ranking using the latent variable of a Bayesian Probit model
Authors	Johan Falkenjack, Arne J{"o}nsson
Abstract	Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora. Often, relevant corpora consist only of easy-to-read texts with no rank information or empirical readability scores, making only binary approaches, such as classification, applicable. We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encouraging results. We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4112/
PDF	https://www.aclweb.org/anthology/W16-4112
PWC	https://paperswithcode.com/paper/implicit-readability-ranking-using-the-latent
Repo
Framework

CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis


Title	CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis
Authors	Xiaobin Chen, Detmar Meurers
Abstract	Informed by research on readability and language acquisition, computational linguists have developed sophisticated tools for the analysis of linguistic complexity. While some tools are starting to become accessible on the web, there still is a disconnect between the features that can in principle be identified based on state-of-the-art computational linguistic analysis, and the analyses a second language acquisition researcher, teacher, or textbook writer can readily obtain and visualize for their own collection of texts. This short paper presents a web-based tool development that aims to meet this challenge. The Common Text Analysis Platform (CTAP) is designed to support fully configurable linguistic feature extraction for a wide range of complexity analyses. It features a user-friendly interface, modularized and reusable analysis component integration, and flexible corpus and feature management. Building on the Unstructured Information Management framework (UIMA), CTAP readily supports integration of state-of-the-art NLP and complexity feature extraction maintaining modularization and reusability. CTAP thereby aims at providing a common platform for complexity analysis, encouraging research collaboration and sharing of feature extraction components{—}to jointly advance the state-of-the-art in complexity analysis in a form that readily supports real-life use by ordinary users.
Tasks	Language Acquisition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4113/
PDF	https://www.aclweb.org/anthology/W16-4113
PWC	https://paperswithcode.com/paper/ctap-a-web-based-tool-supporting-automatic
Repo
Framework

Graph- and surface-level sentence chunking


Title	Graph- and surface-level sentence chunking
Authors	Ewa Muszy{'n}ska
Abstract
Tasks	Chunking
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-3014/
PDF	https://www.aclweb.org/anthology/P16-3014
PWC	https://paperswithcode.com/paper/graph-and-surface-level-sentence-chunking
Repo
Framework

Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market


Title	Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market
Authors	Rachel Harsley, Bhavesh Gupta, Barbara Di Eugenio, Huayi Li
Abstract
Tasks	Sentiment Analysis
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0406/
PDF	https://www.aclweb.org/anthology/W16-0406
PWC	https://paperswithcode.com/paper/hit-songsa-sentiments-harness-public-mood
Repo
Framework

Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments


Title	Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments
Authors	Hao Wang, Yves Lepage
Abstract	fast align is a simple and fast word alignment tool which is widely used in state-of-the-art machine translation systems. It yields comparable results in the end-to-end translation experiments of various language pairs. However, fast align does not perform as well as GIZA++ when applied to language pairs with distinct word orders, like English and Japanese. In this paper, given the lexical translation table output by fast align, we propose to realign words using the hierarchical sub-sentential alignment approach. Experimental results show that simple additional processing improves the performance of word alignment, which is measured by counting alignment matches in comparison with fast align. We also report the result of final machine translation in both English-Japanese and Japanese-English. We show our best system provided significant improvements over the baseline as measured by BLEU and RIBES.
Tasks	Machine Translation, Word Alignment
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4501/
PDF	https://www.aclweb.org/anthology/W16-4501
PWC	https://paperswithcode.com/paper/combining-fast_align-with-hierarchical-sub
Repo
Framework

Using Ambiguity Detection to Streamline Linguistic Annotation


Title	Using Ambiguity Detection to Streamline Linguistic Annotation
Authors	Wajdi Zaghouani, Abdelati Hawwari, Sawsan Alqahtani, Houda Bouamor, Mahmoud Ghoneim, Mona Diab, Kemal Oflazer
Abstract	Arabic writing is typically underspecified for short vowels and other markups, referred to as diacritics. In addition to the lexical ambiguity exhibited in most languages, the lack of diacritics in written Arabic adds another layer of ambiguity which is an artifact of the orthography. In this paper, we present the details of three annotation experimental conditions designed to study the impact of automatic ambiguity detection, on annotation speed and quality in a large scale annotation project.
Tasks	Machine Translation, Speech Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4115/
PDF	https://www.aclweb.org/anthology/W16-4115
PWC	https://paperswithcode.com/paper/using-ambiguity-detection-to-streamline
Repo
Framework

Mixed Linear Regression with Multiple Components


Title	Mixed Linear Regression with Multiple Components
Authors	Kai Zhong, Prateek Jain, Inderjit S. Dhillon
Abstract	In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$. Furthermore, we show that our non-convex formulation can be extended to solving the {\em subspace clustering} problem as well. In particular, when initialized within a small constant distance to the true subspaces, our method converges to the global optima (and recovers true subspaces) in time {\em linear} in the number of points. Furthermore, our empirical results indicate that even with random initialization, our approach converges to the global optima in linear time, providing speed-up of up to two orders of magnitude.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6240-mixed-linear-regression-with-multiple-components
PDF	http://papers.nips.cc/paper/6240-mixed-linear-regression-with-multiple-components.pdf
PWC	https://paperswithcode.com/paper/mixed-linear-regression-with-multiple
Repo
Framework

A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora


Title	A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora
Authors	Christian Bentz, Tatyana Ruzsics, Alex Koplenig, er, Tanja Samard{\v{z}}i{'c}
Abstract	Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing. The need to compare languages with regard to their complexity resulted in a multitude of approaches and methods, ranging from accounts targeting specific structural features to global quantification of variation more generally. In this paper, we investigate the degree to which morphological complexity measures are mutually correlated in a sample of more than 500 languages of 101 language families. We use human expert judgements from the World Atlas of Language Structures (WALS), and compare them to four quantitative measures automatically calculated from language corpora. These consist of three previously defined corpus-derived measures, which are all monolingual, and one new measure based on automatic word-alignment across pairs of languages. We find strong correlations between all the measures, illustrating that both expert judgements and automated approaches converge to similar complexity ratings, and can be used interchangeably.
Tasks	Machine Translation, Word Alignment
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4117/
PDF	https://www.aclweb.org/anthology/W16-4117
PWC	https://paperswithcode.com/paper/a-comparison-between-morphological-complexity
Repo
Framework

Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings


Title	Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings
Authors	Peyman Passban, Qun Liu, Andy Way
Abstract	The phrase table is considered to be the main bilingual resource for the phrase-based statistical machine translation (PBSMT) model. During translation, a source sentence is decomposed into several phrases. The best match of each source phrase is selected among several target-side counterparts within the phrase table, and processed by the decoder to generate a sentence-level translation. The best match is chosen according to several factors, including a set of bilingual features. PBSMT engines by default provide four probability scores in phrase tables which are considered as the main set of bilingual features. Our goal is to enrich that set of features, as a better feature set should yield better translations. We propose new scores generated by a Convolutional Neural Network (CNN) which indicate the semantic relatedness of phrase pairs. We evaluate our model in different experimental settings with different language pairs. We observe significant improvements when the proposed features are incorporated into the PBSMT pipeline.
Tasks	Document Classification, Machine Translation, Word Embeddings
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1243/
PDF	https://www.aclweb.org/anthology/C16-1243
PWC	https://paperswithcode.com/paper/enriching-phrase-tables-for-statistical
Repo
Framework

Learning Transducer Models for Morphological Analysis from Example Inflections


Title	Learning Transducer Models for Morphological Analysis from Example Inflections
Authors	Markus Forsberg, Mans Hulden
Abstract
Tasks	Morphological Analysis, Morphological Inflection
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2405/
PDF	https://www.aclweb.org/anthology/W16-2405
PWC	https://paperswithcode.com/paper/learning-transducer-models-for-morphological
Repo
Framework

Nomen Omen. Enhancing the Latin Morphological Analyser Lemlat with an Onomasticon


Title	Nomen Omen. Enhancing the Latin Morphological Analyser Lemlat with an Onomasticon
Authors	Marco Budassi, Marco Passarotti
Abstract
Tasks	Morphological Analysis, Morphological Inflection, Named Entity Recognition
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2110/
PDF	https://www.aclweb.org/anthology/W16-2110
PWC	https://paperswithcode.com/paper/nomen-omen-enhancing-the-latin-morphological
Repo
Framework

Testing the Processing Hypothesis of word order variation using a probabilistic language model


Title	Testing the Processing Hypothesis of word order variation using a probabilistic language model
Authors	Jelke Bloem
Abstract	This work investigates the application of a measure of surprisal to modeling a grammatical variation phenomenon between near-synonymous constructions. We investigate a particular variation phenomenon, word order variation in Dutch two-verb clusters, where it has been established that word order choice is affected by processing cost. Several multifactorial corpus studies of Dutch verb clusters have used other measures of processing complexity to show that this factor affects word order choice. This previous work allows us to compare the surprisal measure, which is based on constraint satisfaction theories of language modeling, to those previously used measures, which are more directly linked to empirical observations of processing complexity. Our results show that surprisal does not predict the word order choice by itself, but is a significant predictor when used in a measure of uniform information density (UID). This lends support to the view that human language processing is facilitated not so much by predictable sequences of words but more by sequences of words in which information is spread evenly.
Tasks	Language Modelling
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4120/
PDF	https://www.aclweb.org/anthology/W16-4120
PWC	https://paperswithcode.com/paper/testing-the-processing-hypothesis-of-word
Repo
Framework

Crossmodal Network-Based Distributional Semantic Models


Title	Crossmodal Network-Based Distributional Semantic Models
Authors	Elias Iosif, Alex Potamianos, ros
Abstract	Despite the recent success of distributional semantic models (DSMs) in various semantic tasks they remain disconnected with real-world perceptual cues since they typically rely on linguistic features. Text data constitute the dominant source of features for the majority of such models, although there is evidence from cognitive science that cues from other modalities contribute to the acquisition and representation of semantic knowledge. In this work, we propose the crossmodal extension of a two-tier text-based model, where semantic representations are encoded in the first layer, while the second layer is used for computing similarity between words. We exploit text- and image-derived features for performing computations at each layer, as well as various approaches for their crossmodal fusion. It is shown that the crossmodal model performs better (from 0.68 to 0.71 correlation coefficient) than the unimodal one for the task of similarity computation between words.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1627/
PDF	https://www.aclweb.org/anthology/L16-1627
PWC	https://paperswithcode.com/paper/crossmodal-network-based-distributional
Repo
Framework

Coreference in Wikipedia: Main Concept Resolution


Title	Coreference in Wikipedia: Main Concept Resolution
Authors	Abbas Ghaddar, Phillippe Langlais
Abstract
Tasks	Coreference Resolution, Open Information Extraction
Published	2016-08-01
URL	https://www.aclweb.org/anthology/K16-1023/
PDF	https://www.aclweb.org/anthology/K16-1023
PWC	https://paperswithcode.com/paper/coreference-in-wikipedia-main-concept
Repo
Framework

Temporal Lobes as Combinatory Engines for both Form and Meaning


Title	Temporal Lobes as Combinatory Engines for both Form and Meaning
Authors	Jixing Li, Jonathan Brennan, Adam Mahar, John Hale
Abstract	The relative contributions of meaning and form to sentence processing remains an outstanding issue across the language sciences. We examine this issue by formalizing four incremental complexity metrics and comparing them against freely-available ROI timecourses. Syntax-related metrics based on top-down parsing and structural dependency-distance turn out to significantly improve a regression model, compared to a simpler model that formalizes only conceptual combination using a distributional vector-space model. This confirms the view of the anterior temporal lobes as combinatory engines that deal in both form (see e.g. Brennan et al., 2012; Mazoyer, 1993) and meaning (see e.g., Patterson et al., 2007). This same characterization applies to a posterior temporal region in roughly {``}Wernicke{'}s Area.{''} \|
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4121/
PDF	https://www.aclweb.org/anthology/W16-4121
PWC	https://paperswithcode.com/paper/temporal-lobes-as-combinatory-engines-for
Repo
Framework