Paper Group NANR 209
Implicit readability ranking using the latent variable of a Bayesian Probit model. CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis. Graph- and surface-level sentence chunking. Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market. Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments …
Implicit readability ranking using the latent variable of a Bayesian Probit model
Title | Implicit readability ranking using the latent variable of a Bayesian Probit model |
Authors | Johan Falkenjack, Arne J{"o}nsson |
Abstract | Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora. Often, relevant corpora consist only of easy-to-read texts with no rank information or empirical readability scores, making only binary approaches, such as classification, applicable. We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encouraging results. We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4112/ |
https://www.aclweb.org/anthology/W16-4112 | |
PWC | https://paperswithcode.com/paper/implicit-readability-ranking-using-the-latent |
Repo | |
Framework | |
CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis
Title | CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis |
Authors | Xiaobin Chen, Detmar Meurers |
Abstract | Informed by research on readability and language acquisition, computational linguists have developed sophisticated tools for the analysis of linguistic complexity. While some tools are starting to become accessible on the web, there still is a disconnect between the features that can in principle be identified based on state-of-the-art computational linguistic analysis, and the analyses a second language acquisition researcher, teacher, or textbook writer can readily obtain and visualize for their own collection of texts. This short paper presents a web-based tool development that aims to meet this challenge. The Common Text Analysis Platform (CTAP) is designed to support fully configurable linguistic feature extraction for a wide range of complexity analyses. It features a user-friendly interface, modularized and reusable analysis component integration, and flexible corpus and feature management. Building on the Unstructured Information Management framework (UIMA), CTAP readily supports integration of state-of-the-art NLP and complexity feature extraction maintaining modularization and reusability. CTAP thereby aims at providing a common platform for complexity analysis, encouraging research collaboration and sharing of feature extraction components{—}to jointly advance the state-of-the-art in complexity analysis in a form that readily supports real-life use by ordinary users. |
Tasks | Language Acquisition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4113/ |
https://www.aclweb.org/anthology/W16-4113 | |
PWC | https://paperswithcode.com/paper/ctap-a-web-based-tool-supporting-automatic |
Repo | |
Framework | |
Graph- and surface-level sentence chunking
Title | Graph- and surface-level sentence chunking |
Authors | Ewa Muszy{'n}ska |
Abstract | |
Tasks | Chunking |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-3014/ |
https://www.aclweb.org/anthology/P16-3014 | |
PWC | https://paperswithcode.com/paper/graph-and-surface-level-sentence-chunking |
Repo | |
Framework | |
Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market
Title | Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market |
Authors | Rachel Harsley, Bhavesh Gupta, Barbara Di Eugenio, Huayi Li |
Abstract | |
Tasks | Sentiment Analysis |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0406/ |
https://www.aclweb.org/anthology/W16-0406 | |
PWC | https://paperswithcode.com/paper/hit-songsa-sentiments-harness-public-mood |
Repo | |
Framework | |
Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments
Title | Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments |
Authors | Hao Wang, Yves Lepage |
Abstract | fast align is a simple and fast word alignment tool which is widely used in state-of-the-art machine translation systems. It yields comparable results in the end-to-end translation experiments of various language pairs. However, fast align does not perform as well as GIZA++ when applied to language pairs with distinct word orders, like English and Japanese. In this paper, given the lexical translation table output by fast align, we propose to realign words using the hierarchical sub-sentential alignment approach. Experimental results show that simple additional processing improves the performance of word alignment, which is measured by counting alignment matches in comparison with fast align. We also report the result of final machine translation in both English-Japanese and Japanese-English. We show our best system provided significant improvements over the baseline as measured by BLEU and RIBES. |
Tasks | Machine Translation, Word Alignment |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4501/ |
https://www.aclweb.org/anthology/W16-4501 | |
PWC | https://paperswithcode.com/paper/combining-fast_align-with-hierarchical-sub |
Repo | |
Framework | |
Using Ambiguity Detection to Streamline Linguistic Annotation
Title | Using Ambiguity Detection to Streamline Linguistic Annotation |
Authors | Wajdi Zaghouani, Abdelati Hawwari, Sawsan Alqahtani, Houda Bouamor, Mahmoud Ghoneim, Mona Diab, Kemal Oflazer |
Abstract | Arabic writing is typically underspecified for short vowels and other markups, referred to as diacritics. In addition to the lexical ambiguity exhibited in most languages, the lack of diacritics in written Arabic adds another layer of ambiguity which is an artifact of the orthography. In this paper, we present the details of three annotation experimental conditions designed to study the impact of automatic ambiguity detection, on annotation speed and quality in a large scale annotation project. |
Tasks | Machine Translation, Speech Recognition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4115/ |
https://www.aclweb.org/anthology/W16-4115 | |
PWC | https://paperswithcode.com/paper/using-ambiguity-detection-to-streamline |
Repo | |
Framework | |
Mixed Linear Regression with Multiple Components
Title | Mixed Linear Regression with Multiple Components |
Authors | Kai Zhong, Prateek Jain, Inderjit S. Dhillon |
Abstract | In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$. Furthermore, we show that our non-convex formulation can be extended to solving the {\em subspace clustering} problem as well. In particular, when initialized within a small constant distance to the true subspaces, our method converges to the global optima (and recovers true subspaces) in time {\em linear} in the number of points. Furthermore, our empirical results indicate that even with random initialization, our approach converges to the global optima in linear time, providing speed-up of up to two orders of magnitude. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6240-mixed-linear-regression-with-multiple-components |
http://papers.nips.cc/paper/6240-mixed-linear-regression-with-multiple-components.pdf | |
PWC | https://paperswithcode.com/paper/mixed-linear-regression-with-multiple |
Repo | |
Framework | |
A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora
Title | A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora |
Authors | Christian Bentz, Tatyana Ruzsics, Alex Koplenig, er, Tanja Samard{\v{z}}i{'c} |
Abstract | Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing. The need to compare languages with regard to their complexity resulted in a multitude of approaches and methods, ranging from accounts targeting specific structural features to global quantification of variation more generally. In this paper, we investigate the degree to which morphological complexity measures are mutually correlated in a sample of more than 500 languages of 101 language families. We use human expert judgements from the World Atlas of Language Structures (WALS), and compare them to four quantitative measures automatically calculated from language corpora. These consist of three previously defined corpus-derived measures, which are all monolingual, and one new measure based on automatic word-alignment across pairs of languages. We find strong correlations between all the measures, illustrating that both expert judgements and automated approaches converge to similar complexity ratings, and can be used interchangeably. |
Tasks | Machine Translation, Word Alignment |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4117/ |
https://www.aclweb.org/anthology/W16-4117 | |
PWC | https://paperswithcode.com/paper/a-comparison-between-morphological-complexity |
Repo | |
Framework | |
Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings
Title | Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings |
Authors | Peyman Passban, Qun Liu, Andy Way |
Abstract | The phrase table is considered to be the main bilingual resource for the phrase-based statistical machine translation (PBSMT) model. During translation, a source sentence is decomposed into several phrases. The best match of each source phrase is selected among several target-side counterparts within the phrase table, and processed by the decoder to generate a sentence-level translation. The best match is chosen according to several factors, including a set of bilingual features. PBSMT engines by default provide four probability scores in phrase tables which are considered as the main set of bilingual features. Our goal is to enrich that set of features, as a better feature set should yield better translations. We propose new scores generated by a Convolutional Neural Network (CNN) which indicate the semantic relatedness of phrase pairs. We evaluate our model in different experimental settings with different language pairs. We observe significant improvements when the proposed features are incorporated into the PBSMT pipeline. |
Tasks | Document Classification, Machine Translation, Word Embeddings |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1243/ |
https://www.aclweb.org/anthology/C16-1243 | |
PWC | https://paperswithcode.com/paper/enriching-phrase-tables-for-statistical |
Repo | |
Framework | |
Learning Transducer Models for Morphological Analysis from Example Inflections
Title | Learning Transducer Models for Morphological Analysis from Example Inflections |
Authors | Markus Forsberg, Mans Hulden |
Abstract | |
Tasks | Morphological Analysis, Morphological Inflection |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2405/ |
https://www.aclweb.org/anthology/W16-2405 | |
PWC | https://paperswithcode.com/paper/learning-transducer-models-for-morphological |
Repo | |
Framework | |
Nomen Omen. Enhancing the Latin Morphological Analyser Lemlat with an Onomasticon
Title | Nomen Omen. Enhancing the Latin Morphological Analyser Lemlat with an Onomasticon |
Authors | Marco Budassi, Marco Passarotti |
Abstract | |
Tasks | Morphological Analysis, Morphological Inflection, Named Entity Recognition |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2110/ |
https://www.aclweb.org/anthology/W16-2110 | |
PWC | https://paperswithcode.com/paper/nomen-omen-enhancing-the-latin-morphological |
Repo | |
Framework | |
Testing the Processing Hypothesis of word order variation using a probabilistic language model
Title | Testing the Processing Hypothesis of word order variation using a probabilistic language model |
Authors | Jelke Bloem |
Abstract | This work investigates the application of a measure of surprisal to modeling a grammatical variation phenomenon between near-synonymous constructions. We investigate a particular variation phenomenon, word order variation in Dutch two-verb clusters, where it has been established that word order choice is affected by processing cost. Several multifactorial corpus studies of Dutch verb clusters have used other measures of processing complexity to show that this factor affects word order choice. This previous work allows us to compare the surprisal measure, which is based on constraint satisfaction theories of language modeling, to those previously used measures, which are more directly linked to empirical observations of processing complexity. Our results show that surprisal does not predict the word order choice by itself, but is a significant predictor when used in a measure of uniform information density (UID). This lends support to the view that human language processing is facilitated not so much by predictable sequences of words but more by sequences of words in which information is spread evenly. |
Tasks | Language Modelling |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4120/ |
https://www.aclweb.org/anthology/W16-4120 | |
PWC | https://paperswithcode.com/paper/testing-the-processing-hypothesis-of-word |
Repo | |
Framework | |
Crossmodal Network-Based Distributional Semantic Models
Title | Crossmodal Network-Based Distributional Semantic Models |
Authors | Elias Iosif, Alex Potamianos, ros |
Abstract | Despite the recent success of distributional semantic models (DSMs) in various semantic tasks they remain disconnected with real-world perceptual cues since they typically rely on linguistic features. Text data constitute the dominant source of features for the majority of such models, although there is evidence from cognitive science that cues from other modalities contribute to the acquisition and representation of semantic knowledge. In this work, we propose the crossmodal extension of a two-tier text-based model, where semantic representations are encoded in the first layer, while the second layer is used for computing similarity between words. We exploit text- and image-derived features for performing computations at each layer, as well as various approaches for their crossmodal fusion. It is shown that the crossmodal model performs better (from 0.68 to 0.71 correlation coefficient) than the unimodal one for the task of similarity computation between words. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1627/ |
https://www.aclweb.org/anthology/L16-1627 | |
PWC | https://paperswithcode.com/paper/crossmodal-network-based-distributional |
Repo | |
Framework | |
Coreference in Wikipedia: Main Concept Resolution
Title | Coreference in Wikipedia: Main Concept Resolution |
Authors | Abbas Ghaddar, Phillippe Langlais |
Abstract | |
Tasks | Coreference Resolution, Open Information Extraction |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/K16-1023/ |
https://www.aclweb.org/anthology/K16-1023 | |
PWC | https://paperswithcode.com/paper/coreference-in-wikipedia-main-concept |
Repo | |
Framework | |
Temporal Lobes as Combinatory Engines for both Form and Meaning
Title | Temporal Lobes as Combinatory Engines for both Form and Meaning |
Authors | Jixing Li, Jonathan Brennan, Adam Mahar, John Hale |
Abstract | The relative contributions of meaning and form to sentence processing remains an outstanding issue across the language sciences. We examine this issue by formalizing four incremental complexity metrics and comparing them against freely-available ROI timecourses. Syntax-related metrics based on top-down parsing and structural dependency-distance turn out to significantly improve a regression model, compared to a simpler model that formalizes only conceptual combination using a distributional vector-space model. This confirms the view of the anterior temporal lobes as combinatory engines that deal in both form (see e.g. Brennan et al., 2012; Mazoyer, 1993) and meaning (see e.g., Patterson et al., 2007). This same characterization applies to a posterior temporal region in roughly {``}Wernicke{'}s Area.{''} | |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4121/ |
https://www.aclweb.org/anthology/W16-4121 | |
PWC | https://paperswithcode.com/paper/temporal-lobes-as-combinatory-engines-for |
Repo | |
Framework | |