May 4, 2019

1618 words 8 mins read

Paper Group NANR 208

Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work. Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization. Neural Enquirer: Learning to Query Tables in Natural Language. Using Confusion Graphs to Understand Classifier Error. Could Machine Learning Shed Light on Natur …

Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work


Title	Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work
Authors	J{"o}rg Tiedemann, Johanna Nichols, Ronald Sprouse
Abstract	This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordings and dependency parsers using English glosses to project annotation for creating synthetic treebanks. Our preliminary results are promising, supporting the goal of bootstrapping efficient NLP tools with limited or no task-specific annotated data resources available.
Tasks	Cross-Lingual Transfer
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4020/
PDF	https://www.aclweb.org/anthology/W16-4020
PWC	https://paperswithcode.com/paper/tagging-ingush-language-technology-for-low
Repo
Framework

Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization


Title	Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization
Authors	Tyler B. Johnson, Carlos Guestrin
Abstract	We develop methods for rapidly identifying important components of a convex optimization problem for the purpose of achieving fast convergence times. By considering a novel problem formulation—the minimization of a sum of piecewise functions—we describe a principled and general mechanism for exploiting piecewise linear structure in convex optimization. This result leads to a theoretically justified working set algorithm and a novel screening test, which generalize and improve upon many prior results on exploiting structure in convex optimization. In empirical comparisons, we study the scalability of our methods. We find that screening scales surprisingly poorly with the size of the problem, while our working set algorithm convincingly outperforms alternative approaches.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6043-unified-methods-for-exploiting-piecewise-linear-structure-in-convex-optimization
PDF	http://papers.nips.cc/paper/6043-unified-methods-for-exploiting-piecewise-linear-structure-in-convex-optimization.pdf
PWC	https://paperswithcode.com/paper/unified-methods-for-exploiting-piecewise
Repo
Framework

Neural Enquirer: Learning to Query Tables in Natural Language


Title	Neural Enquirer: Learning to Query Tables in Natural Language
Authors	Pengcheng Yin, Zhengdong Lu, Hang Li, Kao Ben
Abstract
Tasks	Learning to Execute, Question Answering, Semantic Parsing
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0105/
PDF	https://www.aclweb.org/anthology/W16-0105
PWC	https://paperswithcode.com/paper/neural-enquirer-learning-to-query-tables-in
Repo
Framework

Using Confusion Graphs to Understand Classifier Error


Title	Using Confusion Graphs to Understand Classifier Error
Authors	Davis Yoshida, Jordan Boyd-Graber
Abstract
Tasks	Question Answering
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0108/
PDF	https://www.aclweb.org/anthology/W16-0108
PWC	https://paperswithcode.com/paper/using-confusion-graphs-to-understand
Repo
Framework

Could Machine Learning Shed Light on Natural Language Complexity?


Title	Could Machine Learning Shed Light on Natural Language Complexity?
Authors	Maria Dolores Jim{'e}nez-L{'o}pez, Leonor Becerra-Bonache
Abstract	In this paper, we propose to use a subfield of machine learning {–}grammatical inference{–} to measure linguistic complexity from a developmental point of view. We focus on relative complexity by considering a child learner in the process of first language acquisition. The relevance of grammatical inference models for measuring linguistic complexity from a developmental point of view is based on the fact that algorithms proposed in this area can be considered computational models for studying first language acquisition. Even though it will be possible to use different techniques from the field of machine learning as computational models for dealing with linguistic complexity -since in any model we have algorithms that can learn from data-, we claim that grammatical inference models offer some advantages over other tools.
Tasks	Language Acquisition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4101/
PDF	https://www.aclweb.org/anthology/W16-4101
PWC	https://paperswithcode.com/paper/could-machine-learning-shed-light-on-natural
Repo
Framework

Web services and data mining: combining linguistic tools for Polish with an analytical platform


Title	Web services and data mining: combining linguistic tools for Polish with an analytical platform
Authors	Maciej Ogrodniczuk
Abstract	In this paper we present a new combination of existing language tools for Polish with a popular data mining platform intended to help researchers from digital humanities perform computational analyses without any programming. The toolset includes RapidMiner Studio, a software solution offering graphical setup of integrated analytical processes and Multiservice, a Web service offering access to several state-of-the-art linguistic tools for Polish. The setting is verified in a simple task of counting frequencies of unknown words in a small corpus.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4025/
PDF	https://www.aclweb.org/anthology/W16-4025
PWC	https://paperswithcode.com/paper/web-services-and-data-mining-combining
Repo
Framework

Addressing surprisal deficiencies in reading time models


Title	Addressing surprisal deficiencies in reading time models
Authors	Marten van Schijndel, William Schuler
Abstract	This study demonstrates a weakness in how n-gram and PCFG surprisal are used to predict reading times in eye-tracking data. In particular, the information conveyed by words skipped during saccades is not usually included in the surprisal measures. This study shows that correcting the surprisal calculation improves n-gram surprisal and that upcoming n-grams affect reading times, replicating previous findings of how lexical frequencies affect reading times. In contrast, the predictivity of PCFG surprisal does not benefit from the surprisal correction despite the fact that lexical sequences skipped by saccades are processed by readers, as demonstrated by the corrected n-gram measure. These results raise questions about the formulation of information-theoretic measures of syntactic processing such as PCFG surprisal and entropy reduction when applied to reading times.
Tasks	Eye Tracking
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4104/
PDF	https://www.aclweb.org/anthology/W16-4104
PWC	https://paperswithcode.com/paper/addressing-surprisal-deficiencies-in-reading
Repo
Framework

Memory access during incremental sentence processing causes reading time latency


Title	Memory access during incremental sentence processing causes reading time latency
Authors	Cory Shain, Marten van Schijndel, Richard Futrell, Edward Gibson, William Schuler
Abstract	Studies on the role of memory as a predictor of reading time latencies (1) differ in their predictions about when memory effects should occur in processing and (2) have had mixed results, with strong positive effects emerging from isolated constructed stimuli and weak or even negative effects emerging from naturally-occurring stimuli. Our study addresses these concerns by comparing several implementations of prominent sentence processing theories on an exploratory corpus and evaluating the most successful of these on a confirmatory corpus, using a new self-paced reading corpus of seemingly natural narratives constructed to contain an unusually high proportion of memory-intensive constructions. We show highly significant and complementary broad-coverage latency effects both for predictors based on the Dependency Locality Theory and for predictors based on a left-corner parsing model of sentence processing. Our results indicate that memory access during sentence processing does take time, but suggest that stimuli requiring many memory access events may be necessary in order to observe the effect.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4106/
PDF	https://www.aclweb.org/anthology/W16-4106
PWC	https://paperswithcode.com/paper/memory-access-during-incremental-sentence
Repo
Framework

Automatic Triage of Mental Health Forum Posts


Title	Automatic Triage of Mental Health Forum Posts
Authors	Benjamin Shickel, Parisa Rashidi
Abstract
Tasks	Sentiment Analysis, Text Classification, Word Embeddings
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0326/
PDF	https://www.aclweb.org/anthology/W16-0326
PWC	https://paperswithcode.com/paper/automatic-triage-of-mental-health-forum-posts
Repo
Framework

Semi-supervised CLPsych 2016 Shared Task System Submission


Title	Semi-supervised CLPsych 2016 Shared Task System Submission
Authors	Nicolas Rey-Villamizar, Prasha Shrestha, Thamar Solorio, Farig Sadeque, Steven Bethard, Ted Pedersen
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0322/
PDF	https://www.aclweb.org/anthology/W16-0322
PWC	https://paperswithcode.com/paper/semi-supervised-clpsych-2016-shared-task
Repo
Framework

Columbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description


Title	Columbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description
Authors	Ch, Arunavha a, Dipankar Das, Ch Mazumdar, an
Abstract
Tasks	Language Identification
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-5814/
PDF	https://www.aclweb.org/anthology/W16-5814
PWC	https://paperswithcode.com/paper/columbia-jadavpur-submission-for-emnlp-2016
Repo
Framework

Duluth at SemEval 2016 Task 14: Extending Gloss Overlaps to Enrich Semantic Taxonomies


Title	Duluth at SemEval 2016 Task 14: Extending Gloss Overlaps to Enrich Semantic Taxonomies
Authors	Ted Pedersen
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1207/
PDF	https://www.aclweb.org/anthology/S16-1207
PWC	https://paperswithcode.com/paper/duluth-at-semeval-2016-task-14-extending
Repo
Framework

A Preliminary Study of Statistically Predictive Syntactic Complexity Features and Manual Simplifications in Basque


Title	A Preliminary Study of Statistically Predictive Syntactic Complexity Features and Manual Simplifications in Basque
Authors	Itziar Gonzalez-Dios, Mar{'\i}a Jes{'u}s Aranzabe, Arantza D{'\i}az de Ilarraza
Abstract	In this paper, we present a comparative analysis of statistically predictive syntactic features of complexity and the treatment of these features by humans when simplifying texts. To that end, we have used a list of the most five statistically predictive features obtained automatically and the Corpus of Basque Simplified Texts (CBST) to analyse how the syntactic phenomena in these features have been manually simplified. Our aim is to go beyond the descriptions of operations found in the corpus and relate the multidisciplinary findings to understand text complexity from different points of view. We also present some issues that can be important when analysing linguistic complexity.
Tasks	Text Simplification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4110/
PDF	https://www.aclweb.org/anthology/W16-4110
PWC	https://paperswithcode.com/paper/a-preliminary-study-of-statistically
Repo
Framework

Fast and Provably Good Seedings for k-Means


Title	Fast and Provably Good Seedings for k-Means
Authors	Olivier Bachem, Mario Lucic, Hamed Hassani, Andreas Krause
Abstract	Seeding - the task of finding initial cluster centers - is critical in obtaining high-quality clusterings for k-Means. However, k-means++ seeding, the state of the art algorithm, does not scale well to massive datasets as it is inherently sequential and requires k full passes through the data. It was recently shown that Markov chain Monte Carlo sampling can be used to efficiently approximate the seeding step of k-means++. However, this result requires assumptions on the data generating distribution. We propose a simple yet fast seeding algorithm that produces provably good clusterings even without assumptions on the data. Our analysis shows that the algorithm allows for a favourable trade-off between solution quality and computational cost, speeding up k-means++ seeding by up to several orders of magnitude. We validate our theoretical results in extensive experiments on a variety of real-world data sets.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6478-fast-and-provably-good-seedings-for-k-means
PDF	http://papers.nips.cc/paper/6478-fast-and-provably-good-seedings-for-k-means.pdf
PWC	https://paperswithcode.com/paper/fast-and-provably-good-seedings-for-k-means
Repo
Framework

Computer-assisted stylistic revision with incomplete and noisy feedback. A pilot study


Title	Computer-assisted stylistic revision with incomplete and noisy feedback. A pilot study
Authors	Christian M. Meyer, Johann Frerik Koch
Abstract
Tasks	Grammatical Error Correction
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0505/
PDF	https://www.aclweb.org/anthology/W16-0505
PWC	https://paperswithcode.com/paper/computer-assisted-stylistic-revision-with
Repo
Framework