May 4, 2019

1618 words 8 mins read

Paper Group NANR 208

Paper Group NANR 208

Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work. Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization. Neural Enquirer: Learning to Query Tables in Natural Language. Using Confusion Graphs to Understand Classifier Error. Could Machine Learning Shed Light on Natur …

Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work

Title Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work
Authors J{"o}rg Tiedemann, Johanna Nichols, Ronald Sprouse
Abstract This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordings and dependency parsers using English glosses to project annotation for creating synthetic treebanks. Our preliminary results are promising, supporting the goal of bootstrapping efficient NLP tools with limited or no task-specific annotated data resources available.
Tasks Cross-Lingual Transfer
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4020/
PDF https://www.aclweb.org/anthology/W16-4020
PWC https://paperswithcode.com/paper/tagging-ingush-language-technology-for-low
Repo
Framework

Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization

Title Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization
Authors Tyler B. Johnson, Carlos Guestrin
Abstract We develop methods for rapidly identifying important components of a convex optimization problem for the purpose of achieving fast convergence times. By considering a novel problem formulation—the minimization of a sum of piecewise functions—we describe a principled and general mechanism for exploiting piecewise linear structure in convex optimization. This result leads to a theoretically justified working set algorithm and a novel screening test, which generalize and improve upon many prior results on exploiting structure in convex optimization. In empirical comparisons, we study the scalability of our methods. We find that screening scales surprisingly poorly with the size of the problem, while our working set algorithm convincingly outperforms alternative approaches.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6043-unified-methods-for-exploiting-piecewise-linear-structure-in-convex-optimization
PDF http://papers.nips.cc/paper/6043-unified-methods-for-exploiting-piecewise-linear-structure-in-convex-optimization.pdf
PWC https://paperswithcode.com/paper/unified-methods-for-exploiting-piecewise
Repo
Framework

Neural Enquirer: Learning to Query Tables in Natural Language

Title Neural Enquirer: Learning to Query Tables in Natural Language
Authors Pengcheng Yin, Zhengdong Lu, Hang Li, Kao Ben
Abstract
Tasks Learning to Execute, Question Answering, Semantic Parsing
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0105/
PDF https://www.aclweb.org/anthology/W16-0105
PWC https://paperswithcode.com/paper/neural-enquirer-learning-to-query-tables-in
Repo
Framework

Using Confusion Graphs to Understand Classifier Error

Title Using Confusion Graphs to Understand Classifier Error
Authors Davis Yoshida, Jordan Boyd-Graber
Abstract
Tasks Question Answering
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0108/
PDF https://www.aclweb.org/anthology/W16-0108
PWC https://paperswithcode.com/paper/using-confusion-graphs-to-understand
Repo
Framework

Could Machine Learning Shed Light on Natural Language Complexity?

Title Could Machine Learning Shed Light on Natural Language Complexity?
Authors Maria Dolores Jim{'e}nez-L{'o}pez, Leonor Becerra-Bonache
Abstract In this paper, we propose to use a subfield of machine learning {–}grammatical inference{–} to measure linguistic complexity from a developmental point of view. We focus on relative complexity by considering a child learner in the process of first language acquisition. The relevance of grammatical inference models for measuring linguistic complexity from a developmental point of view is based on the fact that algorithms proposed in this area can be considered computational models for studying first language acquisition. Even though it will be possible to use different techniques from the field of machine learning as computational models for dealing with linguistic complexity -since in any model we have algorithms that can learn from data-, we claim that grammatical inference models offer some advantages over other tools.
Tasks Language Acquisition
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4101/
PDF https://www.aclweb.org/anthology/W16-4101
PWC https://paperswithcode.com/paper/could-machine-learning-shed-light-on-natural
Repo
Framework

Web services and data mining: combining linguistic tools for Polish with an analytical platform

Title Web services and data mining: combining linguistic tools for Polish with an analytical platform
Authors Maciej Ogrodniczuk
Abstract In this paper we present a new combination of existing language tools for Polish with a popular data mining platform intended to help researchers from digital humanities perform computational analyses without any programming. The toolset includes RapidMiner Studio, a software solution offering graphical setup of integrated analytical processes and Multiservice, a Web service offering access to several state-of-the-art linguistic tools for Polish. The setting is verified in a simple task of counting frequencies of unknown words in a small corpus.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4025/
PDF https://www.aclweb.org/anthology/W16-4025
PWC https://paperswithcode.com/paper/web-services-and-data-mining-combining
Repo
Framework

Addressing surprisal deficiencies in reading time models

Title Addressing surprisal deficiencies in reading time models
Authors Marten van Schijndel, William Schuler
Abstract This study demonstrates a weakness in how n-gram and PCFG surprisal are used to predict reading times in eye-tracking data. In particular, the information conveyed by words skipped during saccades is not usually included in the surprisal measures. This study shows that correcting the surprisal calculation improves n-gram surprisal and that upcoming n-grams affect reading times, replicating previous findings of how lexical frequencies affect reading times. In contrast, the predictivity of PCFG surprisal does not benefit from the surprisal correction despite the fact that lexical sequences skipped by saccades are processed by readers, as demonstrated by the corrected n-gram measure. These results raise questions about the formulation of information-theoretic measures of syntactic processing such as PCFG surprisal and entropy reduction when applied to reading times.
Tasks Eye Tracking
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4104/
PDF https://www.aclweb.org/anthology/W16-4104
PWC https://paperswithcode.com/paper/addressing-surprisal-deficiencies-in-reading
Repo
Framework

Memory access during incremental sentence processing causes reading time latency

Title Memory access during incremental sentence processing causes reading time latency
Authors Cory Shain, Marten van Schijndel, Richard Futrell, Edward Gibson, William Schuler
Abstract Studies on the role of memory as a predictor of reading time latencies (1) differ in their predictions about when memory effects should occur in processing and (2) have had mixed results, with strong positive effects emerging from isolated constructed stimuli and weak or even negative effects emerging from naturally-occurring stimuli. Our study addresses these concerns by comparing several implementations of prominent sentence processing theories on an exploratory corpus and evaluating the most successful of these on a confirmatory corpus, using a new self-paced reading corpus of seemingly natural narratives constructed to contain an unusually high proportion of memory-intensive constructions. We show highly significant and complementary broad-coverage latency effects both for predictors based on the Dependency Locality Theory and for predictors based on a left-corner parsing model of sentence processing. Our results indicate that memory access during sentence processing does take time, but suggest that stimuli requiring many memory access events may be necessary in order to observe the effect.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4106/
PDF https://www.aclweb.org/anthology/W16-4106
PWC https://paperswithcode.com/paper/memory-access-during-incremental-sentence
Repo
Framework

Automatic Triage of Mental Health Forum Posts

Title Automatic Triage of Mental Health Forum Posts
Authors Benjamin Shickel, Parisa Rashidi
Abstract
Tasks Sentiment Analysis, Text Classification, Word Embeddings
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0326/
PDF https://www.aclweb.org/anthology/W16-0326
PWC https://paperswithcode.com/paper/automatic-triage-of-mental-health-forum-posts
Repo
Framework

Semi-supervised CLPsych 2016 Shared Task System Submission

Title Semi-supervised CLPsych 2016 Shared Task System Submission
Authors Nicolas Rey-Villamizar, Prasha Shrestha, Thamar Solorio, Farig Sadeque, Steven Bethard, Ted Pedersen
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0322/
PDF https://www.aclweb.org/anthology/W16-0322
PWC https://paperswithcode.com/paper/semi-supervised-clpsych-2016-shared-task
Repo
Framework

Columbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description

Title Columbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description
Authors Ch, Arunavha a, Dipankar Das, Ch Mazumdar, an
Abstract
Tasks Language Identification
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-5814/
PDF https://www.aclweb.org/anthology/W16-5814
PWC https://paperswithcode.com/paper/columbia-jadavpur-submission-for-emnlp-2016
Repo
Framework

Duluth at SemEval 2016 Task 14: Extending Gloss Overlaps to Enrich Semantic Taxonomies

Title Duluth at SemEval 2016 Task 14: Extending Gloss Overlaps to Enrich Semantic Taxonomies
Authors Ted Pedersen
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1207/
PDF https://www.aclweb.org/anthology/S16-1207
PWC https://paperswithcode.com/paper/duluth-at-semeval-2016-task-14-extending
Repo
Framework

A Preliminary Study of Statistically Predictive Syntactic Complexity Features and Manual Simplifications in Basque

Title A Preliminary Study of Statistically Predictive Syntactic Complexity Features and Manual Simplifications in Basque
Authors Itziar Gonzalez-Dios, Mar{'\i}a Jes{'u}s Aranzabe, Arantza D{'\i}az de Ilarraza
Abstract In this paper, we present a comparative analysis of statistically predictive syntactic features of complexity and the treatment of these features by humans when simplifying texts. To that end, we have used a list of the most five statistically predictive features obtained automatically and the Corpus of Basque Simplified Texts (CBST) to analyse how the syntactic phenomena in these features have been manually simplified. Our aim is to go beyond the descriptions of operations found in the corpus and relate the multidisciplinary findings to understand text complexity from different points of view. We also present some issues that can be important when analysing linguistic complexity.
Tasks Text Simplification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4110/
PDF https://www.aclweb.org/anthology/W16-4110
PWC https://paperswithcode.com/paper/a-preliminary-study-of-statistically
Repo
Framework

Fast and Provably Good Seedings for k-Means

Title Fast and Provably Good Seedings for k-Means
Authors Olivier Bachem, Mario Lucic, Hamed Hassani, Andreas Krause
Abstract Seeding - the task of finding initial cluster centers - is critical in obtaining high-quality clusterings for k-Means. However, k-means++ seeding, the state of the art algorithm, does not scale well to massive datasets as it is inherently sequential and requires k full passes through the data. It was recently shown that Markov chain Monte Carlo sampling can be used to efficiently approximate the seeding step of k-means++. However, this result requires assumptions on the data generating distribution. We propose a simple yet fast seeding algorithm that produces provably good clusterings even without assumptions on the data. Our analysis shows that the algorithm allows for a favourable trade-off between solution quality and computational cost, speeding up k-means++ seeding by up to several orders of magnitude. We validate our theoretical results in extensive experiments on a variety of real-world data sets.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6478-fast-and-provably-good-seedings-for-k-means
PDF http://papers.nips.cc/paper/6478-fast-and-provably-good-seedings-for-k-means.pdf
PWC https://paperswithcode.com/paper/fast-and-provably-good-seedings-for-k-means
Repo
Framework

Computer-assisted stylistic revision with incomplete and noisy feedback. A pilot study

Title Computer-assisted stylistic revision with incomplete and noisy feedback. A pilot study
Authors Christian M. Meyer, Johann Frerik Koch
Abstract
Tasks Grammatical Error Correction
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0505/
PDF https://www.aclweb.org/anthology/W16-0505
PWC https://paperswithcode.com/paper/computer-assisted-stylistic-revision-with
Repo
Framework
comments powered by Disqus