July 26, 2019

2001 words 10 mins read

Paper Group NANR 185

Paper Group NANR 185

Revita: a system for language learning and supporting endangered languages. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model. ZipML: Training Linear Models with End-to-End Low Precision, …

Revita: a system for language learning and supporting endangered languages

Title Revita: a system for language learning and supporting endangered languages
Authors Anisia Katinskaia, Javad Nouri, Roman Yangarber
Abstract
Tasks Language Acquisition
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0304/
PDF https://www.aclweb.org/anthology/W17-0304
PWC https://paperswithcode.com/paper/revita-a-system-for-language-learning-and
Repo
Framework

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Title Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
Authors
Abstract
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-1000/
PDF https://www.aclweb.org/anthology/E17-1000
PWC https://paperswithcode.com/paper/proceedings-of-the-15th-conference-of-the
Repo
Framework

Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model

Title Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model
Authors Eva D{'}hondt, Cyril Grouin, Brigitte Grau
Abstract In this paper we present a novel approach to the automatic correction of OCR-induced orthographic errors in a given text. While current systems depend heavily on large training corpora or external information, such as domain-specific lexicons or confidence scores from the OCR process, our system only requires a small amount of (relatively) clean training data from a representative corpus to learn a character-based statistical language model using Bidirectional Long Short-Term Memory Networks (biLSTMs). We demonstrate the versatility and adaptability of our system on different text corpora with varying degrees of textual noise, including a real-life OCR corpus in the medical domain.
Tasks Language Modelling, Named Entity Recognition, Optical Character Recognition
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1101/
PDF https://www.aclweb.org/anthology/I17-1101
PWC https://paperswithcode.com/paper/generating-a-training-corpus-for-ocr-post
Repo
Framework

ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning

Title ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning
Authors Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang
Abstract Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We mainly focus on linear models, and the answer is yes for linear models. We develop a simple framework called ZipML based on one simple but novel strategy called double sampling. Our ZipML framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias. We validate our framework across a range of applications, and show that it enables an FPGA prototype that is up to $6.5\times$ faster than an implementation using full 32-bit precision. We further develop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another $1.7\times$ in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the-art XNOR-Net.
Tasks Quantization
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=659
PDF http://proceedings.mlr.press/v70/zhang17e/zhang17e.pdf
PWC https://paperswithcode.com/paper/zipml-training-linear-models-with-end-to-end
Repo
Framework

Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique

Title Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique
Authors Shruti Rijhwani, Royal Sequiera, Monojit Choudhury, Kalika Bali, Ch Maddila, ra Shekhar
Abstract Word-level language detection is necessary for analyzing code-switched text, where multiple languages could be mixed within a sentence. Existing models are restricted to code-switching between two specific languages and fail in real-world scenarios as text input rarely has a priori information on the languages used. We present a novel unsupervised word-level language detection technique for code-switched text for an arbitrarily large number of languages, which does not require any manually annotated training data. Our experiments with tweets in seven languages show a 74{%} relative error reduction in word-level labeling with respect to competitive baselines. We then use this system to conduct a large-scale quantitative analysis of code-switching patterns on Twitter, both global as well as region-specific, with 58M tweets.
Tasks
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-1180/
PDF https://www.aclweb.org/anthology/P17-1180
PWC https://paperswithcode.com/paper/estimating-code-switching-on-twitter-with-a
Repo
Framework

Topic Model Stability for Hierarchical Summarization

Title Topic Model Stability for Hierarchical Summarization
Authors John Miller, Kathleen McCoy
Abstract We envisioned responsive generic hierarchical text summarization with summaries organized by section and paragraph based on hierarchical structure topic models. But we had to be sure that topic models were stable for the sampled corpora. To that end we developed a methodology for aligning multiple hierarchical structure topic models run over the same corpus under similar conditions, calculating a representative centroid model, and reporting stability of the centroid model. We ran stability experiments for standard corpora and a development corpus of Global Warming articles. We found flat and hierarchical structures of two levels plus the root offer stable centroid models, but hierarchical structures of three levels plus the root didn{'}t seem stable enough for use in hierarchical summarization.
Tasks Text Summarization, Topic Models
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4509/
PDF https://www.aclweb.org/anthology/W17-4509
PWC https://paperswithcode.com/paper/topic-model-stability-for-hierarchical
Repo
Framework

Mining Argumentative Structure from Natural Language text using Automatically Generated Premise-Conclusion Topic Models

Title Mining Argumentative Structure from Natural Language text using Automatically Generated Premise-Conclusion Topic Models
Authors John Lawrence, Chris Reed
Abstract This paper presents a method of extracting argumentative structure from natural language text. The approach presented is based on the way in which we understand an argument being made, not just from the words said, but from existing contextual knowledge and understanding of the broader issues. We leverage high-precision, low-recall techniques in order to automatically build a large corpus of inferential statements related to the text{'}s topic. These statements are then used to produce a matrix representing the inferential relationship between different aspects of the topic. From this matrix, we are able to determine connectedness and directionality of inference between statements in the original text. By following this approach, we obtain results that compare favourably to those of other similar techniques to classify premise-conclusion pairs (with results 22 points above baseline), but without the requirement of large volumes of annotated, domain specific data.
Tasks Argument Mining, Opinion Mining, Sentiment Analysis, Topic Models
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-5105/
PDF https://www.aclweb.org/anthology/W17-5105
PWC https://paperswithcode.com/paper/mining-argumentative-structure-from-natural
Repo
Framework

Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering

Title Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering
Authors Sam Weng, Chun-Kai Wu, Yu-Chun Wang, Richard Tzong-Han Tsai
Abstract
Tasks Community Question Answering, Question Answering
Published 2017-12-01
URL https://www.aclweb.org/anthology/O17-3003/
PDF https://www.aclweb.org/anthology/O17-3003
PWC https://paperswithcode.com/paper/question-retrieval-with-distributed
Repo
Framework

Visual Interaction Networks: Learning a Physics Simulator from Video

Title Visual Interaction Networks: Learning a Physics Simulator from Video
Authors Nicholas Watters, Daniel Zoran, Theophane Weber, Peter Battaglia, Razvan Pascanu, Andrea Tacchetti
Abstract From just a glance, humans can make rich predictions about the future of a wide range of physical systems. On the other hand, modern approaches from engineering, robotics, and graphics are often restricted to narrow domains or require information about the underlying state. We introduce the Visual Interaction Network, a general-purpose model for learning the dynamics of a physical system from raw visual observations. Our model consists of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Through joint training, the perceptual front-end learns to parse a dynamic visual scene into a set of factored latent object representations. The dynamics predictor learns to roll these states forward in time by computing their interactions, producing a predicted physical trajectory of arbitrary length. We found that from just six input video frames the Visual Interaction Network can generate accurate future trajectories of hundreds of time steps on a wide range of physical systems. Our model can also be applied to scenes with invisible objects, inferring their future states from their effects on the visible objects, and can implicitly infer the unknown mass of objects. This work opens new opportunities for model-based decision-making and planning from raw sensory observations in complex physical environments.
Tasks Decision Making
Published 2017-12-01
URL http://papers.nips.cc/paper/7040-visual-interaction-networks-learning-a-physics-simulator-from-video
PDF http://papers.nips.cc/paper/7040-visual-interaction-networks-learning-a-physics-simulator-from-video.pdf
PWC https://paperswithcode.com/paper/visual-interaction-networks-learning-a
Repo
Framework

Morphosyntactic Analysis of the Pronominal System of Southern Alta

Title Morphosyntactic Analysis of the Pronominal System of Southern Alta
Authors Marvin Abreu
Abstract
Tasks
Published 2017-11-01
URL https://www.aclweb.org/anthology/Y17-1043/
PDF https://www.aclweb.org/anthology/Y17-1043
PWC https://paperswithcode.com/paper/morphosyntactic-analysis-of-the-pronominal
Repo
Framework

A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages

Title A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages
Authors Nizar Habash, Nasser Zalmout, Dima Taji, Hieu Hoang, Maverick Alzate
Abstract We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRC-Acquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words. The corpus follows previous data splits in the literature for tuning, development, and testing. We describe the corpus and how it was created. We also present the first benchmarking results on translating to and from Arabic for 22 European languages.
Tasks Machine Translation
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2038/
PDF https://www.aclweb.org/anthology/E17-2038
PWC https://paperswithcode.com/paper/a-parallel-corpus-for-evaluating-machine
Repo
Framework

Towards Automatic Construction of News Overview Articles by News Synthesis

Title Towards Automatic Construction of News Overview Articles by News Synthesis
Authors Jianmin Zhang, Xiaojun Wan
Abstract In this paper we investigate a new task of automatically constructing an overview article from a given set of news articles about a news event. We propose a news synthesis approach to address this task based on passage segmentation, ranking, selection and merging. Our proposed approach is compared with several typical multi-document summarization methods on the Wikinews dataset, and achieves the best performance on both automatic evaluation and manual evaluation.
Tasks Document Summarization, Multi-Document Summarization
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1224/
PDF https://www.aclweb.org/anthology/D17-1224
PWC https://paperswithcode.com/paper/towards-automatic-construction-of-news
Repo
Framework

Machine Translation, it’s a question of style, innit? The case of English tag questions

Title Machine Translation, it’s a question of style, innit? The case of English tag questions
Authors Rachel Bawden
Abstract In this paper, we address the problem of generating English tag questions (TQs) (e.g. it is, isn{'}t it?) in Machine Translation (MT). We propose a post-edition solution, formulating the problem as a multi-class classification task. We present (i) the automatic annotation of English TQs in a parallel corpus of subtitles and (ii) an approach using a series of classifiers to predict TQ forms, which we use to post-edit state-of-the-art MT outputs. Our method provides significant improvements in English TQ translation when translating from Czech, French and German, in turn improving the fluidity, naturalness, grammatical correctness and pragmatic coherence of MT output.
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1265/
PDF https://www.aclweb.org/anthology/D17-1265
PWC https://paperswithcode.com/paper/machine-translation-its-a-question-of-style
Repo
Framework

Natural Language Processing in Political Campaigns

Title Natural Language Processing in Political Campaigns
Authors Cristina Moise
Abstract This paper overviews the Majoritas ecosystem, providing a complete overview of political campaigns assessment aimed to assist politicians and their staff in delivering consistent and personalized message within social media.
Tasks Opinion Mining, Sentiment Analysis
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-8106/
PDF http://doi.org/10.26615/978-954-452-046-5_006
PWC https://paperswithcode.com/paper/natural-language-processing-in-political
Repo
Framework

Semantic Dependency Parsing via Book Embedding

Title Semantic Dependency Parsing via Book Embedding
Authors Weiwei Sun, Junjie Cao, Xiaojun Wan
Abstract We model a dependency graph as a book, a particular kind of topological space, for semantic dependency parsing. The spine of the book is made up of a sequence of words, and each page contains a subset of noncrossing arcs. To build a semantic graph for a given sentence, we design new Maximum Subgraph algorithms to generate noncrossing graphs on each page, and a Lagrangian Relaxation-based algorithm tocombine pages into a book. Experiments demonstrate the effectiveness of the bookembedding framework across a wide range of conditions. Our parser obtains comparable results with a state-of-the-art transition-based parser.
Tasks Combinatorial Optimization, Dependency Parsing, Semantic Dependency Parsing
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-1077/
PDF https://www.aclweb.org/anthology/P17-1077
PWC https://paperswithcode.com/paper/semantic-dependency-parsing-via-book
Repo
Framework
comments powered by Disqus