July 26, 2019

2001 words 10 mins read

Paper Group NANR 185

Revita: a system for language learning and supporting endangered languages. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model. ZipML: Training Linear Models with End-to-End Low Precision, …

Revita: a system for language learning and supporting endangered languages


Title	Revita: a system for language learning and supporting endangered languages
Authors	Anisia Katinskaia, Javad Nouri, Roman Yangarber
Abstract
Tasks	Language Acquisition
Published	2017-05-01
URL	https://www.aclweb.org/anthology/W17-0304/
PDF	https://www.aclweb.org/anthology/W17-0304
PWC	https://paperswithcode.com/paper/revita-a-system-for-language-learning-and
Repo
Framework

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers


Title	Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
Authors
Abstract
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-1000/
PDF	https://www.aclweb.org/anthology/E17-1000
PWC	https://paperswithcode.com/paper/proceedings-of-the-15th-conference-of-the
Repo
Framework

Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model


Title	Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model
Authors	Eva D{'}hondt, Cyril Grouin, Brigitte Grau
Abstract	In this paper we present a novel approach to the automatic correction of OCR-induced orthographic errors in a given text. While current systems depend heavily on large training corpora or external information, such as domain-specific lexicons or confidence scores from the OCR process, our system only requires a small amount of (relatively) clean training data from a representative corpus to learn a character-based statistical language model using Bidirectional Long Short-Term Memory Networks (biLSTMs). We demonstrate the versatility and adaptability of our system on different text corpora with varying degrees of textual noise, including a real-life OCR corpus in the medical domain.
Tasks	Language Modelling, Named Entity Recognition, Optical Character Recognition
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-1101/
PDF	https://www.aclweb.org/anthology/I17-1101
PWC	https://paperswithcode.com/paper/generating-a-training-corpus-for-ocr-post
Repo
Framework

ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning


Title	ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning
Authors	Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang
Abstract	Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We mainly focus on linear models, and the answer is yes for linear models. We develop a simple framework called ZipML based on one simple but novel strategy called double sampling. Our ZipML framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias. We validate our framework across a range of applications, and show that it enables an FPGA prototype that is up to $6.5\times$ faster than an implementation using full 32-bit precision. We further develop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another $1.7\times$ in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the-art XNOR-Net.
Tasks	Quantization
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=659
PDF	http://proceedings.mlr.press/v70/zhang17e/zhang17e.pdf
PWC	https://paperswithcode.com/paper/zipml-training-linear-models-with-end-to-end
Repo
Framework

Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique


Title	Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique
Authors	Shruti Rijhwani, Royal Sequiera, Monojit Choudhury, Kalika Bali, Ch Maddila, ra Shekhar
Abstract	Word-level language detection is necessary for analyzing code-switched text, where multiple languages could be mixed within a sentence. Existing models are restricted to code-switching between two specific languages and fail in real-world scenarios as text input rarely has a priori information on the languages used. We present a novel unsupervised word-level language detection technique for code-switched text for an arbitrarily large number of languages, which does not require any manually annotated training data. Our experiments with tweets in seven languages show a 74{%} relative error reduction in word-level labeling with respect to competitive baselines. We then use this system to conduct a large-scale quantitative analysis of code-switching patterns on Twitter, both global as well as region-specific, with 58M tweets.
Tasks
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-1180/
PDF	https://www.aclweb.org/anthology/P17-1180
PWC	https://paperswithcode.com/paper/estimating-code-switching-on-twitter-with-a
Repo
Framework

Topic Model Stability for Hierarchical Summarization


Title	Topic Model Stability for Hierarchical Summarization
Authors	John Miller, Kathleen McCoy
Abstract	We envisioned responsive generic hierarchical text summarization with summaries organized by section and paragraph based on hierarchical structure topic models. But we had to be sure that topic models were stable for the sampled corpora. To that end we developed a methodology for aligning multiple hierarchical structure topic models run over the same corpus under similar conditions, calculating a representative centroid model, and reporting stability of the centroid model. We ran stability experiments for standard corpora and a development corpus of Global Warming articles. We found flat and hierarchical structures of two levels plus the root offer stable centroid models, but hierarchical structures of three levels plus the root didn{'}t seem stable enough for use in hierarchical summarization.
Tasks	Text Summarization, Topic Models
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4509/
PDF	https://www.aclweb.org/anthology/W17-4509
PWC	https://paperswithcode.com/paper/topic-model-stability-for-hierarchical
Repo
Framework

Mining Argumentative Structure from Natural Language text using Automatically Generated Premise-Conclusion Topic Models


Title	Mining Argumentative Structure from Natural Language text using Automatically Generated Premise-Conclusion Topic Models
Authors	John Lawrence, Chris Reed
Abstract	This paper presents a method of extracting argumentative structure from natural language text. The approach presented is based on the way in which we understand an argument being made, not just from the words said, but from existing contextual knowledge and understanding of the broader issues. We leverage high-precision, low-recall techniques in order to automatically build a large corpus of inferential statements related to the text{'}s topic. These statements are then used to produce a matrix representing the inferential relationship between different aspects of the topic. From this matrix, we are able to determine connectedness and directionality of inference between statements in the original text. By following this approach, we obtain results that compare favourably to those of other similar techniques to classify premise-conclusion pairs (with results 22 points above baseline), but without the requirement of large volumes of annotated, domain specific data.
Tasks	Argument Mining, Opinion Mining, Sentiment Analysis, Topic Models
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5105/
PDF	https://www.aclweb.org/anthology/W17-5105
PWC	https://paperswithcode.com/paper/mining-argumentative-structure-from-natural
Repo
Framework

Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering


Title	Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering
Authors	Sam Weng, Chun-Kai Wu, Yu-Chun Wang, Richard Tzong-Han Tsai
Abstract
Tasks	Community Question Answering, Question Answering
Published	2017-12-01
URL	https://www.aclweb.org/anthology/O17-3003/
PDF	https://www.aclweb.org/anthology/O17-3003
PWC	https://paperswithcode.com/paper/question-retrieval-with-distributed
Repo
Framework

Visual Interaction Networks: Learning a Physics Simulator from Video


Title	Visual Interaction Networks: Learning a Physics Simulator from Video
Authors	Nicholas Watters, Daniel Zoran, Theophane Weber, Peter Battaglia, Razvan Pascanu, Andrea Tacchetti
Abstract	From just a glance, humans can make rich predictions about the future of a wide range of physical systems. On the other hand, modern approaches from engineering, robotics, and graphics are often restricted to narrow domains or require information about the underlying state. We introduce the Visual Interaction Network, a general-purpose model for learning the dynamics of a physical system from raw visual observations. Our model consists of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Through joint training, the perceptual front-end learns to parse a dynamic visual scene into a set of factored latent object representations. The dynamics predictor learns to roll these states forward in time by computing their interactions, producing a predicted physical trajectory of arbitrary length. We found that from just six input video frames the Visual Interaction Network can generate accurate future trajectories of hundreds of time steps on a wide range of physical systems. Our model can also be applied to scenes with invisible objects, inferring their future states from their effects on the visible objects, and can implicitly infer the unknown mass of objects. This work opens new opportunities for model-based decision-making and planning from raw sensory observations in complex physical environments.
Tasks	Decision Making
Published	2017-12-01
URL	http://papers.nips.cc/paper/7040-visual-interaction-networks-learning-a-physics-simulator-from-video
PDF	http://papers.nips.cc/paper/7040-visual-interaction-networks-learning-a-physics-simulator-from-video.pdf
PWC	https://paperswithcode.com/paper/visual-interaction-networks-learning-a
Repo
Framework

Morphosyntactic Analysis of the Pronominal System of Southern Alta


Title	Morphosyntactic Analysis of the Pronominal System of Southern Alta
Authors	Marvin Abreu
Abstract
Tasks
Published	2017-11-01
URL	https://www.aclweb.org/anthology/Y17-1043/
PDF	https://www.aclweb.org/anthology/Y17-1043
PWC	https://paperswithcode.com/paper/morphosyntactic-analysis-of-the-pronominal
Repo
Framework

A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages


Title	A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages
Authors	Nizar Habash, Nasser Zalmout, Dima Taji, Hieu Hoang, Maverick Alzate
Abstract	We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRC-Acquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words. The corpus follows previous data splits in the literature for tuning, development, and testing. We describe the corpus and how it was created. We also present the first benchmarking results on translating to and from Arabic for 22 European languages.
Tasks	Machine Translation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2038/
PDF	https://www.aclweb.org/anthology/E17-2038
PWC	https://paperswithcode.com/paper/a-parallel-corpus-for-evaluating-machine
Repo
Framework

Towards Automatic Construction of News Overview Articles by News Synthesis


Title	Towards Automatic Construction of News Overview Articles by News Synthesis
Authors	Jianmin Zhang, Xiaojun Wan
Abstract	In this paper we investigate a new task of automatically constructing an overview article from a given set of news articles about a news event. We propose a news synthesis approach to address this task based on passage segmentation, ranking, selection and merging. Our proposed approach is compared with several typical multi-document summarization methods on the Wikinews dataset, and achieves the best performance on both automatic evaluation and manual evaluation.
Tasks	Document Summarization, Multi-Document Summarization
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1224/
PDF	https://www.aclweb.org/anthology/D17-1224
PWC	https://paperswithcode.com/paper/towards-automatic-construction-of-news
Repo
Framework

Machine Translation, it’s a question of style, innit? The case of English tag questions


Title	Machine Translation, it’s a question of style, innit? The case of English tag questions
Authors	Rachel Bawden
Abstract	In this paper, we address the problem of generating English tag questions (TQs) (e.g. it is, isn{'}t it?) in Machine Translation (MT). We propose a post-edition solution, formulating the problem as a multi-class classification task. We present (i) the automatic annotation of English TQs in a parallel corpus of subtitles and (ii) an approach using a series of classifiers to predict TQ forms, which we use to post-edit state-of-the-art MT outputs. Our method provides significant improvements in English TQ translation when translating from Czech, French and German, in turn improving the fluidity, naturalness, grammatical correctness and pragmatic coherence of MT output.
Tasks	Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1265/
PDF	https://www.aclweb.org/anthology/D17-1265
PWC	https://paperswithcode.com/paper/machine-translation-its-a-question-of-style
Repo
Framework

Natural Language Processing in Political Campaigns


Title	Natural Language Processing in Political Campaigns
Authors	Cristina Moise
Abstract	This paper overviews the Majoritas ecosystem, providing a complete overview of political campaigns assessment aimed to assist politicians and their staff in delivering consistent and personalized message within social media.
Tasks	Opinion Mining, Sentiment Analysis
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-8106/
PDF	http://doi.org/10.26615/978-954-452-046-5_006
PWC	https://paperswithcode.com/paper/natural-language-processing-in-political
Repo
Framework

Semantic Dependency Parsing via Book Embedding


Title	Semantic Dependency Parsing via Book Embedding
Authors	Weiwei Sun, Junjie Cao, Xiaojun Wan
Abstract	We model a dependency graph as a book, a particular kind of topological space, for semantic dependency parsing. The spine of the book is made up of a sequence of words, and each page contains a subset of noncrossing arcs. To build a semantic graph for a given sentence, we design new Maximum Subgraph algorithms to generate noncrossing graphs on each page, and a Lagrangian Relaxation-based algorithm tocombine pages into a book. Experiments demonstrate the effectiveness of the bookembedding framework across a wide range of conditions. Our parser obtains comparable results with a state-of-the-art transition-based parser.
Tasks	Combinatorial Optimization, Dependency Parsing, Semantic Dependency Parsing
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-1077/
PDF	https://www.aclweb.org/anthology/P17-1077
PWC	https://paperswithcode.com/paper/semantic-dependency-parsing-via-book
Repo
Framework