Paper Group NANR 185
Revita: a system for language learning and supporting endangered languages. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model. ZipML: Training Linear Models with End-to-End Low Precision, …
Revita: a system for language learning and supporting endangered languages
Title | Revita: a system for language learning and supporting endangered languages |
Authors | Anisia Katinskaia, Javad Nouri, Roman Yangarber |
Abstract | |
Tasks | Language Acquisition |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0304/ |
https://www.aclweb.org/anthology/W17-0304 | |
PWC | https://paperswithcode.com/paper/revita-a-system-for-language-learning-and |
Repo | |
Framework | |
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
Title | Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers |
Authors | |
Abstract | |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1000/ |
https://www.aclweb.org/anthology/E17-1000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-15th-conference-of-the |
Repo | |
Framework | |
Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model
Title | Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model |
Authors | Eva D{'}hondt, Cyril Grouin, Brigitte Grau |
Abstract | In this paper we present a novel approach to the automatic correction of OCR-induced orthographic errors in a given text. While current systems depend heavily on large training corpora or external information, such as domain-specific lexicons or confidence scores from the OCR process, our system only requires a small amount of (relatively) clean training data from a representative corpus to learn a character-based statistical language model using Bidirectional Long Short-Term Memory Networks (biLSTMs). We demonstrate the versatility and adaptability of our system on different text corpora with varying degrees of textual noise, including a real-life OCR corpus in the medical domain. |
Tasks | Language Modelling, Named Entity Recognition, Optical Character Recognition |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1101/ |
https://www.aclweb.org/anthology/I17-1101 | |
PWC | https://paperswithcode.com/paper/generating-a-training-corpus-for-ocr-post |
Repo | |
Framework | |
ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning
Title | ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning |
Authors | Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang |
Abstract | Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We mainly focus on linear models, and the answer is yes for linear models. We develop a simple framework called ZipML based on one simple but novel strategy called double sampling. Our ZipML framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias. We validate our framework across a range of applications, and show that it enables an FPGA prototype that is up to $6.5\times$ faster than an implementation using full 32-bit precision. We further develop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another $1.7\times$ in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the-art XNOR-Net. |
Tasks | Quantization |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=659 |
http://proceedings.mlr.press/v70/zhang17e/zhang17e.pdf | |
PWC | https://paperswithcode.com/paper/zipml-training-linear-models-with-end-to-end |
Repo | |
Framework | |
Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique
Title | Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique |
Authors | Shruti Rijhwani, Royal Sequiera, Monojit Choudhury, Kalika Bali, Ch Maddila, ra Shekhar |
Abstract | Word-level language detection is necessary for analyzing code-switched text, where multiple languages could be mixed within a sentence. Existing models are restricted to code-switching between two specific languages and fail in real-world scenarios as text input rarely has a priori information on the languages used. We present a novel unsupervised word-level language detection technique for code-switched text for an arbitrarily large number of languages, which does not require any manually annotated training data. Our experiments with tweets in seven languages show a 74{%} relative error reduction in word-level labeling with respect to competitive baselines. We then use this system to conduct a large-scale quantitative analysis of code-switching patterns on Twitter, both global as well as region-specific, with 58M tweets. |
Tasks | |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-1180/ |
https://www.aclweb.org/anthology/P17-1180 | |
PWC | https://paperswithcode.com/paper/estimating-code-switching-on-twitter-with-a |
Repo | |
Framework | |
Topic Model Stability for Hierarchical Summarization
Title | Topic Model Stability for Hierarchical Summarization |
Authors | John Miller, Kathleen McCoy |
Abstract | We envisioned responsive generic hierarchical text summarization with summaries organized by section and paragraph based on hierarchical structure topic models. But we had to be sure that topic models were stable for the sampled corpora. To that end we developed a methodology for aligning multiple hierarchical structure topic models run over the same corpus under similar conditions, calculating a representative centroid model, and reporting stability of the centroid model. We ran stability experiments for standard corpora and a development corpus of Global Warming articles. We found flat and hierarchical structures of two levels plus the root offer stable centroid models, but hierarchical structures of three levels plus the root didn{'}t seem stable enough for use in hierarchical summarization. |
Tasks | Text Summarization, Topic Models |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4509/ |
https://www.aclweb.org/anthology/W17-4509 | |
PWC | https://paperswithcode.com/paper/topic-model-stability-for-hierarchical |
Repo | |
Framework | |
Mining Argumentative Structure from Natural Language text using Automatically Generated Premise-Conclusion Topic Models
Title | Mining Argumentative Structure from Natural Language text using Automatically Generated Premise-Conclusion Topic Models |
Authors | John Lawrence, Chris Reed |
Abstract | This paper presents a method of extracting argumentative structure from natural language text. The approach presented is based on the way in which we understand an argument being made, not just from the words said, but from existing contextual knowledge and understanding of the broader issues. We leverage high-precision, low-recall techniques in order to automatically build a large corpus of inferential statements related to the text{'}s topic. These statements are then used to produce a matrix representing the inferential relationship between different aspects of the topic. From this matrix, we are able to determine connectedness and directionality of inference between statements in the original text. By following this approach, we obtain results that compare favourably to those of other similar techniques to classify premise-conclusion pairs (with results 22 points above baseline), but without the requirement of large volumes of annotated, domain specific data. |
Tasks | Argument Mining, Opinion Mining, Sentiment Analysis, Topic Models |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-5105/ |
https://www.aclweb.org/anthology/W17-5105 | |
PWC | https://paperswithcode.com/paper/mining-argumentative-structure-from-natural |
Repo | |
Framework | |
Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering
Title | Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering |
Authors | Sam Weng, Chun-Kai Wu, Yu-Chun Wang, Richard Tzong-Han Tsai |
Abstract | |
Tasks | Community Question Answering, Question Answering |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/O17-3003/ |
https://www.aclweb.org/anthology/O17-3003 | |
PWC | https://paperswithcode.com/paper/question-retrieval-with-distributed |
Repo | |
Framework | |
Visual Interaction Networks: Learning a Physics Simulator from Video
Title | Visual Interaction Networks: Learning a Physics Simulator from Video |
Authors | Nicholas Watters, Daniel Zoran, Theophane Weber, Peter Battaglia, Razvan Pascanu, Andrea Tacchetti |
Abstract | From just a glance, humans can make rich predictions about the future of a wide range of physical systems. On the other hand, modern approaches from engineering, robotics, and graphics are often restricted to narrow domains or require information about the underlying state. We introduce the Visual Interaction Network, a general-purpose model for learning the dynamics of a physical system from raw visual observations. Our model consists of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Through joint training, the perceptual front-end learns to parse a dynamic visual scene into a set of factored latent object representations. The dynamics predictor learns to roll these states forward in time by computing their interactions, producing a predicted physical trajectory of arbitrary length. We found that from just six input video frames the Visual Interaction Network can generate accurate future trajectories of hundreds of time steps on a wide range of physical systems. Our model can also be applied to scenes with invisible objects, inferring their future states from their effects on the visible objects, and can implicitly infer the unknown mass of objects. This work opens new opportunities for model-based decision-making and planning from raw sensory observations in complex physical environments. |
Tasks | Decision Making |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7040-visual-interaction-networks-learning-a-physics-simulator-from-video |
http://papers.nips.cc/paper/7040-visual-interaction-networks-learning-a-physics-simulator-from-video.pdf | |
PWC | https://paperswithcode.com/paper/visual-interaction-networks-learning-a |
Repo | |
Framework | |
Morphosyntactic Analysis of the Pronominal System of Southern Alta
Title | Morphosyntactic Analysis of the Pronominal System of Southern Alta |
Authors | Marvin Abreu |
Abstract | |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1043/ |
https://www.aclweb.org/anthology/Y17-1043 | |
PWC | https://paperswithcode.com/paper/morphosyntactic-analysis-of-the-pronominal |
Repo | |
Framework | |
A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages
Title | A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages |
Authors | Nizar Habash, Nasser Zalmout, Dima Taji, Hieu Hoang, Maverick Alzate |
Abstract | We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRC-Acquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words. The corpus follows previous data splits in the literature for tuning, development, and testing. We describe the corpus and how it was created. We also present the first benchmarking results on translating to and from Arabic for 22 European languages. |
Tasks | Machine Translation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-2038/ |
https://www.aclweb.org/anthology/E17-2038 | |
PWC | https://paperswithcode.com/paper/a-parallel-corpus-for-evaluating-machine |
Repo | |
Framework | |
Towards Automatic Construction of News Overview Articles by News Synthesis
Title | Towards Automatic Construction of News Overview Articles by News Synthesis |
Authors | Jianmin Zhang, Xiaojun Wan |
Abstract | In this paper we investigate a new task of automatically constructing an overview article from a given set of news articles about a news event. We propose a news synthesis approach to address this task based on passage segmentation, ranking, selection and merging. Our proposed approach is compared with several typical multi-document summarization methods on the Wikinews dataset, and achieves the best performance on both automatic evaluation and manual evaluation. |
Tasks | Document Summarization, Multi-Document Summarization |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1224/ |
https://www.aclweb.org/anthology/D17-1224 | |
PWC | https://paperswithcode.com/paper/towards-automatic-construction-of-news |
Repo | |
Framework | |
Machine Translation, it’s a question of style, innit? The case of English tag questions
Title | Machine Translation, it’s a question of style, innit? The case of English tag questions |
Authors | Rachel Bawden |
Abstract | In this paper, we address the problem of generating English tag questions (TQs) (e.g. it is, isn{'}t it?) in Machine Translation (MT). We propose a post-edition solution, formulating the problem as a multi-class classification task. We present (i) the automatic annotation of English TQs in a parallel corpus of subtitles and (ii) an approach using a series of classifiers to predict TQ forms, which we use to post-edit state-of-the-art MT outputs. Our method provides significant improvements in English TQ translation when translating from Czech, French and German, in turn improving the fluidity, naturalness, grammatical correctness and pragmatic coherence of MT output. |
Tasks | Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1265/ |
https://www.aclweb.org/anthology/D17-1265 | |
PWC | https://paperswithcode.com/paper/machine-translation-its-a-question-of-style |
Repo | |
Framework | |
Natural Language Processing in Political Campaigns
Title | Natural Language Processing in Political Campaigns |
Authors | Cristina Moise |
Abstract | This paper overviews the Majoritas ecosystem, providing a complete overview of political campaigns assessment aimed to assist politicians and their staff in delivering consistent and personalized message within social media. |
Tasks | Opinion Mining, Sentiment Analysis |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-8106/ |
http://doi.org/10.26615/978-954-452-046-5_006 | |
PWC | https://paperswithcode.com/paper/natural-language-processing-in-political |
Repo | |
Framework | |
Semantic Dependency Parsing via Book Embedding
Title | Semantic Dependency Parsing via Book Embedding |
Authors | Weiwei Sun, Junjie Cao, Xiaojun Wan |
Abstract | We model a dependency graph as a book, a particular kind of topological space, for semantic dependency parsing. The spine of the book is made up of a sequence of words, and each page contains a subset of noncrossing arcs. To build a semantic graph for a given sentence, we design new Maximum Subgraph algorithms to generate noncrossing graphs on each page, and a Lagrangian Relaxation-based algorithm tocombine pages into a book. Experiments demonstrate the effectiveness of the bookembedding framework across a wide range of conditions. Our parser obtains comparable results with a state-of-the-art transition-based parser. |
Tasks | Combinatorial Optimization, Dependency Parsing, Semantic Dependency Parsing |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-1077/ |
https://www.aclweb.org/anthology/P17-1077 | |
PWC | https://paperswithcode.com/paper/semantic-dependency-parsing-via-book |
Repo | |
Framework | |