July 26, 2019

1837 words 9 mins read

Paper Group NANR 91

Variance in Historical Data: How bad is it and how can we profit from it for historical linguistics?. Synchronized Mediawiki based analyzer dictionary development. Preliminary Experiments concerning Verbal Predicative Structure Extraction from a Large Finnish Corpus. Language technology resources and tools for Mansi: an overview. Distributional reg …

Variance in Historical Data: How bad is it and how can we profit from it for historical linguistics?


Title	Variance in Historical Data: How bad is it and how can we profit from it for historical linguistics?
Authors	Stefanie Dipper
Abstract
Tasks
Published	2017-05-01
URL	https://www.aclweb.org/anthology/W17-0501/
PDF	https://www.aclweb.org/anthology/W17-0501
PWC	https://paperswithcode.com/paper/variance-in-historical-data-how-bad-is-it-and
Repo
Framework

Synchronized Mediawiki based analyzer dictionary development


Title	Synchronized Mediawiki based analyzer dictionary development
Authors	Jack Rueter, Mika H{"a}m{"a}l{"a}inen
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-0601/
PDF	https://www.aclweb.org/anthology/W17-0601
PWC	https://paperswithcode.com/paper/synchronized-mediawiki-based-analyzer
Repo
Framework

Preliminary Experiments concerning Verbal Predicative Structure Extraction from a Large Finnish Corpus


Title	Preliminary Experiments concerning Verbal Predicative Structure Extraction from a Large Finnish Corpus
Authors	Guers Chaminade, e, Thierry Poibeau
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-0605/
PDF	https://www.aclweb.org/anthology/W17-0605
PWC	https://paperswithcode.com/paper/preliminary-experiments-concerning-verbal
Repo
Framework

Language technology resources and tools for Mansi: an overview


Title	Language technology resources and tools for Mansi: an overview
Authors	Csilla Horv{'a}th, Norbert Szil{'a}gyi, Veronika Vincze, {'A}goston Nagy
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-0606/
PDF	https://www.aclweb.org/anthology/W17-0606
PWC	https://paperswithcode.com/paper/language-technology-resources-and-tools-for
Repo
Framework

Distributional regularities of verbs and verbal adjectives: Treebank evidence and broader implications


Title	Distributional regularities of verbs and verbal adjectives: Treebank evidence and broader implications
Authors	Dani{"e}l de Kok, Patricia Fischer, Corina Dima, Erhard Hinrichs
Abstract
Tasks	Lemmatization, Word Embeddings
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7603/
PDF	https://www.aclweb.org/anthology/W17-7603
PWC	https://paperswithcode.com/paper/distributional-regularities-of-verbs-and
Repo
Framework

Predicting Japanese scrambling in the wild


Title	Predicting Japanese scrambling in the wild
Authors	Naho Orita
Abstract	Japanese speakers have a choice between canonical SOV and scrambled OSV word order to express the same meaning. Although previous experiments examine the influence of one or two factors for scrambling in a controlled setting, it is not yet known what kinds of multiple effects contribute to scrambling. This study uses naturally distributed data to test the multiple effects on scrambling simultaneously. A regression analysis replicates the NP length effect and suggests the influence of noun types, but it provides no evidence for syntactic priming, given-new ordering, and the animacy effect. These findings only show evidence for sentence-internal factors, but we find no evidence that discourse level factors play a role.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-0706/
PDF	https://www.aclweb.org/anthology/W17-0706
PWC	https://paperswithcode.com/paper/predicting-japanese-scrambling-in-the-wild
Repo
Framework

Capacity Releasing Diffusion for Speed and Locality.


Title	Capacity Releasing Diffusion for Speed and Locality.
Authors	Di Wang, Kimon Fountoulakis, Monika Henzinger, Michael W. Mahoney, Satish Rao
Abstract	Diffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass “too aggressively,” thereby failing to find the “right” clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both faster and stays more local than the classical spectral diffusion process. As an application, we use our CRD Process to develop an improved local algorithm for graph clustering. Our local graph clustering method can find local clusters in a model of clustering where one begins the CRD Process in a cluster whose vertices are connected better internally than externally by an $O(\log^2 n)$ factor, where $n$ is the number of nodes in the cluster. Thus, our CRD Process is the first local graph clustering algorithm that is not subject to the well-known quadratic Cheeger barrier. Our result requires a certain smoothness condition, which we expect to be an artifact of our analysis. Our empirical evaluation demonstrates improved results, in particular for realistic social graphs where there are moderately good—but not very good—clusters.
Tasks	Graph Clustering
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=827
PDF	http://proceedings.mlr.press/v70/wang17b/wang17b.pdf
PWC	https://paperswithcode.com/paper/capacity-releasing-diffusion-for-speed-and
Repo
Framework

Leveraging Linguistic Resources for Improving Neural Text Classification


Title	Leveraging Linguistic Resources for Improving Neural Text Classification
Authors	Ming Liu, Gholamreza Haffari, Wray Buntine, An, Michelle a-Rajah
Abstract
Tasks	Document Classification, Information Retrieval, Sentiment Analysis, Text Classification, Word Embeddings
Published	2017-12-01
URL	https://www.aclweb.org/anthology/U17-1004/
PDF	https://www.aclweb.org/anthology/U17-1004
PWC	https://paperswithcode.com/paper/leveraging-linguistic-resources-for-improving
Repo
Framework

Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model


Title	Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model
Authors	Sathish Reddy, Dinesh Raghu, Mitesh M. Khapra, Sachindra Joshi
Abstract	In recent years, knowledge graphs such as Freebase that capture facts about entities and relationships between them have been used actively for answering factoid questions. In this paper, we explore the problem of automatically generating question answer pairs from a given knowledge graph. The generated question answer (QA) pairs can be used in several downstream applications. For example, they could be used for training better QA systems. To generate such QA pairs, we first extract a set of keywords from entities and relationships expressed in a triple stored in the knowledge graph. From each such set, we use a subset of keywords to generate a natural language question that has a unique answer. We treat this subset of keywords as a sequence and propose a sequence to sequence model using RNN to generate a natural language question from it. Our RNN based model generates QA pairs with an accuracy of 33.61 percent and performs 110.47 percent (relative) better than a state-of-the-art template based method for generating natural language question from keywords. We also do an extrinsic evaluation by using the generated QA pairs to train a QA system and observe that the F1-score of the QA system improves by 5.5 percent (relative) when using automatically generated QA pairs in addition to manually generated QA pairs available for training.
Tasks	Knowledge Graphs, Question Answering, Question Generation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-1036/
PDF	https://www.aclweb.org/anthology/E17-1036
PWC	https://paperswithcode.com/paper/generating-natural-language-question-answer
Repo
Framework

SaToS: Assessing and Summarising Terms of Services from German Webshops


Title	SaToS: Assessing and Summarising Terms of Services from German Webshops
Authors	Daniel Braun, Elena Scepankova, Patrick Holl, Florian Matthes
Abstract	Every time we buy something online, we are confronted with Terms of Services. However, only a few people actually read these terms, before accepting them, often to their disadvantage. In this paper, we present the SaToS browser plugin which summarises and simplifies Terms of Services from German webshops.
Tasks	Text Generation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-3534/
PDF	https://www.aclweb.org/anthology/W17-3534
PWC	https://paperswithcode.com/paper/satos-assessing-and-summarising-terms-of
Repo
Framework

Discourse Annotation of Non-native Spontaneous Spoken Responses Using the Rhetorical Structure Theory Framework


Title	Discourse Annotation of Non-native Spontaneous Spoken Responses Using the Rhetorical Structure Theory Framework
Authors	Xinhao Wang, James Bruno, Hillary Molloy, Keelan Evanini, Klaus Zechner
Abstract	The availability of the Rhetorical Structure Theory (RST) Discourse Treebank has spurred substantial research into discourse analysis of written texts; however, limited research has been conducted to date on RST annotation and parsing of spoken language, in particular, non-native spontaneous speech. Considering that the measurement of discourse coherence is typically a key metric in human scoring rubrics for assessments of spoken language, we initiated a research effort to obtain RST annotations of a large number of non-native spoken responses from a standardized assessment of academic English proficiency. The resulting inter-annotator kappa agreements on the three different levels of Span, Nuclearity, and Relation are 0.848, 0.766, and 0.653, respectively. Furthermore, a set of features was explored to evaluate the discourse structure of non-native spontaneous speech based on these annotations; the highest performing feature resulted in a correlation of 0.612 with scores of discourse coherence provided by expert human raters.
Tasks	Machine Translation, Text Generation, Text Summarization
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2041/
PDF	https://www.aclweb.org/anthology/P17-2041
PWC	https://paperswithcode.com/paper/discourse-annotation-of-non-native
Repo
Framework

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent


Title	StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent
Authors	Tyler B. Johnson, Carlos Guestrin
Abstract	Coordinate descent (CD) is a scalable and simple algorithm for solving many optimization problems in machine learning. Despite this fact, CD can also be very computationally wasteful. Due to sparsity in sparse regression problems, for example, the majority of CD updates often result in no progress toward the solution. To address this inefficiency, we propose a modified CD algorithm named “StingyCD.” By skipping over many updates that are guaranteed to not decrease the objective value, StingyCD significantly reduces convergence times. Since StingyCD only skips updates with this guarantee, however, StingyCD does not fully exploit the problem’s sparsity. For this reason, we also propose StingyCD+, an algorithm that achieves further speed-ups by skipping updates more aggressively. Since StingyCD and StingyCD+ rely on simple modifications to CD, it is also straightforward to use these algorithms with other approaches to scaling optimization. In empirical comparisons, StingyCD and StingyCD+ improve convergence times considerably for several L1-regularized optimization problems.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=764
PDF	http://proceedings.mlr.press/v70/johnson17a/johnson17a.pdf
PWC	https://paperswithcode.com/paper/stingycd-safely-avoiding-wasteful-updates-in
Repo
Framework

Chatbot with a Discourse Structure-Driven Dialogue Management


Title	Chatbot with a Discourse Structure-Driven Dialogue Management
Authors	Boris Galitsky, Dmitry Ilvovsky
Abstract	We build a chat bot with iterative content exploration that leads a user through a personalized knowledge acquisition session. The chat bot is designed as an automated customer support or product recommendation agent assisting a user in learning product features, product usability, suitability, troubleshooting and other related tasks. To control the user navigation through content, we extend the notion of a linguistic discourse tree (DT) towards a set of documents with multiple sections covering a topic. For a given paragraph, a DT is built by DT parsers. We then combine DTs for the paragraphs of documents to form what we call extended DT, which is a basis for interactive content exploration facilitated by the chat bot. To provide cohesive answers, we use a measure of rhetoric agreement between a question and an answer by tree kernel learning of their DTs.
Tasks	Chatbot, Dialogue Management, Product Recommendation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-3022/
PDF	https://www.aclweb.org/anthology/E17-3022
PWC	https://paperswithcode.com/paper/chatbot-with-a-discourse-structure-driven
Repo
Framework

A Feature Structure Algebra for FTAG


Title	A Feature Structure Algebra for FTAG
Authors	Alex Koller, er
Abstract
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-6201/
PDF	https://www.aclweb.org/anthology/W17-6201
PWC	https://paperswithcode.com/paper/a-feature-structure-algebra-for-ftag
Repo
Framework

Approximating Style by N-gram-based Annotation


Title	Approximating Style by N-gram-based Annotation
Authors	Melanie Andresen, Heike Zinsmeister
Abstract	The concept of style is much debated in theoretical as well as empirical terms. From an empirical perspective, the key question is how to operationalize style and thus make it accessible for annotation and quantification. In authorship attribution, many different approaches have successfully resolved this issue at the cost of linguistic interpretability: The resulting algorithms may be able to distinguish one language variety from the other, but do not give us much information on their distinctive linguistic properties. We approach the issue of interpreting stylistic features by extracting linear and syntactic n-grams that are distinctive for a language variety. We present a study that exemplifies this process by a comparison of the German academic languages of linguistics and literary studies. Overall, our findings show that distinctive n-grams can be related to linguistic categories. The results suggest that the style of German literary studies is characterized by nominal structures and the style of linguistics by verbal ones.
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4913/
PDF	https://www.aclweb.org/anthology/W17-4913
PWC	https://paperswithcode.com/paper/approximating-style-by-n-gram-based
Repo
Framework