Paper Group NANR 91
Variance in Historical Data: How bad is it and how can we profit from it for historical linguistics?. Synchronized Mediawiki based analyzer dictionary development. Preliminary Experiments concerning Verbal Predicative Structure Extraction from a Large Finnish Corpus. Language technology resources and tools for Mansi: an overview. Distributional reg …
Variance in Historical Data: How bad is it and how can we profit from it for historical linguistics?
Title | Variance in Historical Data: How bad is it and how can we profit from it for historical linguistics? |
Authors | Stefanie Dipper |
Abstract | |
Tasks | |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0501/ |
https://www.aclweb.org/anthology/W17-0501 | |
PWC | https://paperswithcode.com/paper/variance-in-historical-data-how-bad-is-it-and |
Repo | |
Framework | |
Synchronized Mediawiki based analyzer dictionary development
Title | Synchronized Mediawiki based analyzer dictionary development |
Authors | Jack Rueter, Mika H{"a}m{"a}l{"a}inen |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-0601/ |
https://www.aclweb.org/anthology/W17-0601 | |
PWC | https://paperswithcode.com/paper/synchronized-mediawiki-based-analyzer |
Repo | |
Framework | |
Preliminary Experiments concerning Verbal Predicative Structure Extraction from a Large Finnish Corpus
Title | Preliminary Experiments concerning Verbal Predicative Structure Extraction from a Large Finnish Corpus |
Authors | Guers Chaminade, e, Thierry Poibeau |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-0605/ |
https://www.aclweb.org/anthology/W17-0605 | |
PWC | https://paperswithcode.com/paper/preliminary-experiments-concerning-verbal |
Repo | |
Framework | |
Language technology resources and tools for Mansi: an overview
Title | Language technology resources and tools for Mansi: an overview |
Authors | Csilla Horv{'a}th, Norbert Szil{'a}gyi, Veronika Vincze, {'A}goston Nagy |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-0606/ |
https://www.aclweb.org/anthology/W17-0606 | |
PWC | https://paperswithcode.com/paper/language-technology-resources-and-tools-for |
Repo | |
Framework | |
Distributional regularities of verbs and verbal adjectives: Treebank evidence and broader implications
Title | Distributional regularities of verbs and verbal adjectives: Treebank evidence and broader implications |
Authors | Dani{"e}l de Kok, Patricia Fischer, Corina Dima, Erhard Hinrichs |
Abstract | |
Tasks | Lemmatization, Word Embeddings |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-7603/ |
https://www.aclweb.org/anthology/W17-7603 | |
PWC | https://paperswithcode.com/paper/distributional-regularities-of-verbs-and |
Repo | |
Framework | |
Predicting Japanese scrambling in the wild
Title | Predicting Japanese scrambling in the wild |
Authors | Naho Orita |
Abstract | Japanese speakers have a choice between canonical SOV and scrambled OSV word order to express the same meaning. Although previous experiments examine the influence of one or two factors for scrambling in a controlled setting, it is not yet known what kinds of multiple effects contribute to scrambling. This study uses naturally distributed data to test the multiple effects on scrambling simultaneously. A regression analysis replicates the NP length effect and suggests the influence of noun types, but it provides no evidence for syntactic priming, given-new ordering, and the animacy effect. These findings only show evidence for sentence-internal factors, but we find no evidence that discourse level factors play a role. |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-0706/ |
https://www.aclweb.org/anthology/W17-0706 | |
PWC | https://paperswithcode.com/paper/predicting-japanese-scrambling-in-the-wild |
Repo | |
Framework | |
Capacity Releasing Diffusion for Speed and Locality.
Title | Capacity Releasing Diffusion for Speed and Locality. |
Authors | Di Wang, Kimon Fountoulakis, Monika Henzinger, Michael W. Mahoney, Satish Rao |
Abstract | Diffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass “too aggressively,” thereby failing to find the “right” clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both faster and stays more local than the classical spectral diffusion process. As an application, we use our CRD Process to develop an improved local algorithm for graph clustering. Our local graph clustering method can find local clusters in a model of clustering where one begins the CRD Process in a cluster whose vertices are connected better internally than externally by an $O(\log^2 n)$ factor, where $n$ is the number of nodes in the cluster. Thus, our CRD Process is the first local graph clustering algorithm that is not subject to the well-known quadratic Cheeger barrier. Our result requires a certain smoothness condition, which we expect to be an artifact of our analysis. Our empirical evaluation demonstrates improved results, in particular for realistic social graphs where there are moderately good—but not very good—clusters. |
Tasks | Graph Clustering |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=827 |
http://proceedings.mlr.press/v70/wang17b/wang17b.pdf | |
PWC | https://paperswithcode.com/paper/capacity-releasing-diffusion-for-speed-and |
Repo | |
Framework | |
Leveraging Linguistic Resources for Improving Neural Text Classification
Title | Leveraging Linguistic Resources for Improving Neural Text Classification |
Authors | Ming Liu, Gholamreza Haffari, Wray Buntine, An, Michelle a-Rajah |
Abstract | |
Tasks | Document Classification, Information Retrieval, Sentiment Analysis, Text Classification, Word Embeddings |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/U17-1004/ |
https://www.aclweb.org/anthology/U17-1004 | |
PWC | https://paperswithcode.com/paper/leveraging-linguistic-resources-for-improving |
Repo | |
Framework | |
Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model
Title | Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model |
Authors | Sathish Reddy, Dinesh Raghu, Mitesh M. Khapra, Sachindra Joshi |
Abstract | In recent years, knowledge graphs such as Freebase that capture facts about entities and relationships between them have been used actively for answering factoid questions. In this paper, we explore the problem of automatically generating question answer pairs from a given knowledge graph. The generated question answer (QA) pairs can be used in several downstream applications. For example, they could be used for training better QA systems. To generate such QA pairs, we first extract a set of keywords from entities and relationships expressed in a triple stored in the knowledge graph. From each such set, we use a subset of keywords to generate a natural language question that has a unique answer. We treat this subset of keywords as a sequence and propose a sequence to sequence model using RNN to generate a natural language question from it. Our RNN based model generates QA pairs with an accuracy of 33.61 percent and performs 110.47 percent (relative) better than a state-of-the-art template based method for generating natural language question from keywords. We also do an extrinsic evaluation by using the generated QA pairs to train a QA system and observe that the F1-score of the QA system improves by 5.5 percent (relative) when using automatically generated QA pairs in addition to manually generated QA pairs available for training. |
Tasks | Knowledge Graphs, Question Answering, Question Generation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1036/ |
https://www.aclweb.org/anthology/E17-1036 | |
PWC | https://paperswithcode.com/paper/generating-natural-language-question-answer |
Repo | |
Framework | |
SaToS: Assessing and Summarising Terms of Services from German Webshops
Title | SaToS: Assessing and Summarising Terms of Services from German Webshops |
Authors | Daniel Braun, Elena Scepankova, Patrick Holl, Florian Matthes |
Abstract | Every time we buy something online, we are confronted with Terms of Services. However, only a few people actually read these terms, before accepting them, often to their disadvantage. In this paper, we present the SaToS browser plugin which summarises and simplifies Terms of Services from German webshops. |
Tasks | Text Generation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-3534/ |
https://www.aclweb.org/anthology/W17-3534 | |
PWC | https://paperswithcode.com/paper/satos-assessing-and-summarising-terms-of |
Repo | |
Framework | |
Discourse Annotation of Non-native Spontaneous Spoken Responses Using the Rhetorical Structure Theory Framework
Title | Discourse Annotation of Non-native Spontaneous Spoken Responses Using the Rhetorical Structure Theory Framework |
Authors | Xinhao Wang, James Bruno, Hillary Molloy, Keelan Evanini, Klaus Zechner |
Abstract | The availability of the Rhetorical Structure Theory (RST) Discourse Treebank has spurred substantial research into discourse analysis of written texts; however, limited research has been conducted to date on RST annotation and parsing of spoken language, in particular, non-native spontaneous speech. Considering that the measurement of discourse coherence is typically a key metric in human scoring rubrics for assessments of spoken language, we initiated a research effort to obtain RST annotations of a large number of non-native spoken responses from a standardized assessment of academic English proficiency. The resulting inter-annotator kappa agreements on the three different levels of Span, Nuclearity, and Relation are 0.848, 0.766, and 0.653, respectively. Furthermore, a set of features was explored to evaluate the discourse structure of non-native spontaneous speech based on these annotations; the highest performing feature resulted in a correlation of 0.612 with scores of discourse coherence provided by expert human raters. |
Tasks | Machine Translation, Text Generation, Text Summarization |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-2041/ |
https://www.aclweb.org/anthology/P17-2041 | |
PWC | https://paperswithcode.com/paper/discourse-annotation-of-non-native |
Repo | |
Framework | |
StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent
Title | StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent |
Authors | Tyler B. Johnson, Carlos Guestrin |
Abstract | Coordinate descent (CD) is a scalable and simple algorithm for solving many optimization problems in machine learning. Despite this fact, CD can also be very computationally wasteful. Due to sparsity in sparse regression problems, for example, the majority of CD updates often result in no progress toward the solution. To address this inefficiency, we propose a modified CD algorithm named “StingyCD.” By skipping over many updates that are guaranteed to not decrease the objective value, StingyCD significantly reduces convergence times. Since StingyCD only skips updates with this guarantee, however, StingyCD does not fully exploit the problem’s sparsity. For this reason, we also propose StingyCD+, an algorithm that achieves further speed-ups by skipping updates more aggressively. Since StingyCD and StingyCD+ rely on simple modifications to CD, it is also straightforward to use these algorithms with other approaches to scaling optimization. In empirical comparisons, StingyCD and StingyCD+ improve convergence times considerably for several L1-regularized optimization problems. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=764 |
http://proceedings.mlr.press/v70/johnson17a/johnson17a.pdf | |
PWC | https://paperswithcode.com/paper/stingycd-safely-avoiding-wasteful-updates-in |
Repo | |
Framework | |
Chatbot with a Discourse Structure-Driven Dialogue Management
Title | Chatbot with a Discourse Structure-Driven Dialogue Management |
Authors | Boris Galitsky, Dmitry Ilvovsky |
Abstract | We build a chat bot with iterative content exploration that leads a user through a personalized knowledge acquisition session. The chat bot is designed as an automated customer support or product recommendation agent assisting a user in learning product features, product usability, suitability, troubleshooting and other related tasks. To control the user navigation through content, we extend the notion of a linguistic discourse tree (DT) towards a set of documents with multiple sections covering a topic. For a given paragraph, a DT is built by DT parsers. We then combine DTs for the paragraphs of documents to form what we call extended DT, which is a basis for interactive content exploration facilitated by the chat bot. To provide cohesive answers, we use a measure of rhetoric agreement between a question and an answer by tree kernel learning of their DTs. |
Tasks | Chatbot, Dialogue Management, Product Recommendation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-3022/ |
https://www.aclweb.org/anthology/E17-3022 | |
PWC | https://paperswithcode.com/paper/chatbot-with-a-discourse-structure-driven |
Repo | |
Framework | |
A Feature Structure Algebra for FTAG
Title | A Feature Structure Algebra for FTAG |
Authors | Alex Koller, er |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-6201/ |
https://www.aclweb.org/anthology/W17-6201 | |
PWC | https://paperswithcode.com/paper/a-feature-structure-algebra-for-ftag |
Repo | |
Framework | |
Approximating Style by N-gram-based Annotation
Title | Approximating Style by N-gram-based Annotation |
Authors | Melanie Andresen, Heike Zinsmeister |
Abstract | The concept of style is much debated in theoretical as well as empirical terms. From an empirical perspective, the key question is how to operationalize style and thus make it accessible for annotation and quantification. In authorship attribution, many different approaches have successfully resolved this issue at the cost of linguistic interpretability: The resulting algorithms may be able to distinguish one language variety from the other, but do not give us much information on their distinctive linguistic properties. We approach the issue of interpreting stylistic features by extracting linear and syntactic n-grams that are distinctive for a language variety. We present a study that exemplifies this process by a comparison of the German academic languages of linguistics and literary studies. Overall, our findings show that distinctive n-grams can be related to linguistic categories. The results suggest that the style of German literary studies is characterized by nominal structures and the style of linguistics by verbal ones. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4913/ |
https://www.aclweb.org/anthology/W17-4913 | |
PWC | https://paperswithcode.com/paper/approximating-style-by-n-gram-based |
Repo | |
Framework | |