Paper Group NANR 117
VectorWeavers at SemEval-2016 Task 10: From Incremental Meaning to Semantic Unit (phrase by phrase). LitWay, Discriminative Extraction for Different Bio-Events. Wasserstein Training of Restricted Boltzmann Machines. Cross-lingual Pronoun Prediction with Linguistically Informed Features. Feature-distributed sparse regression: a screen-and-clean appr …
VectorWeavers at SemEval-2016 Task 10: From Incremental Meaning to Semantic Unit (phrase by phrase)
Title | VectorWeavers at SemEval-2016 Task 10: From Incremental Meaning to Semantic Unit (phrase by phrase) |
Authors | Andreas Scherbakov, Ekaterina Vylomova, Fei Liu, Timothy Baldwin |
Abstract | |
Tasks | Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1145/ |
https://www.aclweb.org/anthology/S16-1145 | |
PWC | https://paperswithcode.com/paper/vectorweavers-at-semeval-2016-task-10-from |
Repo | |
Framework | |
LitWay, Discriminative Extraction for Different Bio-Events
Title | LitWay, Discriminative Extraction for Different Bio-Events |
Authors | Chen Li, Zhiqiang Rao, Xiangrong Zhang |
Abstract | |
Tasks | Relation Extraction |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-3004/ |
https://www.aclweb.org/anthology/W16-3004 | |
PWC | https://paperswithcode.com/paper/litway-discriminative-extraction-for |
Repo | |
Framework | |
Wasserstein Training of Restricted Boltzmann Machines
Title | Wasserstein Training of Restricted Boltzmann Machines |
Authors | Grégoire Montavon, Klaus-Robert Müller, Marco Cuturi |
Abstract | Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role. |
Tasks | Denoising |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6248-wasserstein-training-of-restricted-boltzmann-machines |
http://papers.nips.cc/paper/6248-wasserstein-training-of-restricted-boltzmann-machines.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-training-of-restricted-boltzmann |
Repo | |
Framework | |
Cross-lingual Pronoun Prediction with Linguistically Informed Features
Title | Cross-lingual Pronoun Prediction with Linguistically Informed Features |
Authors | Rachel Bawden |
Abstract | |
Tasks | Coreference Resolution, Language Modelling, Machine Translation, Word Alignment |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2348/ |
https://www.aclweb.org/anthology/W16-2348 | |
PWC | https://paperswithcode.com/paper/cross-lingual-pronoun-prediction-with |
Repo | |
Framework | |
Feature-distributed sparse regression: a screen-and-clean approach
Title | Feature-distributed sparse regression: a screen-and-clean approach |
Authors | Jiyan Yang, Michael W. Mahoney, Michael Saunders, Yuekai Sun |
Abstract | Most existing approaches to distributed sparse regression assume the data is partitioned by samples. However, for high-dimensional data (D » N), it is more natural to partition the data by features. We propose an algorithm to distributed sparse regression when the data is partitioned by features rather than samples. Our approach allows the user to tailor our general method to various distributed computing platforms by trading-off the total amount of data (in bits) sent over the communication network and the number of rounds of communication. We show that an implementation of our approach is capable of solving L1-regularized L2 regression problems with millions of features in minutes. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6187-feature-distributed-sparse-regression-a-screen-and-clean-approach |
http://papers.nips.cc/paper/6187-feature-distributed-sparse-regression-a-screen-and-clean-approach.pdf | |
PWC | https://paperswithcode.com/paper/feature-distributed-sparse-regression-a |
Repo | |
Framework | |
Does String-Based Neural MT Learn Source Syntax?
Title | Does String-Based Neural MT Learn Source Syntax? |
Authors | Xing Shi, Inkit Padhi, Kevin Knight |
Abstract | |
Tasks | Machine Translation |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1159/ |
https://www.aclweb.org/anthology/D16-1159 | |
PWC | https://paperswithcode.com/paper/does-string-based-neural-mt-learn-source |
Repo | |
Framework | |
Evaluating Embeddings using Syntax-based Classification Tasks as a Proxy for Parser Performance
Title | Evaluating Embeddings using Syntax-based Classification Tasks as a Proxy for Parser Performance |
Authors | Arne K{"o}hn |
Abstract | |
Tasks | Dependency Parsing |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2512/ |
https://www.aclweb.org/anthology/W16-2512 | |
PWC | https://paperswithcode.com/paper/evaluating-embeddings-using-syntax-based |
Repo | |
Framework | |
Overview of the 2016 ALTA Shared Task: Cross-KB Coreference
Title | Overview of the 2016 ALTA Shared Task: Cross-KB Coreference |
Authors | Andrew Chisholm, Ben Hachey, Diego Moll{'a} |
Abstract | |
Tasks | Coreference Resolution |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/U16-1020/ |
https://www.aclweb.org/anthology/U16-1020 | |
PWC | https://paperswithcode.com/paper/overview-of-the-2016-alta-shared-task-cross |
Repo | |
Framework | |
融合多任務學習類神經網路聲學模型訓練於會議語音辨識之研究(Leveraging Multi-task Learning with Neural Network Based Acoustic Modeling for Improved Meeting Speech Recognition) [In Chinese]
Title | 融合多任務學習類神經網路聲學模型訓練於會議語音辨識之研究(Leveraging Multi-task Learning with Neural Network Based Acoustic Modeling for Improved Meeting Speech Recognition) [In Chinese] |
Authors | Ming-Han Yang, Yao-Chi Hsu, Hsiao-Tsung Hung, Ying-Wen Chen, Berlin Chen, Kuan-Yu Chen |
Abstract | |
Tasks | Multi-Task Learning, Speech Recognition |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/O16-1002/ |
https://www.aclweb.org/anthology/O16-1002 | |
PWC | https://paperswithcode.com/paper/eaaaaa-ceccc2e-e2a-ae-c-14eeae3e34-ea1c |
Repo | |
Framework | |
The on-line version of Grammatical Dictionary of Polish
Title | The on-line version of Grammatical Dictionary of Polish |
Authors | Marcin Woli{'n}ski, Witold Kiera{'s} |
Abstract | We present the new online edition of a dictionary of Polish inflection ― the Grammatical Dictionary of Polish (http://sgjp.pl). The dictionary is interesting for several reasons: it is comprehensive (over 330,000 lexemes corresponding to almost 4,300,000 different textual words; 1116 handcrafted inflectional patterns), the inflection is presented in an explicit manner in the form of carefully designed tables, the user interface facilitates advanced queries by several features (lemmas, forms, applicable grammatical categories, types of inflection). Moreover, the data of the dictionary is used in morphological analysers, including our product Morfeusz (http://sgjp. pl/morfeusz). From the start, the dictionary was meant to be comfortable for the human reader as well as to be ready for use in NLP applications. In the paper we briefly discuss both aspects of the resource. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1412/ |
https://www.aclweb.org/anthology/L16-1412 | |
PWC | https://paperswithcode.com/paper/the-on-line-version-of-grammatical-dictionary |
Repo | |
Framework | |
Metaphor as a Medium for Emotion: An Empirical Study
Title | Metaphor as a Medium for Emotion: An Empirical Study |
Authors | Saif Mohammad, Ekaterina Shutova, Peter Turney |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/S16-2003/ |
https://www.aclweb.org/anthology/S16-2003 | |
PWC | https://paperswithcode.com/paper/metaphor-as-a-medium-for-emotion-an-empirical |
Repo | |
Framework | |
Extracting Social Networks from Literary Text with Word Embedding Tools
Title | Extracting Social Networks from Literary Text with Word Embedding Tools |
Authors | Gerhard Wohlgenannt, Ekaterina Chernyak, Dmitry Ilvovsky |
Abstract | In this paper a social network is extracted from a literary text. The social network shows, how frequent the characters interact and how similar their social behavior is. Two types of similarity measures are used: the first applies co-occurrence statistics, while the second exploits cosine similarity on different types of word embedding vectors. The results are evaluated by a paid micro-task crowdsourcing survey. The experiments suggest that specific types of word embeddings like word2vec are well-suited for the task at hand and the specific circumstances of literary fiction text. |
Tasks | Language Modelling, Word Embeddings |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4004/ |
https://www.aclweb.org/anthology/W16-4004 | |
PWC | https://paperswithcode.com/paper/extracting-social-networks-from-literary-text |
Repo | |
Framework | |
New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification
Title | New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification |
Authors | Eleanor Chodroff, Matthew Maciejewski, Jan Trmal, Sanjeev Khudanpur, John Godfrey |
Abstract | The Mixer series of speech corpora were collected over several years, principally to support annual NIST evaluations of speaker recognition (SR) technologies. These evaluations focused on conversational speech over a variety of channels and recording conditions. One of the series, Mixer-6, added a new condition, read speech, to support basic scientific research on speaker characteristics, as well as technology evaluation. With read speech it is possible to make relatively precise measurements of phonetic events and features, which can be correlated with the performance of speaker recognition algorithms, or directly used in phonetic analysis of speaker variability. The read speech, as originally recorded, was adequate for large-scale evaluations (e.g., fixed-text speaker ID algorithms) but only marginally suitable for acoustic-phonetic studies. Numerous errors due largely to speaker behavior remained in the corpus, with no record of their locations or rate of occurrence. We undertook the effort to correct this situation with automatic methods supplemented by human listening and annotation. The present paper describes the tools and methods, resulting corrections, and some examples of the kinds of research studies enabled by these enhancements. |
Tasks | Speaker Recognition |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1210/ |
https://www.aclweb.org/anthology/L16-1210 | |
PWC | https://paperswithcode.com/paper/new-release-of-mixer-6-improved-validity-for |
Repo | |
Framework | |
Arabic Corpora for Credibility Analysis
Title | Arabic Corpora for Credibility Analysis |
Authors | Ayman Al Zaatari, Rim El Ballouli, Shady ELbassouni, Wassim El-Hajj, Hazem Hajj, Khaled Shaban, Nizar Habash, Emad Yahya |
Abstract | A significant portion of data generated on blogging and microblogging websites is non-credible as shown in many recent studies. To filter out such non-credible information, machine learning can be deployed to build automatic credibility classifiers. However, as in the case with most supervised machine learning approaches, a sufficiently large and accurate training data must be available. In this paper, we focus on building a public Arabic corpus of blogs and microblogs that can be used for credibility classification. We focus on Arabic due to the recent popularity of blogs and microblogs in the Arab World and due to the lack of any such public corpora in Arabic. We discuss our data acquisition approach and annotation process, provide rigid analysis on the annotated data and finally report some results on the effectiveness of our data for credibility classification. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1696/ |
https://www.aclweb.org/anthology/L16-1696 | |
PWC | https://paperswithcode.com/paper/arabic-corpora-for-credibility-analysis |
Repo | |
Framework | |
Easy Questions First? A Case Study on Curriculum Learning for Question Answering
Title | Easy Questions First? A Case Study on Curriculum Learning for Question Answering |
Authors | Mrinmaya Sachan, Eric Xing |
Abstract | |
Tasks | Active Learning, Question Answering |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1043/ |
https://www.aclweb.org/anthology/P16-1043 | |
PWC | https://paperswithcode.com/paper/easy-questions-first-a-case-study-on |
Repo | |
Framework | |