Paper Group NANR 156
A Paraphrase and Semantic Similarity Detection System for User Generated Short-Text Content on Microblogs. Predicting Restaurant Consumption Level through Social Media Footprints. The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents. Hyperedge Replacement and Nonprojective Dependency Structures. LS …
A Paraphrase and Semantic Similarity Detection System for User Generated Short-Text Content on Microblogs
Title | A Paraphrase and Semantic Similarity Detection System for User Generated Short-Text Content on Microblogs |
Authors | Kuntal Dey, Ritvik Shrivastava, Saroj Kaushik |
Abstract | Existing systems deliver high accuracy and F1-scores for detecting paraphrase and semantic similarity on traditional clean-text corpus. For instance, on the clean-text Microsoft Paraphrase benchmark database, the existing systems attain an accuracy as high as 0:8596. However, existing systems for detecting paraphrases and semantic similarity on user-generated short-text content on microblogs such as Twitter, comprising of noisy and ad hoc short-text, needs significant research attention. In this paper, we propose a machine learning based approach towards this. We propose a set of features that, although well-known in the NLP literature for solving other problems, have not been explored for detecting paraphrase or semantic similarity, on noisy user-generated short-text data such as Twitter. We apply support vector machine (SVM) based learning. We use the benchmark Twitter paraphrase data, released as a part of SemEval 2015, for experiments. Our system delivers a paraphrase detection F1-score of 0.717 and semantic similarity detection F1-score of 0.741, thereby significantly outperforming the existing systems, that deliver F1-scores of 0.696 and 0.724 for the two problems respectively. Our features also allow us to obtain a rank among the top-10, when trained on the Microsoft Paraphrase corpus and tested on the corresponding test data, thereby empirically establishing our approach as ubiquitous across the different paraphrase detection databases. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1271/ |
https://www.aclweb.org/anthology/C16-1271 | |
PWC | https://paperswithcode.com/paper/a-paraphrase-and-semantic-similarity |
Repo | |
Framework | |
Predicting Restaurant Consumption Level through Social Media Footprints
Title | Predicting Restaurant Consumption Level through Social Media Footprints |
Authors | Yang Xiao, Yuan Wang, Hangyu Mao, Zhen Xiao |
Abstract | Accurate prediction of user attributes from social media is valuable for both social science analysis and consumer targeting. In this paper, we propose a systematic method to leverage user online social media content for predicting offline restaurant consumption level. We utilize the social login as a bridge and construct a dataset of 8,844 users who have been linked across Dianping (similar to Yelp) and Sina Weibo. More specifically, we construct consumption level ground truth based on user self report spending. We build predictive models using both raw features and, especially, latent features, such as topic distributions and celebrities clusters. The employed methods demonstrate that online social media content has strong predictive power for offline spending. Finally, combined with qualitative feature analysis, we present the differences in words usage, topic interests and following behavior between different consumption level groups. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1314/ |
https://www.aclweb.org/anthology/C16-1314 | |
PWC | https://paperswithcode.com/paper/predicting-restaurant-consumption-level |
Repo | |
Framework | |
The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
Title | The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents |
Authors | Johann Poignant, Mateusz Budnik, Herv{'e} Bredin, Claude Barras, Mickael Stefas, Pierrick Bruneau, Gilles Adda, Laurent Besacier, Hazim Ekenel, Gil Francopoulo, Hern, Javier o, Joseph Mariani, Ramon Morros, Georges Qu{'e}not, Sophie Rosset, Thomas Tamisier |
Abstract | In this paper, we describe the organization and the implementation of the CAMOMILE collaborative annotation framework for multimodal, multimedia, multilingual (3M) data. Given the versatile nature of the analysis which can be performed on 3M data, the structure of the server was kept intentionally simple in order to preserve its genericity, relying on standard Web technologies. Layers of annotations, defined as data associated to a media fragment from the corpus, are stored in a database and can be managed through standard interfaces with authentication. Interfaces tailored specifically to the needed task can then be developed in an agile way, relying on simple but reliable services for the management of the centralized annotations. We then present our implementation of an active learning scenario for person annotation in video, relying on the CAMOMILE server; during a dry run experiment, the manual annotation of 716 speech segments was thus propagated to 3504 labeled tracks. The code of the CAMOMILE framework is distributed in open source. |
Tasks | Active Learning |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1226/ |
https://www.aclweb.org/anthology/L16-1226 | |
PWC | https://paperswithcode.com/paper/the-camomile-collaborative-annotation |
Repo | |
Framework | |
Hyperedge Replacement and Nonprojective Dependency Structures
Title | Hyperedge Replacement and Nonprojective Dependency Structures |
Authors | Daniel Bauer, Owen Rambow |
Abstract | |
Tasks | Machine Translation, Semantic Parsing |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-3311/ |
https://www.aclweb.org/anthology/W16-3311 | |
PWC | https://paperswithcode.com/paper/hyperedge-replacement-and-nonprojective |
Repo | |
Framework | |
LSIS at SemEval-2016 Task 7: Using Web Search Engines for English and Arabic Unsupervised Sentiment Intensity Prediction
Title | LSIS at SemEval-2016 Task 7: Using Web Search Engines for English and Arabic Unsupervised Sentiment Intensity Prediction |
Authors | Amal Htait, Sebastien Fournier, Patrice Bellot |
Abstract | |
Tasks | Sentiment Analysis |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1076/ |
https://www.aclweb.org/anthology/S16-1076 | |
PWC | https://paperswithcode.com/paper/lsis-at-semeval-2016-task-7-using-web-search |
Repo | |
Framework | |
Use of Semantic Knowledge Base for Enhancement of Coherence of Code-mixed Topic-Based Aspect Clusters
Title | Use of Semantic Knowledge Base for Enhancement of Coherence of Code-mixed Topic-Based Aspect Clusters |
Authors | Kavita Asnani, Jyoti D Pawar |
Abstract | |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-6332/ |
https://www.aclweb.org/anthology/W16-6332 | |
PWC | https://paperswithcode.com/paper/use-of-semantic-knowledge-base-for |
Repo | |
Framework | |
Liberal Event Extraction and Event Schema Induction
Title | Liberal Event Extraction and Event Schema Induction |
Authors | Lifu Huang, Taylor Cassidy, Xiaocheng Feng, Heng Ji, Clare R. Voss, Jiawei Han, Avirup Sil |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1025/ |
https://www.aclweb.org/anthology/P16-1025 | |
PWC | https://paperswithcode.com/paper/liberal-event-extraction-and-event-schema |
Repo | |
Framework | |
Efficient construction of metadata-enhanced web corpora
Title | Efficient construction of metadata-enhanced web corpora |
Authors | Adrien Barbaresi |
Abstract | |
Tasks | Information Retrieval |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2602/ |
https://www.aclweb.org/anthology/W16-2602 | |
PWC | https://paperswithcode.com/paper/efficient-construction-of-metadata-enhanced |
Repo | |
Framework | |
Unsupervised Event Coreference for Abstract Words
Title | Unsupervised Event Coreference for Abstract Words |
Authors | Dheeraj Rajagopal, Eduard Hovy, Teruko Mitamura |
Abstract | |
Tasks | |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/W16-6005/ |
https://www.aclweb.org/anthology/W16-6005 | |
PWC | https://paperswithcode.com/paper/unsupervised-event-coreference-for-abstract |
Repo | |
Framework | |
Extending WordNet with Fine-Grained Collocational Information via Supervised Distributional Learning
Title | Extending WordNet with Fine-Grained Collocational Information via Supervised Distributional Learning |
Authors | Luis Espinosa-Anke, Jose Camacho-Collados, Sara Rodr{'\i}guez-Fern{'a}ndez, Horacio Saggion, Leo Wanner |
Abstract | WordNet is probably the best known lexical resource in Natural Language Processing. While it is widely regarded as a high quality repository of concepts and semantic relations, updating and extending it manually is costly. One important type of relation which could potentially add enormous value to WordNet is the inclusion of collocational information, which is paramount in tasks such as Machine Translation, Natural Language Generation and Second Language Learning. In this paper, we present ColWordNet (CWN), an extended WordNet version with fine-grained collocational information, automatically introduced thanks to a method exploiting linear relations between analogous sense-level embeddings spaces. We perform both intrinsic and extrinsic evaluations, and release CWN for the use and scrutiny of the community. |
Tasks | Machine Translation, Semantic Textual Similarity, Sentiment Analysis, Text Generation, Word Embeddings, Word Sense Disambiguation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1323/ |
https://www.aclweb.org/anthology/C16-1323 | |
PWC | https://paperswithcode.com/paper/extending-wordnet-with-fine-grained |
Repo | |
Framework | |
Demonstration of ChaKi.NET – beyond the corpus search system
Title | Demonstration of ChaKi.NET – beyond the corpus search system |
Authors | Masayuki Asahara, Yuji Matsumoto, Toshio Morita |
Abstract | ChaKi.NET is a corpus management system for dependency structure annotated corpora. After more than 10 years of continuous development, the system is now usable not only for corpus search, but also for visualization, annotation, labelling, and formatting for statistical analysis. This paper describes the various functions included in the current ChaKi.NET system. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2011/ |
https://www.aclweb.org/anthology/C16-2011 | |
PWC | https://paperswithcode.com/paper/demonstration-of-chakinet-a-beyond-the-corpus |
Repo | |
Framework | |
The Power of Adaptivity in Identifying Statistical Alternatives
Title | The Power of Adaptivity in Identifying Statistical Alternatives |
Authors | Kevin G. Jamieson, Daniel Haas, Benjamin Recht |
Abstract | This paper studies the trade-off between two different kinds of pure exploration: breadth versus depth. We focus on the most biased coin problem, asking how many total coin flips are required to identify a heavy'' coin from an infinite bag containing both heavy’’ coins with mean $\theta_1 \in (0,1)$, and ``light” coins with mean $\theta_0 \in (0,\theta_1)$, where heavy coins are drawn from the bag with proportion $\alpha \in (0,1/2)$. When $\alpha,\theta_0,\theta_1$ are unknown, the key difficulty of this problem lies in distinguishing whether the two kinds of coins have very similar means, or whether heavy coins are just extremely rare. While existing solutions to this problem require some prior knowledge of the parameters $\theta_0,\theta_1,\alpha$, we propose an adaptive algorithm that requires no such knowledge yet still obtains near-optimal sample complexity guarantees. In contrast, we provide a lower bound showing that non-adaptive strategies require at least quadratically more samples. In characterizing this gap between adaptive and nonadaptive strategies, we make connections to anomaly detection and prove lower bounds on the sample complexity of differentiating between a single parametric distribution and a mixture of two such distributions. | |
Tasks | Anomaly Detection |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6072-the-power-of-adaptivity-in-identifying-statistical-alternatives |
http://papers.nips.cc/paper/6072-the-power-of-adaptivity-in-identifying-statistical-alternatives.pdf | |
PWC | https://paperswithcode.com/paper/the-power-of-adaptivity-in-identifying |
Repo | |
Framework | |
An Open Source Library for Semantic-Based Datetime Resolution
Title | An Open Source Library for Semantic-Based Datetime Resolution |
Authors | Aur{'e}lie Merlo, Denis Pasin |
Abstract | In this paper, we introduce an original Python implementation of datetime resolution in french, which we make available as open-source library. Our approach is based on Frame Semantics and Corpus Pattern Analysis in order to provide a precise semantic interpretation of datetime expressions. This interpretation facilitates the contextual resolution of datetime expressions in timestamp format. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2023/ |
https://www.aclweb.org/anthology/C16-2023 | |
PWC | https://paperswithcode.com/paper/an-open-source-library-for-semantic-based |
Repo | |
Framework | |
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts
Title | Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts |
Authors | |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-4000/ |
https://www.aclweb.org/anthology/N16-4000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-2016-conference-of-the-1 |
Repo | |
Framework | |
USFD at SemEval-2016 Task 1: Putting different State-of-the-Arts into a Box
Title | USFD at SemEval-2016 Task 1: Putting different State-of-the-Arts into a Box |
Authors | Ahmet Aker, Frederic Blain, Andres Duque, Marina Fomicheva, Jurica Seva, Kashif Shah, Daniel Beck |
Abstract | |
Tasks | Information Retrieval, Machine Translation, Semantic Textual Similarity, Word Alignment |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1092/ |
https://www.aclweb.org/anthology/S16-1092 | |
PWC | https://paperswithcode.com/paper/usfd-at-semeval-2016-task-1-putting-different |
Repo | |
Framework | |