May 5, 2019

2062 words 10 mins read

Paper Group NANR 45

Paper Group NANR 45

Developing a Dataset for Evaluating Approaches for Document Expansion with Images. Tight Complexity Bounds for Optimizing Composite Objectives. A Dataset for Multimodal Question Answering in the Cultural Heritage Domain. More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing. A Long …

Developing a Dataset for Evaluating Approaches for Document Expansion with Images

Title Developing a Dataset for Evaluating Approaches for Document Expansion with Images
Authors Debasis Ganguly, Iacer Calixto, Gareth Jones
Abstract Motivated by the adage that a {``}picture is worth a thousand words{''} it can be reasoned that automatically enriching the textual content of a document with relevant images can increase the readability of a document. Moreover, features extracted from the additional image data inserted into the textual content of a document may, in principle, be also be used by a retrieval engine to better match the topic of a document with that of a given query. In this paper, we describe our approach of building a ground truth dataset to enable further research into automatic addition of relevant images to text documents. The dataset is comprised of the official ImageCLEF 2010 collection (a collection of images with textual metadata) to serve as the images available for automatic enrichment of text, a set of 25 benchmark documents that are to be enriched, which in this case are children{'}s short stories, and a set of manually judged relevant images for each query story obtained by the standard procedure of depth pooling. We use this benchmark dataset to evaluate the effectiveness of standard information retrieval methods as simple baselines for this task. The results indicate that using the whole story as a weighted query, where the weight of each query term is its tf-idf value, achieves an precision of 0:1714 within the top 5 retrieved images on an average. |
Tasks Information Retrieval
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1299/
PDF https://www.aclweb.org/anthology/L16-1299
PWC https://paperswithcode.com/paper/developing-a-dataset-for-evaluating
Repo
Framework

Tight Complexity Bounds for Optimizing Composite Objectives

Title Tight Complexity Bounds for Optimizing Composite Objectives
Authors Blake E. Woodworth, Nati Srebro
Abstract We provide tight upper and lower bounds on the complexity of minimizing the average of m convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show that accelerated gradient descent (AGD) and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate. For non-smooth functions, having access to prox oracles reduces the complexity and we present optimal methods based on smoothing that improve over methods using just gradient accesses.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6058-tight-complexity-bounds-for-optimizing-composite-objectives
PDF http://papers.nips.cc/paper/6058-tight-complexity-bounds-for-optimizing-composite-objectives.pdf
PWC https://paperswithcode.com/paper/tight-complexity-bounds-for-optimizing-1
Repo
Framework

A Dataset for Multimodal Question Answering in the Cultural Heritage Domain

Title A Dataset for Multimodal Question Answering in the Cultural Heritage Domain
Authors Shurong Sheng, Luc Van Gool, Marie-Francine Moens
Abstract Multimodal question answering in the cultural heritage domain allows visitors to ask questions in a more natural way and thus provides better user experiences with cultural objects while visiting a museum, landmark or any other historical site. In this paper, we introduce the construction of a golden standard dataset that will aid research of multimodal question answering in the cultural heritage domain. The dataset, which will be soon released to the public, contains multimodal content including images of typical artworks from the fascinating old-Egyptian Amarna period, related image-containing documents of the artworks and over 800 multimodal queries integrating visual and textual questions. The multimodal questions and related documents are all in English. The multimodal questions are linked to relevant paragraphs in the related documents that contain the answer to the multimodal query.
Tasks Question Answering, Speech Recognition, Visual Question Answering
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4003/
PDF https://www.aclweb.org/anthology/W16-4003
PWC https://paperswithcode.com/paper/a-dataset-for-multimodal-question-answering
Repo
Framework

More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing

Title More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing
Authors Pablo Ruiz, Cl{'e}ment Plancq, Thierry Poibeau
Abstract Text analysis methods widely used in digital humanities often involve word co-occurrence, e.g. concept co-occurrence networks. These methods provide a useful corpus overview, but cannot determine the predicates that relate co-occurring concepts. Our goal was identifying propositions expressing the points supported or opposed by participants in international climate negotiations. Word co-occurrence methods were not sufficient, and an analysis based on open relation extraction had limited coverage for nominal predicates. We present a pipeline which identifies the points that different actors support and oppose, via a domain model with support/opposition predicates, and analysis rules that exploit the output of semantic role labelling, syntactic dependencies and anaphora resolution. Entity linking and keyphrase extraction are also performed on the propositions related to each actor. A user interface allows examining the main concepts in points supported or opposed by each participant, which participants agree or disagree with each other, and about which issues. The system is an example of tools that digital humanities scholars are asking for, to render rich textual information (beyond word co-occurrence) more amenable to quantitative treatment. An evaluation of the tool was satisfactory.
Tasks Entity Linking, Relation Extraction, Semantic Parsing
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1300/
PDF https://www.aclweb.org/anthology/L16-1300
PWC https://paperswithcode.com/paper/more-than-word-cooccurrence-exploring-support
Repo
Framework

A Long Short-Term Memory Framework for Predicting Humor in Dialogues

Title A Long Short-Term Memory Framework for Predicting Humor in Dialogues
Authors Dario Bertero, Pascale Fung
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1016/
PDF https://www.aclweb.org/anthology/N16-1016
PWC https://paperswithcode.com/paper/a-long-short-term-memory-framework-for
Repo
Framework

Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests

Title Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests
Authors Takakazu Imada, Yusuke Inoue, Lei Chen, Syunya Doi, Tian Nie, Chen Zhao, Takehito Utsuro, Yasuhide Kawada
Abstract This paper proposes how to utilize a search engine in order to predict market shares. We propose to compare rates of concerns of those who search for Web pages among several companies which supply products, given a specific products domain. We measure concerns of those who search for Web pages through search engine suggests. Then, we analyze whether rates of concerns of those who search for Web pages have certain correlation with actual market share. We show that those statistics have certain correlations. We finally propose how to predict the market share of a specific product genre based on the rates of concerns of those who search for Web pages.
Tasks Time Series
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1303/
PDF https://www.aclweb.org/anthology/L16-1303
PWC https://paperswithcode.com/paper/analyzing-time-series-changes-of-correlation
Repo
Framework

The Royal Society Corpus: From Uncharted Data to Corpus

Title The Royal Society Corpus: From Uncharted Data to Corpus
Authors Hannah Kermes, Stefania Degaetano-Ortlieb, Ashraf Khamis, J{"o}rg Knappen, Elke Teich
Abstract We present the Royal Society Corpus (RSC) built from the Philosophical Transactions and Proceedings of the Royal Society of London. At present, the corpus contains articles from the first two centuries of the journal (1665―1869) and amounts to around 35 million tokens. The motivation for building the RSC is to investigate the diachronic linguistic development of scientific English. Specifically, we assume that due to specialization, linguistic encodings become more compact over time (Halliday, 1988; Halliday and Martin, 1993), thus creating a specific discourse type characterized by high information density that is functional for expert communication. When building corpora from uncharted material, typically not all relevant meta-data (e.g. author, time, genre) or linguistic data (e.g. sentence/word boundaries, words, parts of speech) is readily available. We present an approach to obtain good quality meta-data and base text data adopting the concept of Agile Software Development.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1305/
PDF https://www.aclweb.org/anthology/L16-1305
PWC https://paperswithcode.com/paper/the-royal-society-corpus-from-uncharted-data
Repo
Framework

Building Evaluation Datasets for Consumer-Oriented Information Retrieval

Title Building Evaluation Datasets for Consumer-Oriented Information Retrieval
Authors Lorraine Goeuriot, Liadh Kelly, Guido Zuccon, Joao Palotti
Abstract Common people often experience difficulties in accessing relevant, correct, accurate and understandable health information online. Developing search techniques that aid these information needs is challenging. In this paper we present the datasets created by CLEF eHealth Lab from 2013-2015 for evaluation of search solutions to support common people finding health information online. Specifically, the CLEF eHealth information retrieval (IR) task of this Lab has provided the research community with benchmarks for evaluating consumer-centered health information retrieval, thus fostering research and development aimed to address this challenging problem. Given consumer queries, the goal of the task is to retrieve relevant documents from the provided collection of web pages. The shared datasets provide a large health web crawl, queries representing people{'}s real world information needs, and relevance assessment judgements for the queries.
Tasks Information Retrieval
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1306/
PDF https://www.aclweb.org/anthology/L16-1306
PWC https://paperswithcode.com/paper/building-evaluation-datasets-for-consumer
Repo
Framework

Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech

Title Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech
Authors Daniil Kocharov
Abstract The current study focuses on optimization of Levenshtein algorithm for the purpose of computing the optimal alignment between two phoneme transcriptions of spoken utterance containing sequences of phonetic symbols. The alignment is computed with the help of a confusion matrix in which costs for phonetic symbol deletion, insertion and substitution are defined taking into account various phonological processes that occur in fluent speech, such as anticipatory assimilation, phone elision and epenthesis. The corpus containing about 30 hours of Russian read speech was used to evaluate the presented algorithms. The experimental results have shown significant reduction of misalignment rate in comparison with the baseline Levenshtein algorithm: the number of errors has been reduced from 1.1 {%} to 0.28 {%}
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1308/
PDF https://www.aclweb.org/anthology/L16-1308
PWC https://paperswithcode.com/paper/phoneme-alignment-using-the-information-on
Repo
Framework

Optimal Sparse Linear Encoders and Sparse PCA

Title Optimal Sparse Linear Encoders and Sparse PCA
Authors Malik Magdon-Ismail, Christos Boutsidis
Abstract Principal components analysis~(PCA) is the optimal linear encoder of data. Sparse linear encoders (e.g., sparse PCA) produce more interpretable features that can promote better generalization. (\rn{1}) Given a level of sparsity, what is the best approximation to PCA? (\rn{2}) Are there efficient algorithms which can achieve this optimal combinatorial tradeoff? We answer both questions by providing the first polynomial-time algorithms to construct \emph{optimal} sparse linear auto-encoders; additionally, we demonstrate the performance of our algorithms on real data.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6252-optimal-sparse-linear-encoders-and-sparse-pca
PDF http://papers.nips.cc/paper/6252-optimal-sparse-linear-encoders-and-sparse-pca.pdf
PWC https://paperswithcode.com/paper/optimal-sparse-linear-encoders-and-sparse-pca
Repo
Framework

Syntactic parsing of chat language in contact center conversation corpus

Title Syntactic parsing of chat language in contact center conversation corpus
Authors Alexis Nasr, Geraldine Damnati, Aleks Guerraz, ra, Frederic Bechet
Abstract
Tasks
Published 2016-09-01
URL https://www.aclweb.org/anthology/W16-3621/
PDF https://www.aclweb.org/anthology/W16-3621
PWC https://paperswithcode.com/paper/syntactic-parsing-of-chat-language-in-contact
Repo
Framework

Defining and Counting Phonological Classes in Cross-linguistic Segment Databases

Title Defining and Counting Phonological Classes in Cross-linguistic Segment Databases
Authors Dan Dediu, Scott Moisik
Abstract Recently, there has been an explosion in the availability of large, good-quality cross-linguistic databases such as WALS (Dryer {&} Haspelmath, 2013), Glottolog (Hammarstrom et al., 2015) and Phoible (Moran {&} McCloy, 2014). Databases such as Phoible contain the actual segments used by various languages as they are given in the primary language descriptions. However, this segment-level representation cannot be used directly for analyses that require generalizations over classes of segments that share theoretically interesting features. Here we present a method and the associated R (R Core Team, 2014) code that allows the flexible definition of such meaningful classes and that can identify the sets of segments falling into such a class for any language inventory. The method and its results are important for those interested in exploring cross-linguistic patterns of phonetic and phonological diversity and their relationship to extra-linguistic factors and processes such as climate, economics, history or human genetics.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1310/
PDF https://www.aclweb.org/anthology/L16-1310
PWC https://paperswithcode.com/paper/defining-and-counting-phonological-classes-in
Repo
Framework

Learning Embeddings to lexicalise RDF Properties

Title Learning Embeddings to lexicalise RDF Properties
Authors Laura Perez-Beltrachini, Claire Gardent
Abstract
Tasks Text Generation
Published 2016-08-01
URL https://www.aclweb.org/anthology/S16-2027/
PDF https://www.aclweb.org/anthology/S16-2027
PWC https://paperswithcode.com/paper/learning-embeddings-to-lexicalise-rdf
Repo
Framework

Watson Discovery Advisor: Question-answering in an industrial setting

Title Watson Discovery Advisor: Question-answering in an industrial setting
Authors Charley Beller, Graham Katz, Allen Ginsberg, Chris Phipps, Sean Bethard, Paul Chase, Elinna Shek, Kristen Summers
Abstract
Tasks Open-Domain Question Answering, Question Answering
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0101/
PDF https://www.aclweb.org/anthology/W16-0101
PWC https://paperswithcode.com/paper/watson-discovery-advisor-question-answering
Repo
Framework
Title Open-domain Factoid Question Answering via Knowledge Graph Search
Authors Ahmad Aghaebrahimian, Filip Jur{\v{c}}{'\i}{\v{c}}ek
Abstract
Tasks Knowledge Graphs, Open-Domain Question Answering, Question Answering
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0104/
PDF https://www.aclweb.org/anthology/W16-0104
PWC https://paperswithcode.com/paper/open-domain-factoid-question-answering-via
Repo
Framework
comments powered by Disqus