May 5, 2019

2062 words 10 mins read

Paper Group NANR 45

Developing a Dataset for Evaluating Approaches for Document Expansion with Images. Tight Complexity Bounds for Optimizing Composite Objectives. A Dataset for Multimodal Question Answering in the Cultural Heritage Domain. More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing. A Long …

Developing a Dataset for Evaluating Approaches for Document Expansion with Images


Title	Developing a Dataset for Evaluating Approaches for Document Expansion with Images
Authors	Debasis Ganguly, Iacer Calixto, Gareth Jones
Abstract	Motivated by the adage that a {``}picture is worth a thousand words{''} it can be reasoned that automatically enriching the textual content of a document with relevant images can increase the readability of a document. Moreover, features extracted from the additional image data inserted into the textual content of a document may, in principle, be also be used by a retrieval engine to better match the topic of a document with that of a given query. In this paper, we describe our approach of building a ground truth dataset to enable further research into automatic addition of relevant images to text documents. The dataset is comprised of the official ImageCLEF 2010 collection (a collection of images with textual metadata) to serve as the images available for automatic enrichment of text, a set of 25 benchmark documents that are to be enriched, which in this case are children{'}s short stories, and a set of manually judged relevant images for each query story obtained by the standard procedure of depth pooling. We use this benchmark dataset to evaluate the effectiveness of standard information retrieval methods as simple baselines for this task. The results indicate that using the whole story as a weighted query, where the weight of each query term is its tf-idf value, achieves an precision of 0:1714 within the top 5 retrieved images on an average. \|
Tasks	Information Retrieval
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1299/
PDF	https://www.aclweb.org/anthology/L16-1299
PWC	https://paperswithcode.com/paper/developing-a-dataset-for-evaluating
Repo
Framework

Tight Complexity Bounds for Optimizing Composite Objectives


Title	Tight Complexity Bounds for Optimizing Composite Objectives
Authors	Blake E. Woodworth, Nati Srebro
Abstract	We provide tight upper and lower bounds on the complexity of minimizing the average of m convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show that accelerated gradient descent (AGD) and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate. For non-smooth functions, having access to prox oracles reduces the complexity and we present optimal methods based on smoothing that improve over methods using just gradient accesses.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6058-tight-complexity-bounds-for-optimizing-composite-objectives
PDF	http://papers.nips.cc/paper/6058-tight-complexity-bounds-for-optimizing-composite-objectives.pdf
PWC	https://paperswithcode.com/paper/tight-complexity-bounds-for-optimizing-1
Repo
Framework

A Dataset for Multimodal Question Answering in the Cultural Heritage Domain


Title	A Dataset for Multimodal Question Answering in the Cultural Heritage Domain
Authors	Shurong Sheng, Luc Van Gool, Marie-Francine Moens
Abstract	Multimodal question answering in the cultural heritage domain allows visitors to ask questions in a more natural way and thus provides better user experiences with cultural objects while visiting a museum, landmark or any other historical site. In this paper, we introduce the construction of a golden standard dataset that will aid research of multimodal question answering in the cultural heritage domain. The dataset, which will be soon released to the public, contains multimodal content including images of typical artworks from the fascinating old-Egyptian Amarna period, related image-containing documents of the artworks and over 800 multimodal queries integrating visual and textual questions. The multimodal questions and related documents are all in English. The multimodal questions are linked to relevant paragraphs in the related documents that contain the answer to the multimodal query.
Tasks	Question Answering, Speech Recognition, Visual Question Answering
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4003/
PDF	https://www.aclweb.org/anthology/W16-4003
PWC	https://paperswithcode.com/paper/a-dataset-for-multimodal-question-answering
Repo
Framework

More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing


Title	More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing
Authors	Pablo Ruiz, Cl{'e}ment Plancq, Thierry Poibeau
Abstract	Text analysis methods widely used in digital humanities often involve word co-occurrence, e.g. concept co-occurrence networks. These methods provide a useful corpus overview, but cannot determine the predicates that relate co-occurring concepts. Our goal was identifying propositions expressing the points supported or opposed by participants in international climate negotiations. Word co-occurrence methods were not sufficient, and an analysis based on open relation extraction had limited coverage for nominal predicates. We present a pipeline which identifies the points that different actors support and oppose, via a domain model with support/opposition predicates, and analysis rules that exploit the output of semantic role labelling, syntactic dependencies and anaphora resolution. Entity linking and keyphrase extraction are also performed on the propositions related to each actor. A user interface allows examining the main concepts in points supported or opposed by each participant, which participants agree or disagree with each other, and about which issues. The system is an example of tools that digital humanities scholars are asking for, to render rich textual information (beyond word co-occurrence) more amenable to quantitative treatment. An evaluation of the tool was satisfactory.
Tasks	Entity Linking, Relation Extraction, Semantic Parsing
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1300/
PDF	https://www.aclweb.org/anthology/L16-1300
PWC	https://paperswithcode.com/paper/more-than-word-cooccurrence-exploring-support
Repo
Framework

A Long Short-Term Memory Framework for Predicting Humor in Dialogues


Title	A Long Short-Term Memory Framework for Predicting Humor in Dialogues
Authors	Dario Bertero, Pascale Fung
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1016/
PDF	https://www.aclweb.org/anthology/N16-1016
PWC	https://paperswithcode.com/paper/a-long-short-term-memory-framework-for
Repo
Framework


Title	Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests
Authors	Takakazu Imada, Yusuke Inoue, Lei Chen, Syunya Doi, Tian Nie, Chen Zhao, Takehito Utsuro, Yasuhide Kawada
Abstract	This paper proposes how to utilize a search engine in order to predict market shares. We propose to compare rates of concerns of those who search for Web pages among several companies which supply products, given a specific products domain. We measure concerns of those who search for Web pages through search engine suggests. Then, we analyze whether rates of concerns of those who search for Web pages have certain correlation with actual market share. We show that those statistics have certain correlations. We finally propose how to predict the market share of a specific product genre based on the rates of concerns of those who search for Web pages.
Tasks	Time Series
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1303/
PDF	https://www.aclweb.org/anthology/L16-1303
PWC	https://paperswithcode.com/paper/analyzing-time-series-changes-of-correlation
Repo
Framework

The Royal Society Corpus: From Uncharted Data to Corpus


Title	The Royal Society Corpus: From Uncharted Data to Corpus
Authors	Hannah Kermes, Stefania Degaetano-Ortlieb, Ashraf Khamis, J{"o}rg Knappen, Elke Teich
Abstract	We present the Royal Society Corpus (RSC) built from the Philosophical Transactions and Proceedings of the Royal Society of London. At present, the corpus contains articles from the first two centuries of the journal (1665â€•1869) and amounts to around 35 million tokens. The motivation for building the RSC is to investigate the diachronic linguistic development of scientific English. Specifically, we assume that due to specialization, linguistic encodings become more compact over time (Halliday, 1988; Halliday and Martin, 1993), thus creating a specific discourse type characterized by high information density that is functional for expert communication. When building corpora from uncharted material, typically not all relevant meta-data (e.g. author, time, genre) or linguistic data (e.g. sentence/word boundaries, words, parts of speech) is readily available. We present an approach to obtain good quality meta-data and base text data adopting the concept of Agile Software Development.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1305/
PDF	https://www.aclweb.org/anthology/L16-1305
PWC	https://paperswithcode.com/paper/the-royal-society-corpus-from-uncharted-data
Repo
Framework

Building Evaluation Datasets for Consumer-Oriented Information Retrieval


Title	Building Evaluation Datasets for Consumer-Oriented Information Retrieval
Authors	Lorraine Goeuriot, Liadh Kelly, Guido Zuccon, Joao Palotti
Abstract	Common people often experience difficulties in accessing relevant, correct, accurate and understandable health information online. Developing search techniques that aid these information needs is challenging. In this paper we present the datasets created by CLEF eHealth Lab from 2013-2015 for evaluation of search solutions to support common people finding health information online. Specifically, the CLEF eHealth information retrieval (IR) task of this Lab has provided the research community with benchmarks for evaluating consumer-centered health information retrieval, thus fostering research and development aimed to address this challenging problem. Given consumer queries, the goal of the task is to retrieve relevant documents from the provided collection of web pages. The shared datasets provide a large health web crawl, queries representing people{'}s real world information needs, and relevance assessment judgements for the queries.
Tasks	Information Retrieval
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1306/
PDF	https://www.aclweb.org/anthology/L16-1306
PWC	https://paperswithcode.com/paper/building-evaluation-datasets-for-consumer
Repo
Framework

Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech


Title	Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech
Authors	Daniil Kocharov
Abstract	The current study focuses on optimization of Levenshtein algorithm for the purpose of computing the optimal alignment between two phoneme transcriptions of spoken utterance containing sequences of phonetic symbols. The alignment is computed with the help of a confusion matrix in which costs for phonetic symbol deletion, insertion and substitution are defined taking into account various phonological processes that occur in fluent speech, such as anticipatory assimilation, phone elision and epenthesis. The corpus containing about 30 hours of Russian read speech was used to evaluate the presented algorithms. The experimental results have shown significant reduction of misalignment rate in comparison with the baseline Levenshtein algorithm: the number of errors has been reduced from 1.1 {%} to 0.28 {%}
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1308/
PDF	https://www.aclweb.org/anthology/L16-1308
PWC	https://paperswithcode.com/paper/phoneme-alignment-using-the-information-on
Repo
Framework

Optimal Sparse Linear Encoders and Sparse PCA


Title	Optimal Sparse Linear Encoders and Sparse PCA
Authors	Malik Magdon-Ismail, Christos Boutsidis
Abstract	Principal components analysis~(PCA) is the optimal linear encoder of data. Sparse linear encoders (e.g., sparse PCA) produce more interpretable features that can promote better generalization. (\rn{1}) Given a level of sparsity, what is the best approximation to PCA? (\rn{2}) Are there efficient algorithms which can achieve this optimal combinatorial tradeoff? We answer both questions by providing the first polynomial-time algorithms to construct \emph{optimal} sparse linear auto-encoders; additionally, we demonstrate the performance of our algorithms on real data.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6252-optimal-sparse-linear-encoders-and-sparse-pca
PDF	http://papers.nips.cc/paper/6252-optimal-sparse-linear-encoders-and-sparse-pca.pdf
PWC	https://paperswithcode.com/paper/optimal-sparse-linear-encoders-and-sparse-pca
Repo
Framework

Syntactic parsing of chat language in contact center conversation corpus


Title	Syntactic parsing of chat language in contact center conversation corpus
Authors	Alexis Nasr, Geraldine Damnati, Aleks Guerraz, ra, Frederic Bechet
Abstract
Tasks
Published	2016-09-01
URL	https://www.aclweb.org/anthology/W16-3621/
PDF	https://www.aclweb.org/anthology/W16-3621
PWC	https://paperswithcode.com/paper/syntactic-parsing-of-chat-language-in-contact
Repo
Framework

Defining and Counting Phonological Classes in Cross-linguistic Segment Databases


Title	Defining and Counting Phonological Classes in Cross-linguistic Segment Databases
Authors	Dan Dediu, Scott Moisik
Abstract	Recently, there has been an explosion in the availability of large, good-quality cross-linguistic databases such as WALS (Dryer {&} Haspelmath, 2013), Glottolog (Hammarstrom et al., 2015) and Phoible (Moran {&} McCloy, 2014). Databases such as Phoible contain the actual segments used by various languages as they are given in the primary language descriptions. However, this segment-level representation cannot be used directly for analyses that require generalizations over classes of segments that share theoretically interesting features. Here we present a method and the associated R (R Core Team, 2014) code that allows the flexible definition of such meaningful classes and that can identify the sets of segments falling into such a class for any language inventory. The method and its results are important for those interested in exploring cross-linguistic patterns of phonetic and phonological diversity and their relationship to extra-linguistic factors and processes such as climate, economics, history or human genetics.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1310/
PDF	https://www.aclweb.org/anthology/L16-1310
PWC	https://paperswithcode.com/paper/defining-and-counting-phonological-classes-in
Repo
Framework

Learning Embeddings to lexicalise RDF Properties


Title	Learning Embeddings to lexicalise RDF Properties
Authors	Laura Perez-Beltrachini, Claire Gardent
Abstract
Tasks	Text Generation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/S16-2027/
PDF	https://www.aclweb.org/anthology/S16-2027
PWC	https://paperswithcode.com/paper/learning-embeddings-to-lexicalise-rdf
Repo
Framework

Watson Discovery Advisor: Question-answering in an industrial setting


Title	Watson Discovery Advisor: Question-answering in an industrial setting
Authors	Charley Beller, Graham Katz, Allen Ginsberg, Chris Phipps, Sean Bethard, Paul Chase, Elinna Shek, Kristen Summers
Abstract
Tasks	Open-Domain Question Answering, Question Answering
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0101/
PDF	https://www.aclweb.org/anthology/W16-0101
PWC	https://paperswithcode.com/paper/watson-discovery-advisor-question-answering
Repo
Framework

Open-domain Factoid Question Answering via Knowledge Graph Search


Title	Open-domain Factoid Question Answering via Knowledge Graph Search
Authors	Ahmad Aghaebrahimian, Filip Jur{\v{c}}{'\i}{\v{c}}ek
Abstract
Tasks	Knowledge Graphs, Open-Domain Question Answering, Question Answering
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0104/
PDF	https://www.aclweb.org/anthology/W16-0104
PWC	https://paperswithcode.com/paper/open-domain-factoid-question-answering-via
Repo
Framework