Paper Group NANR 45
Developing a Dataset for Evaluating Approaches for Document Expansion with Images. Tight Complexity Bounds for Optimizing Composite Objectives. A Dataset for Multimodal Question Answering in the Cultural Heritage Domain. More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing. A Long …
Developing a Dataset for Evaluating Approaches for Document Expansion with Images
Title | Developing a Dataset for Evaluating Approaches for Document Expansion with Images |
Authors | Debasis Ganguly, Iacer Calixto, Gareth Jones |
Abstract | Motivated by the adage that a {``}picture is worth a thousand words{''} it can be reasoned that automatically enriching the textual content of a document with relevant images can increase the readability of a document. Moreover, features extracted from the additional image data inserted into the textual content of a document may, in principle, be also be used by a retrieval engine to better match the topic of a document with that of a given query. In this paper, we describe our approach of building a ground truth dataset to enable further research into automatic addition of relevant images to text documents. The dataset is comprised of the official ImageCLEF 2010 collection (a collection of images with textual metadata) to serve as the images available for automatic enrichment of text, a set of 25 benchmark documents that are to be enriched, which in this case are children{'}s short stories, and a set of manually judged relevant images for each query story obtained by the standard procedure of depth pooling. We use this benchmark dataset to evaluate the effectiveness of standard information retrieval methods as simple baselines for this task. The results indicate that using the whole story as a weighted query, where the weight of each query term is its tf-idf value, achieves an precision of 0:1714 within the top 5 retrieved images on an average. | |
Tasks | Information Retrieval |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1299/ |
https://www.aclweb.org/anthology/L16-1299 | |
PWC | https://paperswithcode.com/paper/developing-a-dataset-for-evaluating |
Repo | |
Framework | |
Tight Complexity Bounds for Optimizing Composite Objectives
Title | Tight Complexity Bounds for Optimizing Composite Objectives |
Authors | Blake E. Woodworth, Nati Srebro |
Abstract | We provide tight upper and lower bounds on the complexity of minimizing the average of m convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show that accelerated gradient descent (AGD) and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate. For non-smooth functions, having access to prox oracles reduces the complexity and we present optimal methods based on smoothing that improve over methods using just gradient accesses. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6058-tight-complexity-bounds-for-optimizing-composite-objectives |
http://papers.nips.cc/paper/6058-tight-complexity-bounds-for-optimizing-composite-objectives.pdf | |
PWC | https://paperswithcode.com/paper/tight-complexity-bounds-for-optimizing-1 |
Repo | |
Framework | |
A Dataset for Multimodal Question Answering in the Cultural Heritage Domain
Title | A Dataset for Multimodal Question Answering in the Cultural Heritage Domain |
Authors | Shurong Sheng, Luc Van Gool, Marie-Francine Moens |
Abstract | Multimodal question answering in the cultural heritage domain allows visitors to ask questions in a more natural way and thus provides better user experiences with cultural objects while visiting a museum, landmark or any other historical site. In this paper, we introduce the construction of a golden standard dataset that will aid research of multimodal question answering in the cultural heritage domain. The dataset, which will be soon released to the public, contains multimodal content including images of typical artworks from the fascinating old-Egyptian Amarna period, related image-containing documents of the artworks and over 800 multimodal queries integrating visual and textual questions. The multimodal questions and related documents are all in English. The multimodal questions are linked to relevant paragraphs in the related documents that contain the answer to the multimodal query. |
Tasks | Question Answering, Speech Recognition, Visual Question Answering |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4003/ |
https://www.aclweb.org/anthology/W16-4003 | |
PWC | https://paperswithcode.com/paper/a-dataset-for-multimodal-question-answering |
Repo | |
Framework | |
More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing
Title | More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing |
Authors | Pablo Ruiz, Cl{'e}ment Plancq, Thierry Poibeau |
Abstract | Text analysis methods widely used in digital humanities often involve word co-occurrence, e.g. concept co-occurrence networks. These methods provide a useful corpus overview, but cannot determine the predicates that relate co-occurring concepts. Our goal was identifying propositions expressing the points supported or opposed by participants in international climate negotiations. Word co-occurrence methods were not sufficient, and an analysis based on open relation extraction had limited coverage for nominal predicates. We present a pipeline which identifies the points that different actors support and oppose, via a domain model with support/opposition predicates, and analysis rules that exploit the output of semantic role labelling, syntactic dependencies and anaphora resolution. Entity linking and keyphrase extraction are also performed on the propositions related to each actor. A user interface allows examining the main concepts in points supported or opposed by each participant, which participants agree or disagree with each other, and about which issues. The system is an example of tools that digital humanities scholars are asking for, to render rich textual information (beyond word co-occurrence) more amenable to quantitative treatment. An evaluation of the tool was satisfactory. |
Tasks | Entity Linking, Relation Extraction, Semantic Parsing |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1300/ |
https://www.aclweb.org/anthology/L16-1300 | |
PWC | https://paperswithcode.com/paper/more-than-word-cooccurrence-exploring-support |
Repo | |
Framework | |
A Long Short-Term Memory Framework for Predicting Humor in Dialogues
Title | A Long Short-Term Memory Framework for Predicting Humor in Dialogues |
Authors | Dario Bertero, Pascale Fung |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1016/ |
https://www.aclweb.org/anthology/N16-1016 | |
PWC | https://paperswithcode.com/paper/a-long-short-term-memory-framework-for |
Repo | |
Framework | |
Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests
Title | Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests |
Authors | Takakazu Imada, Yusuke Inoue, Lei Chen, Syunya Doi, Tian Nie, Chen Zhao, Takehito Utsuro, Yasuhide Kawada |
Abstract | This paper proposes how to utilize a search engine in order to predict market shares. We propose to compare rates of concerns of those who search for Web pages among several companies which supply products, given a specific products domain. We measure concerns of those who search for Web pages through search engine suggests. Then, we analyze whether rates of concerns of those who search for Web pages have certain correlation with actual market share. We show that those statistics have certain correlations. We finally propose how to predict the market share of a specific product genre based on the rates of concerns of those who search for Web pages. |
Tasks | Time Series |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1303/ |
https://www.aclweb.org/anthology/L16-1303 | |
PWC | https://paperswithcode.com/paper/analyzing-time-series-changes-of-correlation |
Repo | |
Framework | |
The Royal Society Corpus: From Uncharted Data to Corpus
Title | The Royal Society Corpus: From Uncharted Data to Corpus |
Authors | Hannah Kermes, Stefania Degaetano-Ortlieb, Ashraf Khamis, J{"o}rg Knappen, Elke Teich |
Abstract | We present the Royal Society Corpus (RSC) built from the Philosophical Transactions and Proceedings of the Royal Society of London. At present, the corpus contains articles from the first two centuries of the journal (1665―1869) and amounts to around 35 million tokens. The motivation for building the RSC is to investigate the diachronic linguistic development of scientific English. Specifically, we assume that due to specialization, linguistic encodings become more compact over time (Halliday, 1988; Halliday and Martin, 1993), thus creating a specific discourse type characterized by high information density that is functional for expert communication. When building corpora from uncharted material, typically not all relevant meta-data (e.g. author, time, genre) or linguistic data (e.g. sentence/word boundaries, words, parts of speech) is readily available. We present an approach to obtain good quality meta-data and base text data adopting the concept of Agile Software Development. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1305/ |
https://www.aclweb.org/anthology/L16-1305 | |
PWC | https://paperswithcode.com/paper/the-royal-society-corpus-from-uncharted-data |
Repo | |
Framework | |
Building Evaluation Datasets for Consumer-Oriented Information Retrieval
Title | Building Evaluation Datasets for Consumer-Oriented Information Retrieval |
Authors | Lorraine Goeuriot, Liadh Kelly, Guido Zuccon, Joao Palotti |
Abstract | Common people often experience difficulties in accessing relevant, correct, accurate and understandable health information online. Developing search techniques that aid these information needs is challenging. In this paper we present the datasets created by CLEF eHealth Lab from 2013-2015 for evaluation of search solutions to support common people finding health information online. Specifically, the CLEF eHealth information retrieval (IR) task of this Lab has provided the research community with benchmarks for evaluating consumer-centered health information retrieval, thus fostering research and development aimed to address this challenging problem. Given consumer queries, the goal of the task is to retrieve relevant documents from the provided collection of web pages. The shared datasets provide a large health web crawl, queries representing people{'}s real world information needs, and relevance assessment judgements for the queries. |
Tasks | Information Retrieval |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1306/ |
https://www.aclweb.org/anthology/L16-1306 | |
PWC | https://paperswithcode.com/paper/building-evaluation-datasets-for-consumer |
Repo | |
Framework | |
Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech
Title | Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech |
Authors | Daniil Kocharov |
Abstract | The current study focuses on optimization of Levenshtein algorithm for the purpose of computing the optimal alignment between two phoneme transcriptions of spoken utterance containing sequences of phonetic symbols. The alignment is computed with the help of a confusion matrix in which costs for phonetic symbol deletion, insertion and substitution are defined taking into account various phonological processes that occur in fluent speech, such as anticipatory assimilation, phone elision and epenthesis. The corpus containing about 30 hours of Russian read speech was used to evaluate the presented algorithms. The experimental results have shown significant reduction of misalignment rate in comparison with the baseline Levenshtein algorithm: the number of errors has been reduced from 1.1 {%} to 0.28 {%} |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1308/ |
https://www.aclweb.org/anthology/L16-1308 | |
PWC | https://paperswithcode.com/paper/phoneme-alignment-using-the-information-on |
Repo | |
Framework | |
Optimal Sparse Linear Encoders and Sparse PCA
Title | Optimal Sparse Linear Encoders and Sparse PCA |
Authors | Malik Magdon-Ismail, Christos Boutsidis |
Abstract | Principal components analysis~(PCA) is the optimal linear encoder of data. Sparse linear encoders (e.g., sparse PCA) produce more interpretable features that can promote better generalization. (\rn{1}) Given a level of sparsity, what is the best approximation to PCA? (\rn{2}) Are there efficient algorithms which can achieve this optimal combinatorial tradeoff? We answer both questions by providing the first polynomial-time algorithms to construct \emph{optimal} sparse linear auto-encoders; additionally, we demonstrate the performance of our algorithms on real data. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6252-optimal-sparse-linear-encoders-and-sparse-pca |
http://papers.nips.cc/paper/6252-optimal-sparse-linear-encoders-and-sparse-pca.pdf | |
PWC | https://paperswithcode.com/paper/optimal-sparse-linear-encoders-and-sparse-pca |
Repo | |
Framework | |
Syntactic parsing of chat language in contact center conversation corpus
Title | Syntactic parsing of chat language in contact center conversation corpus |
Authors | Alexis Nasr, Geraldine Damnati, Aleks Guerraz, ra, Frederic Bechet |
Abstract | |
Tasks | |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-3621/ |
https://www.aclweb.org/anthology/W16-3621 | |
PWC | https://paperswithcode.com/paper/syntactic-parsing-of-chat-language-in-contact |
Repo | |
Framework | |
Defining and Counting Phonological Classes in Cross-linguistic Segment Databases
Title | Defining and Counting Phonological Classes in Cross-linguistic Segment Databases |
Authors | Dan Dediu, Scott Moisik |
Abstract | Recently, there has been an explosion in the availability of large, good-quality cross-linguistic databases such as WALS (Dryer {&} Haspelmath, 2013), Glottolog (Hammarstrom et al., 2015) and Phoible (Moran {&} McCloy, 2014). Databases such as Phoible contain the actual segments used by various languages as they are given in the primary language descriptions. However, this segment-level representation cannot be used directly for analyses that require generalizations over classes of segments that share theoretically interesting features. Here we present a method and the associated R (R Core Team, 2014) code that allows the flexible definition of such meaningful classes and that can identify the sets of segments falling into such a class for any language inventory. The method and its results are important for those interested in exploring cross-linguistic patterns of phonetic and phonological diversity and their relationship to extra-linguistic factors and processes such as climate, economics, history or human genetics. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1310/ |
https://www.aclweb.org/anthology/L16-1310 | |
PWC | https://paperswithcode.com/paper/defining-and-counting-phonological-classes-in |
Repo | |
Framework | |
Learning Embeddings to lexicalise RDF Properties
Title | Learning Embeddings to lexicalise RDF Properties |
Authors | Laura Perez-Beltrachini, Claire Gardent |
Abstract | |
Tasks | Text Generation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/S16-2027/ |
https://www.aclweb.org/anthology/S16-2027 | |
PWC | https://paperswithcode.com/paper/learning-embeddings-to-lexicalise-rdf |
Repo | |
Framework | |
Watson Discovery Advisor: Question-answering in an industrial setting
Title | Watson Discovery Advisor: Question-answering in an industrial setting |
Authors | Charley Beller, Graham Katz, Allen Ginsberg, Chris Phipps, Sean Bethard, Paul Chase, Elinna Shek, Kristen Summers |
Abstract | |
Tasks | Open-Domain Question Answering, Question Answering |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0101/ |
https://www.aclweb.org/anthology/W16-0101 | |
PWC | https://paperswithcode.com/paper/watson-discovery-advisor-question-answering |
Repo | |
Framework | |
Open-domain Factoid Question Answering via Knowledge Graph Search
Title | Open-domain Factoid Question Answering via Knowledge Graph Search |
Authors | Ahmad Aghaebrahimian, Filip Jur{\v{c}}{'\i}{\v{c}}ek |
Abstract | |
Tasks | Knowledge Graphs, Open-Domain Question Answering, Question Answering |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0104/ |
https://www.aclweb.org/anthology/W16-0104 | |
PWC | https://paperswithcode.com/paper/open-domain-factoid-question-answering-via |
Repo | |
Framework | |