Paper Group NANR 11
Joint quantile regression in vector-valued RKHSs. How does Dictionary Size Influence Performance of Vietnamese Word Segmentation?. Linear Relaxations for Finding Diverse Elements in Metric Spaces. Vectors or Graphs? On Differences of Representations for Distributional Semantic Models. SemEval-2016 Task 10: Detecting Minimal Semantic Units and their …
Joint quantile regression in vector-valued RKHSs
Title | Joint quantile regression in vector-valued RKHSs |
Authors | Maxime Sangnier, Olivier Fercoq, Florence D’Alché-Buc |
Abstract | Addressing the will to give a more complete picture than an average relationship provided by standard regression, a novel framework for estimating and predicting simultaneously several conditional quantiles is introduced. The proposed methodology leverages kernel-based multi-task learning to curb the embarrassing phenomenon of quantile crossing, with a one-step estimation procedure and no post-processing. Moreover, this framework comes along with theoretical guarantees and an efficient coordinate descent learning algorithm. Numerical experiments on benchmark and real datasets highlight the enhancements of our approach regarding the prediction error, the crossing occurrences and the training time. |
Tasks | Multi-Task Learning |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6239-joint-quantile-regression-in-vector-valued-rkhss |
http://papers.nips.cc/paper/6239-joint-quantile-regression-in-vector-valued-rkhss.pdf | |
PWC | https://paperswithcode.com/paper/joint-quantile-regression-in-vector-valued |
Repo | |
Framework | |
How does Dictionary Size Influence Performance of Vietnamese Word Segmentation?
Title | How does Dictionary Size Influence Performance of Vietnamese Word Segmentation? |
Authors | Wuying Liu, Lin Wang |
Abstract | Vietnamese word segmentation (VWS) is a challenging basic issue for natural language processing. This paper addresses the problem of how does dictionary size influence VWS performance, proposes two novel measures: square overlap ratio (SOR) and relaxed square overlap ratio (RSOR), and validates their effectiveness. The SOR measure is the product of dictionary overlap ratio and corpus overlap ratio, and the RSOR measure is the relaxed version of SOR measure under an unsupervised condition. The two measures both indicate the suitable degree between segmentation dictionary and object corpus waiting for segmentation. The experimental results show that the more suitable, neither smaller nor larger, dictionary size is better to achieve the state-of-the-art performance for dictionary-based Vietnamese word segmenters. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1172/ |
https://www.aclweb.org/anthology/L16-1172 | |
PWC | https://paperswithcode.com/paper/how-does-dictionary-size-influence |
Repo | |
Framework | |
Linear Relaxations for Finding Diverse Elements in Metric Spaces
Title | Linear Relaxations for Finding Diverse Elements in Metric Spaces |
Authors | Aditya Bhaskara, Mehrdad Ghadiri, Vahab Mirrokni, Ola Svensson |
Abstract | Choosing a diverse subset of a large collection of points in a metric space is a fundamental problem, with applications in feature selection, recommender systems, web search, data summarization, etc. Various notions of diversity have been proposed, tailored to different applications. The general algorithmic goal is to find a subset of points that maximize diversity, while obeying a cardinality (or more generally, matroid) constraint. The goal of this paper is to develop a novel linear programming (LP) framework that allows us to design approximation algorithms for such problems. We study an objective known as {\em sum-min} diversity, which is known to be effective in many applications, and give the first constant factor approximation algorithm. Our LP framework allows us to easily incorporate additional constraints, as well as secondary objectives. We also prove a hardness result for two natural diversity objectives, under the so-called {\em planted clique} assumption. Finally, we study the empirical performance of our algorithm on several standard datasets. We first study the approximation quality of the algorithm by comparing with the LP objective. Then, we compare the quality of the solutions produced by our method with other popular diversity maximization algorithms. |
Tasks | Data Summarization, Feature Selection, Recommendation Systems |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6500-linear-relaxations-for-finding-diverse-elements-in-metric-spaces |
http://papers.nips.cc/paper/6500-linear-relaxations-for-finding-diverse-elements-in-metric-spaces.pdf | |
PWC | https://paperswithcode.com/paper/linear-relaxations-for-finding-diverse |
Repo | |
Framework | |
Vectors or Graphs? On Differences of Representations for Distributional Semantic Models
Title | Vectors or Graphs? On Differences of Representations for Distributional Semantic Models |
Authors | Chris Biemann |
Abstract | Distributional Semantic Models (DSMs) have recently received increased attention, together with the rise of neural architectures for scalable training of dense vector embeddings. While some of the literature even includes terms like {}vectors{'} and { }dimensionality{'} in the definition of DSMs, there are some good reasons why we should consider alternative formulations of distributional models. As an instance, I present a scalable graph-based solution to distributional semantics. The model belongs to the family of {`}count-based{'} DSMs, keeps its representation sparse and explicit, and thus fully interpretable. I will highlight some important differences between sparse graph-based and dense vector approaches to DSMs: while dense vector-based models are computationally easier to handle and provide a nice uniform representation that can be compared and combined in many ways, they lack interpretability, provenance and robustness. On the other hand, graph-based sparse models have a more straightforward interpretation, handle sense distinctions more naturally and can straightforwardly be linked to knowledge bases, while lacking the ability to compare arbitrary lexical units and a compositionality operation. Since both representations have their merits, I opt for exploring their combination in the outlook. | |
Tasks | Information Retrieval |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5301/ |
https://www.aclweb.org/anthology/W16-5301 | |
PWC | https://paperswithcode.com/paper/vectors-or-graphs-on-differences-of |
Repo | |
Framework | |
SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)
Title | SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM) |
Authors | Nathan Schneider, Dirk Hovy, Anders Johannsen, Marine Carpuat |
Abstract | |
Tasks | Part-Of-Speech Tagging, Word Sense Disambiguation |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1084/ |
https://www.aclweb.org/anthology/S16-1084 | |
PWC | https://paperswithcode.com/paper/semeval-2016-task-10-detecting-minimal |
Repo | |
Framework | |
Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments
Title | Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments |
Authors | Mariano Felice, Christopher Bryant, Ted Briscoe |
Abstract | We propose a new method of automatically extracting learner errors from parallel English as a Second Language (ESL) sentences in an effort to regularise annotation formats and reduce inconsistencies. Specifically, given an original and corrected sentence, our method first uses a linguistically enhanced alignment algorithm to determine the most likely mappings between tokens, and secondly employs a rule-based function to decide which alignments should be merged. Our method beats all previous approaches on the tested datasets, achieving state-of-the-art results for automatic error extraction. |
Tasks | Grammatical Error Correction, Machine Translation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1079/ |
https://www.aclweb.org/anthology/C16-1079 | |
PWC | https://paperswithcode.com/paper/automatic-extraction-of-learner-errors-in-esl |
Repo | |
Framework | |
Incremental Variational Sparse Gaussian Process Regression
Title | Incremental Variational Sparse Gaussian Process Regression |
Authors | Ching-An Cheng, Byron Boots |
Abstract | Recent work on scaling up Gaussian process regression (GPR) to large datasets has primarily focused on sparse GPR, which leverages a small set of basis functions to approximate the full Gaussian process during inference. However, the majority of these approaches are batch methods that operate on the entire training dataset at once, precluding the use of datasets that are streaming or too large to fit into memory. Although previous work has considered incrementally solving variational sparse GPR, most algorithms fail to update the basis functions and therefore perform suboptimally. We propose a novel incremental learning algorithm for variational sparse GPR based on stochastic mirror ascent of probability densities in reproducing kernel Hilbert space. This new formulation allows our algorithm to update basis functions online in accordance with the manifold structure of probability densities for fast convergence. We conduct several experiments and show that our proposed approach achieves better empirical performance in terms of prediction error than the recent state-of-the-art incremental solutions to variational sparse GPR. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6473-incremental-variational-sparse-gaussian-process-regression |
http://papers.nips.cc/paper/6473-incremental-variational-sparse-gaussian-process-regression.pdf | |
PWC | https://paperswithcode.com/paper/incremental-variational-sparse-gaussian |
Repo | |
Framework | |
Adapting Event Embedding for Implicit Discourse Relation Recognition
Title | Adapting Event Embedding for Implicit Discourse Relation Recognition |
Authors | Maria Leonor Pacheco, I-Ta Lee, Xiao Zhang, Abdullah Khan Zehady, Pranjal Daga, Di Jin, Ayush Parolia, Dan Goldwasser |
Abstract | |
Tasks | Feature Engineering |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/K16-2019/ |
https://www.aclweb.org/anthology/K16-2019 | |
PWC | https://paperswithcode.com/paper/adapting-event-embedding-for-implicit |
Repo | |
Framework | |
Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity.
Title | Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. |
Authors | Barbara Rychalska, Katarzyna Pakulska, Krystyna Chodorowska, Wojciech Walczak, Piotr Andruszkiewicz |
Abstract | |
Tasks | Machine Translation, Question Answering, Semantic Similarity, Semantic Textual Similarity |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1091/ |
https://www.aclweb.org/anthology/S16-1091 | |
PWC | https://paperswithcode.com/paper/samsung-poland-nlp-team-at-semeval-2016-task |
Repo | |
Framework | |
Leveraging Captions in the Wild to Improve Object Detection
Title | Leveraging Captions in the Wild to Improve Object Detection |
Authors | Mert Kilickaya, Nazli Ikizler-Cinbis, Erkut Erdem, Aykut Erdem |
Abstract | |
Tasks | Object Detection |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-3204/ |
https://www.aclweb.org/anthology/W16-3204 | |
PWC | https://paperswithcode.com/paper/leveraging-captions-in-the-wild-to-improve |
Repo | |
Framework | |
Generalized Correspondence-LDA Models (GC-LDA) for Identifying Functional Regions in the Brain
Title | Generalized Correspondence-LDA Models (GC-LDA) for Identifying Functional Regions in the Brain |
Authors | Timothy Rubin, Oluwasanmi O. Koyejo, Michael N. Jones, Tal Yarkoni |
Abstract | This paper presents Generalized Correspondence-LDA (GC-LDA), a generalization of the Correspondence-LDA model that allows for variable spatial representations to be associated with topics, and increased flexibility in terms of the strength of the correspondence between data types induced by the model. We present three variants of GC-LDA, each of which associates topics with a different spatial representation, and apply them to a corpus of neuroimaging data. In the context of this dataset, each topic corresponds to a functional brain region, where the region’s spatial extent is captured by a probability distribution over neural activity, and the region’s cognitive function is captured by a probability distribution over linguistic terms. We illustrate the qualitative improvements offered by GC-LDA in terms of the types of topics extracted with alternative spatial representations, as well as the model’s ability to incorporate a-priori knowledge from the neuroimaging literature. We furthermore demonstrate that the novel features of GC-LDA improve predictions for missing data. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6274-generalized-correspondence-lda-models-gc-lda-for-identifying-functional-regions-in-the-brain |
http://papers.nips.cc/paper/6274-generalized-correspondence-lda-models-gc-lda-for-identifying-functional-regions-in-the-brain.pdf | |
PWC | https://paperswithcode.com/paper/generalized-correspondence-lda-models-gc-lda |
Repo | |
Framework | |
SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles
Title | SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles |
Authors | Liling Tan, Carolina Scarton, Lucia Specia, Josef van Genabith |
Abstract | |
Tasks | Machine Translation, Semantic Textual Similarity |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1095/ |
https://www.aclweb.org/anthology/S16-1095 | |
PWC | https://paperswithcode.com/paper/saarsheff-at-semeval-2016-task-1-semantic |
Repo | |
Framework | |
Recognizing Salient Entities in Shopping Queries
Title | Recognizing Salient Entities in Shopping Queries |
Authors | Zornitsa Kozareva, Qi Li, Ke Zhai, Weiwei Guo |
Abstract | |
Tasks | Feature Engineering, Structured Prediction, Word Embeddings |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-2018/ |
https://www.aclweb.org/anthology/P16-2018 | |
PWC | https://paperswithcode.com/paper/recognizing-salient-entities-in-shopping |
Repo | |
Framework | |
Towards a resource based on users’ knowledge to overcome the Tip of the Tongue problem.
Title | Towards a resource based on users’ knowledge to overcome the Tip of the Tongue problem. |
Authors | Michael Zock, Chris Biemann |
Abstract | Language production is largely a matter of words which, in the case of access problems, can be searched for in an external resource (lexicon, thesaurus). In this kind of dialogue the user provides the momentarily available knowledge concerning the target and the system responds with the best guess(es) it can make given this input. As tip-of-the-tongue (ToT)-studies have shown, people always have some knowledge concerning the target (meaning fragments, number of syllables, …) even if its complete form is eluding them. We will show here how to tap on this knowledge to build a resource likely to help authors (speakers/writers) to overcome the ToT-problem. Yet, before doing so we need a better understanding of the various kinds of knowledge people have when looking for a word. To this end, we asked crowdworkers to provide some cues to describe a given target and to specify then how each one of them relates to the target, in the hope that this could help others to find the elusive word. Next, we checked how well a given search strategy worked when being applied to differently built lexical networks. The results showed quite dramatic differences, which is not really surprising. After all, different networks are built for different purposes; hence each one of them is more or less suited for a given task. What was more surprising though is the fact that the relational information given by the users did not allow us to find the elusive word in WordNet better than without it. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5308/ |
https://www.aclweb.org/anthology/W16-5308 | |
PWC | https://paperswithcode.com/paper/towards-a-resource-based-on-users-knowledge |
Repo | |
Framework | |
Exploring Different Preposition Sets, Models and Feature Sets in Automatic Generation of Spatial Image Descriptions
Title | Exploring Different Preposition Sets, Models and Feature Sets in Automatic Generation of Spatial Image Descriptions |
Authors | Anja Belz, Adrian Muscat, Br Birmingham, on |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-3209/ |
https://www.aclweb.org/anthology/W16-3209 | |
PWC | https://paperswithcode.com/paper/exploring-different-preposition-sets-models |
Repo | |
Framework | |