July 26, 2019

1959 words 10 mins read

Paper Group NANR 192

Paper Group NANR 192

Topical Coherence in LDA-based Models through Induced Segmentation. WING-NUS at SemEval-2017 Task 10: Keyphrase Extraction and Classification as Joint Sequence Labeling. Distributed Batch Gaussian Process Optimization. Probabilistic Submodular Maximization in Sub-Linear Time. Varying Linguistic Purposes of Emoji in (Twitter) Context. Learning the S …

Topical Coherence in LDA-based Models through Induced Segmentation

Title Topical Coherence in LDA-based Models through Induced Segmentation
Authors Hesam Amoualian, Wei Lu, Eric Gaussier, Georgios Balikas, Massih R. Amini, Marianne Clausel
Abstract This paper presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.
Tasks Ad-Hoc Information Retrieval, Information Retrieval, Text Classification, Topic Models
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-1165/
PDF https://www.aclweb.org/anthology/P17-1165
PWC https://paperswithcode.com/paper/topical-coherence-in-lda-based-models-through
Repo
Framework

WING-NUS at SemEval-2017 Task 10: Keyphrase Extraction and Classification as Joint Sequence Labeling

Title WING-NUS at SemEval-2017 Task 10: Keyphrase Extraction and Classification as Joint Sequence Labeling
Authors Animesh Prasad, Min-Yen Kan
Abstract We describe an end-to-end pipeline processing approach for SemEval 2017{'}s Task 10 to extract keyphrases and their relations from scientific publications. We jointly identify and classify keyphrases by modeling the subtasks as sequential labeling. Our system utilizes standard, surface-level features along with the adjacent word features, and performs conditional decoding on whole text to extract keyphrases. We focus only on the identification and typing of keyphrases (Subtasks A and B, together referred as extraction), but provide an end-to-end system inclusive of keyphrase relation identification (Subtask C) for completeness. Our top performing configuration achieves an $F_1$ of 0.27 for the end-to-end keyphrase extraction and relation identification scenario on the final test data, and compares on par to other top ranked systems for keyphrase extraction. Our system outperforms other techniques that do not employ global decoding and hence do not account for dependencies between keyphrases. We believe this is crucial for keyphrase classification in the given context of scientific document mining.
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2170/
PDF https://www.aclweb.org/anthology/S17-2170
PWC https://paperswithcode.com/paper/wing-nus-at-semeval-2017-task-10-keyphrase
Repo
Framework

Distributed Batch Gaussian Process Optimization

Title Distributed Batch Gaussian Process Optimization
Authors Erik A. Daxberger, Bryan Kian Hsiang Low
Abstract This paper presents a novel distributed batch Gaussian process upper confidence bound (DB-GP-UCB) algorithm for performing batch Bayesian optimization (BO) of highly complex, costly-to-evaluate black-box objective functions. In contrast to existing batch BO algorithms, DB-GP-UCB can jointly optimize a batch of inputs (as opposed to selecting the inputs of a batch one at a time) while still preserving scalability in the batch size. To realize this, we generalize GP-UCB to a new batch variant amenable to a Markov approximation, which can then be naturally formulated as a multi-agent distributed constraint optimization problem in order to fully exploit the efficiency of its state-of-the-art solvers for achieving linear time in the batch size. Our DB-GP-UCB algorithm offers practitioners the flexibility to trade off between the approximation quality and time efficiency by varying the Markov order. We provide a theoretical guarantee for the convergence rate of DB-GP-UCB via bounds on its cumulative regret. Empirical evaluation on synthetic benchmark objective functions and a real-world optimization problem shows that DB-GP-UCB outperforms the state-of-the-art batch BO algorithms.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=689
PDF http://proceedings.mlr.press/v70/daxberger17a/daxberger17a.pdf
PWC https://paperswithcode.com/paper/distributed-batch-gaussian-process
Repo
Framework

Probabilistic Submodular Maximization in Sub-Linear Time

Title Probabilistic Submodular Maximization in Sub-Linear Time
Authors Serban Stan, Morteza Zadimoghaddam, Andreas Krause, Amin Karbasi
Abstract In this paper, we consider optimizing submodular functions that are drawn from some unknown distribution. This setting arises, e.g., in recommender systems, where the utility of a subset of items may depend on a user-specific submodular utility function. In modern applications, the ground set of items is often so large that even the widely used (lazy) greedy algorithm is not efficient enough. As a remedy, we introduce the problem of sublinear time probabilistic submodular maximization: Given training examples of functions (e.g., via user feature vectors), we seek to reduce the ground set so that optimizing new functions drawn from the same distribution will provide almost as much value when restricted to the reduced ground set as when using the full set. We cast this problem as a two-stage submodular maximization and develop a novel efficient algorithm for this problem which offers $1/2(1 - 1/e^2)$ approximation ratio for general monotone submodular functions and general matroid constraints. We demonstrate the effectiveness of our approach on several real-world applications where running the maximization problem on the reduced ground set leads to two orders of magnitude speed-up while incurring almost no loss.
Tasks Recommendation Systems
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=840
PDF http://proceedings.mlr.press/v70/stan17a/stan17a.pdf
PWC https://paperswithcode.com/paper/probabilistic-submodular-maximization-in-sub
Repo
Framework

Varying Linguistic Purposes of Emoji in (Twitter) Context

Title Varying Linguistic Purposes of Emoji in (Twitter) Context
Authors Noa Na{'}aman, Hannah Provenza, Orion Montoya
Abstract
Tasks Feature Engineering, Word Embeddings
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-3022/
PDF https://www.aclweb.org/anthology/P17-3022
PWC https://paperswithcode.com/paper/varying-linguistic-purposes-of-emoji-in
Repo
Framework

Learning the Structure of Variable-Order CRFs: a finite-state perspective

Title Learning the Structure of Variable-Order CRFs: a finite-state perspective
Authors Thomas Lavergne, Fran{\c{c}}ois Yvon
Abstract The computational complexity of linear-chain Conditional Random Fields (CRFs) makes it difficult to deal with very large label sets and long range dependencies. Such situations are not rare and arise when dealing with morphologically rich languages or joint labelling tasks. We extend here recent proposals to consider variable order CRFs. Using an effective finite-state representation of variable-length dependencies, we propose new ways to perform feature selection at large scale and report experimental results where we outperform strong baselines on a tagging task.
Tasks Chunking, Feature Selection, Named Entity Recognition, Part-Of-Speech Tagging
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1044/
PDF https://www.aclweb.org/anthology/D17-1044
PWC https://paperswithcode.com/paper/learning-the-structure-of-variable-order-crfs
Repo
Framework

Non-lexical Features Encode Political Affiliation on Twitter

Title Non-lexical Features Encode Political Affiliation on Twitter
Authors Rachael Tatman, Leo Stewart, Am Paullada, alynne, Emma Spiro
Abstract Previous work on classifying Twitter users{'} political alignment has mainly focused on lexical and social network features. This study provides evidence that political affiliation is also reflected in features which have been previously overlooked: users{'} discourse patterns (proportion of Tweets that are retweets or replies) and their rate of use of capitalization and punctuation. We find robust differences between politically left- and right-leaning communities with respect to these discourse and sub-lexical features, although they are not enough to train a high-accuracy classifier.
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2909/
PDF https://www.aclweb.org/anthology/W17-2909
PWC https://paperswithcode.com/paper/non-lexical-features-encode-political
Repo
Framework

Joint CTC/attention decoding for end-to-end speech recognition

Title Joint CTC/attention decoding for end-to-end speech recognition
Authors Takaaki Hori, Shinji Watanabe, John Hershey
Abstract End-to-end automatic speech recognition (ASR) has become a popular alternative to conventional DNN/HMM systems because it avoids the need for linguistic resources such as pronunciation dictionary, tokenization, and context-dependency trees, leading to a greatly simplified model-building process. There are two major types of end-to-end architectures for ASR: attention-based methods use an attention mechanism to perform alignment between acoustic frames and recognized symbols, and connectionist temporal classification (CTC), uses Markov assumptions to efficiently solve sequential problems by dynamic programming. This paper proposes joint decoding algorithm for end-to-end ASR with a hybrid CTC/attention architecture, which effectively utilizes both advantages in decoding. We have applied the proposed method to two ASR benchmarks (spontaneous Japanese and Mandarin Chinese), and showing the comparable performance to conventional state-of-the-art DNN/HMM ASR systems without linguistic resources.
Tasks End-To-End Speech Recognition, Language Modelling, Speech Recognition, Tokenization
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-1048/
PDF https://www.aclweb.org/anthology/P17-1048
PWC https://paperswithcode.com/paper/joint-ctcattention-decoding-for-end-to-end
Repo
Framework

Draw and Tell: Multimodal Descriptions Outperform Verbal- or Sketch-Only Descriptions in an Image Retrieval Task

Title Draw and Tell: Multimodal Descriptions Outperform Verbal- or Sketch-Only Descriptions in an Image Retrieval Task
Authors Ting Han, David Schlangen
Abstract While language conveys meaning largely symbolically, actual communication acts typically contain iconic elements as well: People gesture while they speak, or may even draw sketches while explaining something. Image retrieval prima facie seems like a task that could profit from combined symbolic and iconic reference, but it is typically set up to work either from language only, or via (iconic) sketches with no verbal contribution. Using a model of grounded language semantics and a model of sketch-to-image mapping, we show that adding even very reduced iconic information to a verbal image description improves recall. Verbal descriptions paired with fully detailed sketches still perform better than these sketches alone. We see these results as supporting the assumption that natural user interfaces should respond to multimodal input, where possible, rather than just language alone.
Tasks Image Retrieval
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-2061/
PDF https://www.aclweb.org/anthology/I17-2061
PWC https://paperswithcode.com/paper/draw-and-tell-multimodal-descriptions
Repo
Framework

Multi-tape Computing with Synchronous Relations

Title Multi-tape Computing with Synchronous Relations
Authors Christian Wurm, Simon Petitjean
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4005/
PDF https://www.aclweb.org/anthology/W17-4005
PWC https://paperswithcode.com/paper/multi-tape-computing-with-synchronous
Repo
Framework

Modality Markers in Cebuano and Tagalog

Title Modality Markers in Cebuano and Tagalog
Authors Michael Tanangkingsing
Abstract
Tasks
Published 2017-11-01
URL https://www.aclweb.org/anthology/Y17-1005/
PDF https://www.aclweb.org/anthology/Y17-1005
PWC https://paperswithcode.com/paper/modality-markers-in-cebuano-and-tagalog
Repo
Framework

Empirically Sampling Universal Dependencies

Title Empirically Sampling Universal Dependencies
Authors Natalie Schluter, {\v{Z}}eljko Agi{'c}
Abstract
Tasks
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0415/
PDF https://www.aclweb.org/anthology/W17-0415
PWC https://paperswithcode.com/paper/empirically-sampling-universal-dependencies
Repo
Framework

Deep Multi-Task Learning for Aspect Term Extraction with Memory Interaction

Title Deep Multi-Task Learning for Aspect Term Extraction with Memory Interaction
Authors Xin Li, Wai Lam
Abstract We propose a novel LSTM-based deep multi-task learning framework for aspect term extraction from user review sentences. Two LSTMs equipped with extended memories and neural memory operations are designed for jointly handling the extraction tasks of aspects and opinions via memory interactions. Sentimental sentence constraint is also added for more accurate prediction via another LSTM. Experiment results over two benchmark datasets demonstrate the effectiveness of our framework.
Tasks Aspect-Based Sentiment Analysis, Multi-Task Learning, Sentiment Analysis
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1310/
PDF https://www.aclweb.org/anthology/D17-1310
PWC https://paperswithcode.com/paper/deep-multi-task-learning-for-aspect-term
Repo
Framework

Towards Producing Human-Validated Translation Resources for the Fula language through WordNet Linking

Title Towards Producing Human-Validated Translation Resources for the Fula language through WordNet Linking
Authors Khalil Mrini, Martin Benjamin
Abstract We propose methods to link automatically parsed linguistic data to the WordNet. We apply these methods on a trilingual dictionary in Fula, English and French. Dictionary entry parsing is used to collect the linguistic data. Then we connect it to the Open Multilingual WordNet (OMW) through two attempts, and use confidence scores to quantify accuracy. We obtained 11,000 entries in parsing and linked about 58{%} to the OMW on the first attempt, and an additional 14{%} in the second one. These links are due to be validated by Fula speakers before being added to the Kamusi Project{'}s database.
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-7908/
PDF https://doi.org/10.26615/978-954-452-042-7_008
PWC https://paperswithcode.com/paper/towards-producing-human-validated-translation
Repo
Framework

HCS at SemEval-2017 Task 5: Polarity detection in business news using convolutional neural networks

Title HCS at SemEval-2017 Task 5: Polarity detection in business news using convolutional neural networks
Authors Lidia Pivovarova, Lloren{\c{c}} Escoter, Arto Klami, Roman Yangarber
Abstract Task 5 of SemEval-2017 involves fine-grained sentiment analysis on financial microblogs and news. Our solution for determining the sentiment score extends an earlier convolutional neural network for sentiment analysis in several ways. We explicitly encode a focus on a particular company, we apply a data augmentation scheme, and use a larger data collection to complement the small training data provided by the task organizers. The best results were achieved by training a model on an external dataset and then tuning it using the provided training dataset.
Tasks Data Augmentation, Sentiment Analysis
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2143/
PDF https://www.aclweb.org/anthology/S17-2143
PWC https://paperswithcode.com/paper/hcs-at-semeval-2017-task-5-polarity-detection
Repo
Framework
comments powered by Disqus