May 5, 2019

2116 words 10 mins read

Paper Group NANR 36

Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis. IKE - An Interactive Tool for Knowledge Extraction. Farasa: A New Fast and Accurate Arabic Word Segmenter. Proceedings of the 15th Workshop on Biomedical Natural Language Processing. A Morphological Lexicon of Esperanto with Morpheme Frequencies. Neural Headline Gene …

Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis


Title	Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis
Authors	Yoshinobu Kawahara
Abstract	A spectral analysis of the Koopman operator, which is an infinite dimensional linear operator on an observable, gives a (modal) description of the global behavior of a nonlinear dynamical system without any explicit prior knowledge of its governing equations. In this paper, we consider a spectral analysis of the Koopman operator in a reproducing kernel Hilbert space (RKHS). We propose a modal decomposition algorithm to perform the analysis using finite-length data sequences generated from a nonlinear system. The algorithm is in essence reduced to the calculation of a set of orthogonal bases for the Krylov matrix in RKHS and the eigendecomposition of the projection of the Koopman operator onto the subspace spanned by the bases. The algorithm returns a decomposition of the dynamics into a finite number of modes, and thus it can be thought of as a feature extraction procedure for a nonlinear dynamical system. Therefore, we further consider applications in machine learning using extracted features with the presented analysis. We illustrate the method on the applications using synthetic and real-world data.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6583-dynamic-mode-decomposition-with-reproducing-kernels-for-koopman-spectral-analysis
PDF	http://papers.nips.cc/paper/6583-dynamic-mode-decomposition-with-reproducing-kernels-for-koopman-spectral-analysis.pdf
PWC	https://paperswithcode.com/paper/dynamic-mode-decomposition-with-reproducing
Repo
Framework

IKE - An Interactive Tool for Knowledge Extraction


Title	IKE - An Interactive Tool for Knowledge Extraction
Authors	Bhavana Dalvi, Sumithra Bhakthavatsalam, Chris Clark, Peter Clark, Oren Etzioni, Anthony Fader, Dirk Groeneveld
Abstract
Tasks	Chunking, Slot Filling
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-1303/
PDF	https://www.aclweb.org/anthology/W16-1303
PWC	https://paperswithcode.com/paper/ike-an-interactive-tool-for-knowledge
Repo
Framework

Farasa: A New Fast and Accurate Arabic Word Segmenter


Title	Farasa: A New Fast and Accurate Arabic Word Segmenter
Authors	Kareem Darwish, Hamdy Mubarak
Abstract	In this paper, we present Farasa (meaning insight in Arabic), which is a fast and accurate Arabic segmenter. Segmentation involves breaking Arabic words into their constituent clitics. Our approach is based on SVMrank using linear kernels. The features that we utilized account for: likelihood of stems, prefixes, suffixes, and their combination; presence in lexicons containing valid stems and named entities; and underlying stem templates. Farasa outperforms or equalizes state-of-the-art Arabic segmenters, namely QATARA and MADAMIRA. Meanwhile, Farasa is nearly one order of magnitude faster than QATARA and two orders of magnitude faster than MADAMIRA. The segmenter should be able to process one billion words in less than 5 hours. Farasa is written entirely in native Java, with no external dependencies, and is open-source.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1170/
PDF	https://www.aclweb.org/anthology/L16-1170
PWC	https://paperswithcode.com/paper/farasa-a-new-fast-and-accurate-arabic-word
Repo
Framework

Proceedings of the 15th Workshop on Biomedical Natural Language Processing


Title	Proceedings of the 15th Workshop on Biomedical Natural Language Processing
Authors
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2900/
PDF	https://www.aclweb.org/anthology/W16-2900
PWC	https://paperswithcode.com/paper/proceedings-of-the-15th-workshop-on
Repo
Framework

A Morphological Lexicon of Esperanto with Morpheme Frequencies


Title	A Morphological Lexicon of Esperanto with Morpheme Frequencies
Authors	Eckhard Bick
Abstract	This paper discusses the internal structure of complex Esperanto words (CWs). Using a morphological analyzer, possible affixation and compounding is checked for over 50,000 Esperanto lexemes against a list of 17,000 root words. Morpheme boundaries in the resulting analyses were then checked manually, creating a CW dictionary of 28,000 words, representing 56.4{%} of the lexicon, or 19.4{%} of corpus tokens. The error percentage of the EspGram morphological analyzer for new corpus CWs was 4.3{%} for types and 6.4{%} for tokens, with a recall of almost 100{%}, and wrong/spurious boundaries being more common than missing ones. For pedagogical purposes a morpheme frequency dictionary was constructed for a 16 million word corpus, confirming the importance of agglutinative derivational morphemes in the Esperanto lexicon. Finally, as a means to reduce the morphological ambiguity of CWs, we provide POS likelihoods for Esperanto suffixes.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1171/
PDF	https://www.aclweb.org/anthology/L16-1171
PWC	https://paperswithcode.com/paper/a-morphological-lexicon-of-esperanto-with
Repo
Framework

Neural Headline Generation on Abstract Meaning Representation


Title	Neural Headline Generation on Abstract Meaning Representation
Authors	Sho Takase, Jun Suzuki, Naoaki Okazaki, Tsutomu Hirao, Masaaki Nagata
Abstract
Tasks	Dependency Parsing, Image Captioning, Language Modelling, Machine Translation, Named Entity Recognition, Semantic Role Labeling, Text Generation, Video Description
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1112/
PDF	https://www.aclweb.org/anthology/D16-1112
PWC	https://paperswithcode.com/paper/neural-headline-generation-on-abstract
Repo
Framework

South African Language Resources: Phrase Chunking


Title	South African Language Resources: Phrase Chunking
Authors	Roald Eiselen
Abstract	Phrase chunking remains an important natural language processing (NLP) technique for intermediate syntactic processing. This paper describes the development of protocols, annotated phrase chunking data sets and automatic phrase chunkers for ten South African languages. Various problems with adapting the existing annotation protocols of English are discussed as well as an overview of the annotated data sets. Based on the annotated sets, CRF-based phrase chunkers are created and tested with a combination of different features, including part of speech tags and character n-grams. The results of the phrase chunking evaluation show that disjunctively written languages can achieve notably better results for phrase chunking with a limited data set than conjunctive languages, but that the addition of character n-grams improve the results for conjunctive languages.
Tasks	Chunking
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1109/
PDF	https://www.aclweb.org/anthology/L16-1109
PWC	https://paperswithcode.com/paper/south-african-language-resources-phrase
Repo
Framework

基於字元階層之語音合成用文脈訊息擷取 (Character-Level Linguistic Features Extraction for Text-to-Speech System) [In Chinese]


Title	基於字元階層之語音合成用文脈訊息擷取 (Character-Level Linguistic Features Extraction for Text-to-Speech System) [In Chinese]
Authors	Kuan-Hung Chen, Shu-Han Liao, Yuan-Fu Liao, Yih-Ru Wang
Abstract
Tasks	Feature Engineering, Speech Synthesis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/O16-3005/
PDF	https://www.aclweb.org/anthology/O16-3005
PWC	https://paperswithcode.com/paper/ao14aaeaa1eae3ac-ee-a-character-level
Repo
Framework

Validating bundled gap filling – Empirical evidence for ambiguity reduction and language proficiency testing capabilities


Title	Validating bundled gap filling – Empirical evidence for ambiguity reduction and language proficiency testing capabilities
Authors	Niklas Meyer, Michael Wojatzki, Torsten Zesch
Abstract
Tasks
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-6507/
PDF	https://www.aclweb.org/anthology/W16-6507
PWC	https://paperswithcode.com/paper/validating-bundled-gap-filling-a-empirical
Repo
Framework

ITNLP-AiKF at SemEval-2016 Task 3 a quesiton answering system using community QA repository


Title	ITNLP-AiKF at SemEval-2016 Task 3 a quesiton answering system using community QA repository
Authors	Chang{'}e Jia
Abstract
Tasks	Answer Selection, Community Question Answering, Question Answering, Question Similarity, Semantic Textual Similarity, Topic Models
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1139/
PDF	https://www.aclweb.org/anthology/S16-1139
PWC	https://paperswithcode.com/paper/itnlp-aikf-at-semeval-2016-task-3-a-quesiton
Repo
Framework

Fast Distributed Submodular Cover: Public-Private Data Summarization


Title	Fast Distributed Submodular Cover: Public-Private Data Summarization
Authors	Baharan Mirzasoleiman, Morteza Zadimoghaddam, Amin Karbasi
Abstract	In this paper, we introduce the public-private framework of data summarization motivated by privacy concerns in personalized recommender systems and online social services. Such systems have usually access to massive data generated by a large pool of users. A major fraction of the data is public and is visible to (and can be used for) all users. However, each user can also contribute some private data that should not be shared with other users to ensure her privacy. The goal is to provide a succinct summary of massive dataset, ideally as small as possible, from which customized summaries can be built for each user, i.e. it can contain elements from the public data (for diversity) and users’ private data (for personalization). To formalize the above challenge, we assume that the scoring function according to which a user evaluates the utility of her summary satisfies submodularity, a widely used notion in data summarization applications. Thus, we model the data summarization targeted to each user as an instance of a submodular cover problem. However, when the data is massive it is infeasible to use the centralized greedy algorithm to find a customized summary even for a single user. Moreover, for a large pool of users, it is too time consuming to find such summaries separately. Instead, we develop a fast distributed algorithm for submodular cover, FASTCOVER, that provides a succinct summary in one shot and for all users. We show that the solution provided by FASTCOVER is competitive with that of the centralized algorithm with the number of rounds that is exponentially smaller than state of the art results. Moreover, we have implemented FASTCOVER with Spark to demonstrate its practical performance on a number of concrete applications, including personalized location recommendation, personalized movie recommendation, and dominating set on tens of millions of data points and varying number of users.
Tasks	Data Summarization, Recommendation Systems
Published	2016-12-01
URL	http://papers.nips.cc/paper/6540-fast-distributed-submodular-cover-public-private-data-summarization
PDF	http://papers.nips.cc/paper/6540-fast-distributed-submodular-cover-public-private-data-summarization.pdf
PWC	https://paperswithcode.com/paper/fast-distributed-submodular-cover-public
Repo
Framework

Convolution-Enhanced Bilingual Recursive Neural Network for Bilingual Semantic Modeling


Title	Convolution-Enhanced Bilingual Recursive Neural Network for Bilingual Semantic Modeling
Authors	Jinsong Su, Biao Zhang, Deyi Xiong, Ruochen Li, Jianmin Yin
Abstract	Estimating similarities at different levels of linguistic units, such as words, sub-phrases and phrases, is helpful for measuring semantic similarity of an entire bilingual phrase. In this paper, we propose a convolution-enhanced bilingual recursive neural network (ConvBRNN), which not only exploits word alignments to guide the generation of phrase structures but also integrates multiple-level information of the generated phrase structures into bilingual semantic modeling. In order to accurately learn the semantic hierarchy of a bilingual phrase, we develop a recursive neural network to constrain the learned bilingual phrase structures to be consistent with word alignments. Upon the generated source and target phrase structures, we stack a convolutional neural network to integrate vector representations of linguistic units on the structures into bilingual phrase embeddings. After that, we fully incorporate information of different linguistic units into a bilinear semantic similarity model. We introduce two max-margin losses to train the ConvBRNN model: one for the phrase structure inference and the other for the semantic similarity model. Experiments on NIST Chinese-English translation tasks demonstrate the high quality of the generated bilingual phrase structures with respect to word alignments and the effectiveness of learned semantic similarities on machine translation.
Tasks	Machine Translation, Semantic Similarity, Semantic Textual Similarity
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1289/
PDF	https://www.aclweb.org/anthology/C16-1289
PWC	https://paperswithcode.com/paper/convolution-enhanced-bilingual-recursive
Repo
Framework

Building Content-driven Entity Networks for Scarce Scientific Literature using Content Information


Title	Building Content-driven Entity Networks for Scarce Scientific Literature using Content Information
Authors	Reinald Kim Amplayo, Min Song
Abstract	This paper proposes several network construction methods for collections of scarce scientific literature data. We define scarcity as lacking in value and in volume. Instead of using the paper{'}s metadata to construct several kinds of scientific networks, we use the full texts of the articles and automatically extract the entities needed to construct the networks. Specifically, we present seven kinds of networks using the proposed construction methods: co-occurrence networks for author, keyword, and biological entities, and citation networks for author, keyword, biological, and topic entities. We show two case studies that applies our proposed methods: CADASIL, a rare yet the most common form of hereditary stroke disorder, and Metformin, the first-line medication to the type 2 diabetes treatment. We apply our proposed method to four different applications for evaluation: finding prolific authors, finding important bio-entities, finding meaningful keywords, and discovering influential topics. The results show that the co-occurrence and citation networks constructed using the proposed method outperforms the traditional-based networks. We also compare our proposed networks to traditional citation networks constructed using enough data and infer that even with the same amount of enough data, our methods perform comparably or better than the traditional methods.
Tasks	Entity Extraction
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5103/
PDF	https://www.aclweb.org/anthology/W16-5103
PWC	https://paperswithcode.com/paper/building-content-driven-entity-networks-for
Repo
Framework

Extracting Spatial Entities and Relations in Korean Text


Title	Extracting Spatial Entities and Relations in Korean Text
Authors	Bogyum Kim, Jae Sung Lee
Abstract	A spatial information extraction system retrieves spatial entities and their relationships for geological searches and reasoning. Spatial information systems have been developed mainly for English text, e.g., through the SpaceEval competition. Some of the techniques are useful but not directly applicable to Korean text, because of linguistic differences and the lack of language resources. In this paper, we propose a Korean spatial entity extraction model and a spatial relation extraction model; the spatial entity extraction model uses word vectors to alleviate the over generation and the spatial relation extraction mod-el uses dependency parse labels to find the proper arguments in relations. Experiments with Korean text show that the two models are effective for spatial information extraction.
Tasks	Entity Extraction, Named Entity Recognition, Question Answering, Relation Extraction, Robot Navigation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1225/
PDF	https://www.aclweb.org/anthology/C16-1225
PWC	https://paperswithcode.com/paper/extracting-spatial-entities-and-relations-in
Repo
Framework

Experiments in Candidate Phrase Selection for Financial Named Entity Extraction - A Demo


Title	Experiments in Candidate Phrase Selection for Financial Named Entity Extraction - A Demo
Authors	Aman Kumar, Hassan Alam, Tina Werner, Manan Vyas
Abstract	In this study we develop a system that tags and extracts financial concepts called financial named entities (FNE) along with corresponding numeric values {–} monetary and temporal. We employ machine learning and natural language processing methods to identify financial concepts and dates, and link them to numerical entities.
Tasks	Entity Extraction
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2010/
PDF	https://www.aclweb.org/anthology/C16-2010
PWC	https://paperswithcode.com/paper/experiments-in-candidate-phrase-selection-for
Repo
Framework