Paper Group NANR 36
Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis. IKE - An Interactive Tool for Knowledge Extraction. Farasa: A New Fast and Accurate Arabic Word Segmenter. Proceedings of the 15th Workshop on Biomedical Natural Language Processing. A Morphological Lexicon of Esperanto with Morpheme Frequencies. Neural Headline Gene …
Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis
Title | Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis |
Authors | Yoshinobu Kawahara |
Abstract | A spectral analysis of the Koopman operator, which is an infinite dimensional linear operator on an observable, gives a (modal) description of the global behavior of a nonlinear dynamical system without any explicit prior knowledge of its governing equations. In this paper, we consider a spectral analysis of the Koopman operator in a reproducing kernel Hilbert space (RKHS). We propose a modal decomposition algorithm to perform the analysis using finite-length data sequences generated from a nonlinear system. The algorithm is in essence reduced to the calculation of a set of orthogonal bases for the Krylov matrix in RKHS and the eigendecomposition of the projection of the Koopman operator onto the subspace spanned by the bases. The algorithm returns a decomposition of the dynamics into a finite number of modes, and thus it can be thought of as a feature extraction procedure for a nonlinear dynamical system. Therefore, we further consider applications in machine learning using extracted features with the presented analysis. We illustrate the method on the applications using synthetic and real-world data. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6583-dynamic-mode-decomposition-with-reproducing-kernels-for-koopman-spectral-analysis |
http://papers.nips.cc/paper/6583-dynamic-mode-decomposition-with-reproducing-kernels-for-koopman-spectral-analysis.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-mode-decomposition-with-reproducing |
Repo | |
Framework | |
IKE - An Interactive Tool for Knowledge Extraction
Title | IKE - An Interactive Tool for Knowledge Extraction |
Authors | Bhavana Dalvi, Sumithra Bhakthavatsalam, Chris Clark, Peter Clark, Oren Etzioni, Anthony Fader, Dirk Groeneveld |
Abstract | |
Tasks | Chunking, Slot Filling |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-1303/ |
https://www.aclweb.org/anthology/W16-1303 | |
PWC | https://paperswithcode.com/paper/ike-an-interactive-tool-for-knowledge |
Repo | |
Framework | |
Farasa: A New Fast and Accurate Arabic Word Segmenter
Title | Farasa: A New Fast and Accurate Arabic Word Segmenter |
Authors | Kareem Darwish, Hamdy Mubarak |
Abstract | In this paper, we present Farasa (meaning insight in Arabic), which is a fast and accurate Arabic segmenter. Segmentation involves breaking Arabic words into their constituent clitics. Our approach is based on SVMrank using linear kernels. The features that we utilized account for: likelihood of stems, prefixes, suffixes, and their combination; presence in lexicons containing valid stems and named entities; and underlying stem templates. Farasa outperforms or equalizes state-of-the-art Arabic segmenters, namely QATARA and MADAMIRA. Meanwhile, Farasa is nearly one order of magnitude faster than QATARA and two orders of magnitude faster than MADAMIRA. The segmenter should be able to process one billion words in less than 5 hours. Farasa is written entirely in native Java, with no external dependencies, and is open-source. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1170/ |
https://www.aclweb.org/anthology/L16-1170 | |
PWC | https://paperswithcode.com/paper/farasa-a-new-fast-and-accurate-arabic-word |
Repo | |
Framework | |
Proceedings of the 15th Workshop on Biomedical Natural Language Processing
Title | Proceedings of the 15th Workshop on Biomedical Natural Language Processing |
Authors | |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2900/ |
https://www.aclweb.org/anthology/W16-2900 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-15th-workshop-on |
Repo | |
Framework | |
A Morphological Lexicon of Esperanto with Morpheme Frequencies
Title | A Morphological Lexicon of Esperanto with Morpheme Frequencies |
Authors | Eckhard Bick |
Abstract | This paper discusses the internal structure of complex Esperanto words (CWs). Using a morphological analyzer, possible affixation and compounding is checked for over 50,000 Esperanto lexemes against a list of 17,000 root words. Morpheme boundaries in the resulting analyses were then checked manually, creating a CW dictionary of 28,000 words, representing 56.4{%} of the lexicon, or 19.4{%} of corpus tokens. The error percentage of the EspGram morphological analyzer for new corpus CWs was 4.3{%} for types and 6.4{%} for tokens, with a recall of almost 100{%}, and wrong/spurious boundaries being more common than missing ones. For pedagogical purposes a morpheme frequency dictionary was constructed for a 16 million word corpus, confirming the importance of agglutinative derivational morphemes in the Esperanto lexicon. Finally, as a means to reduce the morphological ambiguity of CWs, we provide POS likelihoods for Esperanto suffixes. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1171/ |
https://www.aclweb.org/anthology/L16-1171 | |
PWC | https://paperswithcode.com/paper/a-morphological-lexicon-of-esperanto-with |
Repo | |
Framework | |
Neural Headline Generation on Abstract Meaning Representation
Title | Neural Headline Generation on Abstract Meaning Representation |
Authors | Sho Takase, Jun Suzuki, Naoaki Okazaki, Tsutomu Hirao, Masaaki Nagata |
Abstract | |
Tasks | Dependency Parsing, Image Captioning, Language Modelling, Machine Translation, Named Entity Recognition, Semantic Role Labeling, Text Generation, Video Description |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1112/ |
https://www.aclweb.org/anthology/D16-1112 | |
PWC | https://paperswithcode.com/paper/neural-headline-generation-on-abstract |
Repo | |
Framework | |
South African Language Resources: Phrase Chunking
Title | South African Language Resources: Phrase Chunking |
Authors | Roald Eiselen |
Abstract | Phrase chunking remains an important natural language processing (NLP) technique for intermediate syntactic processing. This paper describes the development of protocols, annotated phrase chunking data sets and automatic phrase chunkers for ten South African languages. Various problems with adapting the existing annotation protocols of English are discussed as well as an overview of the annotated data sets. Based on the annotated sets, CRF-based phrase chunkers are created and tested with a combination of different features, including part of speech tags and character n-grams. The results of the phrase chunking evaluation show that disjunctively written languages can achieve notably better results for phrase chunking with a limited data set than conjunctive languages, but that the addition of character n-grams improve the results for conjunctive languages. |
Tasks | Chunking |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1109/ |
https://www.aclweb.org/anthology/L16-1109 | |
PWC | https://paperswithcode.com/paper/south-african-language-resources-phrase |
Repo | |
Framework | |
基於字元階層之語音合成用文脈訊息擷取 (Character-Level Linguistic Features Extraction for Text-to-Speech System) [In Chinese]
Title | 基於字元階層之語音合成用文脈訊息擷取 (Character-Level Linguistic Features Extraction for Text-to-Speech System) [In Chinese] |
Authors | Kuan-Hung Chen, Shu-Han Liao, Yuan-Fu Liao, Yih-Ru Wang |
Abstract | |
Tasks | Feature Engineering, Speech Synthesis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/O16-3005/ |
https://www.aclweb.org/anthology/O16-3005 | |
PWC | https://paperswithcode.com/paper/ao14aaeaa1eae3ac-ee-a-character-level |
Repo | |
Framework | |
Validating bundled gap filling – Empirical evidence for ambiguity reduction and language proficiency testing capabilities
Title | Validating bundled gap filling – Empirical evidence for ambiguity reduction and language proficiency testing capabilities |
Authors | Niklas Meyer, Michael Wojatzki, Torsten Zesch |
Abstract | |
Tasks | |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/W16-6507/ |
https://www.aclweb.org/anthology/W16-6507 | |
PWC | https://paperswithcode.com/paper/validating-bundled-gap-filling-a-empirical |
Repo | |
Framework | |
ITNLP-AiKF at SemEval-2016 Task 3 a quesiton answering system using community QA repository
Title | ITNLP-AiKF at SemEval-2016 Task 3 a quesiton answering system using community QA repository |
Authors | Chang{'}e Jia |
Abstract | |
Tasks | Answer Selection, Community Question Answering, Question Answering, Question Similarity, Semantic Textual Similarity, Topic Models |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1139/ |
https://www.aclweb.org/anthology/S16-1139 | |
PWC | https://paperswithcode.com/paper/itnlp-aikf-at-semeval-2016-task-3-a-quesiton |
Repo | |
Framework | |
Fast Distributed Submodular Cover: Public-Private Data Summarization
Title | Fast Distributed Submodular Cover: Public-Private Data Summarization |
Authors | Baharan Mirzasoleiman, Morteza Zadimoghaddam, Amin Karbasi |
Abstract | In this paper, we introduce the public-private framework of data summarization motivated by privacy concerns in personalized recommender systems and online social services. Such systems have usually access to massive data generated by a large pool of users. A major fraction of the data is public and is visible to (and can be used for) all users. However, each user can also contribute some private data that should not be shared with other users to ensure her privacy. The goal is to provide a succinct summary of massive dataset, ideally as small as possible, from which customized summaries can be built for each user, i.e. it can contain elements from the public data (for diversity) and users’ private data (for personalization). To formalize the above challenge, we assume that the scoring function according to which a user evaluates the utility of her summary satisfies submodularity, a widely used notion in data summarization applications. Thus, we model the data summarization targeted to each user as an instance of a submodular cover problem. However, when the data is massive it is infeasible to use the centralized greedy algorithm to find a customized summary even for a single user. Moreover, for a large pool of users, it is too time consuming to find such summaries separately. Instead, we develop a fast distributed algorithm for submodular cover, FASTCOVER, that provides a succinct summary in one shot and for all users. We show that the solution provided by FASTCOVER is competitive with that of the centralized algorithm with the number of rounds that is exponentially smaller than state of the art results. Moreover, we have implemented FASTCOVER with Spark to demonstrate its practical performance on a number of concrete applications, including personalized location recommendation, personalized movie recommendation, and dominating set on tens of millions of data points and varying number of users. |
Tasks | Data Summarization, Recommendation Systems |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6540-fast-distributed-submodular-cover-public-private-data-summarization |
http://papers.nips.cc/paper/6540-fast-distributed-submodular-cover-public-private-data-summarization.pdf | |
PWC | https://paperswithcode.com/paper/fast-distributed-submodular-cover-public |
Repo | |
Framework | |
Convolution-Enhanced Bilingual Recursive Neural Network for Bilingual Semantic Modeling
Title | Convolution-Enhanced Bilingual Recursive Neural Network for Bilingual Semantic Modeling |
Authors | Jinsong Su, Biao Zhang, Deyi Xiong, Ruochen Li, Jianmin Yin |
Abstract | Estimating similarities at different levels of linguistic units, such as words, sub-phrases and phrases, is helpful for measuring semantic similarity of an entire bilingual phrase. In this paper, we propose a convolution-enhanced bilingual recursive neural network (ConvBRNN), which not only exploits word alignments to guide the generation of phrase structures but also integrates multiple-level information of the generated phrase structures into bilingual semantic modeling. In order to accurately learn the semantic hierarchy of a bilingual phrase, we develop a recursive neural network to constrain the learned bilingual phrase structures to be consistent with word alignments. Upon the generated source and target phrase structures, we stack a convolutional neural network to integrate vector representations of linguistic units on the structures into bilingual phrase embeddings. After that, we fully incorporate information of different linguistic units into a bilinear semantic similarity model. We introduce two max-margin losses to train the ConvBRNN model: one for the phrase structure inference and the other for the semantic similarity model. Experiments on NIST Chinese-English translation tasks demonstrate the high quality of the generated bilingual phrase structures with respect to word alignments and the effectiveness of learned semantic similarities on machine translation. |
Tasks | Machine Translation, Semantic Similarity, Semantic Textual Similarity |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1289/ |
https://www.aclweb.org/anthology/C16-1289 | |
PWC | https://paperswithcode.com/paper/convolution-enhanced-bilingual-recursive |
Repo | |
Framework | |
Building Content-driven Entity Networks for Scarce Scientific Literature using Content Information
Title | Building Content-driven Entity Networks for Scarce Scientific Literature using Content Information |
Authors | Reinald Kim Amplayo, Min Song |
Abstract | This paper proposes several network construction methods for collections of scarce scientific literature data. We define scarcity as lacking in value and in volume. Instead of using the paper{'}s metadata to construct several kinds of scientific networks, we use the full texts of the articles and automatically extract the entities needed to construct the networks. Specifically, we present seven kinds of networks using the proposed construction methods: co-occurrence networks for author, keyword, and biological entities, and citation networks for author, keyword, biological, and topic entities. We show two case studies that applies our proposed methods: CADASIL, a rare yet the most common form of hereditary stroke disorder, and Metformin, the first-line medication to the type 2 diabetes treatment. We apply our proposed method to four different applications for evaluation: finding prolific authors, finding important bio-entities, finding meaningful keywords, and discovering influential topics. The results show that the co-occurrence and citation networks constructed using the proposed method outperforms the traditional-based networks. We also compare our proposed networks to traditional citation networks constructed using enough data and infer that even with the same amount of enough data, our methods perform comparably or better than the traditional methods. |
Tasks | Entity Extraction |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5103/ |
https://www.aclweb.org/anthology/W16-5103 | |
PWC | https://paperswithcode.com/paper/building-content-driven-entity-networks-for |
Repo | |
Framework | |
Extracting Spatial Entities and Relations in Korean Text
Title | Extracting Spatial Entities and Relations in Korean Text |
Authors | Bogyum Kim, Jae Sung Lee |
Abstract | A spatial information extraction system retrieves spatial entities and their relationships for geological searches and reasoning. Spatial information systems have been developed mainly for English text, e.g., through the SpaceEval competition. Some of the techniques are useful but not directly applicable to Korean text, because of linguistic differences and the lack of language resources. In this paper, we propose a Korean spatial entity extraction model and a spatial relation extraction model; the spatial entity extraction model uses word vectors to alleviate the over generation and the spatial relation extraction mod-el uses dependency parse labels to find the proper arguments in relations. Experiments with Korean text show that the two models are effective for spatial information extraction. |
Tasks | Entity Extraction, Named Entity Recognition, Question Answering, Relation Extraction, Robot Navigation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1225/ |
https://www.aclweb.org/anthology/C16-1225 | |
PWC | https://paperswithcode.com/paper/extracting-spatial-entities-and-relations-in |
Repo | |
Framework | |
Experiments in Candidate Phrase Selection for Financial Named Entity Extraction - A Demo
Title | Experiments in Candidate Phrase Selection for Financial Named Entity Extraction - A Demo |
Authors | Aman Kumar, Hassan Alam, Tina Werner, Manan Vyas |
Abstract | In this study we develop a system that tags and extracts financial concepts called financial named entities (FNE) along with corresponding numeric values {–} monetary and temporal. We employ machine learning and natural language processing methods to identify financial concepts and dates, and link them to numerical entities. |
Tasks | Entity Extraction |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2010/ |
https://www.aclweb.org/anthology/C16-2010 | |
PWC | https://paperswithcode.com/paper/experiments-in-candidate-phrase-selection-for |
Repo | |
Framework | |