May 5, 2019

1969 words 10 mins read

Paper Group NANR 23

Paper Group NANR 23

Development of a Bengali parser by cross-lingual transfer from Hindi. Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking. A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing. AIMU: Actionable Items for Meeting Understanding. Proceedings of the 4th BioNLP Shared Task Workshop. A Datase …

Development of a Bengali parser by cross-lingual transfer from Hindi

Title Development of a Bengali parser by cross-lingual transfer from Hindi
Authors Ayan Das, Agnivo Saha, Sudeshna Sarkar
Abstract In recent years there has been a lot of interest in cross-lingual parsing for developing treebanks for languages with small or no annotated treebanks. In this paper, we explore the development of a cross-lingual transfer parser from Hindi to Bengali using a Hindi parser and a Hindi-Bengali parallel corpus. A parser is trained and applied to the Hindi sentences of the parallel corpus and the parse trees are projected to construct probable parse trees of the corresponding Bengali sentences. Only about 14{%} of these trees are complete (transferred trees contain all the target sentence words) and they are used to construct a Bengali parser. We relax the criteria of completeness to consider well-formed trees (43{%} of the trees) leading to an improvement. We note that the words often do not have a one-to-one mapping in the two languages but considering sentences at the chunk-level results in better correspondence between the two languages. Based on this we present a method to use chunking as a preprocessing step and do the transfer on the chunk trees. We find that about 72{%} of the projected parse trees of Bengali are now well-formed. The resultant parser achieves significant improvement in both Unlabeled Attachment Score (UAS) as well as Labeled Attachment Score (LAS) over the baseline word-level transferred parser.
Tasks Chunking, Cross-Lingual Transfer
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3704/
PDF https://www.aclweb.org/anthology/W16-3704
PWC https://paperswithcode.com/paper/development-of-a-bengali-parser-by-cross
Repo
Framework

Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking

Title Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking
Authors Ayan Das, Agnivo Saha, Sudeshna Sarkar
Abstract
Tasks Chunking, Cross-Lingual Transfer
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-6313/
PDF https://www.aclweb.org/anthology/W16-6313
PWC https://paperswithcode.com/paper/cross-lingual-transfer-parser-from-hindi-to
Repo
Framework

A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing

Title A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing
Authors Xian-Ling Mao, Yi-Jing Hao, Qiang Zhou, Wen-Qing Yuan, Liner Yang, Heyan Huang
Abstract Recently, topic modeling has been widely applied in data mining due to its powerful ability. A common, major challenge in applying such topic models to other tasks is to accurately interpret the meaning of each topic. Topic labeling, as a major interpreting method, has attracted significant attention recently. However, most of previous works only focus on the effectiveness of topic labeling, and less attention has been paid to quickly creating good topic descriptors; meanwhile, it{'}s hard to assign labels for new emerging topics by using most of existing methods. To solve the problems above, in this paper, we propose a novel fast topic labeling framework that casts the labeling problem as a k-nearest neighbor (KNN) search problem in a probability vector set. Our experimental results show that the proposed sequential interleaving method based on locality sensitive hashing (LSH) technology is efficient in boosting the comparison speed among probability distributions, and the proposed framework can generate meaningful labels to interpret topics, including new emerging topics.
Tasks Chunking, Topic Models
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1315/
PDF https://www.aclweb.org/anthology/C16-1315
PWC https://paperswithcode.com/paper/a-novel-fast-framework-for-topic-labeling
Repo
Framework

AIMU: Actionable Items for Meeting Understanding

Title AIMU: Actionable Items for Meeting Understanding
Authors Yun-Nung Chen, Dilek Hakkani-T{"u}r
Abstract With emerging conversational data, automated content analysis is needed for better data interpretation, so that it is accurately understood and can be effectively integrated and utilized in various applications. ICSI meeting corpus is a publicly released data set of multi-party meetings in an organization that has been released over a decade ago, and has been fostering meeting understanding research since then. The original data collection includes transcription of participant turns as well as meta-data annotations, such as disfluencies and dialog act tags. This paper presents an extended set of annotations for the ICSI meeting corpus with a goal of deeply understanding meeting conversations, where participant turns are annotated by actionable items that could be performed by an automated meeting assistant. In addition to the user utterances that contain an actionable item, annotations also include the arguments associated with the actionable item. The set of actionable items are determined by aligning human-human interactions to human-machine interactions, where a data annotation schema designed for a virtual personal assistant (human-machine genre) is adapted to the meetings domain (human-human genre). The data set is formed by annotating participants{'} utterances in meetings with potential intents/actions considering their contexts. The set of actions target what could be accomplished by an automated meeting assistant, such as taking a note of action items that a participant commits to, or finding emails or topic related documents that were mentioned during the meeting. A total of 10 defined intents/actions are considered as actionable items in meetings. Turns that include actionable intents were annotated for 22 public ICSI meetings, that include a total of 21K utterances, segmented by speaker turns. Participants{'} spoken turns, possible actions along with associated arguments and their vector representations as computed by convolutional deep structured semantic models are included in the data set for future research. We present a detailed statistical analysis of the data set and analyze the performance of applying convolutional deep structured semantic models for an actionable item detection task. The data is available at http://research.microsoft.com/ projects/meetingunderstanding/.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1117/
PDF https://www.aclweb.org/anthology/L16-1117
PWC https://paperswithcode.com/paper/aimu-actionable-items-for-meeting
Repo
Framework

Proceedings of the 4th BioNLP Shared Task Workshop

Title Proceedings of the 4th BioNLP Shared Task Workshop
Authors
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-3000/
PDF https://www.aclweb.org/anthology/W16-3000
PWC https://paperswithcode.com/paper/proceedings-of-the-4th-bionlp-shared-task
Repo
Framework

A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage

Title A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage
Authors Thomas Lavergne, Aur{'e}lie N{'e}v{'e}ol, Aude Robert, Cyril Grouin, Gr{'e}goire Rey, Pierre Zweigenbaum
Abstract Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English. This paper describes a large-scale dataset prepared from French death certificates, and the problems which needed to be solved to turn it into a dataset suitable for the application of machine learning and natural language processing methods of ICD-10 coding. The dataset includes the free-text statements written by medical doctors, the associated meta-data, the human coder-assigned codes for each statement, as well as the statement segments which supported the coder{'}s decision for each code. The dataset comprises 93,694 death certificates totalling 276,103 statements and 377,677 ICD-10 code assignments (3,457 unique codes). It was made available for an international automated coding shared task, which attracted five participating teams. An extended version of the dataset will be used in a new edition of the shared task.
Tasks Named Entity Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-5107/
PDF https://www.aclweb.org/anthology/W16-5107
PWC https://paperswithcode.com/paper/a-dataset-for-icd-10-coding-of-death
Repo
Framework

Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

Title Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
Authors
Abstract
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-5400/
PDF https://www.aclweb.org/anthology/W16-5400
PWC https://paperswithcode.com/paper/proceedings-of-the-12th-workshop-on-asian
Repo
Framework

Building a Motivational Interviewing Dataset

Title Building a Motivational Interviewing Dataset
Authors Ver{'o}nica P{'e}rez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0305/
PDF https://www.aclweb.org/anthology/W16-0305
PWC https://paperswithcode.com/paper/building-a-motivational-interviewing-dataset
Repo
Framework

Detection, Disambiguation and Argument Identification of Discourse Connectives in Chinese Discourse Parsing

Title Detection, Disambiguation and Argument Identification of Discourse Connectives in Chinese Discourse Parsing
Authors Yong-Siang Shih, Hsin-Hsi Chen
Abstract In this paper, we investigate four important issues together for explicit discourse relation labelling in Chinese texts: (1) discourse connective extraction, (2) linking ambiguity resolution, (3) relation type disambiguation, and (4) argument boundary identification. In a pipelined Chinese discourse parser, we identify potential connective candidates by string matching, eliminate non-discourse usages from them with a binary classifier, resolve linking ambiguities among connective components by ranking, disambiguate relation types by a multiway classifier, and determine the argument boundaries by conditional random fields. The experiments on Chinese Discourse Treebank show that the F1 scores of 0.7506, 0.7693, 0.7458, and 0.3134 are achieved for discourse usage disambiguation, linking disambiguation, relation type disambiguation, and argument boundary identification, respectively, in a pipelined Chinese discourse parser.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1178/
PDF https://www.aclweb.org/anthology/C16-1178
PWC https://paperswithcode.com/paper/detection-disambiguation-and-argument
Repo
Framework

Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs

Title Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs
Authors Yu-Xiong Wang, Martial Hebert
Abstract This work explores CNNs for the recognition of novel categories from few examples. Inspired by the transferability properties of CNNs, we introduce an additional unsupervised meta-training stage that exposes multiple top layer units to a large amount of unlabeled real-world images. By encouraging these units to learn diverse sets of low-density separators across the unlabeled data, we capture a more generic, richer description of the visual world, which decouples these units from ties to a specific set of categories. We propose an unsupervised margin maximization that jointly estimates compact high-density regions and infers low-density separators. The low-density separator (LDS) modules can be plugged into any or all of the top layers of a standard CNN architecture. The resulting CNNs significantly improve the performance in scene classification, fine-grained recognition, and action recognition with small training samples.
Tasks Scene Classification, Temporal Action Localization
Published 2016-12-01
URL http://papers.nips.cc/paper/6408-learning-from-small-sample-sets-by-combining-unsupervised-meta-training-with-cnns
PDF http://papers.nips.cc/paper/6408-learning-from-small-sample-sets-by-combining-unsupervised-meta-training-with-cnns.pdf
PWC https://paperswithcode.com/paper/learning-from-small-sample-sets-by-combining
Repo
Framework

MixKMeans: Clustering Question-Answer Archives

Title MixKMeans: Clustering Question-Answer Archives
Authors Deepak P
Abstract
Tasks Question Answering, Semantic Textual Similarity
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1164/
PDF https://www.aclweb.org/anthology/D16-1164
PWC https://paperswithcode.com/paper/mixkmeans-clustering-question-answer-archives
Repo
Framework

Filter and Match Approach to Pair-wise Web URI Linking

Title Filter and Match Approach to Pair-wise Web URI Linking
Authors S. Shivashankar, Yitong Li, Afshin Rahimi
Abstract
Tasks Document Classification, Machine Translation, Semantic Textual Similarity
Published 2016-12-01
URL https://www.aclweb.org/anthology/U16-1022/
PDF https://www.aclweb.org/anthology/U16-1022
PWC https://paperswithcode.com/paper/filter-and-match-approach-to-pair-wise-web
Repo
Framework

Knowledge-Based Semantic Embedding for Machine Translation

Title Knowledge-Based Semantic Embedding for Machine Translation
Authors Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu Sun, Houfeng Wang
Abstract
Tasks Machine Translation
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1212/
PDF https://www.aclweb.org/anthology/P16-1212
PWC https://paperswithcode.com/paper/knowledge-based-semantic-embedding-for
Repo
Framework

Answering Yes-No Questions by Penalty Scoring in History Subjects of University Entrance Examinations

Title Answering Yes-No Questions by Penalty Scoring in History Subjects of University Entrance Examinations
Authors Yoshinobu Kano
Abstract Answering yes{–}no questions is more difficult than simply retrieving ranked search results. To answer yes{–}no questions, especially when the correct answer is no, one must find an objectionable keyword that makes the question{'}s answer no. Existing systems, such as factoid-based ones, cannot answer yes{–}no questions very well because of insufficient handling of such objectionable keywords. We suggest an algorithm that answers yes{–}no questions by assigning an importance to objectionable keywords. Concretely speaking, we suggest a penalized scoring method that finds and makes lower score for parts of documents that include such objectionable keywords. We check a keyword distribution for each part of a document such as a paragraph, calculating the keyword density as a basic score. Then we use an objectionable keyword penalty when a keyword does not appear in a target part but appears in other parts of the document. Our algorithm is robust for open domain problems because it requires no training. We achieved 4.45 point better results in F1 scores than the best score of the NTCIR-10 RITE2 shared task, also obtained the best score in 2014 mock university examination challenge of the Todai Robot project.
Tasks Question Answering
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4413/
PDF https://www.aclweb.org/anthology/W16-4413
PWC https://paperswithcode.com/paper/answering-yes-no-questions-by-penalty-scoring
Repo
Framework

Learning Paraphrasing for Multiword Expressions

Title Learning Paraphrasing for Multiword Expressions
Authors Seid Muhie Yimam, H{'e}ctor Mart{'\i}nez Alonso, Martin Riedl, Chris Biemann
Abstract
Tasks Learning-To-Rank, Machine Translation, Natural Language Inference, Text Simplification, Word Sense Disambiguation
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1801/
PDF https://www.aclweb.org/anthology/W16-1801
PWC https://paperswithcode.com/paper/learning-paraphrasing-for-multiword
Repo
Framework
comments powered by Disqus