Paper Group NANR 23
Development of a Bengali parser by cross-lingual transfer from Hindi. Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking. A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing. AIMU: Actionable Items for Meeting Understanding. Proceedings of the 4th BioNLP Shared Task Workshop. A Datase …
Development of a Bengali parser by cross-lingual transfer from Hindi
Title | Development of a Bengali parser by cross-lingual transfer from Hindi |
Authors | Ayan Das, Agnivo Saha, Sudeshna Sarkar |
Abstract | In recent years there has been a lot of interest in cross-lingual parsing for developing treebanks for languages with small or no annotated treebanks. In this paper, we explore the development of a cross-lingual transfer parser from Hindi to Bengali using a Hindi parser and a Hindi-Bengali parallel corpus. A parser is trained and applied to the Hindi sentences of the parallel corpus and the parse trees are projected to construct probable parse trees of the corresponding Bengali sentences. Only about 14{%} of these trees are complete (transferred trees contain all the target sentence words) and they are used to construct a Bengali parser. We relax the criteria of completeness to consider well-formed trees (43{%} of the trees) leading to an improvement. We note that the words often do not have a one-to-one mapping in the two languages but considering sentences at the chunk-level results in better correspondence between the two languages. Based on this we present a method to use chunking as a preprocessing step and do the transfer on the chunk trees. We find that about 72{%} of the projected parse trees of Bengali are now well-formed. The resultant parser achieves significant improvement in both Unlabeled Attachment Score (UAS) as well as Labeled Attachment Score (LAS) over the baseline word-level transferred parser. |
Tasks | Chunking, Cross-Lingual Transfer |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3704/ |
https://www.aclweb.org/anthology/W16-3704 | |
PWC | https://paperswithcode.com/paper/development-of-a-bengali-parser-by-cross |
Repo | |
Framework | |
Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking
Title | Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking |
Authors | Ayan Das, Agnivo Saha, Sudeshna Sarkar |
Abstract | |
Tasks | Chunking, Cross-Lingual Transfer |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-6313/ |
https://www.aclweb.org/anthology/W16-6313 | |
PWC | https://paperswithcode.com/paper/cross-lingual-transfer-parser-from-hindi-to |
Repo | |
Framework | |
A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing
Title | A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing |
Authors | Xian-Ling Mao, Yi-Jing Hao, Qiang Zhou, Wen-Qing Yuan, Liner Yang, Heyan Huang |
Abstract | Recently, topic modeling has been widely applied in data mining due to its powerful ability. A common, major challenge in applying such topic models to other tasks is to accurately interpret the meaning of each topic. Topic labeling, as a major interpreting method, has attracted significant attention recently. However, most of previous works only focus on the effectiveness of topic labeling, and less attention has been paid to quickly creating good topic descriptors; meanwhile, it{'}s hard to assign labels for new emerging topics by using most of existing methods. To solve the problems above, in this paper, we propose a novel fast topic labeling framework that casts the labeling problem as a k-nearest neighbor (KNN) search problem in a probability vector set. Our experimental results show that the proposed sequential interleaving method based on locality sensitive hashing (LSH) technology is efficient in boosting the comparison speed among probability distributions, and the proposed framework can generate meaningful labels to interpret topics, including new emerging topics. |
Tasks | Chunking, Topic Models |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1315/ |
https://www.aclweb.org/anthology/C16-1315 | |
PWC | https://paperswithcode.com/paper/a-novel-fast-framework-for-topic-labeling |
Repo | |
Framework | |
AIMU: Actionable Items for Meeting Understanding
Title | AIMU: Actionable Items for Meeting Understanding |
Authors | Yun-Nung Chen, Dilek Hakkani-T{"u}r |
Abstract | With emerging conversational data, automated content analysis is needed for better data interpretation, so that it is accurately understood and can be effectively integrated and utilized in various applications. ICSI meeting corpus is a publicly released data set of multi-party meetings in an organization that has been released over a decade ago, and has been fostering meeting understanding research since then. The original data collection includes transcription of participant turns as well as meta-data annotations, such as disfluencies and dialog act tags. This paper presents an extended set of annotations for the ICSI meeting corpus with a goal of deeply understanding meeting conversations, where participant turns are annotated by actionable items that could be performed by an automated meeting assistant. In addition to the user utterances that contain an actionable item, annotations also include the arguments associated with the actionable item. The set of actionable items are determined by aligning human-human interactions to human-machine interactions, where a data annotation schema designed for a virtual personal assistant (human-machine genre) is adapted to the meetings domain (human-human genre). The data set is formed by annotating participants{'} utterances in meetings with potential intents/actions considering their contexts. The set of actions target what could be accomplished by an automated meeting assistant, such as taking a note of action items that a participant commits to, or finding emails or topic related documents that were mentioned during the meeting. A total of 10 defined intents/actions are considered as actionable items in meetings. Turns that include actionable intents were annotated for 22 public ICSI meetings, that include a total of 21K utterances, segmented by speaker turns. Participants{'} spoken turns, possible actions along with associated arguments and their vector representations as computed by convolutional deep structured semantic models are included in the data set for future research. We present a detailed statistical analysis of the data set and analyze the performance of applying convolutional deep structured semantic models for an actionable item detection task. The data is available at http://research.microsoft.com/ projects/meetingunderstanding/. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1117/ |
https://www.aclweb.org/anthology/L16-1117 | |
PWC | https://paperswithcode.com/paper/aimu-actionable-items-for-meeting |
Repo | |
Framework | |
Proceedings of the 4th BioNLP Shared Task Workshop
Title | Proceedings of the 4th BioNLP Shared Task Workshop |
Authors | |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-3000/ |
https://www.aclweb.org/anthology/W16-3000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-4th-bionlp-shared-task |
Repo | |
Framework | |
A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage
Title | A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage |
Authors | Thomas Lavergne, Aur{'e}lie N{'e}v{'e}ol, Aude Robert, Cyril Grouin, Gr{'e}goire Rey, Pierre Zweigenbaum |
Abstract | Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English. This paper describes a large-scale dataset prepared from French death certificates, and the problems which needed to be solved to turn it into a dataset suitable for the application of machine learning and natural language processing methods of ICD-10 coding. The dataset includes the free-text statements written by medical doctors, the associated meta-data, the human coder-assigned codes for each statement, as well as the statement segments which supported the coder{'}s decision for each code. The dataset comprises 93,694 death certificates totalling 276,103 statements and 377,677 ICD-10 code assignments (3,457 unique codes). It was made available for an international automated coding shared task, which attracted five participating teams. An extended version of the dataset will be used in a new edition of the shared task. |
Tasks | Named Entity Recognition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5107/ |
https://www.aclweb.org/anthology/W16-5107 | |
PWC | https://paperswithcode.com/paper/a-dataset-for-icd-10-coding-of-death |
Repo | |
Framework | |
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
Title | Proceedings of the 12th Workshop on Asian Language Resources (ALR12) |
Authors | |
Abstract | |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5400/ |
https://www.aclweb.org/anthology/W16-5400 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-12th-workshop-on-asian |
Repo | |
Framework | |
Building a Motivational Interviewing Dataset
Title | Building a Motivational Interviewing Dataset |
Authors | Ver{'o}nica P{'e}rez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0305/ |
https://www.aclweb.org/anthology/W16-0305 | |
PWC | https://paperswithcode.com/paper/building-a-motivational-interviewing-dataset |
Repo | |
Framework | |
Detection, Disambiguation and Argument Identification of Discourse Connectives in Chinese Discourse Parsing
Title | Detection, Disambiguation and Argument Identification of Discourse Connectives in Chinese Discourse Parsing |
Authors | Yong-Siang Shih, Hsin-Hsi Chen |
Abstract | In this paper, we investigate four important issues together for explicit discourse relation labelling in Chinese texts: (1) discourse connective extraction, (2) linking ambiguity resolution, (3) relation type disambiguation, and (4) argument boundary identification. In a pipelined Chinese discourse parser, we identify potential connective candidates by string matching, eliminate non-discourse usages from them with a binary classifier, resolve linking ambiguities among connective components by ranking, disambiguate relation types by a multiway classifier, and determine the argument boundaries by conditional random fields. The experiments on Chinese Discourse Treebank show that the F1 scores of 0.7506, 0.7693, 0.7458, and 0.3134 are achieved for discourse usage disambiguation, linking disambiguation, relation type disambiguation, and argument boundary identification, respectively, in a pipelined Chinese discourse parser. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1178/ |
https://www.aclweb.org/anthology/C16-1178 | |
PWC | https://paperswithcode.com/paper/detection-disambiguation-and-argument |
Repo | |
Framework | |
Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs
Title | Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs |
Authors | Yu-Xiong Wang, Martial Hebert |
Abstract | This work explores CNNs for the recognition of novel categories from few examples. Inspired by the transferability properties of CNNs, we introduce an additional unsupervised meta-training stage that exposes multiple top layer units to a large amount of unlabeled real-world images. By encouraging these units to learn diverse sets of low-density separators across the unlabeled data, we capture a more generic, richer description of the visual world, which decouples these units from ties to a specific set of categories. We propose an unsupervised margin maximization that jointly estimates compact high-density regions and infers low-density separators. The low-density separator (LDS) modules can be plugged into any or all of the top layers of a standard CNN architecture. The resulting CNNs significantly improve the performance in scene classification, fine-grained recognition, and action recognition with small training samples. |
Tasks | Scene Classification, Temporal Action Localization |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6408-learning-from-small-sample-sets-by-combining-unsupervised-meta-training-with-cnns |
http://papers.nips.cc/paper/6408-learning-from-small-sample-sets-by-combining-unsupervised-meta-training-with-cnns.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-small-sample-sets-by-combining |
Repo | |
Framework | |
MixKMeans: Clustering Question-Answer Archives
Title | MixKMeans: Clustering Question-Answer Archives |
Authors | Deepak P |
Abstract | |
Tasks | Question Answering, Semantic Textual Similarity |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1164/ |
https://www.aclweb.org/anthology/D16-1164 | |
PWC | https://paperswithcode.com/paper/mixkmeans-clustering-question-answer-archives |
Repo | |
Framework | |
Filter and Match Approach to Pair-wise Web URI Linking
Title | Filter and Match Approach to Pair-wise Web URI Linking |
Authors | S. Shivashankar, Yitong Li, Afshin Rahimi |
Abstract | |
Tasks | Document Classification, Machine Translation, Semantic Textual Similarity |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/U16-1022/ |
https://www.aclweb.org/anthology/U16-1022 | |
PWC | https://paperswithcode.com/paper/filter-and-match-approach-to-pair-wise-web |
Repo | |
Framework | |
Knowledge-Based Semantic Embedding for Machine Translation
Title | Knowledge-Based Semantic Embedding for Machine Translation |
Authors | Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu Sun, Houfeng Wang |
Abstract | |
Tasks | Machine Translation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1212/ |
https://www.aclweb.org/anthology/P16-1212 | |
PWC | https://paperswithcode.com/paper/knowledge-based-semantic-embedding-for |
Repo | |
Framework | |
Answering Yes-No Questions by Penalty Scoring in History Subjects of University Entrance Examinations
Title | Answering Yes-No Questions by Penalty Scoring in History Subjects of University Entrance Examinations |
Authors | Yoshinobu Kano |
Abstract | Answering yes{–}no questions is more difficult than simply retrieving ranked search results. To answer yes{–}no questions, especially when the correct answer is no, one must find an objectionable keyword that makes the question{'}s answer no. Existing systems, such as factoid-based ones, cannot answer yes{–}no questions very well because of insufficient handling of such objectionable keywords. We suggest an algorithm that answers yes{–}no questions by assigning an importance to objectionable keywords. Concretely speaking, we suggest a penalized scoring method that finds and makes lower score for parts of documents that include such objectionable keywords. We check a keyword distribution for each part of a document such as a paragraph, calculating the keyword density as a basic score. Then we use an objectionable keyword penalty when a keyword does not appear in a target part but appears in other parts of the document. Our algorithm is robust for open domain problems because it requires no training. We achieved 4.45 point better results in F1 scores than the best score of the NTCIR-10 RITE2 shared task, also obtained the best score in 2014 mock university examination challenge of the Todai Robot project. |
Tasks | Question Answering |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4413/ |
https://www.aclweb.org/anthology/W16-4413 | |
PWC | https://paperswithcode.com/paper/answering-yes-no-questions-by-penalty-scoring |
Repo | |
Framework | |
Learning Paraphrasing for Multiword Expressions
Title | Learning Paraphrasing for Multiword Expressions |
Authors | Seid Muhie Yimam, H{'e}ctor Mart{'\i}nez Alonso, Martin Riedl, Chris Biemann |
Abstract | |
Tasks | Learning-To-Rank, Machine Translation, Natural Language Inference, Text Simplification, Word Sense Disambiguation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1801/ |
https://www.aclweb.org/anthology/W16-1801 | |
PWC | https://paperswithcode.com/paper/learning-paraphrasing-for-multiword |
Repo | |
Framework | |