May 5, 2019

1969 words 10 mins read

Paper Group NANR 23

Development of a Bengali parser by cross-lingual transfer from Hindi. Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking. A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing. AIMU: Actionable Items for Meeting Understanding. Proceedings of the 4th BioNLP Shared Task Workshop. A Datase …

Development of a Bengali parser by cross-lingual transfer from Hindi


Title	Development of a Bengali parser by cross-lingual transfer from Hindi
Authors	Ayan Das, Agnivo Saha, Sudeshna Sarkar
Abstract	In recent years there has been a lot of interest in cross-lingual parsing for developing treebanks for languages with small or no annotated treebanks. In this paper, we explore the development of a cross-lingual transfer parser from Hindi to Bengali using a Hindi parser and a Hindi-Bengali parallel corpus. A parser is trained and applied to the Hindi sentences of the parallel corpus and the parse trees are projected to construct probable parse trees of the corresponding Bengali sentences. Only about 14{%} of these trees are complete (transferred trees contain all the target sentence words) and they are used to construct a Bengali parser. We relax the criteria of completeness to consider well-formed trees (43{%} of the trees) leading to an improvement. We note that the words often do not have a one-to-one mapping in the two languages but considering sentences at the chunk-level results in better correspondence between the two languages. Based on this we present a method to use chunking as a preprocessing step and do the transfer on the chunk trees. We find that about 72{%} of the projected parse trees of Bengali are now well-formed. The resultant parser achieves significant improvement in both Unlabeled Attachment Score (UAS) as well as Labeled Attachment Score (LAS) over the baseline word-level transferred parser.
Tasks	Chunking, Cross-Lingual Transfer
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3704/
PDF	https://www.aclweb.org/anthology/W16-3704
PWC	https://paperswithcode.com/paper/development-of-a-bengali-parser-by-cross
Repo
Framework

Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking


Title	Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking
Authors	Ayan Das, Agnivo Saha, Sudeshna Sarkar
Abstract
Tasks	Chunking, Cross-Lingual Transfer
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-6313/
PDF	https://www.aclweb.org/anthology/W16-6313
PWC	https://paperswithcode.com/paper/cross-lingual-transfer-parser-from-hindi-to
Repo
Framework

A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing


Title	A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing
Authors	Xian-Ling Mao, Yi-Jing Hao, Qiang Zhou, Wen-Qing Yuan, Liner Yang, Heyan Huang
Abstract	Recently, topic modeling has been widely applied in data mining due to its powerful ability. A common, major challenge in applying such topic models to other tasks is to accurately interpret the meaning of each topic. Topic labeling, as a major interpreting method, has attracted significant attention recently. However, most of previous works only focus on the effectiveness of topic labeling, and less attention has been paid to quickly creating good topic descriptors; meanwhile, it{'}s hard to assign labels for new emerging topics by using most of existing methods. To solve the problems above, in this paper, we propose a novel fast topic labeling framework that casts the labeling problem as a k-nearest neighbor (KNN) search problem in a probability vector set. Our experimental results show that the proposed sequential interleaving method based on locality sensitive hashing (LSH) technology is efficient in boosting the comparison speed among probability distributions, and the proposed framework can generate meaningful labels to interpret topics, including new emerging topics.
Tasks	Chunking, Topic Models
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1315/
PDF	https://www.aclweb.org/anthology/C16-1315
PWC	https://paperswithcode.com/paper/a-novel-fast-framework-for-topic-labeling
Repo
Framework

AIMU: Actionable Items for Meeting Understanding


Title	AIMU: Actionable Items for Meeting Understanding
Authors	Yun-Nung Chen, Dilek Hakkani-T{"u}r
Abstract	With emerging conversational data, automated content analysis is needed for better data interpretation, so that it is accurately understood and can be effectively integrated and utilized in various applications. ICSI meeting corpus is a publicly released data set of multi-party meetings in an organization that has been released over a decade ago, and has been fostering meeting understanding research since then. The original data collection includes transcription of participant turns as well as meta-data annotations, such as disfluencies and dialog act tags. This paper presents an extended set of annotations for the ICSI meeting corpus with a goal of deeply understanding meeting conversations, where participant turns are annotated by actionable items that could be performed by an automated meeting assistant. In addition to the user utterances that contain an actionable item, annotations also include the arguments associated with the actionable item. The set of actionable items are determined by aligning human-human interactions to human-machine interactions, where a data annotation schema designed for a virtual personal assistant (human-machine genre) is adapted to the meetings domain (human-human genre). The data set is formed by annotating participants{'} utterances in meetings with potential intents/actions considering their contexts. The set of actions target what could be accomplished by an automated meeting assistant, such as taking a note of action items that a participant commits to, or finding emails or topic related documents that were mentioned during the meeting. A total of 10 defined intents/actions are considered as actionable items in meetings. Turns that include actionable intents were annotated for 22 public ICSI meetings, that include a total of 21K utterances, segmented by speaker turns. Participants{'} spoken turns, possible actions along with associated arguments and their vector representations as computed by convolutional deep structured semantic models are included in the data set for future research. We present a detailed statistical analysis of the data set and analyze the performance of applying convolutional deep structured semantic models for an actionable item detection task. The data is available at http://research.microsoft.com/ projects/meetingunderstanding/.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1117/
PDF	https://www.aclweb.org/anthology/L16-1117
PWC	https://paperswithcode.com/paper/aimu-actionable-items-for-meeting
Repo
Framework

Proceedings of the 4th BioNLP Shared Task Workshop


Title	Proceedings of the 4th BioNLP Shared Task Workshop
Authors
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-3000/
PDF	https://www.aclweb.org/anthology/W16-3000
PWC	https://paperswithcode.com/paper/proceedings-of-the-4th-bionlp-shared-task
Repo
Framework

A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage


Title	A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage
Authors	Thomas Lavergne, Aur{'e}lie N{'e}v{'e}ol, Aude Robert, Cyril Grouin, Gr{'e}goire Rey, Pierre Zweigenbaum
Abstract	Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English. This paper describes a large-scale dataset prepared from French death certificates, and the problems which needed to be solved to turn it into a dataset suitable for the application of machine learning and natural language processing methods of ICD-10 coding. The dataset includes the free-text statements written by medical doctors, the associated meta-data, the human coder-assigned codes for each statement, as well as the statement segments which supported the coder{'}s decision for each code. The dataset comprises 93,694 death certificates totalling 276,103 statements and 377,677 ICD-10 code assignments (3,457 unique codes). It was made available for an international automated coding shared task, which attracted five participating teams. An extended version of the dataset will be used in a new edition of the shared task.
Tasks	Named Entity Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5107/
PDF	https://www.aclweb.org/anthology/W16-5107
PWC	https://paperswithcode.com/paper/a-dataset-for-icd-10-coding-of-death
Repo
Framework

Proceedings of the 12th Workshop on Asian Language Resources (ALR12)


Title	Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
Authors
Abstract
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5400/
PDF	https://www.aclweb.org/anthology/W16-5400
PWC	https://paperswithcode.com/paper/proceedings-of-the-12th-workshop-on-asian
Repo
Framework

Building a Motivational Interviewing Dataset


Title	Building a Motivational Interviewing Dataset
Authors	Ver{'o}nica P{'e}rez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0305/
PDF	https://www.aclweb.org/anthology/W16-0305
PWC	https://paperswithcode.com/paper/building-a-motivational-interviewing-dataset
Repo
Framework

Detection, Disambiguation and Argument Identification of Discourse Connectives in Chinese Discourse Parsing


Title	Detection, Disambiguation and Argument Identification of Discourse Connectives in Chinese Discourse Parsing
Authors	Yong-Siang Shih, Hsin-Hsi Chen
Abstract	In this paper, we investigate four important issues together for explicit discourse relation labelling in Chinese texts: (1) discourse connective extraction, (2) linking ambiguity resolution, (3) relation type disambiguation, and (4) argument boundary identification. In a pipelined Chinese discourse parser, we identify potential connective candidates by string matching, eliminate non-discourse usages from them with a binary classifier, resolve linking ambiguities among connective components by ranking, disambiguate relation types by a multiway classifier, and determine the argument boundaries by conditional random fields. The experiments on Chinese Discourse Treebank show that the F1 scores of 0.7506, 0.7693, 0.7458, and 0.3134 are achieved for discourse usage disambiguation, linking disambiguation, relation type disambiguation, and argument boundary identification, respectively, in a pipelined Chinese discourse parser.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1178/
PDF	https://www.aclweb.org/anthology/C16-1178
PWC	https://paperswithcode.com/paper/detection-disambiguation-and-argument
Repo
Framework

Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs


Title	Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs
Authors	Yu-Xiong Wang, Martial Hebert
Abstract	This work explores CNNs for the recognition of novel categories from few examples. Inspired by the transferability properties of CNNs, we introduce an additional unsupervised meta-training stage that exposes multiple top layer units to a large amount of unlabeled real-world images. By encouraging these units to learn diverse sets of low-density separators across the unlabeled data, we capture a more generic, richer description of the visual world, which decouples these units from ties to a specific set of categories. We propose an unsupervised margin maximization that jointly estimates compact high-density regions and infers low-density separators. The low-density separator (LDS) modules can be plugged into any or all of the top layers of a standard CNN architecture. The resulting CNNs significantly improve the performance in scene classification, fine-grained recognition, and action recognition with small training samples.
Tasks	Scene Classification, Temporal Action Localization
Published	2016-12-01
URL	http://papers.nips.cc/paper/6408-learning-from-small-sample-sets-by-combining-unsupervised-meta-training-with-cnns
PDF	http://papers.nips.cc/paper/6408-learning-from-small-sample-sets-by-combining-unsupervised-meta-training-with-cnns.pdf
PWC	https://paperswithcode.com/paper/learning-from-small-sample-sets-by-combining
Repo
Framework

MixKMeans: Clustering Question-Answer Archives


Title	MixKMeans: Clustering Question-Answer Archives
Authors	Deepak P
Abstract
Tasks	Question Answering, Semantic Textual Similarity
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1164/
PDF	https://www.aclweb.org/anthology/D16-1164
PWC	https://paperswithcode.com/paper/mixkmeans-clustering-question-answer-archives
Repo
Framework

Filter and Match Approach to Pair-wise Web URI Linking


Title	Filter and Match Approach to Pair-wise Web URI Linking
Authors	S. Shivashankar, Yitong Li, Afshin Rahimi
Abstract
Tasks	Document Classification, Machine Translation, Semantic Textual Similarity
Published	2016-12-01
URL	https://www.aclweb.org/anthology/U16-1022/
PDF	https://www.aclweb.org/anthology/U16-1022
PWC	https://paperswithcode.com/paper/filter-and-match-approach-to-pair-wise-web
Repo
Framework

Knowledge-Based Semantic Embedding for Machine Translation


Title	Knowledge-Based Semantic Embedding for Machine Translation
Authors	Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu Sun, Houfeng Wang
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1212/
PDF	https://www.aclweb.org/anthology/P16-1212
PWC	https://paperswithcode.com/paper/knowledge-based-semantic-embedding-for
Repo
Framework

Answering Yes-No Questions by Penalty Scoring in History Subjects of University Entrance Examinations


Title	Answering Yes-No Questions by Penalty Scoring in History Subjects of University Entrance Examinations
Authors	Yoshinobu Kano
Abstract	Answering yes{–}no questions is more difficult than simply retrieving ranked search results. To answer yes{–}no questions, especially when the correct answer is no, one must find an objectionable keyword that makes the question{'}s answer no. Existing systems, such as factoid-based ones, cannot answer yes{–}no questions very well because of insufficient handling of such objectionable keywords. We suggest an algorithm that answers yes{–}no questions by assigning an importance to objectionable keywords. Concretely speaking, we suggest a penalized scoring method that finds and makes lower score for parts of documents that include such objectionable keywords. We check a keyword distribution for each part of a document such as a paragraph, calculating the keyword density as a basic score. Then we use an objectionable keyword penalty when a keyword does not appear in a target part but appears in other parts of the document. Our algorithm is robust for open domain problems because it requires no training. We achieved 4.45 point better results in F1 scores than the best score of the NTCIR-10 RITE2 shared task, also obtained the best score in 2014 mock university examination challenge of the Todai Robot project.
Tasks	Question Answering
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4413/
PDF	https://www.aclweb.org/anthology/W16-4413
PWC	https://paperswithcode.com/paper/answering-yes-no-questions-by-penalty-scoring
Repo
Framework

Learning Paraphrasing for Multiword Expressions


Title	Learning Paraphrasing for Multiword Expressions
Authors	Seid Muhie Yimam, H{'e}ctor Mart{'\i}nez Alonso, Martin Riedl, Chris Biemann
Abstract
Tasks	Learning-To-Rank, Machine Translation, Natural Language Inference, Text Simplification, Word Sense Disambiguation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1801/
PDF	https://www.aclweb.org/anthology/W16-1801
PWC	https://paperswithcode.com/paper/learning-paraphrasing-for-multiword
Repo
Framework