Paper Group NANR 48
Enhancing Key-Value Memory Neural Networks for Knowledge Based Question Answering. Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin. Noun Phrases Rooted by Adjectives: A Dependency Grammar Analysis of the Big Mess Construction. Exceptive constructions. A Dependency-base …
Enhancing Key-Value Memory Neural Networks for Knowledge Based Question Answering
Title | Enhancing Key-Value Memory Neural Networks for Knowledge Based Question Answering |
Authors | Kun Xu, Yuxuan Lai, Yansong Feng, Zhiguo Wang |
Abstract | Traditional Key-value Memory Neural Networks (KV-MemNNs) are proved to be effective to support shallow reasoning over a collection of documents in domain specific Question Answering or Reading Comprehension tasks. However, extending KV-MemNNs to Knowledge Based Question Answering (KB-QA) is not trivia, which should properly decompose a complex question into a sequence of queries against the memory, and update the query representations to support multi-hop reasoning over the memory. In this paper, we propose a novel mechanism to enable conventional KV-MemNNs models to perform interpretable reasoning for complex questions. To achieve this, we design a new query updating strategy to mask previously-addressed memory information from the query representations, and introduce a novel STOP strategy to avoid invalid or repeated memory reading without strong annotation signals. This also enables KV-MemNNs to produce structured queries and work in a semantic parsing fashion. Experimental results on benchmark datasets show that our solution, trained with question-answer pairs only, can provide conventional KV-MemNNs models with better reasoning abilities on complex questions, and achieve state-of-art performances. |
Tasks | Question Answering, Reading Comprehension, Semantic Parsing |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1301/ |
https://www.aclweb.org/anthology/N19-1301 | |
PWC | https://paperswithcode.com/paper/enhancing-key-value-memory-neural-networks |
Repo | |
Framework | |
Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin
Title | Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin |
Authors | Francesco Mambrini, Marco Passarotti |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7808/ |
https://www.aclweb.org/anthology/W19-7808 | |
PWC | https://paperswithcode.com/paper/linked-open-treebanks-interlinking |
Repo | |
Framework | |
Noun Phrases Rooted by Adjectives: A Dependency Grammar Analysis of the Big Mess Construction
Title | Noun Phrases Rooted by Adjectives: A Dependency Grammar Analysis of the Big Mess Construction |
Authors | Timothy Osborne |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7707/ |
https://www.aclweb.org/anthology/W19-7707 | |
PWC | https://paperswithcode.com/paper/noun-phrases-rooted-by-adjectives-a |
Repo | |
Framework | |
Exceptive constructions. A Dependency-based Analysis
Title | Exceptive constructions. A Dependency-based Analysis |
Authors | Mohamed Galal, Sylvain Kahane, Yomna Safwat |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7720/ |
https://www.aclweb.org/anthology/W19-7720 | |
PWC | https://paperswithcode.com/paper/exceptive-constructions-a-dependency-based |
Repo | |
Framework | |
Ellipsis in Chinese AMR Corpus
Title | Ellipsis in Chinese AMR Corpus |
Authors | Yihuan Liu, Bin Li, Peiyi Yan, Li Song, Weiguang Qu |
Abstract | Ellipsis is very common in language. It{'}s necessary for natural language processing to restore the elided elements in a sentence. However, there{'}s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98{%} of sentences have ellipses. 92{%} of the ellipses are restored by copying the antecedents{'} concepts. and 12.9{%} of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-3310/ |
https://www.aclweb.org/anthology/W19-3310 | |
PWC | https://paperswithcode.com/paper/ellipsis-in-chinese-amr-corpus |
Repo | |
Framework | |
Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification
Title | Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification |
Authors | Hao Guo, Kang Zheng, Xiaochuan Fan, Hongkai Yu, Song Wang |
Abstract | Human visual perception shows good consistency for many multi-label image classification tasks under certain spatial transforms, such as scaling, rotation, flipping and translation. This has motivated the data augmentation strategy widely used in CNN classifier training – transformed images are included for training by assuming the same class labels as their original images. In this paper, we further propose the assumption of perceptual consistency of visual attention regions for classification under such transforms, i.e., the attention region for a classification follows the same transform if the input image is spatially transformed. While the attention regions of CNN classifiers can be derived as an attention heatmap in middle layers of the network, we find that their consistency under many transforms are not preserved. To address this problem, we propose a two-branch network with an original image and its transformed image as inputs and introduce a new attention consistency loss that measures the attention heatmap consistency between two branches. This new loss is then combined with multi-label image classification loss for network training. Experiments on three datasets verify the superiority of the proposed network by achieving new state-of-the-art classification performance. |
Tasks | Data Augmentation, Image Classification |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Guo_Visual_Attention_Consistency_Under_Image_Transforms_for_Multi-Label_Image_Classification_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Guo_Visual_Attention_Consistency_Under_Image_Transforms_for_Multi-Label_Image_Classification_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/visual-attention-consistency-under-image |
Repo | |
Framework | |
Learning New Tricks From Old Dogs: Multi-Source Transfer Learning From Pre-Trained Networks
Title | Learning New Tricks From Old Dogs: Multi-Source Transfer Learning From Pre-Trained Networks |
Authors | Joshua Lee, Prasanna Sattigeri, Gregory Wornell |
Abstract | The advent of deep learning algorithms for mobile devices and sensors has led to a dramatic expansion in the availability and number of systems trained on a wide range of machine learning tasks, creating a host of opportunities and challenges in the realm of transfer learning. Currently, most transfer learning methods require some kind of control over the systems learned, either by enforcing constraints during the source training, or through the use of a joint optimization objective between tasks that requires all data be co-located for training. However, for practical, privacy, or other reasons, in a variety of applications we may have no control over the individual source task training, nor access to source training samples. Instead we only have access to features pre-trained on such data as the output of “black-boxes.’’ For such scenarios, we consider the multi-source learning problem of training a classifier using an ensemble of pre-trained neural networks for a set of classes that have not been observed by any of the source networks, and for which we have very few training samples. We show that by using these distributed networks as feature extractors, we can train an effective classifier in a computationally-efficient manner using tools from (nonlinear) maximal correlation analysis. In particular, we develop a method we refer to as maximal correlation weighting (MCW) to build the required target classifier from an appropriate weighting of the feature functions from the source networks. We illustrate the effectiveness of the resulting classifier on datasets derived from the CIFAR-100, Stanford Dogs, and Tiny ImageNet datasets, and, in addition, use the methodology to characterize the relative value of different source tasks in learning a target task. |
Tasks | Transfer Learning |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8688-learning-new-tricks-from-old-dogs-multi-source-transfer-learning-from-pre-trained-networks |
http://papers.nips.cc/paper/8688-learning-new-tricks-from-old-dogs-multi-source-transfer-learning-from-pre-trained-networks.pdf | |
PWC | https://paperswithcode.com/paper/learning-new-tricks-from-old-dogs-multi |
Repo | |
Framework | |
Distilling weighted finite automata from arbitrary probabilistic models
Title | Distilling weighted finite automata from arbitrary probabilistic models |
Authors | An Suresh, a Theertha, Brian Roark, Michael Riley, Vlad Schogol |
Abstract | Weighted finite automata (WFA) are often used to represent probabilistic models, such as n-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on some tasks including distilling n-gram models from neural models. |
Tasks | |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/W19-3112/ |
https://www.aclweb.org/anthology/W19-3112 | |
PWC | https://paperswithcode.com/paper/distilling-weighted-finite-automata-from |
Repo | |
Framework | |
A Communication Efficient Stochastic Multi-Block Alternating Direction Method of Multipliers
Title | A Communication Efficient Stochastic Multi-Block Alternating Direction Method of Multipliers |
Authors | Hao Yu |
Abstract | The alternating direction method of multipliers (ADMM) has recently received tremendous interests for distributed large scale optimization in machine learning, statistics, multi-agent networks and related applications. In this paper, we propose a new parallel multi-block stochastic ADMM for distributed stochastic optimization, where each node is only required to perform simple stochastic gradient descent updates. The proposed ADMM is fully parallel, can solve problems with arbitrary block structures, and has a convergence rate comparable to or better than existing state-of-the-art ADMM methods for stochastic optimization. Existing stochastic (or deterministic) ADMMs require each node to exchange its updated primal variables across nodes at each iteration and hence cause significant amount of communication overhead. Existing ADMMs require roughly the same number of inter-node communication rounds as the number of in-node computation rounds. In contrast, the number of communication rounds required by our new ADMM is only the square root of the number of computation rounds. |
Tasks | Stochastic Optimization |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9068-a-communication-efficient-stochastic-multi-block-alternating-direction-method-of-multipliers |
http://papers.nips.cc/paper/9068-a-communication-efficient-stochastic-multi-block-alternating-direction-method-of-multipliers.pdf | |
PWC | https://paperswithcode.com/paper/a-communication-efficient-stochastic-multi |
Repo | |
Framework | |
A Contrastive Evaluation of Word Sense Disambiguation Systems for Finnish
Title | A Contrastive Evaluation of Word Sense Disambiguation Systems for Finnish |
Authors | Frankie Robertson |
Abstract | |
Tasks | Word Sense Disambiguation |
Published | 2019-01-01 |
URL | https://www.aclweb.org/anthology/W19-0304/ |
https://www.aclweb.org/anthology/W19-0304 | |
PWC | https://paperswithcode.com/paper/a-contrastive-evaluation-of-word-sense |
Repo | |
Framework | |
PolyU_CBS-CFA at the FinSBD Task: Sentence Boundary Detection of Financial Data with Domain Knowledge Enhancement and Bilingual Training
Title | PolyU_CBS-CFA at the FinSBD Task: Sentence Boundary Detection of Financial Data with Domain Knowledge Enhancement and Bilingual Training |
Authors | Mingyu Wan, Rong Xiang, Emmanuele Chersoni, Natalia Klyueva, Kathleen Ahrens, Bin Miao, David Broadstock, Jian Kang, Amos Yung, Chu-Ren Huang |
Abstract | |
Tasks | Boundary Detection |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5521/ |
https://www.aclweb.org/anthology/W19-5521 | |
PWC | https://paperswithcode.com/paper/polyu_cbs-cfa-at-the-finsbd-task-sentence |
Repo | |
Framework | |
Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification
Title | Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification |
Authors | Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, Shin’ichi Satoh |
Abstract | Infrared-Visible person RE-IDentification (IV-REID) is a rising task. Compared to conventional person re-identification (re-ID), IV-REID concerns the additional modality discrepancy originated from the different imaging processes of spectrum cameras, in addition to the person’s appearance discrepancy caused by viewpoint changes, pose variations and deformations presented in the conventional re-ID task. The co-existed discrepancies make IV-REID more difficult to solve. Previous methods attempt to reduce the appearance and modality discrepancies simultaneously using feature-level constraints. It is however difficult to eliminate the mixed discrepancies using only feature-level constraints. To address the problem, this paper introduces a novel Dual-level Discrepancy Reduction Learning (D^2RL) scheme which handles the two discrepancies separately. For reducing the modality discrepancy, an image-level sub-network is trained to translate an infrared image into its visible counterpart and a visible image to its infrared version. With the image-level sub-network, we can unify the representations for images with different modalities. With the help of the unified multi-spectral images, a feature-level sub-network is trained to reduce the remaining appearance discrepancy through feature embedding. By cascading the two sub-networks and training them jointly, the dual-level reductions take their responsibilities cooperatively and attentively. Extensive experiments demonstrate the proposed approach outperforms the state-of-the-art methods. |
Tasks | Person Re-Identification |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Wang_Learning_to_Reduce_Dual-Level_Discrepancy_for_Infrared-Visible_Person_Re-Identification_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Learning_to_Reduce_Dual-Level_Discrepancy_for_Infrared-Visible_Person_Re-Identification_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-reduce-dual-level-discrepancy-for |
Repo | |
Framework | |
Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models
Title | Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models |
Authors | Huan Zhang, Hai Zhao |
Abstract | Sequence to sequence (seq2seq) models have become a popular framework for neural sequence prediction. While traditional seq2seq models are trained by Maximum Likelihood Estimation (MLE), much recent work has made various attempts to optimize evaluation scores directly to solve the mismatch between training and evaluation, since model predictions are usually evaluated by a task specific evaluation metric like BLEU or ROUGE scores instead of perplexity. This paper for the first time puts this existing work into two categories, a) minimum divergence, and b) maximum margin. We introduce a new training criterion based on the analysis of existing work, and empirically compare models in the two categories. Our experimental results show that our new training criterion can usually work better than existing methods, on both the tasks of machine translation and sentence summarization. |
Tasks | Machine Translation |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=H1xD9sR5Fm |
https://openreview.net/pdf?id=H1xD9sR5Fm | |
PWC | https://paperswithcode.com/paper/minimum-divergence-vs-maximum-margin-an |
Repo | |
Framework | |
Joint Representative Selection and Feature Learning: A Semi-Supervised Approach
Title | Joint Representative Selection and Feature Learning: A Semi-Supervised Approach |
Authors | Suchen Wang, Jingjing Meng, Junsong Yuan, Yap-Peng Tan |
Abstract | In this paper, we propose a semi-supervised approach for representative selection, which finds a small set of representatives that can well summarize a large data collection. Given labeled source data and big unlabeled target data, we aim to find representatives in the target data, which can not only represent and associate data points belonging to each labeled category, but also discover novel categories in the target data, if any. To leverage labeled source data, we guide representative selection from labeled source to unlabeled target. We propose a joint optimization framework which alternately optimizes (1) representative selection in the target data and (2) discriminative feature learning from both the source and the target for better representative selection. Experiments on image and video datasets demonstrate that our proposed approach not only finds better representatives, but also can discover novel categories in the target data that are not in the source. |
Tasks | |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Wang_Joint_Representative_Selection_and_Feature_Learning_A_Semi-Supervised_Approach_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Joint_Representative_Selection_and_Feature_Learning_A_Semi-Supervised_Approach_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/joint-representative-selection-and-feature |
Repo | |
Framework | |
Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited
Title | Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited |
Authors | Artur Kulmizev, Miryam de Lhoneux, Johannes Gontrum, Elena Fano, Joakim Nivre |
Abstract | Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope. In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same. Moreover, we show that deep contextualized word embeddings, which allow parsers to pack information about global sentence structure into local feature representations, benefit transition-based parsers more than graph-based parsers, making the two approaches virtually equivalent in terms of both accuracy and error profile. We argue that the reason is that these representations help prevent search errors and thereby allow transition-based parsers to better exploit their inherent strength of making accurate local decisions. We support this explanation by an error analysis of parsing experiments on 13 languages. |
Tasks | Dependency Parsing, Word Embeddings |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1277/ |
https://www.aclweb.org/anthology/D19-1277 | |
PWC | https://paperswithcode.com/paper/deep-contextualized-word-embeddings-in-1 |
Repo | |
Framework | |