January 25, 2020

2411 words 12 mins read

Paper Group NANR 48

Enhancing Key-Value Memory Neural Networks for Knowledge Based Question Answering. Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin. Noun Phrases Rooted by Adjectives: A Dependency Grammar Analysis of the Big Mess Construction. Exceptive constructions. A Dependency-base …

Enhancing Key-Value Memory Neural Networks for Knowledge Based Question Answering


Title	Enhancing Key-Value Memory Neural Networks for Knowledge Based Question Answering
Authors	Kun Xu, Yuxuan Lai, Yansong Feng, Zhiguo Wang
Abstract	Traditional Key-value Memory Neural Networks (KV-MemNNs) are proved to be effective to support shallow reasoning over a collection of documents in domain specific Question Answering or Reading Comprehension tasks. However, extending KV-MemNNs to Knowledge Based Question Answering (KB-QA) is not trivia, which should properly decompose a complex question into a sequence of queries against the memory, and update the query representations to support multi-hop reasoning over the memory. In this paper, we propose a novel mechanism to enable conventional KV-MemNNs models to perform interpretable reasoning for complex questions. To achieve this, we design a new query updating strategy to mask previously-addressed memory information from the query representations, and introduce a novel STOP strategy to avoid invalid or repeated memory reading without strong annotation signals. This also enables KV-MemNNs to produce structured queries and work in a semantic parsing fashion. Experimental results on benchmark datasets show that our solution, trained with question-answer pairs only, can provide conventional KV-MemNNs models with better reasoning abilities on complex questions, and achieve state-of-art performances.
Tasks	Question Answering, Reading Comprehension, Semantic Parsing
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1301/
PDF	https://www.aclweb.org/anthology/N19-1301
PWC	https://paperswithcode.com/paper/enhancing-key-value-memory-neural-networks
Repo
Framework

Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin


Title	Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin
Authors	Francesco Mambrini, Marco Passarotti
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-7808/
PDF	https://www.aclweb.org/anthology/W19-7808
PWC	https://paperswithcode.com/paper/linked-open-treebanks-interlinking
Repo
Framework

Noun Phrases Rooted by Adjectives: A Dependency Grammar Analysis of the Big Mess Construction


Title	Noun Phrases Rooted by Adjectives: A Dependency Grammar Analysis of the Big Mess Construction
Authors	Timothy Osborne
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-7707/
PDF	https://www.aclweb.org/anthology/W19-7707
PWC	https://paperswithcode.com/paper/noun-phrases-rooted-by-adjectives-a
Repo
Framework

Exceptive constructions. A Dependency-based Analysis


Title	Exceptive constructions. A Dependency-based Analysis
Authors	Mohamed Galal, Sylvain Kahane, Yomna Safwat
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-7720/
PDF	https://www.aclweb.org/anthology/W19-7720
PWC	https://paperswithcode.com/paper/exceptive-constructions-a-dependency-based
Repo
Framework

Ellipsis in Chinese AMR Corpus


Title	Ellipsis in Chinese AMR Corpus
Authors	Yihuan Liu, Bin Li, Peiyi Yan, Li Song, Weiguang Qu
Abstract	Ellipsis is very common in language. It{'}s necessary for natural language processing to restore the elided elements in a sentence. However, there{'}s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98{%} of sentences have ellipses. 92{%} of the ellipses are restored by copying the antecedents{'} concepts. and 12.9{%} of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-3310/
PDF	https://www.aclweb.org/anthology/W19-3310
PWC	https://paperswithcode.com/paper/ellipsis-in-chinese-amr-corpus
Repo
Framework

Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification


Title	Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification
Authors	Hao Guo, Kang Zheng, Xiaochuan Fan, Hongkai Yu, Song Wang
Abstract	Human visual perception shows good consistency for many multi-label image classification tasks under certain spatial transforms, such as scaling, rotation, flipping and translation. This has motivated the data augmentation strategy widely used in CNN classifier training – transformed images are included for training by assuming the same class labels as their original images. In this paper, we further propose the assumption of perceptual consistency of visual attention regions for classification under such transforms, i.e., the attention region for a classification follows the same transform if the input image is spatially transformed. While the attention regions of CNN classifiers can be derived as an attention heatmap in middle layers of the network, we find that their consistency under many transforms are not preserved. To address this problem, we propose a two-branch network with an original image and its transformed image as inputs and introduce a new attention consistency loss that measures the attention heatmap consistency between two branches. This new loss is then combined with multi-label image classification loss for network training. Experiments on three datasets verify the superiority of the proposed network by achieving new state-of-the-art classification performance.
Tasks	Data Augmentation, Image Classification
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Guo_Visual_Attention_Consistency_Under_Image_Transforms_for_Multi-Label_Image_Classification_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Guo_Visual_Attention_Consistency_Under_Image_Transforms_for_Multi-Label_Image_Classification_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/visual-attention-consistency-under-image
Repo
Framework

Learning New Tricks From Old Dogs: Multi-Source Transfer Learning From Pre-Trained Networks


Title	Learning New Tricks From Old Dogs: Multi-Source Transfer Learning From Pre-Trained Networks
Authors	Joshua Lee, Prasanna Sattigeri, Gregory Wornell
Abstract	The advent of deep learning algorithms for mobile devices and sensors has led to a dramatic expansion in the availability and number of systems trained on a wide range of machine learning tasks, creating a host of opportunities and challenges in the realm of transfer learning. Currently, most transfer learning methods require some kind of control over the systems learned, either by enforcing constraints during the source training, or through the use of a joint optimization objective between tasks that requires all data be co-located for training. However, for practical, privacy, or other reasons, in a variety of applications we may have no control over the individual source task training, nor access to source training samples. Instead we only have access to features pre-trained on such data as the output of “black-boxes.’’ For such scenarios, we consider the multi-source learning problem of training a classifier using an ensemble of pre-trained neural networks for a set of classes that have not been observed by any of the source networks, and for which we have very few training samples. We show that by using these distributed networks as feature extractors, we can train an effective classifier in a computationally-efficient manner using tools from (nonlinear) maximal correlation analysis. In particular, we develop a method we refer to as maximal correlation weighting (MCW) to build the required target classifier from an appropriate weighting of the feature functions from the source networks. We illustrate the effectiveness of the resulting classifier on datasets derived from the CIFAR-100, Stanford Dogs, and Tiny ImageNet datasets, and, in addition, use the methodology to characterize the relative value of different source tasks in learning a target task.
Tasks	Transfer Learning
Published	2019-12-01
URL	http://papers.nips.cc/paper/8688-learning-new-tricks-from-old-dogs-multi-source-transfer-learning-from-pre-trained-networks
PDF	http://papers.nips.cc/paper/8688-learning-new-tricks-from-old-dogs-multi-source-transfer-learning-from-pre-trained-networks.pdf
PWC	https://paperswithcode.com/paper/learning-new-tricks-from-old-dogs-multi
Repo
Framework

Distilling weighted finite automata from arbitrary probabilistic models


Title	Distilling weighted finite automata from arbitrary probabilistic models
Authors	An Suresh, a Theertha, Brian Roark, Michael Riley, Vlad Schogol
Abstract	Weighted finite automata (WFA) are often used to represent probabilistic models, such as n-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on some tasks including distilling n-gram models from neural models.
Tasks
Published	2019-09-01
URL	https://www.aclweb.org/anthology/W19-3112/
PDF	https://www.aclweb.org/anthology/W19-3112
PWC	https://paperswithcode.com/paper/distilling-weighted-finite-automata-from
Repo
Framework

A Communication Efficient Stochastic Multi-Block Alternating Direction Method of Multipliers


Title	A Communication Efficient Stochastic Multi-Block Alternating Direction Method of Multipliers
Authors	Hao Yu
Abstract	The alternating direction method of multipliers (ADMM) has recently received tremendous interests for distributed large scale optimization in machine learning, statistics, multi-agent networks and related applications. In this paper, we propose a new parallel multi-block stochastic ADMM for distributed stochastic optimization, where each node is only required to perform simple stochastic gradient descent updates. The proposed ADMM is fully parallel, can solve problems with arbitrary block structures, and has a convergence rate comparable to or better than existing state-of-the-art ADMM methods for stochastic optimization. Existing stochastic (or deterministic) ADMMs require each node to exchange its updated primal variables across nodes at each iteration and hence cause significant amount of communication overhead. Existing ADMMs require roughly the same number of inter-node communication rounds as the number of in-node computation rounds. In contrast, the number of communication rounds required by our new ADMM is only the square root of the number of computation rounds.
Tasks	Stochastic Optimization
Published	2019-12-01
URL	http://papers.nips.cc/paper/9068-a-communication-efficient-stochastic-multi-block-alternating-direction-method-of-multipliers
PDF	http://papers.nips.cc/paper/9068-a-communication-efficient-stochastic-multi-block-alternating-direction-method-of-multipliers.pdf
PWC	https://paperswithcode.com/paper/a-communication-efficient-stochastic-multi
Repo
Framework

A Contrastive Evaluation of Word Sense Disambiguation Systems for Finnish


Title	A Contrastive Evaluation of Word Sense Disambiguation Systems for Finnish
Authors	Frankie Robertson
Abstract
Tasks	Word Sense Disambiguation
Published	2019-01-01
URL	https://www.aclweb.org/anthology/W19-0304/
PDF	https://www.aclweb.org/anthology/W19-0304
PWC	https://paperswithcode.com/paper/a-contrastive-evaluation-of-word-sense
Repo
Framework

PolyU_CBS-CFA at the FinSBD Task: Sentence Boundary Detection of Financial Data with Domain Knowledge Enhancement and Bilingual Training


Title	PolyU_CBS-CFA at the FinSBD Task: Sentence Boundary Detection of Financial Data with Domain Knowledge Enhancement and Bilingual Training
Authors	Mingyu Wan, Rong Xiang, Emmanuele Chersoni, Natalia Klyueva, Kathleen Ahrens, Bin Miao, David Broadstock, Jian Kang, Amos Yung, Chu-Ren Huang
Abstract
Tasks	Boundary Detection
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5521/
PDF	https://www.aclweb.org/anthology/W19-5521
PWC	https://paperswithcode.com/paper/polyu_cbs-cfa-at-the-finsbd-task-sentence
Repo
Framework

Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification


Title	Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification
Authors	Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, Shin’ichi Satoh
Abstract	Infrared-Visible person RE-IDentification (IV-REID) is a rising task. Compared to conventional person re-identification (re-ID), IV-REID concerns the additional modality discrepancy originated from the different imaging processes of spectrum cameras, in addition to the person’s appearance discrepancy caused by viewpoint changes, pose variations and deformations presented in the conventional re-ID task. The co-existed discrepancies make IV-REID more difficult to solve. Previous methods attempt to reduce the appearance and modality discrepancies simultaneously using feature-level constraints. It is however difficult to eliminate the mixed discrepancies using only feature-level constraints. To address the problem, this paper introduces a novel Dual-level Discrepancy Reduction Learning (D^2RL) scheme which handles the two discrepancies separately. For reducing the modality discrepancy, an image-level sub-network is trained to translate an infrared image into its visible counterpart and a visible image to its infrared version. With the image-level sub-network, we can unify the representations for images with different modalities. With the help of the unified multi-spectral images, a feature-level sub-network is trained to reduce the remaining appearance discrepancy through feature embedding. By cascading the two sub-networks and training them jointly, the dual-level reductions take their responsibilities cooperatively and attentively. Extensive experiments demonstrate the proposed approach outperforms the state-of-the-art methods.
Tasks	Person Re-Identification
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Wang_Learning_to_Reduce_Dual-Level_Discrepancy_for_Infrared-Visible_Person_Re-Identification_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Learning_to_Reduce_Dual-Level_Discrepancy_for_Infrared-Visible_Person_Re-Identification_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/learning-to-reduce-dual-level-discrepancy-for
Repo
Framework

Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models


Title	Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models
Authors	Huan Zhang, Hai Zhao
Abstract	Sequence to sequence (seq2seq) models have become a popular framework for neural sequence prediction. While traditional seq2seq models are trained by Maximum Likelihood Estimation (MLE), much recent work has made various attempts to optimize evaluation scores directly to solve the mismatch between training and evaluation, since model predictions are usually evaluated by a task specific evaluation metric like BLEU or ROUGE scores instead of perplexity. This paper for the first time puts this existing work into two categories, a) minimum divergence, and b) maximum margin. We introduce a new training criterion based on the analysis of existing work, and empirically compare models in the two categories. Our experimental results show that our new training criterion can usually work better than existing methods, on both the tasks of machine translation and sentence summarization.
Tasks	Machine Translation
Published	2019-05-01
URL	https://openreview.net/forum?id=H1xD9sR5Fm
PDF	https://openreview.net/pdf?id=H1xD9sR5Fm
PWC	https://paperswithcode.com/paper/minimum-divergence-vs-maximum-margin-an
Repo
Framework

Joint Representative Selection and Feature Learning: A Semi-Supervised Approach


Title	Joint Representative Selection and Feature Learning: A Semi-Supervised Approach
Authors	Suchen Wang, Jingjing Meng, Junsong Yuan, Yap-Peng Tan
Abstract	In this paper, we propose a semi-supervised approach for representative selection, which finds a small set of representatives that can well summarize a large data collection. Given labeled source data and big unlabeled target data, we aim to find representatives in the target data, which can not only represent and associate data points belonging to each labeled category, but also discover novel categories in the target data, if any. To leverage labeled source data, we guide representative selection from labeled source to unlabeled target. We propose a joint optimization framework which alternately optimizes (1) representative selection in the target data and (2) discriminative feature learning from both the source and the target for better representative selection. Experiments on image and video datasets demonstrate that our proposed approach not only finds better representatives, but also can discover novel categories in the target data that are not in the source.
Tasks
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Wang_Joint_Representative_Selection_and_Feature_Learning_A_Semi-Supervised_Approach_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Joint_Representative_Selection_and_Feature_Learning_A_Semi-Supervised_Approach_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/joint-representative-selection-and-feature
Repo
Framework

Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited


Title	Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited
Authors	Artur Kulmizev, Miryam de Lhoneux, Johannes Gontrum, Elena Fano, Joakim Nivre
Abstract	Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope. In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same. Moreover, we show that deep contextualized word embeddings, which allow parsers to pack information about global sentence structure into local feature representations, benefit transition-based parsers more than graph-based parsers, making the two approaches virtually equivalent in terms of both accuracy and error profile. We argue that the reason is that these representations help prevent search errors and thereby allow transition-based parsers to better exploit their inherent strength of making accurate local decisions. We support this explanation by an error analysis of parsing experiments on 13 languages.
Tasks	Dependency Parsing, Word Embeddings
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1277/
PDF	https://www.aclweb.org/anthology/D19-1277
PWC	https://paperswithcode.com/paper/deep-contextualized-word-embeddings-in-1
Repo
Framework