January 25, 2020

3057 words 15 mins read

Paper Group NAWR 12

Cross-Sentence Grammatical Error Correction. Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering. Consistency-based anomaly detection with adaptive multiple-hypotheses predictions. Multi-Relational Script Learning for Discourse Relations. Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness …

Cross-Sentence Grammatical Error Correction


Title	Cross-Sentence Grammatical Error Correction
Authors	Shamil Chollampatt, Weiqi Wang, Hwee Tou Ng
Abstract	Automatic grammatical error correction (GEC) research has made remarkable progress in the past decade. However, all existing approaches to GEC correct errors by considering a single sentence alone and ignoring crucial cross-sentence context. Some errors can only be corrected reliably using cross-sentence context and models can also benefit from the additional contextual information in correcting other errors. In this paper, we address this serious limitation of existing approaches and improve strong neural encoder-decoder models by appropriately modeling wider contexts. We employ an auxiliary encoder that encodes previous sentences and incorporate the encoding in the decoder via attention and gating mechanisms. Our approach results in statistically significant improvements in overall GEC performance over strong baselines across multiple test sets. Analysis of our cross-sentence GEC model on a synthetic dataset shows high performance in verb tense corrections that require cross-sentence context.
Tasks	Grammatical Error Correction
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1042/
PDF	https://www.aclweb.org/anthology/P19-1042
PWC	https://paperswithcode.com/paper/cross-sentence-grammatical-error-correction
Repo	https://github.com/nusnlp/crosentgec
Framework	pytorch

Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering


Title	Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering
Authors	Biwei Huang, Kun Zhang, Pengtao Xie, Mingming Gong, Eric P. Xing, Clark Glymour
Abstract	State-of-the-art approaches to causal discovery usually assume a fixed underlying causal model. However, it is often the case that causal models vary across domains or subjects, due to possibly omitted factors that affect the quantitative causal effects. As a typical example, causal connectivity in the brain network has been reported to vary across individuals, with significant differences across groups of people, such as autistics and typical controls. In this paper, we develop a unified framework for causal discovery and mechanism-based group identification. In particular, we propose a specific and shared causal model (SSCM), which takes into account the variabilities of causal relations across individuals/groups and leverages their commonalities to achieve statistically reliable estimation. The learned SSCM gives the specific causal knowledge for each individual as well as the general trend over the population. In addition, the estimated model directly provides the group information of each individual. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed method.
Tasks	Causal Discovery
Published	2019-12-01
URL	http://papers.nips.cc/paper/9506-specific-and-shared-causal-relation-modeling-and-mechanism-based-clustering
PDF	http://papers.nips.cc/paper/9506-specific-and-shared-causal-relation-modeling-and-mechanism-based-clustering.pdf
PWC	https://paperswithcode.com/paper/specific-and-shared-causal-relation-modeling
Repo	https://github.com/Biwei-Huang/Specific-and-Shared-Causal-Relation-Modeling-and-Mechanism-Based-Clustering
Framework	none

Consistency-based anomaly detection with adaptive multiple-hypotheses predictions


Title	Consistency-based anomaly detection with adaptive multiple-hypotheses predictions
Authors	Duc Tam Nguyen, Zhongyu Lou, Michael Klar, Thomas Brox
Abstract	In one-class-learning tasks, only the normal case can be modeled with data, whereas the variation of all possible anomalies is too large to be described sufficiently by samples. Thus, due to the lack of representative data, the wide-spread discriminative approaches cannot cover such learning tasks, and rather generative models, which attempt to learn the input density of the normal cases, are used. However, generative models suffer from a large input dimensionality (as in images) and are typically inefficient learners. We propose to learn the data distribution more efficiently with a multi-hypotheses autoencoder. Moreover, the model is criticized by a discriminator, which prevents artificial data modes not supported by data, and which enforces diversity across hypotheses. This consistency-based anomaly detection (ConAD) framework allows the reliable identification of outof- distribution samples. For anomaly detection on CIFAR-10, it yields up to 3.9% points improvement over previously reported results. On a real anomaly detection task, the approach reduces the error of the baseline models from 6.8% to 1.5%.
Tasks	Anomaly Detection
Published	2019-05-01
URL	https://openreview.net/forum?id=r1ledo0ctX
PDF	https://openreview.net/pdf?id=r1ledo0ctX
PWC	https://paperswithcode.com/paper/consistency-based-anomaly-detection-with
Repo	https://github.com/YeongHyeon/ConAD
Framework	tf

Multi-Relational Script Learning for Discourse Relations


Title	Multi-Relational Script Learning for Discourse Relations
Authors	I-Ta Lee, Dan Goldwasser
Abstract	Modeling script knowledge can be useful for a wide range of NLP tasks. Current statistical script learning approaches embed the events, such that their relationships are indicated by their similarity in the embedding. While intuitive, these approaches fall short of representing nuanced relations, needed for downstream tasks. In this paper, we suggest to view learning event embedding as a multi-relational problem, which allows us to capture different aspects of event pairs. We model a rich set of event relations, such as Cause and Contrast, derived from the Penn Discourse Tree Bank. We evaluate our model on three types of tasks, the popular Mutli-Choice Narrative Cloze and its variants, several multi-relational prediction tasks, and a related downstream task{—}implicit discourse sense classification.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1413/
PDF	https://www.aclweb.org/anthology/P19-1413
PWC	https://paperswithcode.com/paper/multi-relational-script-learning-for
Repo	https://github.com/doug919/multi_relational_script_learning
Framework	pytorch

Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks


Title	Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks
Authors	Gunjan Verma, Ananthram Swami
Abstract	Modern machine learning systems are susceptible to adversarial examples; inputs which clearly preserve the characteristic semantics of a given class, but whose classification is (usually confidently) incorrect. Existing approaches to adversarial defense generally rely on modifying the input, e.g. quantization, or the learned model parameters, e.g. via adversarial training. However, recent research has shown that most such approaches succumb to adversarial examples when different norms or more sophisticated adaptive attacks are considered. In this paper, we propose a fundamentally different approach which instead changes the way the output is represented and decoded. This simple approach achieves state-of-the-art robustness to adversarial examples for L 2 and L ∞ based adversarial perturbations on MNIST and CIFAR10. In addition, even under strong white-box attacks, we find that our model often assigns adversarial examples a low probability; those with high probability are usually interpretable, i.e. perturbed towards the perceptual boundary between the original and adversarial class. Our approach has several advantages: it yields more meaningful probability estimates, is extremely fast during training and testing, requires essentially no architectural changes to existing discriminative learning pipelines, is wholly complementary to other defense approaches including adversarial training, and does not sacrifice benign test set performance
Tasks	Adversarial Defense, Quantization
Published	2019-12-01
URL	http://papers.nips.cc/paper/9070-error-correcting-output-codes-improve-probability-estimation-and-adversarial-robustness-of-deep-neural-networks
PDF	http://papers.nips.cc/paper/9070-error-correcting-output-codes-improve-probability-estimation-and-adversarial-robustness-of-deep-neural-networks.pdf
PWC	https://paperswithcode.com/paper/error-correcting-output-codes-improve
Repo	https://github.com/Gunjan108/robust-ecoc
Framework	tf

Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions


Title	Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions
Authors	Ashia C. Wilson, Lester Mackey, Andre Wibisono
Abstract	We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions. We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth - a natural generalization of the standard smoothness assumption on the objective function. When the objective function is convex, we present two frameworks for “accelerating” descent methods, one in the style of Nesterov and the other in the style of Monteiro and Svaiter. Rescaled gradient descent can be accelerated under the same strong smoothness assumption using both frameworks. We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9508-accelerating-rescaled-gradient-descent-fast-optimization-of-smooth-functions
PDF	http://papers.nips.cc/paper/9508-accelerating-rescaled-gradient-descent-fast-optimization-of-smooth-functions.pdf
PWC	https://paperswithcode.com/paper/accelerating-rescaled-gradient-descent-fast
Repo	https://github.com/aswilson07/ARGD
Framework	none

Proactive Human-Machine Conversation with Explicit Conversation Goal


Title	Proactive Human-Machine Conversation with Explicit Conversation Goal
Authors	Wenquan Wu, Zhen Guo, Xiangyang Zhou, Hua Wu, Xiyuan Zhang, Rongzhong Lian, Haifeng Wang
Abstract	Though great progress has been made for human-machine conversation, current dialogue system is still in its infancy: it usually converses passively and utters words more as a matter of response, rather than on its own initiatives. In this paper, we take a radical step towards building a human-like conversational agent: endowing it with the ability of proactively leading the conversation (introducing a new topic or maintaining the current topic). To facilitate the development of such conversation systems, we create a new dataset named Konv where one acts as a conversation leader and the other acts as the follower. The leader is provided with a knowledge graph and asked to sequentially change the discussion topics, following the given conversation goal, and meanwhile keep the dialogue as natural and engaging as possible. Konv enables a very challenging task as the model needs to both understand dialogue and plan over the given knowledge graph. We establish baseline results on this dataset (about 270K utterances and 30k dialogues) using several state-of-the-art models. Experimental results show that dialogue models that plan over the knowledge graph can make full use of related knowledge to generate more diverse multi-turn conversations. The baseline systems along with the dataset are publicly available.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1369/
PDF	https://www.aclweb.org/anthology/P19-1369
PWC	https://paperswithcode.com/paper/proactive-human-machine-conversation-with-1
Repo	https://github.com/PaddlePaddle/models
Framework	none

Complex Word Identification as a Sequence Labelling Task


Title	Complex Word Identification as a Sequence Labelling Task
Authors	Sian Gooding, Ekaterina Kochmar
Abstract	Complex Word Identification (CWI) is concerned with detection of words in need of simplification and is a crucial first step in a simplification pipeline. It has been shown that reliable CWI systems considerably improve text simplification. However, most CWI systems to date address the task on a word-by-word basis, not taking the context into account. In this paper, we present a novel approach to CWI based on sequence modelling. Our system is capable of performing CWI in context, does not require extensive feature engineering and outperforms state-of-the-art systems on this task.
Tasks	Complex Word Identification, Feature Engineering, Text Simplification
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1109/
PDF	https://www.aclweb.org/anthology/P19-1109
PWC	https://paperswithcode.com/paper/complex-word-identification-as-a-sequence
Repo	https://github.com/siangooding/cwi/tree/master/CWI%20Sequence%20Labeller
Framework	none

A Simple Baseline for Audio-Visual Scene-Aware Dialog


Title	A Simple Baseline for Audio-Visual Scene-Aware Dialog
Authors	Idan Schwartz, Alexander G. Schwing, Tamir Hazan
Abstract	The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. However, very little is known to date about how to effectively extract meaningful information from a plethora of sensors that pound the computational engine of those devices. Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end. Our method differentiates in a data-driven manner useful signals from distracting ones using an attention mechanism. We evaluate the proposed approach on the recently introduced and challenging audio-visual scene-aware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by more than 20% on CIDEr.
Tasks
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Schwartz_A_Simple_Baseline_for_Audio-Visual_Scene-Aware_Dialog_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Schwartz_A_Simple_Baseline_for_Audio-Visual_Scene-Aware_Dialog_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/a-simple-baseline-for-audio-visual-scene-1
Repo	https://github.com/idansc/simple-avsd
Framework	pytorch


Title	DM2C: Deep Mixed-Modal Clustering
Authors	Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang
Abstract	Data exhibited with multiple modalities are ubiquitous in real-world clustering tasks. Most existing methods, however, pose a strong assumption that the pairing information for modalities is available for all instances. In this paper, we consider a more challenging task where each instance is represented in only one modality, which we call mixed-modal data. Without any extra pairing supervision across modalities, it is difficult to find a universal semantic space for all of them. To tackle this problem, we present an adversarial learning framework for clustering with mixed-modal data. Instead of transforming all the samples into a joint modality-independent space, our framework learns the mappings across individual modal spaces by virtue of cycle-consistency. Through these mappings, we could easily unify all the samples into a single modal space and perform the clustering. Evaluations on several real-world mixed-modal datasets could demonstrate the superiority of our proposed framework.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8823-dm2c-deep-mixed-modal-clustering
PDF	http://papers.nips.cc/paper/8823-dm2c-deep-mixed-modal-clustering.pdf
PWC	https://paperswithcode.com/paper/dm2c-deep-mixed-modal-clustering
Repo	https://github.com/jiangyangby/DM2C
Framework	pytorch

Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks


Title	Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks
Authors	N. Dinesh Reddy, Minh Vo, Srinivasa G. Narasimhan
Abstract	We present Occlusion-Net, a framework to predict 2D and 3D locations of occluded keypoints for objects, in a largely self-supervised manner. We use an off-the-shelf detector as input (like MaskRCNN) that is trained only on visible key point annotations. This is the only supervision used in this work. A graph encoder network then explicitly classifies invisible edges and a graph decoder network corrects the occluded keypoint locations from the initial detector. Central to this work is a trifocal tensor loss that provides indirect self-supervision for occluded keypoint locations that are visible in other views of the object. The 2D keypoints are then passed into a 3D graph network that estimates the 3D shape and camera pose using the self-supervised re-projection loss. At test time, our approach successfully localizes keypoints in a single view under a diverse set of severe occlusion settings. We demonstrate and evaluate our approach on synthetic CAD data as well as a large image set capturing vehicles at many busy city intersections. As an interesting aside, we compare the accuracy of human labels of invisible keypoints against those obtained from geometric trifocal-tensor loss.
Tasks	3D Car Instance Understanding, 3D Object Reconstruction From A Single Image, Pose Estimation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Reddy_Occlusion-Net_2D3D_Occluded_Keypoint_Localization_Using_Graph_Networks_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Reddy_Occlusion-Net_2D3D_Occluded_Keypoint_Localization_Using_Graph_Networks_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/occlusion-net-2d3d-occluded-keypoint
Repo	https://github.com/dineshreddy91/Occlusion_Net
Framework	pytorch

Learning Rich Features at High-Speed for Single-Shot Object Detection


Title	Learning Rich Features at High-Speed for Single-Shot Object Detection
Authors	Tiancai Wang, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao
Abstract	Single-stage object detection methods have received significant attention recently due to their characteristic realtime capabilities and high detection accuracies. Generally, most existing single-stage detectors follow two common practices: they employ a network backbone that is pretrained on ImageNet for the classification task and use a top-down feature pyramid representation for handling scale variations. Contrary to common pre-training strategy, recent works have demonstrated the benefits of training from scratch to reduce the task gap between classification and localization, especially at high overlap thresholds. However, detection models trained from scratch require significantly longer training time compared to their typical finetuning based counterparts. We introduce a single-stage detection framework that combines the advantages of both fine-tuning pretrained models and training from scratch. Our framework constitutes a standard network that uses a pre-trained backbone and a parallel light-weight auxiliary network trained from scratch. Further, we argue that the commonly used top-down pyramid representation only focuses on passing high-level semantics from the top layers to bottom layers. We introduce a bi-directional network that efficiently circulates both low-/mid-level and high-level semantic information in the detection framework. Experiments are performed on MS COCO and UAVDT datasets. Compared to the baseline, our detector achieives an absolute gain of 7.4% and 4.2% in average precision (AP) on MS COCO and UAVDT datasets, respectively using VGG backbone. For a 300x300 input on the MS COCO test set, our detector with ResNet backbone surpasses existing single-stage detection methods for single-scale inference achieving 34.3 AP, while operating at an inference time of 19 milliseconds on a single Titan X GPU. Code is avail- able at https://github.com/vaesl/LRF-Net.
Tasks	Object Detection
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Wang_Learning_Rich_Features_at_High-Speed_for_Single-Shot_Object_Detection_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Learning_Rich_Features_at_High-Speed_for_Single-Shot_Object_Detection_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/learning-rich-features-at-high-speed-for
Repo	https://github.com/vaesl/LRF-Net
Framework	pytorch

Non-local Attention Learning on Large Heterogeneous Information Networks


Title	Non-local Attention Learning on Large Heterogeneous Information Networks
Authors	Yuxin Xiao, Zecheng Zhang, Carl Yang, Chengxiang Zhai
Abstract	Heterogeneous information network (HIN) summarizes rich structural information in real-world datasets and plays an important role in many big data applications. Recently, graph neural networks have been extended to the representation learning of HIN. One very recent advancement is the hierarchical attention mechanism which incorporates both nodewise and semantic-wise attention. However, since HIN is more likely to be densely connected given its diverse types of edges, repeatedly applying graph convolutional layers can make the node embeddings indistinguishable very quickly. In order to avoid oversmoothness, existing graph neural networks targeting HIN generally suffer from a shallow structure. Consequently, those approaches ignore information beyond the local neighborhood. This design flaw violates the concept of non-local learning, which emphasizes the importance of capturing long-range dependencies. To properly address this limitation, we propose a novel framework of non-local attention in heterogeneous information networks (NLAH). Our framework utilizes a non-local attention structure to complement the hierarchical attention mechanism. In this way, it leverages both local and non-local information simultaneously. Moreover, a weighted sampling schema is designed for NLAH to reduce the computation cost for largescale datasets. Extensive experiments on three different realworld heterogeneous information networks illustrate that our framework exhibits extraordinary scalability and outperforms state-of-the-art baselines with significant margins.
Tasks	Heterogeneous Node Classification, Representation Learning
Published	2019-12-12
URL	https://ieeexplore.ieee.org/document/9006463
PDF	https://xiaoyuxin1002.github.io/docs/NLAH.pdf
PWC	https://paperswithcode.com/paper/non-local-attention-learning-on-large
Repo	https://github.com/xiaoyuxin1002/NLAH
Framework	pytorch

Graph-Based Meaning Representations: Design and Processing


Title	Graph-Based Meaning Representations: Design and Processing
Authors	Alex Koller, er, Stephan Oepen, Weiwei Sun
Abstract	This tutorial is on representing and processing sentence meaning in the form of labeled directed graphs. The tutorial will (a) briefly review relevant background in formal and linguistic semantics; (b) semi-formally define a unified abstract view on different flavors of semantic graphs and associated terminology; (c) survey common frameworks for graph-based meaning representation and available graph banks; and (d) offer a technical overview of a representative selection of different parsing approaches.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-4002/
PDF	https://www.aclweb.org/anthology/P19-4002
PWC	https://paperswithcode.com/paper/graph-based-meaning-representations-design
Repo	https://github.com/cfmrp/tutorial
Framework	none

Input Similarity from the Neural Network Perspective


Title	Input Similarity from the Neural Network Perspective
Authors	Guillaume Charpiat, Nicolas Girard, Loris Felardos, Yuliya Tarabalka
Abstract	Given a trained neural network, we aim at understanding how similar it considers any two samples. For this, we express a proper definition of similarity from the neural network perspective (i.e. we quantify how undissociable two inputs A and B are), by taking a machine learning viewpoint: how much a parameter variation designed to change the output for A would impact the output for B as well? We study the mathematical properties of this similarity measure, and show how to estimate sample density with it, in low complexity, enabling new types of statistical analysis for neural networks. We also propose to use it during training, to enforce that examples known to be similar should also be seen as similar by the network. We then study the self-denoising phenomenon encountered in regression tasks when training neural networks on datasets with noisy labels. We exhibit a multimodal image registration task where almost perfect accuracy is reached, far beyond label noise variance. Such an impressive self-denoising phenomenon can be explained as a noise averaging effect over the labels of similar examples. We analyze data by retrieving samples perceived as similar by the network, and are able to quantify the denoising effect without requiring true labels.
Tasks	Denoising, Image Registration
Published	2019-12-01
URL	http://papers.nips.cc/paper/8775-input-similarity-from-the-neural-network-perspective
PDF	http://papers.nips.cc/paper/8775-input-similarity-from-the-neural-network-perspective.pdf
PWC	https://paperswithcode.com/paper/input-similarity-from-the-neural-network
Repo	https://github.com/Lydorn/netsimilarity
Framework	pytorch