January 25, 2020

3057 words 15 mins read

Paper Group NAWR 12

Paper Group NAWR 12

Cross-Sentence Grammatical Error Correction. Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering. Consistency-based anomaly detection with adaptive multiple-hypotheses predictions. Multi-Relational Script Learning for Discourse Relations. Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness …

Cross-Sentence Grammatical Error Correction

Title Cross-Sentence Grammatical Error Correction
Authors Shamil Chollampatt, Weiqi Wang, Hwee Tou Ng
Abstract Automatic grammatical error correction (GEC) research has made remarkable progress in the past decade. However, all existing approaches to GEC correct errors by considering a single sentence alone and ignoring crucial cross-sentence context. Some errors can only be corrected reliably using cross-sentence context and models can also benefit from the additional contextual information in correcting other errors. In this paper, we address this serious limitation of existing approaches and improve strong neural encoder-decoder models by appropriately modeling wider contexts. We employ an auxiliary encoder that encodes previous sentences and incorporate the encoding in the decoder via attention and gating mechanisms. Our approach results in statistically significant improvements in overall GEC performance over strong baselines across multiple test sets. Analysis of our cross-sentence GEC model on a synthetic dataset shows high performance in verb tense corrections that require cross-sentence context.
Tasks Grammatical Error Correction
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1042/
PDF https://www.aclweb.org/anthology/P19-1042
PWC https://paperswithcode.com/paper/cross-sentence-grammatical-error-correction
Repo https://github.com/nusnlp/crosentgec
Framework pytorch

Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering

Title Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering
Authors Biwei Huang, Kun Zhang, Pengtao Xie, Mingming Gong, Eric P. Xing, Clark Glymour
Abstract State-of-the-art approaches to causal discovery usually assume a fixed underlying causal model. However, it is often the case that causal models vary across domains or subjects, due to possibly omitted factors that affect the quantitative causal effects. As a typical example, causal connectivity in the brain network has been reported to vary across individuals, with significant differences across groups of people, such as autistics and typical controls. In this paper, we develop a unified framework for causal discovery and mechanism-based group identification. In particular, we propose a specific and shared causal model (SSCM), which takes into account the variabilities of causal relations across individuals/groups and leverages their commonalities to achieve statistically reliable estimation. The learned SSCM gives the specific causal knowledge for each individual as well as the general trend over the population. In addition, the estimated model directly provides the group information of each individual. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed method.
Tasks Causal Discovery
Published 2019-12-01
URL http://papers.nips.cc/paper/9506-specific-and-shared-causal-relation-modeling-and-mechanism-based-clustering
PDF http://papers.nips.cc/paper/9506-specific-and-shared-causal-relation-modeling-and-mechanism-based-clustering.pdf
PWC https://paperswithcode.com/paper/specific-and-shared-causal-relation-modeling
Repo https://github.com/Biwei-Huang/Specific-and-Shared-Causal-Relation-Modeling-and-Mechanism-Based-Clustering
Framework none

Consistency-based anomaly detection with adaptive multiple-hypotheses predictions

Title Consistency-based anomaly detection with adaptive multiple-hypotheses predictions
Authors Duc Tam Nguyen, Zhongyu Lou, Michael Klar, Thomas Brox
Abstract In one-class-learning tasks, only the normal case can be modeled with data, whereas the variation of all possible anomalies is too large to be described sufficiently by samples. Thus, due to the lack of representative data, the wide-spread discriminative approaches cannot cover such learning tasks, and rather generative models, which attempt to learn the input density of the normal cases, are used. However, generative models suffer from a large input dimensionality (as in images) and are typically inefficient learners. We propose to learn the data distribution more efficiently with a multi-hypotheses autoencoder. Moreover, the model is criticized by a discriminator, which prevents artificial data modes not supported by data, and which enforces diversity across hypotheses. This consistency-based anomaly detection (ConAD) framework allows the reliable identification of outof- distribution samples. For anomaly detection on CIFAR-10, it yields up to 3.9% points improvement over previously reported results. On a real anomaly detection task, the approach reduces the error of the baseline models from 6.8% to 1.5%.
Tasks Anomaly Detection
Published 2019-05-01
URL https://openreview.net/forum?id=r1ledo0ctX
PDF https://openreview.net/pdf?id=r1ledo0ctX
PWC https://paperswithcode.com/paper/consistency-based-anomaly-detection-with
Repo https://github.com/YeongHyeon/ConAD
Framework tf

Multi-Relational Script Learning for Discourse Relations

Title Multi-Relational Script Learning for Discourse Relations
Authors I-Ta Lee, Dan Goldwasser
Abstract Modeling script knowledge can be useful for a wide range of NLP tasks. Current statistical script learning approaches embed the events, such that their relationships are indicated by their similarity in the embedding. While intuitive, these approaches fall short of representing nuanced relations, needed for downstream tasks. In this paper, we suggest to view learning event embedding as a multi-relational problem, which allows us to capture different aspects of event pairs. We model a rich set of event relations, such as Cause and Contrast, derived from the Penn Discourse Tree Bank. We evaluate our model on three types of tasks, the popular Mutli-Choice Narrative Cloze and its variants, several multi-relational prediction tasks, and a related downstream task{—}implicit discourse sense classification.
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1413/
PDF https://www.aclweb.org/anthology/P19-1413
PWC https://paperswithcode.com/paper/multi-relational-script-learning-for
Repo https://github.com/doug919/multi_relational_script_learning
Framework pytorch

Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks

Title Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks
Authors Gunjan Verma, Ananthram Swami
Abstract Modern machine learning systems are susceptible to adversarial examples; inputs which clearly preserve the characteristic semantics of a given class, but whose classification is (usually confidently) incorrect. Existing approaches to adversarial defense generally rely on modifying the input, e.g. quantization, or the learned model parameters, e.g. via adversarial training. However, recent research has shown that most such approaches succumb to adversarial examples when different norms or more sophisticated adaptive attacks are considered. In this paper, we propose a fundamentally different approach which instead changes the way the output is represented and decoded. This simple approach achieves state-of-the-art robustness to adversarial examples for L 2 and L ∞ based adversarial perturbations on MNIST and CIFAR10. In addition, even under strong white-box attacks, we find that our model often assigns adversarial examples a low probability; those with high probability are usually interpretable, i.e. perturbed towards the perceptual boundary between the original and adversarial class. Our approach has several advantages: it yields more meaningful probability estimates, is extremely fast during training and testing, requires essentially no architectural changes to existing discriminative learning pipelines, is wholly complementary to other defense approaches including adversarial training, and does not sacrifice benign test set performance
Tasks Adversarial Defense, Quantization
Published 2019-12-01
URL http://papers.nips.cc/paper/9070-error-correcting-output-codes-improve-probability-estimation-and-adversarial-robustness-of-deep-neural-networks
PDF http://papers.nips.cc/paper/9070-error-correcting-output-codes-improve-probability-estimation-and-adversarial-robustness-of-deep-neural-networks.pdf
PWC https://paperswithcode.com/paper/error-correcting-output-codes-improve
Repo https://github.com/Gunjan108/robust-ecoc
Framework tf

Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Title Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions
Authors Ashia C. Wilson, Lester Mackey, Andre Wibisono
Abstract We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions. We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth - a natural generalization of the standard smoothness assumption on the objective function. When the objective function is convex, we present two frameworks for “accelerating” descent methods, one in the style of Nesterov and the other in the style of Monteiro and Svaiter. Rescaled gradient descent can be accelerated under the same strong smoothness assumption using both frameworks. We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9508-accelerating-rescaled-gradient-descent-fast-optimization-of-smooth-functions
PDF http://papers.nips.cc/paper/9508-accelerating-rescaled-gradient-descent-fast-optimization-of-smooth-functions.pdf
PWC https://paperswithcode.com/paper/accelerating-rescaled-gradient-descent-fast
Repo https://github.com/aswilson07/ARGD
Framework none

Proactive Human-Machine Conversation with Explicit Conversation Goal

Title Proactive Human-Machine Conversation with Explicit Conversation Goal
Authors Wenquan Wu, Zhen Guo, Xiangyang Zhou, Hua Wu, Xiyuan Zhang, Rongzhong Lian, Haifeng Wang
Abstract Though great progress has been made for human-machine conversation, current dialogue system is still in its infancy: it usually converses passively and utters words more as a matter of response, rather than on its own initiatives. In this paper, we take a radical step towards building a human-like conversational agent: endowing it with the ability of proactively leading the conversation (introducing a new topic or maintaining the current topic). To facilitate the development of such conversation systems, we create a new dataset named Konv where one acts as a conversation leader and the other acts as the follower. The leader is provided with a knowledge graph and asked to sequentially change the discussion topics, following the given conversation goal, and meanwhile keep the dialogue as natural and engaging as possible. Konv enables a very challenging task as the model needs to both understand dialogue and plan over the given knowledge graph. We establish baseline results on this dataset (about 270K utterances and 30k dialogues) using several state-of-the-art models. Experimental results show that dialogue models that plan over the knowledge graph can make full use of related knowledge to generate more diverse multi-turn conversations. The baseline systems along with the dataset are publicly available.
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1369/
PDF https://www.aclweb.org/anthology/P19-1369
PWC https://paperswithcode.com/paper/proactive-human-machine-conversation-with-1
Repo https://github.com/PaddlePaddle/models
Framework none

Complex Word Identification as a Sequence Labelling Task

Title Complex Word Identification as a Sequence Labelling Task
Authors Sian Gooding, Ekaterina Kochmar
Abstract Complex Word Identification (CWI) is concerned with detection of words in need of simplification and is a crucial first step in a simplification pipeline. It has been shown that reliable CWI systems considerably improve text simplification. However, most CWI systems to date address the task on a word-by-word basis, not taking the context into account. In this paper, we present a novel approach to CWI based on sequence modelling. Our system is capable of performing CWI in context, does not require extensive feature engineering and outperforms state-of-the-art systems on this task.
Tasks Complex Word Identification, Feature Engineering, Text Simplification
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1109/
PDF https://www.aclweb.org/anthology/P19-1109
PWC https://paperswithcode.com/paper/complex-word-identification-as-a-sequence
Repo https://github.com/siangooding/cwi/tree/master/CWI%20Sequence%20Labeller
Framework none

A Simple Baseline for Audio-Visual Scene-Aware Dialog

Title A Simple Baseline for Audio-Visual Scene-Aware Dialog
Authors Idan Schwartz, Alexander G. Schwing, Tamir Hazan
Abstract The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. However, very little is known to date about how to effectively extract meaningful information from a plethora of sensors that pound the computational engine of those devices. Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end. Our method differentiates in a data-driven manner useful signals from distracting ones using an attention mechanism. We evaluate the proposed approach on the recently introduced and challenging audio-visual scene-aware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by more than 20% on CIDEr.
Tasks
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Schwartz_A_Simple_Baseline_for_Audio-Visual_Scene-Aware_Dialog_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Schwartz_A_Simple_Baseline_for_Audio-Visual_Scene-Aware_Dialog_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/a-simple-baseline-for-audio-visual-scene-1
Repo https://github.com/idansc/simple-avsd
Framework pytorch

DM2C: Deep Mixed-Modal Clustering

Title DM2C: Deep Mixed-Modal Clustering
Authors Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang
Abstract Data exhibited with multiple modalities are ubiquitous in real-world clustering tasks. Most existing methods, however, pose a strong assumption that the pairing information for modalities is available for all instances. In this paper, we consider a more challenging task where each instance is represented in only one modality, which we call mixed-modal data. Without any extra pairing supervision across modalities, it is difficult to find a universal semantic space for all of them. To tackle this problem, we present an adversarial learning framework for clustering with mixed-modal data. Instead of transforming all the samples into a joint modality-independent space, our framework learns the mappings across individual modal spaces by virtue of cycle-consistency. Through these mappings, we could easily unify all the samples into a single modal space and perform the clustering. Evaluations on several real-world mixed-modal datasets could demonstrate the superiority of our proposed framework.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/8823-dm2c-deep-mixed-modal-clustering
PDF http://papers.nips.cc/paper/8823-dm2c-deep-mixed-modal-clustering.pdf
PWC https://paperswithcode.com/paper/dm2c-deep-mixed-modal-clustering
Repo https://github.com/jiangyangby/DM2C
Framework pytorch

Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks

Title Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks
Authors N. Dinesh Reddy, Minh Vo, Srinivasa G. Narasimhan
Abstract We present Occlusion-Net, a framework to predict 2D and 3D locations of occluded keypoints for objects, in a largely self-supervised manner. We use an off-the-shelf detector as input (like MaskRCNN) that is trained only on visible key point annotations. This is the only supervision used in this work. A graph encoder network then explicitly classifies invisible edges and a graph decoder network corrects the occluded keypoint locations from the initial detector. Central to this work is a trifocal tensor loss that provides indirect self-supervision for occluded keypoint locations that are visible in other views of the object. The 2D keypoints are then passed into a 3D graph network that estimates the 3D shape and camera pose using the self-supervised re-projection loss. At test time, our approach successfully localizes keypoints in a single view under a diverse set of severe occlusion settings. We demonstrate and evaluate our approach on synthetic CAD data as well as a large image set capturing vehicles at many busy city intersections. As an interesting aside, we compare the accuracy of human labels of invisible keypoints against those obtained from geometric trifocal-tensor loss.
Tasks 3D Car Instance Understanding, 3D Object Reconstruction From A Single Image, Pose Estimation
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Reddy_Occlusion-Net_2D3D_Occluded_Keypoint_Localization_Using_Graph_Networks_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Reddy_Occlusion-Net_2D3D_Occluded_Keypoint_Localization_Using_Graph_Networks_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/occlusion-net-2d3d-occluded-keypoint
Repo https://github.com/dineshreddy91/Occlusion_Net
Framework pytorch

Learning Rich Features at High-Speed for Single-Shot Object Detection

Title Learning Rich Features at High-Speed for Single-Shot Object Detection
Authors Tiancai Wang, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao
Abstract Single-stage object detection methods have received significant attention recently due to their characteristic realtime capabilities and high detection accuracies. Generally, most existing single-stage detectors follow two common practices: they employ a network backbone that is pretrained on ImageNet for the classification task and use a top-down feature pyramid representation for handling scale variations. Contrary to common pre-training strategy, recent works have demonstrated the benefits of training from scratch to reduce the task gap between classification and localization, especially at high overlap thresholds. However, detection models trained from scratch require significantly longer training time compared to their typical finetuning based counterparts. We introduce a single-stage detection framework that combines the advantages of both fine-tuning pretrained models and training from scratch. Our framework constitutes a standard network that uses a pre-trained backbone and a parallel light-weight auxiliary network trained from scratch. Further, we argue that the commonly used top-down pyramid representation only focuses on passing high-level semantics from the top layers to bottom layers. We introduce a bi-directional network that efficiently circulates both low-/mid-level and high-level semantic information in the detection framework. Experiments are performed on MS COCO and UAVDT datasets. Compared to the baseline, our detector achieives an absolute gain of 7.4% and 4.2% in average precision (AP) on MS COCO and UAVDT datasets, respectively using VGG backbone. For a 300x300 input on the MS COCO test set, our detector with ResNet backbone surpasses existing single-stage detection methods for single-scale inference achieving 34.3 AP, while operating at an inference time of 19 milliseconds on a single Titan X GPU. Code is avail- able at https://github.com/vaesl/LRF-Net.
Tasks Object Detection
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Wang_Learning_Rich_Features_at_High-Speed_for_Single-Shot_Object_Detection_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Learning_Rich_Features_at_High-Speed_for_Single-Shot_Object_Detection_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/learning-rich-features-at-high-speed-for
Repo https://github.com/vaesl/LRF-Net
Framework pytorch

Non-local Attention Learning on Large Heterogeneous Information Networks

Title Non-local Attention Learning on Large Heterogeneous Information Networks
Authors Yuxin Xiao, Zecheng Zhang, Carl Yang, Chengxiang Zhai
Abstract Heterogeneous information network (HIN) summarizes rich structural information in real-world datasets and plays an important role in many big data applications. Recently, graph neural networks have been extended to the representation learning of HIN. One very recent advancement is the hierarchical attention mechanism which incorporates both nodewise and semantic-wise attention. However, since HIN is more likely to be densely connected given its diverse types of edges, repeatedly applying graph convolutional layers can make the node embeddings indistinguishable very quickly. In order to avoid oversmoothness, existing graph neural networks targeting HIN generally suffer from a shallow structure. Consequently, those approaches ignore information beyond the local neighborhood. This design flaw violates the concept of non-local learning, which emphasizes the importance of capturing long-range dependencies. To properly address this limitation, we propose a novel framework of non-local attention in heterogeneous information networks (NLAH). Our framework utilizes a non-local attention structure to complement the hierarchical attention mechanism. In this way, it leverages both local and non-local information simultaneously. Moreover, a weighted sampling schema is designed for NLAH to reduce the computation cost for largescale datasets. Extensive experiments on three different realworld heterogeneous information networks illustrate that our framework exhibits extraordinary scalability and outperforms state-of-the-art baselines with significant margins.
Tasks Heterogeneous Node Classification, Representation Learning
Published 2019-12-12
URL https://ieeexplore.ieee.org/document/9006463
PDF https://xiaoyuxin1002.github.io/docs/NLAH.pdf
PWC https://paperswithcode.com/paper/non-local-attention-learning-on-large
Repo https://github.com/xiaoyuxin1002/NLAH
Framework pytorch

Graph-Based Meaning Representations: Design and Processing

Title Graph-Based Meaning Representations: Design and Processing
Authors Alex Koller, er, Stephan Oepen, Weiwei Sun
Abstract This tutorial is on representing and processing sentence meaning in the form of labeled directed graphs. The tutorial will (a) briefly review relevant background in formal and linguistic semantics; (b) semi-formally define a unified abstract view on different flavors of semantic graphs and associated terminology; (c) survey common frameworks for graph-based meaning representation and available graph banks; and (d) offer a technical overview of a representative selection of different parsing approaches.
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-4002/
PDF https://www.aclweb.org/anthology/P19-4002
PWC https://paperswithcode.com/paper/graph-based-meaning-representations-design
Repo https://github.com/cfmrp/tutorial
Framework none

Input Similarity from the Neural Network Perspective

Title Input Similarity from the Neural Network Perspective
Authors Guillaume Charpiat, Nicolas Girard, Loris Felardos, Yuliya Tarabalka
Abstract Given a trained neural network, we aim at understanding how similar it considers any two samples. For this, we express a proper definition of similarity from the neural network perspective (i.e. we quantify how undissociable two inputs A and B are), by taking a machine learning viewpoint: how much a parameter variation designed to change the output for A would impact the output for B as well? We study the mathematical properties of this similarity measure, and show how to estimate sample density with it, in low complexity, enabling new types of statistical analysis for neural networks. We also propose to use it during training, to enforce that examples known to be similar should also be seen as similar by the network. We then study the self-denoising phenomenon encountered in regression tasks when training neural networks on datasets with noisy labels. We exhibit a multimodal image registration task where almost perfect accuracy is reached, far beyond label noise variance. Such an impressive self-denoising phenomenon can be explained as a noise averaging effect over the labels of similar examples. We analyze data by retrieving samples perceived as similar by the network, and are able to quantify the denoising effect without requiring true labels.
Tasks Denoising, Image Registration
Published 2019-12-01
URL http://papers.nips.cc/paper/8775-input-similarity-from-the-neural-network-perspective
PDF http://papers.nips.cc/paper/8775-input-similarity-from-the-neural-network-perspective.pdf
PWC https://paperswithcode.com/paper/input-similarity-from-the-neural-network
Repo https://github.com/Lydorn/netsimilarity
Framework pytorch
comments powered by Disqus