Paper Group NAWR 12
Cross-Sentence Grammatical Error Correction. Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering. Consistency-based anomaly detection with adaptive multiple-hypotheses predictions. Multi-Relational Script Learning for Discourse Relations. Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness …
Cross-Sentence Grammatical Error Correction
Title | Cross-Sentence Grammatical Error Correction |
Authors | Shamil Chollampatt, Weiqi Wang, Hwee Tou Ng |
Abstract | Automatic grammatical error correction (GEC) research has made remarkable progress in the past decade. However, all existing approaches to GEC correct errors by considering a single sentence alone and ignoring crucial cross-sentence context. Some errors can only be corrected reliably using cross-sentence context and models can also benefit from the additional contextual information in correcting other errors. In this paper, we address this serious limitation of existing approaches and improve strong neural encoder-decoder models by appropriately modeling wider contexts. We employ an auxiliary encoder that encodes previous sentences and incorporate the encoding in the decoder via attention and gating mechanisms. Our approach results in statistically significant improvements in overall GEC performance over strong baselines across multiple test sets. Analysis of our cross-sentence GEC model on a synthetic dataset shows high performance in verb tense corrections that require cross-sentence context. |
Tasks | Grammatical Error Correction |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1042/ |
https://www.aclweb.org/anthology/P19-1042 | |
PWC | https://paperswithcode.com/paper/cross-sentence-grammatical-error-correction |
Repo | https://github.com/nusnlp/crosentgec |
Framework | pytorch |
Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering
Title | Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering |
Authors | Biwei Huang, Kun Zhang, Pengtao Xie, Mingming Gong, Eric P. Xing, Clark Glymour |
Abstract | State-of-the-art approaches to causal discovery usually assume a fixed underlying causal model. However, it is often the case that causal models vary across domains or subjects, due to possibly omitted factors that affect the quantitative causal effects. As a typical example, causal connectivity in the brain network has been reported to vary across individuals, with significant differences across groups of people, such as autistics and typical controls. In this paper, we develop a unified framework for causal discovery and mechanism-based group identification. In particular, we propose a specific and shared causal model (SSCM), which takes into account the variabilities of causal relations across individuals/groups and leverages their commonalities to achieve statistically reliable estimation. The learned SSCM gives the specific causal knowledge for each individual as well as the general trend over the population. In addition, the estimated model directly provides the group information of each individual. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed method. |
Tasks | Causal Discovery |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9506-specific-and-shared-causal-relation-modeling-and-mechanism-based-clustering |
http://papers.nips.cc/paper/9506-specific-and-shared-causal-relation-modeling-and-mechanism-based-clustering.pdf | |
PWC | https://paperswithcode.com/paper/specific-and-shared-causal-relation-modeling |
Repo | https://github.com/Biwei-Huang/Specific-and-Shared-Causal-Relation-Modeling-and-Mechanism-Based-Clustering |
Framework | none |
Consistency-based anomaly detection with adaptive multiple-hypotheses predictions
Title | Consistency-based anomaly detection with adaptive multiple-hypotheses predictions |
Authors | Duc Tam Nguyen, Zhongyu Lou, Michael Klar, Thomas Brox |
Abstract | In one-class-learning tasks, only the normal case can be modeled with data, whereas the variation of all possible anomalies is too large to be described sufficiently by samples. Thus, due to the lack of representative data, the wide-spread discriminative approaches cannot cover such learning tasks, and rather generative models, which attempt to learn the input density of the normal cases, are used. However, generative models suffer from a large input dimensionality (as in images) and are typically inefficient learners. We propose to learn the data distribution more efficiently with a multi-hypotheses autoencoder. Moreover, the model is criticized by a discriminator, which prevents artificial data modes not supported by data, and which enforces diversity across hypotheses. This consistency-based anomaly detection (ConAD) framework allows the reliable identification of outof- distribution samples. For anomaly detection on CIFAR-10, it yields up to 3.9% points improvement over previously reported results. On a real anomaly detection task, the approach reduces the error of the baseline models from 6.8% to 1.5%. |
Tasks | Anomaly Detection |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=r1ledo0ctX |
https://openreview.net/pdf?id=r1ledo0ctX | |
PWC | https://paperswithcode.com/paper/consistency-based-anomaly-detection-with |
Repo | https://github.com/YeongHyeon/ConAD |
Framework | tf |
Multi-Relational Script Learning for Discourse Relations
Title | Multi-Relational Script Learning for Discourse Relations |
Authors | I-Ta Lee, Dan Goldwasser |
Abstract | Modeling script knowledge can be useful for a wide range of NLP tasks. Current statistical script learning approaches embed the events, such that their relationships are indicated by their similarity in the embedding. While intuitive, these approaches fall short of representing nuanced relations, needed for downstream tasks. In this paper, we suggest to view learning event embedding as a multi-relational problem, which allows us to capture different aspects of event pairs. We model a rich set of event relations, such as Cause and Contrast, derived from the Penn Discourse Tree Bank. We evaluate our model on three types of tasks, the popular Mutli-Choice Narrative Cloze and its variants, several multi-relational prediction tasks, and a related downstream task{—}implicit discourse sense classification. |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1413/ |
https://www.aclweb.org/anthology/P19-1413 | |
PWC | https://paperswithcode.com/paper/multi-relational-script-learning-for |
Repo | https://github.com/doug919/multi_relational_script_learning |
Framework | pytorch |
Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks
Title | Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks |
Authors | Gunjan Verma, Ananthram Swami |
Abstract | Modern machine learning systems are susceptible to adversarial examples; inputs which clearly preserve the characteristic semantics of a given class, but whose classification is (usually confidently) incorrect. Existing approaches to adversarial defense generally rely on modifying the input, e.g. quantization, or the learned model parameters, e.g. via adversarial training. However, recent research has shown that most such approaches succumb to adversarial examples when different norms or more sophisticated adaptive attacks are considered. In this paper, we propose a fundamentally different approach which instead changes the way the output is represented and decoded. This simple approach achieves state-of-the-art robustness to adversarial examples for L 2 and L ∞ based adversarial perturbations on MNIST and CIFAR10. In addition, even under strong white-box attacks, we find that our model often assigns adversarial examples a low probability; those with high probability are usually interpretable, i.e. perturbed towards the perceptual boundary between the original and adversarial class. Our approach has several advantages: it yields more meaningful probability estimates, is extremely fast during training and testing, requires essentially no architectural changes to existing discriminative learning pipelines, is wholly complementary to other defense approaches including adversarial training, and does not sacrifice benign test set performance |
Tasks | Adversarial Defense, Quantization |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9070-error-correcting-output-codes-improve-probability-estimation-and-adversarial-robustness-of-deep-neural-networks |
http://papers.nips.cc/paper/9070-error-correcting-output-codes-improve-probability-estimation-and-adversarial-robustness-of-deep-neural-networks.pdf | |
PWC | https://paperswithcode.com/paper/error-correcting-output-codes-improve |
Repo | https://github.com/Gunjan108/robust-ecoc |
Framework | tf |
Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions
Title | Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions |
Authors | Ashia C. Wilson, Lester Mackey, Andre Wibisono |
Abstract | We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions. We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth - a natural generalization of the standard smoothness assumption on the objective function. When the objective function is convex, we present two frameworks for “accelerating” descent methods, one in the style of Nesterov and the other in the style of Monteiro and Svaiter. Rescaled gradient descent can be accelerated under the same strong smoothness assumption using both frameworks. We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9508-accelerating-rescaled-gradient-descent-fast-optimization-of-smooth-functions |
http://papers.nips.cc/paper/9508-accelerating-rescaled-gradient-descent-fast-optimization-of-smooth-functions.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-rescaled-gradient-descent-fast |
Repo | https://github.com/aswilson07/ARGD |
Framework | none |
Proactive Human-Machine Conversation with Explicit Conversation Goal
Title | Proactive Human-Machine Conversation with Explicit Conversation Goal |
Authors | Wenquan Wu, Zhen Guo, Xiangyang Zhou, Hua Wu, Xiyuan Zhang, Rongzhong Lian, Haifeng Wang |
Abstract | Though great progress has been made for human-machine conversation, current dialogue system is still in its infancy: it usually converses passively and utters words more as a matter of response, rather than on its own initiatives. In this paper, we take a radical step towards building a human-like conversational agent: endowing it with the ability of proactively leading the conversation (introducing a new topic or maintaining the current topic). To facilitate the development of such conversation systems, we create a new dataset named Konv where one acts as a conversation leader and the other acts as the follower. The leader is provided with a knowledge graph and asked to sequentially change the discussion topics, following the given conversation goal, and meanwhile keep the dialogue as natural and engaging as possible. Konv enables a very challenging task as the model needs to both understand dialogue and plan over the given knowledge graph. We establish baseline results on this dataset (about 270K utterances and 30k dialogues) using several state-of-the-art models. Experimental results show that dialogue models that plan over the knowledge graph can make full use of related knowledge to generate more diverse multi-turn conversations. The baseline systems along with the dataset are publicly available. |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1369/ |
https://www.aclweb.org/anthology/P19-1369 | |
PWC | https://paperswithcode.com/paper/proactive-human-machine-conversation-with-1 |
Repo | https://github.com/PaddlePaddle/models |
Framework | none |
Complex Word Identification as a Sequence Labelling Task
Title | Complex Word Identification as a Sequence Labelling Task |
Authors | Sian Gooding, Ekaterina Kochmar |
Abstract | Complex Word Identification (CWI) is concerned with detection of words in need of simplification and is a crucial first step in a simplification pipeline. It has been shown that reliable CWI systems considerably improve text simplification. However, most CWI systems to date address the task on a word-by-word basis, not taking the context into account. In this paper, we present a novel approach to CWI based on sequence modelling. Our system is capable of performing CWI in context, does not require extensive feature engineering and outperforms state-of-the-art systems on this task. |
Tasks | Complex Word Identification, Feature Engineering, Text Simplification |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1109/ |
https://www.aclweb.org/anthology/P19-1109 | |
PWC | https://paperswithcode.com/paper/complex-word-identification-as-a-sequence |
Repo | https://github.com/siangooding/cwi/tree/master/CWI%20Sequence%20Labeller |
Framework | none |
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Title | A Simple Baseline for Audio-Visual Scene-Aware Dialog |
Authors | Idan Schwartz, Alexander G. Schwing, Tamir Hazan |
Abstract | The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. However, very little is known to date about how to effectively extract meaningful information from a plethora of sensors that pound the computational engine of those devices. Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end. Our method differentiates in a data-driven manner useful signals from distracting ones using an attention mechanism. We evaluate the proposed approach on the recently introduced and challenging audio-visual scene-aware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by more than 20% on CIDEr. |
Tasks | |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Schwartz_A_Simple_Baseline_for_Audio-Visual_Scene-Aware_Dialog_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Schwartz_A_Simple_Baseline_for_Audio-Visual_Scene-Aware_Dialog_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-baseline-for-audio-visual-scene-1 |
Repo | https://github.com/idansc/simple-avsd |
Framework | pytorch |
DM2C: Deep Mixed-Modal Clustering
Title | DM2C: Deep Mixed-Modal Clustering |
Authors | Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang |
Abstract | Data exhibited with multiple modalities are ubiquitous in real-world clustering tasks. Most existing methods, however, pose a strong assumption that the pairing information for modalities is available for all instances. In this paper, we consider a more challenging task where each instance is represented in only one modality, which we call mixed-modal data. Without any extra pairing supervision across modalities, it is difficult to find a universal semantic space for all of them. To tackle this problem, we present an adversarial learning framework for clustering with mixed-modal data. Instead of transforming all the samples into a joint modality-independent space, our framework learns the mappings across individual modal spaces by virtue of cycle-consistency. Through these mappings, we could easily unify all the samples into a single modal space and perform the clustering. Evaluations on several real-world mixed-modal datasets could demonstrate the superiority of our proposed framework. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8823-dm2c-deep-mixed-modal-clustering |
http://papers.nips.cc/paper/8823-dm2c-deep-mixed-modal-clustering.pdf | |
PWC | https://paperswithcode.com/paper/dm2c-deep-mixed-modal-clustering |
Repo | https://github.com/jiangyangby/DM2C |
Framework | pytorch |
Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks
Title | Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks |
Authors | N. Dinesh Reddy, Minh Vo, Srinivasa G. Narasimhan |
Abstract | We present Occlusion-Net, a framework to predict 2D and 3D locations of occluded keypoints for objects, in a largely self-supervised manner. We use an off-the-shelf detector as input (like MaskRCNN) that is trained only on visible key point annotations. This is the only supervision used in this work. A graph encoder network then explicitly classifies invisible edges and a graph decoder network corrects the occluded keypoint locations from the initial detector. Central to this work is a trifocal tensor loss that provides indirect self-supervision for occluded keypoint locations that are visible in other views of the object. The 2D keypoints are then passed into a 3D graph network that estimates the 3D shape and camera pose using the self-supervised re-projection loss. At test time, our approach successfully localizes keypoints in a single view under a diverse set of severe occlusion settings. We demonstrate and evaluate our approach on synthetic CAD data as well as a large image set capturing vehicles at many busy city intersections. As an interesting aside, we compare the accuracy of human labels of invisible keypoints against those obtained from geometric trifocal-tensor loss. |
Tasks | 3D Car Instance Understanding, 3D Object Reconstruction From A Single Image, Pose Estimation |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Reddy_Occlusion-Net_2D3D_Occluded_Keypoint_Localization_Using_Graph_Networks_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Reddy_Occlusion-Net_2D3D_Occluded_Keypoint_Localization_Using_Graph_Networks_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/occlusion-net-2d3d-occluded-keypoint |
Repo | https://github.com/dineshreddy91/Occlusion_Net |
Framework | pytorch |
Learning Rich Features at High-Speed for Single-Shot Object Detection
Title | Learning Rich Features at High-Speed for Single-Shot Object Detection |
Authors | Tiancai Wang, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao |
Abstract | Single-stage object detection methods have received significant attention recently due to their characteristic realtime capabilities and high detection accuracies. Generally, most existing single-stage detectors follow two common practices: they employ a network backbone that is pretrained on ImageNet for the classification task and use a top-down feature pyramid representation for handling scale variations. Contrary to common pre-training strategy, recent works have demonstrated the benefits of training from scratch to reduce the task gap between classification and localization, especially at high overlap thresholds. However, detection models trained from scratch require significantly longer training time compared to their typical finetuning based counterparts. We introduce a single-stage detection framework that combines the advantages of both fine-tuning pretrained models and training from scratch. Our framework constitutes a standard network that uses a pre-trained backbone and a parallel light-weight auxiliary network trained from scratch. Further, we argue that the commonly used top-down pyramid representation only focuses on passing high-level semantics from the top layers to bottom layers. We introduce a bi-directional network that efficiently circulates both low-/mid-level and high-level semantic information in the detection framework. Experiments are performed on MS COCO and UAVDT datasets. Compared to the baseline, our detector achieives an absolute gain of 7.4% and 4.2% in average precision (AP) on MS COCO and UAVDT datasets, respectively using VGG backbone. For a 300x300 input on the MS COCO test set, our detector with ResNet backbone surpasses existing single-stage detection methods for single-scale inference achieving 34.3 AP, while operating at an inference time of 19 milliseconds on a single Titan X GPU. Code is avail- able at https://github.com/vaesl/LRF-Net. |
Tasks | Object Detection |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Wang_Learning_Rich_Features_at_High-Speed_for_Single-Shot_Object_Detection_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Learning_Rich_Features_at_High-Speed_for_Single-Shot_Object_Detection_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/learning-rich-features-at-high-speed-for |
Repo | https://github.com/vaesl/LRF-Net |
Framework | pytorch |
Non-local Attention Learning on Large Heterogeneous Information Networks
Title | Non-local Attention Learning on Large Heterogeneous Information Networks |
Authors | Yuxin Xiao, Zecheng Zhang, Carl Yang, Chengxiang Zhai |
Abstract | Heterogeneous information network (HIN) summarizes rich structural information in real-world datasets and plays an important role in many big data applications. Recently, graph neural networks have been extended to the representation learning of HIN. One very recent advancement is the hierarchical attention mechanism which incorporates both nodewise and semantic-wise attention. However, since HIN is more likely to be densely connected given its diverse types of edges, repeatedly applying graph convolutional layers can make the node embeddings indistinguishable very quickly. In order to avoid oversmoothness, existing graph neural networks targeting HIN generally suffer from a shallow structure. Consequently, those approaches ignore information beyond the local neighborhood. This design flaw violates the concept of non-local learning, which emphasizes the importance of capturing long-range dependencies. To properly address this limitation, we propose a novel framework of non-local attention in heterogeneous information networks (NLAH). Our framework utilizes a non-local attention structure to complement the hierarchical attention mechanism. In this way, it leverages both local and non-local information simultaneously. Moreover, a weighted sampling schema is designed for NLAH to reduce the computation cost for largescale datasets. Extensive experiments on three different realworld heterogeneous information networks illustrate that our framework exhibits extraordinary scalability and outperforms state-of-the-art baselines with significant margins. |
Tasks | Heterogeneous Node Classification, Representation Learning |
Published | 2019-12-12 |
URL | https://ieeexplore.ieee.org/document/9006463 |
https://xiaoyuxin1002.github.io/docs/NLAH.pdf | |
PWC | https://paperswithcode.com/paper/non-local-attention-learning-on-large |
Repo | https://github.com/xiaoyuxin1002/NLAH |
Framework | pytorch |
Graph-Based Meaning Representations: Design and Processing
Title | Graph-Based Meaning Representations: Design and Processing |
Authors | Alex Koller, er, Stephan Oepen, Weiwei Sun |
Abstract | This tutorial is on representing and processing sentence meaning in the form of labeled directed graphs. The tutorial will (a) briefly review relevant background in formal and linguistic semantics; (b) semi-formally define a unified abstract view on different flavors of semantic graphs and associated terminology; (c) survey common frameworks for graph-based meaning representation and available graph banks; and (d) offer a technical overview of a representative selection of different parsing approaches. |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-4002/ |
https://www.aclweb.org/anthology/P19-4002 | |
PWC | https://paperswithcode.com/paper/graph-based-meaning-representations-design |
Repo | https://github.com/cfmrp/tutorial |
Framework | none |
Input Similarity from the Neural Network Perspective
Title | Input Similarity from the Neural Network Perspective |
Authors | Guillaume Charpiat, Nicolas Girard, Loris Felardos, Yuliya Tarabalka |
Abstract | Given a trained neural network, we aim at understanding how similar it considers any two samples. For this, we express a proper definition of similarity from the neural network perspective (i.e. we quantify how undissociable two inputs A and B are), by taking a machine learning viewpoint: how much a parameter variation designed to change the output for A would impact the output for B as well? We study the mathematical properties of this similarity measure, and show how to estimate sample density with it, in low complexity, enabling new types of statistical analysis for neural networks. We also propose to use it during training, to enforce that examples known to be similar should also be seen as similar by the network. We then study the self-denoising phenomenon encountered in regression tasks when training neural networks on datasets with noisy labels. We exhibit a multimodal image registration task where almost perfect accuracy is reached, far beyond label noise variance. Such an impressive self-denoising phenomenon can be explained as a noise averaging effect over the labels of similar examples. We analyze data by retrieving samples perceived as similar by the network, and are able to quantify the denoising effect without requiring true labels. |
Tasks | Denoising, Image Registration |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8775-input-similarity-from-the-neural-network-perspective |
http://papers.nips.cc/paper/8775-input-similarity-from-the-neural-network-perspective.pdf | |
PWC | https://paperswithcode.com/paper/input-similarity-from-the-neural-network |
Repo | https://github.com/Lydorn/netsimilarity |
Framework | pytorch |