Paper Group NANR 173
Provable Non-linear Inductive Matrix Completion. Top-Down Structurally-Constrained Neural Response Generation with Lexicalized Probabilistic Context-Free Grammar. Overcoming catastrophic forgetting through weight consolidation and long-term memory. Steve Martin at SemEval-2019 Task 4: Ensemble Learning Model for Detecting Hyperpartisan News. Using …
Provable Non-linear Inductive Matrix Completion
Title | Provable Non-linear Inductive Matrix Completion |
Authors | Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon |
Abstract | Consider a standard recommendation/retrieval problem where given a query, the goal is to retrieve the most relevant items. Inductive matrix completion (IMC) method is a standard approach for this problem where the given query as well as the items are embedded in a common low-dimensional space. The inner product between a query embedding and an item embedding reflects relevance of the (query, item) pair. Non-linear IMC (NIMC) uses non-linear networks to embed the query as well as items, and is known to be highly effective for a variety of tasks, such as video recommendations for users, semantic web search, etc. Despite its wide usage, existing literature lacks rigorous understanding of NIMC models. A key challenge in analyzing such models is to deal with the non-convexity arising out of non-linear embeddings in addition to the non-convexity arising out of the low-dimensional restriction of the embedding space, which is akin to the low-rank restriction in the standard matrix completion problem. In this paper, we provide the first theoretical analysis for a simple NIMC model in the realizable setting, where the relevance score of a (query, item) pair is formulated as the inner product between their single-layer neural representations. Our results show that under mild assumptions we can recover the ground truth parameters of the NIMC model using standard (stochastic) gradient descent methods if the methods are initialized within a small distance to the optimal parameters. We show that a standard tensor method can be used to initialize the solution within the required distance to the optimal parameters. Furthermore, we show that the number of query-item relevance observations required, a key parameter in learning such models, scales nearly linearly with the input dimensionality thus matching existing results for the standard linear inductive matrix completion. |
Tasks | Matrix Completion |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9320-provable-non-linear-inductive-matrix-completion |
http://papers.nips.cc/paper/9320-provable-non-linear-inductive-matrix-completion.pdf | |
PWC | https://paperswithcode.com/paper/provable-non-linear-inductive-matrix |
Repo | |
Framework | |
Top-Down Structurally-Constrained Neural Response Generation with Lexicalized Probabilistic Context-Free Grammar
Title | Top-Down Structurally-Constrained Neural Response Generation with Lexicalized Probabilistic Context-Free Grammar |
Authors | Wenchao Du, Alan W Black |
Abstract | We consider neural language generation under a novel problem setting: generating the words of a sentence according to the order of their first appearance in its lexicalized PCFG parse tree, in a depth-first, left-to-right manner. Unlike previous tree-based language generation methods, our approach is both (i) top-down and (ii) explicitly generating syntactic structure at the same time. In addition, our method combines neural model with symbolic approach: word choice at each step is constrained by its predicted syntactic function. We applied our model to the task of dialog response generation, and found it significantly improves over sequence-to-sequence baseline, in terms of diversity and relevance. We also investigated the effect of lexicalization on language generation, and found that lexicalization schemes that give priority to content words have certain advantages over those focusing on dependency relations. |
Tasks | Text Generation |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1377/ |
https://www.aclweb.org/anthology/N19-1377 | |
PWC | https://paperswithcode.com/paper/top-down-structurally-constrained-neural |
Repo | |
Framework | |
Overcoming catastrophic forgetting through weight consolidation and long-term memory
Title | Overcoming catastrophic forgetting through weight consolidation and long-term memory |
Authors | Shixian Wen, Laurent Itti |
Abstract | Sequential learning of multiple tasks in artificial neural networks using gradient descent leads to catastrophic forgetting, whereby previously learned knowledge is erased during learning of new, disjoint knowledge. Here, we propose a new approach to sequential learning which leverages the recent discovery of adversarial examples. We use adversarial subspaces from previous tasks to enable learning of new tasks with less interference. We apply our method to sequentially learning to classify digits 0, 1, 2 (task 1), 4, 5, 6, (task 2), and 7, 8, 9 (task 3) in MNIST (disjoint MNIST task). We compare and combine our Adversarial Direction (AD) method with the recently proposed Elastic Weight Consolidation (EWC) method for sequential learning. We train each task for 20 epochs, which yields good initial performance (99.24% correct task 1 performance). After training task 2, and then task 3, both plain gradient descent (PGD) and EWC largely forget task 1 (task 1 accuracy 32.95% for PGD and 41.02% for EWC), while our combined approach (AD+EWC) still achieves 94.53% correct on task 1. We obtain similar results with a much more difficult disjoint CIFAR10 task (70.10% initial task 1 performance, 67.73% after learning tasks 2 and 3 for AD+EWC, while PGD and EWC both fall to chance level). We confirm qualitatively similar results for EMNIST with 5 tasks and under 3 variants of our approach. Our results suggest that AD+EWC can provide better sequential learning performance than either PGD or EWC. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=BJlSHsAcK7 |
https://openreview.net/pdf?id=BJlSHsAcK7 | |
PWC | https://paperswithcode.com/paper/overcoming-catastrophic-forgetting-through |
Repo | |
Framework | |
Steve Martin at SemEval-2019 Task 4: Ensemble Learning Model for Detecting Hyperpartisan News
Title | Steve Martin at SemEval-2019 Task 4: Ensemble Learning Model for Detecting Hyperpartisan News |
Authors | Youngjun Joo, Inchon Hwang |
Abstract | This paper describes our submission to task 4 in SemEval 2019, i.e., hyperpartisan news detection. Our model aims at detecting hyperpartisan news by incorporating the style-based features and the content-based features. We extract a broad number of feature sets and use as our learning algorithms the GBDT and the n-gram CNN model. Finally, we apply the weighted average for effective learning between the two models. Our model achieves an accuracy of 0.745 on the test set in subtask A. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2171/ |
https://www.aclweb.org/anthology/S19-2171 | |
PWC | https://paperswithcode.com/paper/steve-martin-at-semeval-2019-task-4-ensemble |
Repo | |
Framework | |
Using Contextual Representations for Suicide Risk Assessment from Internet Forums
Title | Using Contextual Representations for Suicide Risk Assessment from Internet Forums |
Authors | Ashwin Karthik Ambalavanan, Pranjali Dileep Jagtap, Soumya Adhya, Murthy Devarakonda |
Abstract | Social media posts may yield clues to the subject{'}s (usually, the writer{'}s) suicide risk and intent, which can be used for timely intervention. This research, motivated by the CLPsych 2019 shared task, developed neural network-based methods for analyzing posts in one or more Reddit forums to assess the subject{'}s suicide risk. One of the technical challenges this task poses is the large amount of text from multiple posts of a single user. Our neural network models use the advanced multi-headed Attention-based autoencoder architecture, called Bidirectional Encoder Representations from Transformers (BERT). Our system achieved the 2nd best performance of 0.477 macro averaged F measure on Task A of the challenge. Among the three different alternatives we developed for the challenge, the single BERT model that processed all of a user{'}s posts performed the best on all three Tasks. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/W19-3022/ |
https://www.aclweb.org/anthology/W19-3022 | |
PWC | https://paperswithcode.com/paper/using-contextual-representations-for-suicide |
Repo | |
Framework | |
On Knowledge distillation from complex networks for response prediction
Title | On Knowledge distillation from complex networks for response prediction |
Authors | Siddhartha Arora, Mitesh M. Khapra, Harish G. Ramaswamy |
Abstract | Recent advances in Question Answering have lead to the development of very complex models which compute rich representations for query and documents by capturing all pairwise interactions between query and document words. This makes these models expensive in space and time, and in practice one has to restrict the length of the documents that can be fed to these models. Such models have also been recently employed for the task of predicting dialog responses from available background documents (e.g., Holl-E dataset). However, here the documents are longer, thereby rendering these complex models infeasible except in select restricted settings. In order to overcome this, we use standard simple models which do not capture all pairwise interactions, but learn to emulate certain characteristics of a complex teacher network. Specifically, we first investigate the conicity of representations learned by a complex model and observe that it is significantly lower than that of simpler models. Based on this insight, we modify the simple architecture to mimic this characteristic. We go further by using knowledge distillation approaches, where the simple model acts as a student and learns to match the output from the complex teacher network. We experiment with the Holl-E dialog data set and show that by mimicking characteristics and matching outputs from a teacher, even a simple network can give improved performance. |
Tasks | Question Answering |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1382/ |
https://www.aclweb.org/anthology/N19-1382 | |
PWC | https://paperswithcode.com/paper/on-knowledge-distillation-from-complex |
Repo | |
Framework | |
A Personalized Data-to-Text Support Tool for Cancer Patients
Title | A Personalized Data-to-Text Support Tool for Cancer Patients |
Authors | Saar Hommes, Chris van der Lee, Felix Clouth, Jeroen Vermunt, X Verbeek, er, Emiel Krahmer |
Abstract | In this paper, we present a novel data-to-text system for cancer patients, providing information on quality of life implications after treatment, which can be embedded in the context of shared decision making. Currently, information on quality of life implications is often not discussed, partly because (until recently) data has been lacking. In our work, we rely on a newly developed prediction model, which assigns patients to scenarios. Furthermore, we use data-to-text techniques to explain these scenario-based predictions in personalized and understandable language. We highlight the possibilities of NLG for personalization, discuss ethical implications and also present the outcomes of a first evaluation with clinicians. |
Tasks | Decision Making |
Published | 2019-10-01 |
URL | https://www.aclweb.org/anthology/W19-8656/ |
https://www.aclweb.org/anthology/W19-8656 | |
PWC | https://paperswithcode.com/paper/a-personalized-data-to-text-support-tool-for |
Repo | |
Framework | |
Probabilistic Federated Neural Matching
Title | Probabilistic Federated Neural Matching |
Authors | Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang, Yasaman Khazaeni |
Abstract | In federated learning problems, data is scattered across different servers and exchanging or pooling it is often impractical or prohibited. We develop a Bayesian nonparametric framework for federated learning with neural networks. Each data server is assumed to train local neural network weights, which are modeled through our framework. We then develop an inference approach that allows us to synthesize a more expressive global network without additional supervision or data pooling. We then demonstrate the efficacy of our approach on federated learning problems simulated from two popular image classification datasets. |
Tasks | Image Classification |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SygHGnRqK7 |
https://openreview.net/pdf?id=SygHGnRqK7 | |
PWC | https://paperswithcode.com/paper/probabilistic-federated-neural-matching |
Repo | |
Framework | |
Led3D: A Lightweight and Efficient Deep Approach to Recognizing Low-Quality 3D Faces
Title | Led3D: A Lightweight and Efficient Deep Approach to Recognizing Low-Quality 3D Faces |
Authors | Guodong Mu, Di Huang, Guosheng Hu, Jia Sun, Yunhong Wang |
Abstract | Due to the intrinsic invariance to pose and illumination changes, 3D Face Recognition (FR) has a promising potential in the real world. 3D FR using high-quality faces, which are of high resolutions and with smooth surfaces, have been widely studied. However, research on that with low-quality input is limited, although it involves more applications. In this paper, we focus on 3D FR using low-quality data, targeting an efficient and accurate deep learning solution. To achieve this, we work on two aspects: (1) designing a lightweight yet powerful CNN; (2) generating finer and bigger training data. For (1), we propose a Multi-Scale Feature Fusion (MSFF) module and a Spatial Attention Vectorization (SAV) module to build a compact and discriminative CNN. For (2), we propose a data processing system including point-cloud recovery, surface refinement, and data augmentation (with newly proposed shape jittering and shape scaling). We conduct extensive experiments on Lock3DFace and achieve state-of-the-art results, outperforming many heavy CNNs such as VGG-16 and ResNet-34. In addition, our model can operate at a very high speed (136 fps) on Jetson TX2, and the promising accuracy and efficiency reached show its great applicability on edge/mobile devices. |
Tasks | Data Augmentation, Face Recognition |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Mu_Led3D_A_Lightweight_and_Efficient_Deep_Approach_to_Recognizing_Low-Quality_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Mu_Led3D_A_Lightweight_and_Efficient_Deep_Approach_to_Recognizing_Low-Quality_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/led3d-a-lightweight-and-efficient-deep |
Repo | |
Framework | |
The Design of the SauLTC application for the English-Arabic Learner Translation Corpus
Title | The Design of the SauLTC application for the English-Arabic Learner Translation Corpus |
Authors | Maha Al-Harthi, Amal Alsaif |
Abstract | |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/W19-5610/ |
https://www.aclweb.org/anthology/W19-5610 | |
PWC | https://paperswithcode.com/paper/the-design-of-the-saultc-application-for-the |
Repo | |
Framework | |
Identification of Conditional Causal Effects under Markov Equivalence
Title | Identification of Conditional Causal Effects under Markov Equivalence |
Authors | Amin Jaber, Jiji Zhang, Elias Bareinboim |
Abstract | Causal identification is the problem of deciding whether a post-interventional distribution is computable from a combination of qualitative knowledge about the data-generating process, which is encoded in a causal diagram, and an observational distribution. A generalization of this problem restricts the qualitative knowledge to a class of Markov equivalent causal diagrams, which, unlike a single, fully-specified causal diagram, can be inferred from the observational distribution. Recent work by (Jaber et al., 2019a) devised a complete algorithm for the identification of unconditional causal effects given a Markov equivalence class of causal diagrams. However, there are identifiable conditional causal effects that cannot be handled by that algorithm. In this work, we derive an algorithm to identify conditional effects, which are particularly useful for evaluating conditional plans or policies. |
Tasks | Causal Identification |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9327-identification-of-conditional-causal-effects-under-markov-equivalence |
http://papers.nips.cc/paper/9327-identification-of-conditional-causal-effects-under-markov-equivalence.pdf | |
PWC | https://paperswithcode.com/paper/identification-of-conditional-causal-effects |
Repo | |
Framework | |
Effective and Efficient Batch Normalization Using Few Uncorrelated Data for Statistics’ Estimation
Title | Effective and Efficient Batch Normalization Using Few Uncorrelated Data for Statistics’ Estimation |
Authors | Zhaodong Chen, Lei Deng, Guoqi Li, Jiawei Sun, Xing Hu, Ling Liang, YufeiDing, Yuan Xie |
Abstract | Deep Neural Networks (DNNs) thrive in recent years in which Batch Normalization (BN) plays an indispensable role. However, it has been observed that BN is costly due to the reduction operations. In this paper, we propose alleviating the BN’s cost by using only a small fraction of data for mean & variance estimation at each iteration. The key challenge to reach this goal is how to achieve a satisfactory balance between normalization effectiveness and execution efficiency. We identify that the effectiveness expects less data correlation while the efficiency expects regular execution pattern. To this end, we propose two categories of approach: sampling or creating few uncorrelated data for statistics’ estimation with certain strategy constraints. The former includes “Batch Sampling (BS)” that randomly selects few samples from each batch and “Feature Sampling (FS)” that randomly selects a small patch from each feature map of all samples, and the latter is “Virtual Dataset Normalization (VDN)” that generates few synthetic random samples. Accordingly, multi-way strategies are designed to reduce the data correlation for accurate estimation and optimize the execution pattern for running acceleration in the meantime. All the proposed methods are comprehensively evaluated on various DNN models, where an overall training speedup by up to 21.7% on modern GPUs can be practically achieved without the support of any specialized libraries, and the loss of model accuracy and convergence rate are negligible. Furthermore, our methods demonstrate powerful performance when solving the well-known “micro-batch normalization” problem in the case of tiny batch size. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=HkGsHj05tQ |
https://openreview.net/pdf?id=HkGsHj05tQ | |
PWC | https://paperswithcode.com/paper/effective-and-efficient-batch-normalization |
Repo | |
Framework | |
Understanding the Behaviour of Neural Abstractive Summarizers using Contrastive Examples
Title | Understanding the Behaviour of Neural Abstractive Summarizers using Contrastive Examples |
Authors | Krtin Kumar, Jackie Chi Kit Cheung |
Abstract | Neural abstractive summarizers generate summary texts using a language model conditioned on the input source text, and have recently achieved high ROUGE scores on benchmark summarization datasets. We investigate how they achieve this performance with respect to human-written gold-standard abstracts, and whether the systems are able to understand deeper syntactic and semantic structures. We generate a set of contrastive summaries which are perturbed, deficient versions of human-written summaries, and test whether existing neural summarizers score them more highly than the human-written summaries. We analyze their performance on different datasets and find that these systems fail to understand the source text, in a majority of the cases. |
Tasks | Language Modelling |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1396/ |
https://www.aclweb.org/anthology/N19-1396 | |
PWC | https://paperswithcode.com/paper/understanding-the-behaviour-of-neural |
Repo | |
Framework | |
Proceedings of The 8th Workshop on Patent and Scientific Literature Translation
Title | Proceedings of The 8th Workshop on Patent and Scientific Literature Translation |
Authors | |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7200/ |
https://www.aclweb.org/anthology/W19-7200 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-8th-workshop-on-patent-and |
Repo | |
Framework | |
PA3D: Pose-Action 3D Machine for Video Recognition
Title | PA3D: Pose-Action 3D Machine for Video Recognition |
Authors | An Yan, Yali Wang, Zhifeng Li, Yu Qiao |
Abstract | Recent studies have witnessed the successes of using 3D CNNs for video action recognition. However, most 3D models are built upon RGB and optical flow streams, which may not fully exploit pose dynamics, i.e., an important cue of modeling human actions. To fill this gap, we propose a concise Pose-Action 3D Machine (PA3D), which can effectively encode multiple pose modalities within a unified 3D framework, and consequently learn spatio-temporal pose representations for action recognition. More specifically, we introduce a novel temporal pose convolution to aggregate spatial poses over frames. Unlike the classical temporal convolution, our operation can explicitly learn the pose motions that are discriminative to recognize human actions. Extensive experiments on three popular benchmarks (i.e., JHMDB, HMDB, and Charades) show that, PA3D outperforms the recent pose-based approaches. Furthermore, PA3D is highly complementary to the recent 3D CNNs, e.g., I3D. Multi-stream fusion achieves the state-of-the-art performance on all evaluated data sets. |
Tasks | Action Recognition In Videos, Optical Flow Estimation, Skeleton Based Action Recognition, Temporal Action Localization, Video Recognition |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Yan_PA3D_Pose-Action_3D_Machine_for_Video_Recognition_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Yan_PA3D_Pose-Action_3D_Machine_for_Video_Recognition_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/pa3d-pose-action-3d-machine-for-video |
Repo | |
Framework | |