Paper Group NANR 195
Stochastic Training of Graph Convolutional Networks. Towards Controllable Story Generation. Syntax-based Transfer Learning for the Task of Biomedical Relation Extraction. Content Extraction and Lexical Analysis from Customer-Agent Interactions. Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization. Syntactically Aware Neural Arc …
Stochastic Training of Graph Convolutional Networks
Title | Stochastic Training of Graph Convolutional Networks |
Authors | Jianfei Chen, Jun Zhu |
Abstract | Graph convolutional networks (GCNs) are powerful deep neural networks for graph-structured data. However, GCN computes nodes’ representation recursively from their neighbors, making the receptive field size grow exponentially with the number of layers. Previous attempts on reducing the receptive field size by subsampling neighbors do not have any convergence guarantee, and their receptive field size per node is still in the order of hundreds. In this paper, we develop a preprocessing strategy and two control variate based algorithms to further reduce the receptive field size. Our algorithms are guaranteed to converge to GCN’s local optimum regardless of the neighbor sampling size. Empirical results show that our algorithms have a similar convergence speed per epoch with the exact algorithm even using only two neighbors per node. The time consumption of our algorithm on the Reddit dataset is only one fifth of previous neighbor sampling algorithms. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=rylejExC- |
https://openreview.net/pdf?id=rylejExC- | |
PWC | https://paperswithcode.com/paper/stochastic-training-of-graph-convolutional-1 |
Repo | |
Framework | |
Towards Controllable Story Generation
Title | Towards Controllable Story Generation |
Authors | Nanyun Peng, Marjan Ghazvininejad, Jonathan May, Kevin Knight |
Abstract | We present a general framework of analyzing existing story corpora to generate controllable and creative new stories. The proposed framework needs little manual annotation to achieve controllable story generation. It creates a new interface for humans to interact with computers to generate personalized stories. We apply the framework to build recurrent neural network (RNN)-based generation models to control story ending valence and storyline. Experiments show that our methods successfully achieve the control and enhance the coherence of stories through introducing storylines. with additional control factors, the generation model gets lower perplexity, and yields more coherent stories that are faithful to the control factors according to human evaluation. |
Tasks | Text Generation |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-1505/ |
https://www.aclweb.org/anthology/W18-1505 | |
PWC | https://paperswithcode.com/paper/towards-controllable-story-generation |
Repo | |
Framework | |
Syntax-based Transfer Learning for the Task of Biomedical Relation Extraction
Title | Syntax-based Transfer Learning for the Task of Biomedical Relation Extraction |
Authors | Joël Legrand, Yannick Toussaint, Chedy Raïssi, Adrien Coulet |
Abstract | |
Tasks | Domain Adaptation, Relation Extraction, Transfer Learning |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/papers/W18-5617/w18-5617 |
https://www.aclweb.org/anthology/W18-5617 | |
PWC | https://paperswithcode.com/paper/syntax-based-transfer-learning-for-the-task |
Repo | |
Framework | |
Content Extraction and Lexical Analysis from Customer-Agent Interactions
Title | Content Extraction and Lexical Analysis from Customer-Agent Interactions |
Authors | Sergiu Nisioi, Anca Bucur, Liviu P. Dinu |
Abstract | In this paper, we provide a lexical comparative analysis of the vocabulary used by customers and agents in an Enterprise Resource Planning (ERP) environment and a potential solution to clean the data and extract relevant content for NLP. As a result, we demonstrate that the actual vocabulary for the language that prevails in the ERP conversations is highly divergent from the standardized dictionary and further different from general language usage as extracted from the Common Crawl corpus. Moreover, in specific business communication circumstances, where it is expected to observe a high usage of standardized language, code switching and non-standard expression are predominant, emphasizing once more the discrepancy between the day-to-day use of language and the standardized one. |
Tasks | Lexical Analysis |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-6118/ |
https://www.aclweb.org/anthology/W18-6118 | |
PWC | https://paperswithcode.com/paper/content-extraction-and-lexical-analysis-from |
Repo | |
Framework | |
Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization
Title | Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization |
Authors | Jinghui Chen, Pan Xu, Lingxiao Wang, Jian Ma, Quanquan Gu |
Abstract | We propose a nonconvex estimator for the covariate adjusted precision matrix estimation problem in the high dimensional regime, under sparsity constraints. To solve this estimator, we propose an alternating gradient descent algorithm with hard thresholding. Compared with existing methods along this line of research, which lack theoretical guarantees in optimization error and/or statistical error, the proposed algorithm not only is computationally much more efficient with a linear rate of convergence, but also attains the optimal statistical rate up to a logarithmic factor. Thorough experiments on both synthetic and real data support our theory. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2478 |
http://proceedings.mlr.press/v80/chen18n/chen18n.pdf | |
PWC | https://paperswithcode.com/paper/covariate-adjusted-precision-matrix |
Repo | |
Framework | |
Syntactically Aware Neural Architectures for Definition Extraction
Title | Syntactically Aware Neural Architectures for Definition Extraction |
Authors | Luis Espinosa-Anke, Steven Schockaert |
Abstract | Automatically identifying definitional knowledge in text corpora (Definition Extraction or DE) is an important task with direct applications in, among others, Automatic Glossary Generation, Taxonomy Learning, Question Answering and Semantic Search. It is generally cast as a binary classification problem between definitional and non-definitional sentences. In this paper we present a set of neural architectures combining Convolutional and Recurrent Neural Networks, which are further enriched by incorporating linguistic information via syntactic dependencies. Our experimental results in the task of sentence classification, on two benchmarking DE datasets (one generic, one domain-specific), show that these models obtain consistent state of the art results. Furthermore, we demonstrate that models trained on clean Wikipedia-like definitions can successfully be applied to more noisy domain-specific corpora. |
Tasks | Question Answering, Sentence Classification, Word Sense Disambiguation |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/N18-2061/ |
https://www.aclweb.org/anthology/N18-2061 | |
PWC | https://paperswithcode.com/paper/syntactically-aware-neural-architectures-for |
Repo | |
Framework | |
Testing Sparsity over Known and Unknown Bases
Title | Testing Sparsity over Known and Unknown Bases |
Authors | Siddharth Barman, Arnab Bhattacharyya, Suprovat Ghoshal |
Abstract | Sparsity is a basic property of real vectors that is exploited in a wide variety of machine learning applications. In this work, we describe property testing algorithms for sparsity that observe a low-dimensional projec- tion of the input. We consider two settings. In the first setting, we test sparsity with respect to an unknown basis: given input vectors $y_1 ,…,y_p \in R^d$ whose concatenation as columns forms $Y \in R^{d \times p}$ , does $Y = AX$ for matrices $A \in R^{d\times m}$ and $X \in R^{m \times p}$ such that each column of $X$ is $k$-sparse, or is $Y$ “far” from having such a decomposition? In the second setting, we test sparsity with respect to a known basis: for a fixed design ma- trix $A \in R^{d \times m}$ , given input vector $y \in R^d$ , is $y = Ax$ for some $k$-sparse vector $x$ or is $y$ “far” from having such a decomposition? We analyze our algorithms using tools from high-dimensional geometry and probability. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2164 |
http://proceedings.mlr.press/v80/barman18a/barman18a.pdf | |
PWC | https://paperswithcode.com/paper/testing-sparsity-over-known-and-unknown-bases |
Repo | |
Framework | |
Generative Models for Alignment and Data Efficiency in Language
Title | Generative Models for Alignment and Data Efficiency in Language |
Authors | Dustin Tran, Yura Burda, Ilya Sutskever |
Abstract | We examine how learning from unaligned data can improve both the data efficiency of supervised tasks as well as enable alignments without any supervision. For example, consider unsupervised machine translation: the input is two corpora of English and French, and the task is to translate from one language to the other but without any pairs of English and French sentences. To address this, we develop feature-matching autoencoders (FMAEs). FMAEs ensure that the marginal distribution of feature layers are preserved across forward and inverse mappings between domains. We show that FMAEs achieve state of the art for data efficiency and alignment across three tasks: text decipherment, sentiment transfer, and neural machine translation for English-to-German and English-to-French. Most compellingly, FMAEs achieve state of the art for neural translation with limited supervision, with significant BLEU score differences of up to 5.7 and 6.3 over traditional supervised models. Furthermore, on English-to-German, they outperform last year’s best fully supervised models such as ByteNet (Kalchbrenner et al., 2016) while using only half as many supervised examples. |
Tasks | Machine Translation, Unsupervised Machine Translation |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=rJ7RBNe0- |
https://openreview.net/pdf?id=rJ7RBNe0- | |
PWC | https://paperswithcode.com/paper/generative-models-for-alignment-and-data |
Repo | |
Framework | |
A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing
Title | A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing |
Authors | Daniel Fern{'a}ndez-Gonz{'a}lez, Carlos G{'o}mez-Rodr{'\i}guez |
Abstract | We propose an efficient dynamic oracle for training the 2-Planar transition-based parser, a linear-time parser with over 99{%} coverage on non-projective syntactic corpora. This novel approach outperforms the static training strategy in the vast majority of languages tested and scored better on most datasets than the arc-hybrid parser enhanced with the Swap transition, which can handle unrestricted non-projectivity. |
Tasks | Dependency Parsing |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/N18-2062/ |
https://www.aclweb.org/anthology/N18-2062 | |
PWC | https://paperswithcode.com/paper/a-dynamic-oracle-for-linear-time-2-planar |
Repo | |
Framework | |
Non-convex Conditional Gradient Sliding
Title | Non-convex Conditional Gradient Sliding |
Authors | Chao Qu, Yan Li, Huan Xu |
Abstract | We investigate a projection free optimization method, namely non-convex conditional gradient sliding (NCGS) for non-convex optimization problems on the batch, stochastic and finite-sum settings. Conditional gradient sliding (CGS) method, by integrating Nesterov’s accelerated gradient method with Frank-Wolfe (FW) method in a smart way, outperforms FW for convex optimization, by reducing the amount of gradient computations. However, the study of CGS in the non-convex setting is limited. In this paper, we propose the non-convex conditional gradient sliding (NCGS) methods and analyze their convergence properties. We also leverage the idea of variance reduction from the recent progress in convex optimization to obtain a new algorithm termed variance reduced NCGS (NCGS-VR), and obtain faster convergence rate than the batch NCGS in the finite-sum setting. We show that NCGS algorithms outperform their Frank-Wolfe counterparts both in theory and in practice, for all three settings, namely the batch, stochastic and finite-sum setting. This significantly improves our understanding of optimizing non-convex functions with complicated feasible sets (where projection is prohibitively expensive). |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=1947 |
http://proceedings.mlr.press/v80/qu18a/qu18a.pdf | |
PWC | https://paperswithcode.com/paper/non-convex-conditional-gradient-sliding |
Repo | |
Framework | |
Kronecker-factored Curvature Approximations for Recurrent Neural Networks
Title | Kronecker-factored Curvature Approximations for Recurrent Neural Networks |
Authors | James Martens, Jimmy Ba, Matt Johnson |
Abstract | Kronecker-factor Approximate Curvature (Martens & Grosse, 2015) (K-FAC) is a 2nd-order optimization method which has been shown to give state-of-the-art performance on large-scale neural network optimization tasks (Ba et al., 2017). It is based on an approximation to the Fisher information matrix (FIM) that makes assumptions about the particular structure of the network and the way it is parameterized. The original K-FAC method was applicable only to fully-connected networks, although it has been recently extended by Grosse & Martens (2016) to handle convolutional networks as well. In this work we extend the method to handle RNNs by introducing a novel approximation to the FIM for RNNs. This approximation works by modelling the covariance structure between the gradient contributions at different time-steps using a chain-structured linear Gaussian graphical model, summing the various cross-covariances, and computing the inverse in closed form. We demonstrate in experiments that our method significantly outperforms general purpose state-of-the-art optimizers like SGD with momentum and Adam on several challenging RNN training tasks. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=HyMTkQZAb |
https://openreview.net/pdf?id=HyMTkQZAb | |
PWC | https://paperswithcode.com/paper/kronecker-factored-curvature-approximations |
Repo | |
Framework | |
The Potential of the Computational Linguistic Analysis of Social Media for Population Studies
Title | The Potential of the Computational Linguistic Analysis of Social Media for Population Studies |
Authors | Letizia Mencarini |
Abstract | The paper provides an outline of the scope for synergy between computational linguistic analysis and population stud-ies. It first reviews where population studies stand in terms of using social media data. Demographers are entering the realm of big data in force. But, this paper argues, population studies have much to gain from computational linguis-tic analysis, especially in terms of ex-plaining the drivers behind population processes. The paper gives two examples of how the method can be applied, and concludes with a fundamental caveat. Yes, computational linguistic analysis provides a possible key for integrating micro theory into any demographic analysis of social media data. But results may be of little value in as much as knowledge about fundamental sample characteristics are unknown. |
Tasks | |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-1109/ |
https://www.aclweb.org/anthology/W18-1109 | |
PWC | https://paperswithcode.com/paper/the-potential-of-the-computational-linguistic |
Repo | |
Framework | |
Lancaster at SemEval-2018 Task 3: Investigating Ironic Features in English Tweets
Title | Lancaster at SemEval-2018 Task 3: Investigating Ironic Features in English Tweets |
Authors | Edward Dearden, Alistair Baron |
Abstract | This paper describes the system we submitted to SemEval-2018 Task 3. The aim of the system is to distinguish between irony and non-irony in English tweets. We create a targeted feature set and analyse how different features are useful in the task of irony detection, achieving an F1-score of 0.5914. The analysis of individual features provides insight that may be useful in future attempts at detecting irony in tweets. |
Tasks | Sentiment Analysis |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1096/ |
https://www.aclweb.org/anthology/S18-1096 | |
PWC | https://paperswithcode.com/paper/lancaster-at-semeval-2018-task-3 |
Repo | |
Framework | |
Identification of Alias Links among Participants in Narratives
Title | Identification of Alias Links among Participants in Narratives |
Authors | Sangameshwar Patil, Sachin Pawar, Swapnil Hingmire, Girish Palshikar, Vasudeva Varma, Pushpak Bhattacharyya |
Abstract | Identification of distinct and independent participants (entities of interest) in a narrative is an important task for many NLP applications. This task becomes challenging because these participants are often referred to using multiple aliases. In this paper, we propose an approach based on linguistic knowledge for identification of aliases mentioned using proper nouns, pronouns or noun phrases with common noun headword. We use Markov Logic Network (MLN) to encode the linguistic knowledge for identification of aliases. We evaluate on four diverse history narratives of varying complexity. Our approach performs better than the state-of-the-art approach as well as a combination of standard named entity recognition and coreference resolution techniques. |
Tasks | Coreference Resolution, Named Entity Recognition, Question Answering |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2011/ |
https://www.aclweb.org/anthology/P18-2011 | |
PWC | https://paperswithcode.com/paper/identification-of-alias-links-among |
Repo | |
Framework | |
UTFPR at IEST 2018: Exploring Character-to-Word Composition for Emotion Analysis
Title | UTFPR at IEST 2018: Exploring Character-to-Word Composition for Emotion Analysis |
Authors | Gustavo Paetzold |
Abstract | We introduce the UTFPR system for the Implicit Emotions Shared Task of 2018: A compositional character-to-word recurrent neural network that does not exploit heavy and/or hard-to-obtain resources. We find that our approach can outperform multiple baselines, and offers an elegant and effective solution to the problem of orthographic variance in tweets. |
Tasks | Emotion Recognition |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6224/ |
https://www.aclweb.org/anthology/W18-6224 | |
PWC | https://paperswithcode.com/paper/utfpr-at-iest-2018-exploring-character-to |
Repo | |
Framework | |