Paper Group NANR 19
The RWTH Aachen University Machine Translation Systems for WMT 2019. TabNN: A Universal Neural Network Solution for Tabular Data. High Resolution and Fast Face Completion via Progressively Attentive GANs. Enhancing 2D Representation via Adjacent Views for 3D Shape Retrieval. Domain Adaptation of SRL Systems for Biological Processes. Graph Convoluti …
The RWTH Aachen University Machine Translation Systems for WMT 2019
Title | The RWTH Aachen University Machine Translation Systems for WMT 2019 |
Authors | Jan Rosendahl, Christian Herold, Yunsu Kim, Miguel Gra{\c{c}}a, Weiyue Wang, Parnia Bahar, Yingbo Gao, Hermann Ney |
Abstract | This paper describes the neural machine translation systems developed at the RWTH Aachen University for the German-English, Chinese-English and Kazakh-English news translation tasks of the Fourth Conference on Machine Translation (WMT19). For all tasks, the final submitted system is based on the Transformer architecture. We focus on improving data filtering and fine-tuning as well as systematically evaluating interesting approaches like unigram language model segmentation and transfer learning. For the De-En task, none of the tested methods gave a significant improvement over last years winning system and we end up with the same performance, resulting in 39.6{%} BLEU on newstest2019. In the Zh-En task, we show 1.3{%} BLEU improvement over our last year{'}s submission, which we mostly attribute to the splitting of long sentences during translation. We further report results on the Kazakh-English task where we gain improvements of 11.1{%} BLEU over our baseline system. On the same task we present a recent transfer learning approach, which uses half of the free parameters of our submission system and performs on par with it. |
Tasks | Language Modelling, Machine Translation, Transfer Learning |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5338/ |
https://www.aclweb.org/anthology/W19-5338 | |
PWC | https://paperswithcode.com/paper/the-rwth-aachen-university-machine |
Repo | |
Framework | |
TabNN: A Universal Neural Network Solution for Tabular Data
Title | TabNN: A Universal Neural Network Solution for Tabular Data |
Authors | Guolin Ke, Jia Zhang, Zhenhui Xu, Jiang Bian, Tie-Yan Liu |
Abstract | Neural Network (NN) has achieved state-of-the-art performances in many tasks within image, speech, and text domains. Such great success is mainly due to special structure design to fit the particular data patterns, such as CNN capturing spatial locality and RNN modeling sequential dependency. Essentially, these specific NNs achieve good performance by leveraging the prior knowledge over corresponding domain data. Nevertheless, there are many applications with all kinds of tabular data in other domains. Since there are no shared patterns among these diverse tabular data, it is hard to design specific structures to fit them all. Without careful architecture design based on domain knowledge, it is quite challenging for NN to reach satisfactory performance in these tabular data domains. To fill the gap of NN in tabular data learning, we propose a universal neural network solution, called TabNN, to derive effective NN architectures for tabular data in all kinds of tasks automatically. Specifically, the design of TabNN follows two principles: \emph{to explicitly leverages expressive feature combinations} and \emph{to reduce model complexity}. Since GBDT has empirically proven its strength in modeling tabular data, we use GBDT to power the implementation of TabNN. Comprehensive experimental analysis on a variety of tabular datasets demonstrate that TabNN can achieve much better performance than many baseline solutions. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=r1eJssCqY7 |
https://openreview.net/pdf?id=r1eJssCqY7 | |
PWC | https://paperswithcode.com/paper/tabnn-a-universal-neural-network-solution-for |
Repo | |
Framework | |
High Resolution and Fast Face Completion via Progressively Attentive GANs
Title | High Resolution and Fast Face Completion via Progressively Attentive GANs |
Authors | Zeyuan Chen, Shaoliang Nie, Tianfu Wu, Christopher G. Healey |
Abstract | Face completion is a challenging task with the difficulty level increasing significantly with respect to high resolution, the complexity of “holes” and the controllable attributes of filled-in fragments. Our system addresses the challenges by learning a fully end-to-end framework that trains generative adversarial networks (GANs) progressively from low resolution to high resolution with conditional vectors encoding controllable attributes. We design a novel coarse-to-fine attentive module network architecture. Our model is encouraged to attend on finer details while the network is growing to a higher resolution, thus being capable of showing progressive attention to different frequency components in a coarse-to-fine way. We term the module Frequency-oriented Attentive Module (FAM). Our system can complete faces with large structural and appearance variations using a single feed-forward pass of computation with mean inference time of 0.54 seconds for images at 1024x1024 resolution. A pilot human study shows our approach outperforms state-of-the-art face completion methods. The code will be released upon publication. |
Tasks | Facial Inpainting |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=Hkxx3o0qFX |
https://openreview.net/pdf?id=Hkxx3o0qFX | |
PWC | https://paperswithcode.com/paper/high-resolution-and-fast-face-completion-via |
Repo | |
Framework | |
Enhancing 2D Representation via Adjacent Views for 3D Shape Retrieval
Title | Enhancing 2D Representation via Adjacent Views for 3D Shape Retrieval |
Authors | Cheng Xu, Zhaoqun Li, Qiang Qiu, Biao Leng, Jingfei Jiang |
Abstract | Multi-view shape descriptors obtained from various 2D images are commonly adopted in 3D shape retrieval. One major challenge is that significant shape information are discarded during 2D view rendering through projection. In this paper, we propose a convolutional neural network based method, CenterNet, to enhance each individual 2D view using its neighboring ones. By exploiting cross-view correlations, CenterNet learns how adjacent views can be maximally incorporated for an enhanced 2D representation to effectively describe shapes. We observe that a very small amount of, e.g., six, enhanced 2D views, are already sufficient for a panoramic shape description. Thus, by simply aggregating features from six enhanced 2D views, we arrive at a highly compact yet discriminative shape descriptor. The proposed shape descriptor significantly outperforms state-of-the-art 3D shape retrieval methods on the ModelNet and ShapeNetCore55 benchmarks, and also exhibits robustness against object occlusion. |
Tasks | 3D Shape Retrieval |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Xu_Enhancing_2D_Representation_via_Adjacent_Views_for_3D_Shape_Retrieval_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Xu_Enhancing_2D_Representation_via_Adjacent_Views_for_3D_Shape_Retrieval_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-2d-representation-via-adjacent |
Repo | |
Framework | |
Domain Adaptation of SRL Systems for Biological Processes
Title | Domain Adaptation of SRL Systems for Biological Processes |
Authors | Dheeraj Rajagopal, Nidhi Vyas, Aditya Siddhant, Anirudha Rayasam, T, Niket on, Eduard Hovy |
Abstract | Domain adaptation remains one of the most challenging aspects in the wide-spread use of Semantic Role Labeling (SRL) systems. Current state-of-the-art methods are typically trained on large-scale datasets, but their performances do not directly transfer to low-resource domain-specific settings. In this paper, we propose two approaches for domain adaptation in the biological domain that involves pre-training LSTM-CRF based on existing large-scale datasets and adapting it for a low-resource corpus of biological processes. Our first approach defines a mapping between the source labels and the target labels, and the other approach modifies the final CRF layer in sequence-labeling neural network architecture. We perform our experiments on ProcessBank dataset which contains less than 200 paragraphs on biological processes. We improve over the previous state-of-the-art system on this dataset by 21 F1 points. We also show that, by incorporating event-event relationship in ProcessBank, we are able to achieve an additional 2.6 F1 gain, giving us possible insights into how to improve SRL systems for biological process using richer annotations. |
Tasks | Domain Adaptation, Semantic Role Labeling |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5009/ |
https://www.aclweb.org/anthology/W19-5009 | |
PWC | https://paperswithcode.com/paper/domain-adaptation-of-srl-systems-for |
Repo | |
Framework | |
Graph Convolutional Network with Sequential Attention For Goal-Oriented Dialogue Systems
Title | Graph Convolutional Network with Sequential Attention For Goal-Oriented Dialogue Systems |
Authors | Suman Banerjee, Mitesh M. Khapra |
Abstract | Domain specific goal-oriented dialogue systems typically require modeling three types of inputs, viz., (i) the knowledge-base associated with the domain, (ii) the history of the conversation, which is a sequence of utterances and (iii) the current utterance for which the response needs to be generated. While modeling these inputs, current state-of-the-art models such as Mem2Seq typically ignore the rich structure inherent in the knowledge graph and the sentences in the conversation context. Inspired by the recent success of structure-aware Graph Convolutional Networks (GCNs) for various NLP tasks such as machine translation, semantic role labeling and document dating, we propose a memory augmented GCN for goal-oriented dialogues. Our model exploits (i) the entity relation graph in a knowledge-base and (ii) the dependency graph associated with an utterance to compute richer representations for words and entities. Further, we take cognizance of the fact that in certain situations, such as, when the conversation is in a code-mixed language, dependency parsers may not be available. We show that in such situations we could use the global word co-occurrence graph and use it to enrich the representations of utterances. We experiment with the modified DSTC2 dataset and its recently released code-mixed versions in four languages and show that our method outperforms existing state-of-the-art methods, using a wide range of evaluation metrics. |
Tasks | Goal-Oriented Dialogue Systems, Machine Translation, Semantic Role Labeling |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=Skz-3j05tm |
https://openreview.net/pdf?id=Skz-3j05tm | |
PWC | https://paperswithcode.com/paper/graph-convolutional-network-with-sequential |
Repo | |
Framework | |
Show Some Love to Your n-grams: A Bit of Progress and Stronger n-gram Language Modeling Baselines
Title | Show Some Love to Your n-grams: A Bit of Progress and Stronger n-gram Language Modeling Baselines |
Authors | Ehsan Shareghi, Daniela Gerz, Ivan Vuli{'c}, Anna Korhonen |
Abstract | In recent years neural language models (LMs) have set the state-of-the-art performance for several benchmarking datasets. While the reasons for their success and their computational demand are well-documented, a comparison between neural models and more recent developments in n-gram models is neglected. In this paper, we examine the recent progress in n-gram literature, running experiments on 50 languages covering all morphological language families. Experimental results illustrate that a simple extension of Modified Kneser-Ney outperforms an lstm language model on 42 languages while a word-level Bayesian n-gram LM (Shareghi et al., 2017) outperforms the character-aware neural model (Kim et al., 2016) on average across all languages, and its extension which explicitly injects linguistic knowledge (Gerz et al., 2018) on 8 languages. Further experiments on larger Europarl datasets for 3 languages indicate that neural architectures are able to outperform computationally much cheaper n-gram models: n-gram training is up to 15,000x quicker. Our experiments illustrate that standalone n-gram models lend themselves as natural choices for resource-lean or morphologically rich languages, while the recent progress has significantly improved their accuracy. |
Tasks | Language Modelling |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1417/ |
https://www.aclweb.org/anthology/N19-1417 | |
PWC | https://paperswithcode.com/paper/show-some-love-to-your-n-grams-a-bit-of |
Repo | |
Framework | |
A RECURRENT NEURAL CASCADE-BASED MODEL FOR CONTINUOUS-TIME DIFFUSION PROCESS
Title | A RECURRENT NEURAL CASCADE-BASED MODEL FOR CONTINUOUS-TIME DIFFUSION PROCESS |
Authors | Sylvain Lamprier |
Abstract | Many works have been proposed in the literature to capture the dynamics of diffusion in networks. While some of them define graphical markovian models to extract temporal relationships between node infections in networks, others consider diffusion episodes as sequences of infections via recurrent neural models. In this paper we propose a model at the crossroads of these two extremes, which embeds the history of diffusion in infected nodes as hidden continuous states. Depending on the trajectory followed by the content before reaching a given node, the distribution of influence probabilities may vary. However, content trajectories are usually hidden in the data, which induces challenging learning problems. We propose a topological recurrent neural model which exhibits good experimental performances for diffusion modelling and prediction. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SJNceh0cFX |
https://openreview.net/pdf?id=SJNceh0cFX | |
PWC | https://paperswithcode.com/paper/a-recurrent-neural-cascade-based-model-for |
Repo | |
Framework | |
Named Entity Recognition in Information Security Domain for Russian
Title | Named Entity Recognition in Information Security Domain for Russian |
Authors | Anastasiia Sirotina, Natalia Loukachevitch |
Abstract | In this paper we discuss the named entity recognition task for Russian texts related to cybersecurity. First of all, we describe the problems that arise in course of labeling unstructured texts from information security domain. We introduce guidelines for human annotators, according to which a corpus has been marked up. Then, a CRF-based system and different neural architectures have been implemented and applied to the corpus. The named entity recognition systems have been evaluated and compared to determine the most efficient one. |
Tasks | Named Entity Recognition |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1128/ |
https://www.aclweb.org/anthology/R19-1128 | |
PWC | https://paperswithcode.com/paper/named-entity-recognition-in-information |
Repo | |
Framework | |
Neural Chinese Address Parsing
Title | Neural Chinese Address Parsing |
Authors | Hao Li, Wei Lu, Pengjun Xie, Linlin Li |
Abstract | This paper introduces a new task {–} Chinese address parsing {–} the task of mapping Chinese addresses into semantically meaningful chunks. While it is possible to model this problem using a conventional sequence labelling approach, our observation is that there exist complex dependencies between labels that cannot be readily captured by a simple linear-chain structure. We investigate neural structured prediction models with latent variables to capture such rich structural information within Chinese addresses. We create and publicly release a new dataset consisting of 15K Chinese addresses, and conduct extensive experiments on the dataset to investigate the model effectiveness and robustness. We release our code and data at http://statnlp.org/research/sp. |
Tasks | Structured Prediction |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1346/ |
https://www.aclweb.org/anthology/N19-1346 | |
PWC | https://paperswithcode.com/paper/neural-chinese-address-parsing |
Repo | |
Framework | |
Exploiting Discourse-Level Segmentation for Extractive Summarization
Title | Exploiting Discourse-Level Segmentation for Extractive Summarization |
Authors | Zhengyuan Liu, Nancy Chen |
Abstract | Extractive summarization selects and concatenates the most essential text spans in a document. Most, if not all, neural approaches use sentences as the elementary unit to select content for summarization. However, semantic segments containing supplementary information or descriptive details are often nonessential in the generated summaries. In this work, we propose to exploit discourse-level segmentation as a finer-grained means to more precisely pinpoint the core content in a document. We investigate how the sub-sentential segmentation improves extractive summarization performance when content selection is modeled through two basic neural network architectures and a deep bi-directional transformer. Experiment results on the CNN/Daily Mail dataset show that discourse-level segmentation is effective in both cases. In particular, we achieve state-of-the-art performance when discourse-level segmentation is combined with our adapted contextual representation model. |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5415/ |
https://www.aclweb.org/anthology/D19-5415 | |
PWC | https://paperswithcode.com/paper/exploiting-discourse-level-segmentation-for |
Repo | |
Framework | |
SPARSE: Structured Prediction using Argument-Relative Structured Encoding
Title | SPARSE: Structured Prediction using Argument-Relative Structured Encoding |
Authors | Rishi Bommasani, Arzoo Katiyar, Claire Cardie |
Abstract | We propose structured encoding as a novel approach to learning representations for relations and events in neural structured prediction. Our approach explicitly leverages the structure of available relation and event metadata to generate these representations, which are parameterized by both the attribute structure of the metadata as well as the learned representation of the arguments of the relations and events. We consider affine, biaffine, and recurrent operators for building hierarchical representations and modelling underlying features. We apply our approach to the second-order structured prediction task studied in the 2016/2017 Belief and Sentiment analysis evaluations (BeSt): given a document and its entities, relations, and events (including metadata and mentions), determine the sentiment of each entity towards every relation and event in the document. Without task-specific knowledge sources or domain engineering, we significantly improve over systems and baselines that neglect the available metadata or its hierarchical structure. We observe across-the-board improvements on the BeSt 2016/2017 sentiment analysis task of at least 2.3 (absolute) and 10.6{%} (relative) F-measure over the previous state-of-the-art. |
Tasks | Sentiment Analysis, Structured Prediction |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/W19-1503/ |
https://www.aclweb.org/anthology/W19-1503 | |
PWC | https://paperswithcode.com/paper/sparse-structured-prediction-using-argument |
Repo | |
Framework | |
Deep Layers as Stochastic Solvers
Title | Deep Layers as Stochastic Solvers |
Authors | Adel Bibi, Bernard Ghanem, Vladlen Koltun, Rene Ranftl |
Abstract | We provide a novel perspective on the forward pass through a block of layers in a deep network. In particular, we show that a forward pass through a standard dropout layer followed by a linear layer and a non-linear activation is equivalent to optimizing a convex optimization objective with a single iteration of a $\tau$-nice Proximal Stochastic Gradient method. We further show that replacing standard Bernoulli dropout with additive dropout is equivalent to optimizing the same convex objective with a variance-reduced proximal method. By expressing both fully-connected and convolutional layers as special cases of a high-order tensor product, we unify the underlying convex optimization problem in the tensor setting and derive a formula for the Lipschitz constant $L$ used to determine the optimal step size of the above proximal methods. We conduct experiments with standard convolutional networks applied to the CIFAR-10 and CIFAR-100 datasets, and show that replacing a block of layers with multiple iterations of the corresponding solver, with step size set via $L$, consistently improves classification accuracy. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=ryxxCiRqYX |
https://openreview.net/pdf?id=ryxxCiRqYX | |
PWC | https://paperswithcode.com/paper/deep-layers-as-stochastic-solvers |
Repo | |
Framework | |
A Neural Network Component for Knowledge-Based Semantic Representations of Text
Title | A Neural Network Component for Knowledge-Based Semantic Representations of Text |
Authors | Alej Piad-Morffis, ro, Rafael Mu{~n}oz, Yoan Guti{'e}rrez, Yudivian Almeida-Cruz, Suilan Estevez-Velarde, Andr{'e}s Montoyo |
Abstract | This paper presents Semantic Neural Networks (SNNs), a knowledge-aware component based on deep learning. SNNs can be trained to encode explicit semantic knowledge from an arbitrary knowledge base, and can subsequently be combined with other deep learning architectures. At prediction time, SNNs provide a semantic encoding extracted from the input data, which can be exploited by other neural network components to build extended representation models that can face alternative problems. The SNN architecture is defined in terms of the concepts and relations present in a knowledge base. Based on this architecture, a training procedure is developed. Finally, an experimental setup is presented to illustrate the behaviour and performance of a SNN for a specific NLP problem, in this case, opinion mining for the classification of movie reviews. |
Tasks | Opinion Mining |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1105/ |
https://www.aclweb.org/anthology/R19-1105 | |
PWC | https://paperswithcode.com/paper/a-neural-network-component-for-knowledge |
Repo | |
Framework | |
Aligning Open IE Relations and KB Relations using a Siamese Network Based on Word Embedding
Title | Aligning Open IE Relations and KB Relations using a Siamese Network Based on Word Embedding |
Authors | Rifki Afina Putri, Giwon Hong, Sung-Hyon Myaeng |
Abstract | Open Information Extraction (Open IE) aims at generating entity-relation-entity triples from a large amount of text, aiming at capturing key semantics of the text. Given a triple, the relation expresses the type of semantic relation between the entities. Although relations from an Open IE system are more extensible than those used in a traditional Information Extraction system and a Knowledge Base (KB) such as Knowledge Graphs, the former lacks in semantics; an Open IE relation is simply a sequence of words, whereas a KB relation has a predefined meaning. As a way to provide a meaning to an Open IE relation, we attempt to align it with one of the predefined set of relations used in a KB. Our approach is to use a Siamese network that compares two sequences of word embeddings representing an Open IE relation and a predefined KB relation. In order to make the approach practical, we automatically generate a training dataset using a distant supervision approach instead of relying on a hand-labeled dataset. Our experiment shows that the proposed method can capture the relational semantics better than the recent approaches. |
Tasks | Knowledge Graphs, Open Information Extraction, Word Embeddings |
Published | 2019-05-01 |
URL | https://www.aclweb.org/anthology/W19-0412/ |
https://www.aclweb.org/anthology/W19-0412 | |
PWC | https://paperswithcode.com/paper/aligning-open-ie-relations-and-kb-relations |
Repo | |
Framework | |