October 15, 2019

2133 words 11 mins read

Paper Group NANR 248

Paper Group NANR 248

An Empirical Study of Machine Translation for the Shared Task of WMT18. The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018. Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems. Developing New Linguistic Resources and Tools for the Gali …

An Empirical Study of Machine Translation for the Shared Task of WMT18

Title An Empirical Study of Machine Translation for the Shared Task of WMT18
Authors Chao Bei, Hao Zong, Yiming Wang, Baoyong Fan, Shiqi Li, Conghu Yuan
Abstract This paper describes the Global Tone Communication Co., Ltd.{'}s submission of the WMT18 shared news translation task. We participated in the English-to-Chinese direction and get the best BLEU (43.8) scores among all the participants. The submitted system focus on data clearing and techniques to build a competitive model for this task. Unlike other participants, the submitted system are mainly relied on the data filtering to obtain the best BLEU score. We do data filtering not only for provided sentences but also for the back translated sentences. The techniques we apply for data filtering include filtering by rules, language models and translation models. We also conduct several experiments to validate the effectiveness of training techniques. According to our experiments, the Annealing Adam optimizing function and ensemble decoding are the most effective techniques for the model training.
Tasks Chinese Word Segmentation, Language Modelling, Machine Translation, Tokenization
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6404/
PDF https://www.aclweb.org/anthology/W18-6404
PWC https://paperswithcode.com/paper/an-empirical-study-of-machine-translation-for
Repo
Framework

The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018

Title The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018
Authors Miguel Gra{\c{c}}a, Yunsu Kim, Julian Schamper, Jiahui Geng, Hermann Ney
Abstract This paper describes the unsupervised neural machine translation (NMT) systems of the RWTH Aachen University developed for the English ↔ German news translation task of the \textit{EMNLP 2018 Third Conference on Machine Translation} (WMT 2018). Our work is based on iterative back-translation using a shared encoder-decoder NMT model. We extensively compare different vocabulary types, word embedding initialization schemes and optimization methods for our model. We also investigate gating and weight normalization for the word embedding layer.
Tasks Machine Translation, Tokenization, Word Embeddings
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6409/
PDF https://www.aclweb.org/anthology/W18-6409
PWC https://paperswithcode.com/paper/the-rwth-aachen-university-english-german-and
Repo
Framework

Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems

Title Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems
Authors Jung-Su Ha, Young-Jin Park, Hyeok-Joo Chae, Soon-Seo Park, Han-Lim Choi
Abstract We present a representation learning algorithm that learns a low-dimensional latent dynamical system from high-dimensional sequential raw data, e.g., video. The framework builds upon recent advances in amortized inference methods that use both an inference network and a refinement procedure to output samples from a variational distribution given an observation sequence, and takes advantage of the duality between control and inference to approximately solve the intractable inference problem using the path integral control approach. The learned dynamical model can be used to predict and plan the future states; we also present the efficient planning method that exploits the learned low-dimensional latent dynamics. Numerical experiments show that the proposed path-integral control based variational inference method leads to tighter lower bounds in statistical model learning of sequential data. Supplementary video: https://youtu.be/xCp35crUoLQ
Tasks Representation Learning
Published 2018-12-01
URL http://papers.nips.cc/paper/8108-adaptive-path-integral-autoencoders-representation-learning-and-planning-for-dynamical-systems
PDF http://papers.nips.cc/paper/8108-adaptive-path-integral-autoencoders-representation-learning-and-planning-for-dynamical-systems.pdf
PWC https://paperswithcode.com/paper/adaptive-path-integral-autoencoders
Repo
Framework

Developing New Linguistic Resources and Tools for the Galician Language

Title Developing New Linguistic Resources and Tools for the Galician Language
Authors Rodrigo Agerri, Xavier G{'o}mez Guinovart, German Rigau, Miguel Anxo Solla Portela
Abstract
Tasks Lemmatization, Named Entity Recognition, Word Sense Disambiguation
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1367/
PDF https://www.aclweb.org/anthology/L18-1367
PWC https://paperswithcode.com/paper/developing-new-linguistic-resources-and-tools
Repo
Framework

The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018

Title The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018
Authors Ngoc-Quan Pham, Jan Niehues, Alex Waibel, er
Abstract We present our experiments in the scope of the news translation task in WMT 2018, in directions: English→German. The core of our systems is the encoder-decoder based neural machine translation models using the transformer architecture. We enhanced the model with a deeper architecture. By using techniques to limit the memory consumption, we were able to train models that are 4 times larger on one GPU and improve the performance by 1.2 BLEU points. Furthermore, we performed sentence selection for the newly available ParaCrawl corpus. Thereby, we could improve the effectiveness of the corpus by 0.5 BLEU points.
Tasks Machine Translation, Tokenization
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6422/
PDF https://www.aclweb.org/anthology/W18-6422
PWC https://paperswithcode.com/paper/the-karlsruhe-institute-of-technology-systems
Repo
Framework

A Linguistically-Informed Fusion Approach for Multimodal Depression Detection

Title A Linguistically-Informed Fusion Approach for Multimodal Depression Detection
Authors Michelle Morales, Stefan Scherer, Rivka Levitan
Abstract Automated depression detection is inherently a multimodal problem. Therefore, it is critical that researchers investigate fusion techniques for multimodal design. This paper presents the first-ever comprehensive study of fusion techniques for depression detection. In addition, we present novel linguistically-motivated fusion techniques, which we find outperform existing approaches.
Tasks
Published 2018-06-01
URL https://www.aclweb.org/anthology/W18-0602/
PDF https://www.aclweb.org/anthology/W18-0602
PWC https://paperswithcode.com/paper/a-linguistically-informed-fusion-approach-for
Repo
Framework

The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018

Title The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018
Authors Julian Schamper, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, Hermann Ney
Abstract This paper describes the statistical machine translation systems developed at RWTH Aachen University for the German→English, English→Turkish and Chinese→English translation tasks of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles of neural machine translation systems based on the Transformer architecture. Our main focus is on the German→English task where we to all automatic scored first with respect metrics provided by the organizers. We identify data selection, fine-tuning, batch size and model dimension as important hyperparameters. In total we improve by 6.8{%} BLEU over our last year{'}s submission and by 4.8{%} BLEU over the winning system of the 2017 German→English task. In English→Turkish task, we show 3.6{%} BLEU improvement over the last year{'}s winning system. We further report results on the Chinese→English task where we improve 2.2{%} BLEU on average over our baseline systems but stay behind the 2018 winning systems.
Tasks Machine Translation, Tokenization
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6426/
PDF https://www.aclweb.org/anthology/W18-6426
PWC https://paperswithcode.com/paper/the-rwth-aachen-university-supervised-machine
Repo
Framework

Reusable workflows for gender prediction

Title Reusable workflows for gender prediction
Authors Matej Martinc, Senja Pollak
Abstract
Tasks Feature Engineering, Gender Prediction, Text Classification
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1082/
PDF https://www.aclweb.org/anthology/L18-1082
PWC https://paperswithcode.com/paper/reusable-workflows-for-gender-prediction
Repo
Framework

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global

Title Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global
Authors Thomas Laurent, James Brecht
Abstract We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima.
Tasks
Published 2018-07-01
URL https://icml.cc/Conferences/2018/Schedule?showEvent=1993
PDF http://proceedings.mlr.press/v80/laurent18a/laurent18a.pdf
PWC https://paperswithcode.com/paper/deep-linear-networks-with-arbitrary-loss-all
Repo
Framework

Mutual Information Neural Estimation

Title Mutual Information Neural Estimation
Authors Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm
Abstract We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement the Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.
Tasks
Published 2018-07-01
URL https://icml.cc/Conferences/2018/Schedule?showEvent=2440
PDF http://proceedings.mlr.press/v80/belghazi18a/belghazi18a.pdf
PWC https://paperswithcode.com/paper/mutual-information-neural-estimation
Repo
Framework

Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation

Title Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation
Authors Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, Thomas S. Huang
Abstract Despite remarkable progress, weakly supervised segmentation methods are still inferior to their fully supervised counterparts. We obverse that the performance gap mainly comes from the inability of producing dense and integral pixel-level object localization for training images only with image-level labels. In this work, we revisit the dilated convolution proposed in [1] and shed light on how it enables the classification network to generate dense object localization. By substantially enlarging the receptive fields of convolutional kernels with different dilation rates, the classification network can localize the object regions even when they are not so discriminative for classification and finally produce reliable object regions for benefiting both weakly- and semi- supervised semantic segmentation. Despite the apparent simplicity of dilated convolution, we are able to obtain superior performance for semantic segmentation tasks. In particular, it achieves 60.8% and 67.6% mean Intersection-over-Union (mIoU) on Pascal VOC 2012 test set in weakly- (only image-level labels are available) and semi- (1,464 segmentation masks are available) settings, which are the new state-of-the-arts.
Tasks Object Localization, Semantic Segmentation, Semi-Supervised Semantic Segmentation
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Wei_Revisiting_Dilated_Convolution_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Wei_Revisiting_Dilated_Convolution_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/revisiting-dilated-convolution-a-simple
Repo
Framework

A Neural Question Answering Model Based on Semi-Structured Tables

Title A Neural Question Answering Model Based on Semi-Structured Tables
Authors Hao Wang, Xiaodong Zhang, Shuming Ma, Xu Sun, Houfeng Wang, Mengxiang Wang
Abstract Most question answering (QA) systems are based on raw text and structured knowledge graph. However, raw text corpora are hard for QA system to understand, and structured knowledge graph needs intensive manual work, while it is relatively easy to obtain semi-structured tables from many sources directly, or build them automatically. In this paper, we build an end-to-end system to answer multiple choice questions with semi-structured tables as its knowledge. Our system answers queries by two steps. First, it finds the most similar tables. Then the system measures the relevance between each question and candidate table cells, and choose the most related cell as the source of answer. The system is evaluated with TabMCQ dataset, and gets a huge improvement compared to the state of the art.
Tasks Knowledge Graphs, Question Answering
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1165/
PDF https://www.aclweb.org/anthology/C18-1165
PWC https://paperswithcode.com/paper/a-neural-question-answering-model-based-on
Repo
Framework

A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication

Title A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication
Authors Peng Jiang, Gagan Agrawal
Abstract The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous works have demonstrated the potential of using gradient sparsification and quantization to reduce the communication cost. However, there is still a lack of understanding about how sparse and quantized communication affects the convergence rate of the training algorithm. In this paper, we study the convergence rate of distributed SGD for non-convex optimization with two communication reducing strategies: sparse parameter averaging and gradient quantization. We show that $O(1/\sqrt{MK})$ convergence rate can be achieved if the sparsification and quantization hyperparameters are configured properly. We also propose a strategy called periodic quantized averaging (PQASGD) that further reduces the communication cost while preserving the $O(1/\sqrt{MK})$ convergence rate. Our evaluation validates our theoretical results and shows that our PQASGD can converge as fast as full-communication SGD with only $3%-5%$ communication data size.
Tasks Quantization
Published 2018-12-01
URL http://papers.nips.cc/paper/7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication
PDF http://papers.nips.cc/paper/7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication.pdf
PWC https://paperswithcode.com/paper/a-linear-speedup-analysis-of-distributed-deep
Repo
Framework

Provable Variable Selection for Streaming Features

Title Provable Variable Selection for Streaming Features
Authors Jing Wang, Jie Shen, Ping Li
Abstract In large-scale machine learning applications and high-dimensional statistics, it is ubiquitous to address a considerable number of features among which many are redundant. As a remedy, online feature selection has attracted increasing attention in recent years. It sequentially reveals features and evaluates the importance of them. Though online feature selection has proven an elegant methodology, it is usually challenging to carry out a rigorous theoretical characterization. In this work, we propose a provable online feature selection algorithm that utilizes the online leverage score. The selected features are then fed to $k$-means clustering, making the clustering step memory and computationally efficient. We prove that with high probability, performing $k$-means clustering based on the selected feature space does not deviate far from the optimal clustering using the original data. The empirical results on real-world data sets demonstrate the effectiveness of our algorithm.
Tasks Feature Selection
Published 2018-07-01
URL https://icml.cc/Conferences/2018/Schedule?showEvent=2352
PDF http://proceedings.mlr.press/v80/wang18g/wang18g.pdf
PWC https://paperswithcode.com/paper/provable-variable-selection-for-streaming
Repo
Framework

Revisiting the Task of Scoring Open IE Relations

Title Revisiting the Task of Scoring Open IE Relations
Authors William L{'e}chelle, Philippe Langlais
Abstract
Tasks Knowledge Base Completion, Language Modelling, Open Information Extraction
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1323/
PDF https://www.aclweb.org/anthology/L18-1323
PWC https://paperswithcode.com/paper/revisiting-the-task-of-scoring-open-ie
Repo
Framework
comments powered by Disqus