October 15, 2019

2133 words 11 mins read

Paper Group NANR 248

An Empirical Study of Machine Translation for the Shared Task of WMT18. The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018. Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems. Developing New Linguistic Resources and Tools for the Gali …

An Empirical Study of Machine Translation for the Shared Task of WMT18


Title	An Empirical Study of Machine Translation for the Shared Task of WMT18
Authors	Chao Bei, Hao Zong, Yiming Wang, Baoyong Fan, Shiqi Li, Conghu Yuan
Abstract	This paper describes the Global Tone Communication Co., Ltd.{'}s submission of the WMT18 shared news translation task. We participated in the English-to-Chinese direction and get the best BLEU (43.8) scores among all the participants. The submitted system focus on data clearing and techniques to build a competitive model for this task. Unlike other participants, the submitted system are mainly relied on the data filtering to obtain the best BLEU score. We do data filtering not only for provided sentences but also for the back translated sentences. The techniques we apply for data filtering include filtering by rules, language models and translation models. We also conduct several experiments to validate the effectiveness of training techniques. According to our experiments, the Annealing Adam optimizing function and ensemble decoding are the most effective techniques for the model training.
Tasks	Chinese Word Segmentation, Language Modelling, Machine Translation, Tokenization
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6404/
PDF	https://www.aclweb.org/anthology/W18-6404
PWC	https://paperswithcode.com/paper/an-empirical-study-of-machine-translation-for
Repo
Framework

The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018


Title	The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018
Authors	Miguel Gra{\c{c}}a, Yunsu Kim, Julian Schamper, Jiahui Geng, Hermann Ney
Abstract	This paper describes the unsupervised neural machine translation (NMT) systems of the RWTH Aachen University developed for the English â†” German news translation task of the \textit{EMNLP 2018 Third Conference on Machine Translation} (WMT 2018). Our work is based on iterative back-translation using a shared encoder-decoder NMT model. We extensively compare different vocabulary types, word embedding initialization schemes and optimization methods for our model. We also investigate gating and weight normalization for the word embedding layer.
Tasks	Machine Translation, Tokenization, Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6409/
PDF	https://www.aclweb.org/anthology/W18-6409
PWC	https://paperswithcode.com/paper/the-rwth-aachen-university-english-german-and
Repo
Framework

Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems


Title	Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems
Authors	Jung-Su Ha, Young-Jin Park, Hyeok-Joo Chae, Soon-Seo Park, Han-Lim Choi
Abstract	We present a representation learning algorithm that learns a low-dimensional latent dynamical system from high-dimensional sequential raw data, e.g., video. The framework builds upon recent advances in amortized inference methods that use both an inference network and a refinement procedure to output samples from a variational distribution given an observation sequence, and takes advantage of the duality between control and inference to approximately solve the intractable inference problem using the path integral control approach. The learned dynamical model can be used to predict and plan the future states; we also present the efficient planning method that exploits the learned low-dimensional latent dynamics. Numerical experiments show that the proposed path-integral control based variational inference method leads to tighter lower bounds in statistical model learning of sequential data. Supplementary video: https://youtu.be/xCp35crUoLQ
Tasks	Representation Learning
Published	2018-12-01
URL	http://papers.nips.cc/paper/8108-adaptive-path-integral-autoencoders-representation-learning-and-planning-for-dynamical-systems
PDF	http://papers.nips.cc/paper/8108-adaptive-path-integral-autoencoders-representation-learning-and-planning-for-dynamical-systems.pdf
PWC	https://paperswithcode.com/paper/adaptive-path-integral-autoencoders
Repo
Framework

Developing New Linguistic Resources and Tools for the Galician Language


Title	Developing New Linguistic Resources and Tools for the Galician Language
Authors	Rodrigo Agerri, Xavier G{'o}mez Guinovart, German Rigau, Miguel Anxo Solla Portela
Abstract
Tasks	Lemmatization, Named Entity Recognition, Word Sense Disambiguation
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1367/
PDF	https://www.aclweb.org/anthology/L18-1367
PWC	https://paperswithcode.com/paper/developing-new-linguistic-resources-and-tools
Repo
Framework

The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018


Title	The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018
Authors	Ngoc-Quan Pham, Jan Niehues, Alex Waibel, er
Abstract	We present our experiments in the scope of the news translation task in WMT 2018, in directions: Englishâ†’German. The core of our systems is the encoder-decoder based neural machine translation models using the transformer architecture. We enhanced the model with a deeper architecture. By using techniques to limit the memory consumption, we were able to train models that are 4 times larger on one GPU and improve the performance by 1.2 BLEU points. Furthermore, we performed sentence selection for the newly available ParaCrawl corpus. Thereby, we could improve the effectiveness of the corpus by 0.5 BLEU points.
Tasks	Machine Translation, Tokenization
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6422/
PDF	https://www.aclweb.org/anthology/W18-6422
PWC	https://paperswithcode.com/paper/the-karlsruhe-institute-of-technology-systems
Repo
Framework

A Linguistically-Informed Fusion Approach for Multimodal Depression Detection


Title	A Linguistically-Informed Fusion Approach for Multimodal Depression Detection
Authors	Michelle Morales, Stefan Scherer, Rivka Levitan
Abstract	Automated depression detection is inherently a multimodal problem. Therefore, it is critical that researchers investigate fusion techniques for multimodal design. This paper presents the first-ever comprehensive study of fusion techniques for depression detection. In addition, we present novel linguistically-motivated fusion techniques, which we find outperform existing approaches.
Tasks
Published	2018-06-01
URL	https://www.aclweb.org/anthology/W18-0602/
PDF	https://www.aclweb.org/anthology/W18-0602
PWC	https://paperswithcode.com/paper/a-linguistically-informed-fusion-approach-for
Repo
Framework

The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018


Title	The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018
Authors	Julian Schamper, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, Hermann Ney
Abstract	This paper describes the statistical machine translation systems developed at RWTH Aachen University for the German→English, English→Turkish and Chinese→English translation tasks of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles of neural machine translation systems based on the Transformer architecture. Our main focus is on the German→English task where we to all automatic scored first with respect metrics provided by the organizers. We identify data selection, fine-tuning, batch size and model dimension as important hyperparameters. In total we improve by 6.8{%} BLEU over our last year{'}s submission and by 4.8{%} BLEU over the winning system of the 2017 German→English task. In English→Turkish task, we show 3.6{%} BLEU improvement over the last year{'}s winning system. We further report results on the Chinese→English task where we improve 2.2{%} BLEU on average over our baseline systems but stay behind the 2018 winning systems.
Tasks	Machine Translation, Tokenization
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6426/
PDF	https://www.aclweb.org/anthology/W18-6426
PWC	https://paperswithcode.com/paper/the-rwth-aachen-university-supervised-machine
Repo
Framework

Reusable workflows for gender prediction


Title	Reusable workflows for gender prediction
Authors	Matej Martinc, Senja Pollak
Abstract
Tasks	Feature Engineering, Gender Prediction, Text Classification
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1082/
PDF	https://www.aclweb.org/anthology/L18-1082
PWC	https://paperswithcode.com/paper/reusable-workflows-for-gender-prediction
Repo
Framework

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global


Title	Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global
Authors	Thomas Laurent, James Brecht
Abstract	We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=1993
PDF	http://proceedings.mlr.press/v80/laurent18a/laurent18a.pdf
PWC	https://paperswithcode.com/paper/deep-linear-networks-with-arbitrary-loss-all
Repo
Framework

Mutual Information Neural Estimation


Title	Mutual Information Neural Estimation
Authors	Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm
Abstract	We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement the Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2440
PDF	http://proceedings.mlr.press/v80/belghazi18a/belghazi18a.pdf
PWC	https://paperswithcode.com/paper/mutual-information-neural-estimation
Repo
Framework

Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation


Title	Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation
Authors	Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, Thomas S. Huang
Abstract	Despite remarkable progress, weakly supervised segmentation methods are still inferior to their fully supervised counterparts. We obverse that the performance gap mainly comes from the inability of producing dense and integral pixel-level object localization for training images only with image-level labels. In this work, we revisit the dilated convolution proposed in [1] and shed light on how it enables the classification network to generate dense object localization. By substantially enlarging the receptive fields of convolutional kernels with different dilation rates, the classification network can localize the object regions even when they are not so discriminative for classification and finally produce reliable object regions for benefiting both weakly- and semi- supervised semantic segmentation. Despite the apparent simplicity of dilated convolution, we are able to obtain superior performance for semantic segmentation tasks. In particular, it achieves 60.8% and 67.6% mean Intersection-over-Union (mIoU) on Pascal VOC 2012 test set in weakly- (only image-level labels are available) and semi- (1,464 segmentation masks are available) settings, which are the new state-of-the-arts.
Tasks	Object Localization, Semantic Segmentation, Semi-Supervised Semantic Segmentation
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Wei_Revisiting_Dilated_Convolution_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Wei_Revisiting_Dilated_Convolution_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/revisiting-dilated-convolution-a-simple
Repo
Framework

A Neural Question Answering Model Based on Semi-Structured Tables


Title	A Neural Question Answering Model Based on Semi-Structured Tables
Authors	Hao Wang, Xiaodong Zhang, Shuming Ma, Xu Sun, Houfeng Wang, Mengxiang Wang
Abstract	Most question answering (QA) systems are based on raw text and structured knowledge graph. However, raw text corpora are hard for QA system to understand, and structured knowledge graph needs intensive manual work, while it is relatively easy to obtain semi-structured tables from many sources directly, or build them automatically. In this paper, we build an end-to-end system to answer multiple choice questions with semi-structured tables as its knowledge. Our system answers queries by two steps. First, it finds the most similar tables. Then the system measures the relevance between each question and candidate table cells, and choose the most related cell as the source of answer. The system is evaluated with TabMCQ dataset, and gets a huge improvement compared to the state of the art.
Tasks	Knowledge Graphs, Question Answering
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1165/
PDF	https://www.aclweb.org/anthology/C18-1165
PWC	https://paperswithcode.com/paper/a-neural-question-answering-model-based-on
Repo
Framework

A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication


Title	A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication
Authors	Peng Jiang, Gagan Agrawal
Abstract	The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous works have demonstrated the potential of using gradient sparsification and quantization to reduce the communication cost. However, there is still a lack of understanding about how sparse and quantized communication affects the convergence rate of the training algorithm. In this paper, we study the convergence rate of distributed SGD for non-convex optimization with two communication reducing strategies: sparse parameter averaging and gradient quantization. We show that $O(1/\sqrt{MK})$ convergence rate can be achieved if the sparsification and quantization hyperparameters are configured properly. We also propose a strategy called periodic quantized averaging (PQASGD) that further reduces the communication cost while preserving the $O(1/\sqrt{MK})$ convergence rate. Our evaluation validates our theoretical results and shows that our PQASGD can converge as fast as full-communication SGD with only $3%-5%$ communication data size.
Tasks	Quantization
Published	2018-12-01
URL	http://papers.nips.cc/paper/7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication
PDF	http://papers.nips.cc/paper/7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication.pdf
PWC	https://paperswithcode.com/paper/a-linear-speedup-analysis-of-distributed-deep
Repo
Framework

Provable Variable Selection for Streaming Features


Title	Provable Variable Selection for Streaming Features
Authors	Jing Wang, Jie Shen, Ping Li
Abstract	In large-scale machine learning applications and high-dimensional statistics, it is ubiquitous to address a considerable number of features among which many are redundant. As a remedy, online feature selection has attracted increasing attention in recent years. It sequentially reveals features and evaluates the importance of them. Though online feature selection has proven an elegant methodology, it is usually challenging to carry out a rigorous theoretical characterization. In this work, we propose a provable online feature selection algorithm that utilizes the online leverage score. The selected features are then fed to $k$-means clustering, making the clustering step memory and computationally efficient. We prove that with high probability, performing $k$-means clustering based on the selected feature space does not deviate far from the optimal clustering using the original data. The empirical results on real-world data sets demonstrate the effectiveness of our algorithm.
Tasks	Feature Selection
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2352
PDF	http://proceedings.mlr.press/v80/wang18g/wang18g.pdf
PWC	https://paperswithcode.com/paper/provable-variable-selection-for-streaming
Repo
Framework

Revisiting the Task of Scoring Open IE Relations


Title	Revisiting the Task of Scoring Open IE Relations
Authors	William L{'e}chelle, Philippe Langlais
Abstract
Tasks	Knowledge Base Completion, Language Modelling, Open Information Extraction
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1323/
PDF	https://www.aclweb.org/anthology/L18-1323
PWC	https://paperswithcode.com/paper/revisiting-the-task-of-scoring-open-ie
Repo
Framework