Paper Group NANR 248
An Empirical Study of Machine Translation for the Shared Task of WMT18. The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018. Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems. Developing New Linguistic Resources and Tools for the Gali …
An Empirical Study of Machine Translation for the Shared Task of WMT18
Title | An Empirical Study of Machine Translation for the Shared Task of WMT18 |
Authors | Chao Bei, Hao Zong, Yiming Wang, Baoyong Fan, Shiqi Li, Conghu Yuan |
Abstract | This paper describes the Global Tone Communication Co., Ltd.{'}s submission of the WMT18 shared news translation task. We participated in the English-to-Chinese direction and get the best BLEU (43.8) scores among all the participants. The submitted system focus on data clearing and techniques to build a competitive model for this task. Unlike other participants, the submitted system are mainly relied on the data filtering to obtain the best BLEU score. We do data filtering not only for provided sentences but also for the back translated sentences. The techniques we apply for data filtering include filtering by rules, language models and translation models. We also conduct several experiments to validate the effectiveness of training techniques. According to our experiments, the Annealing Adam optimizing function and ensemble decoding are the most effective techniques for the model training. |
Tasks | Chinese Word Segmentation, Language Modelling, Machine Translation, Tokenization |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6404/ |
https://www.aclweb.org/anthology/W18-6404 | |
PWC | https://paperswithcode.com/paper/an-empirical-study-of-machine-translation-for |
Repo | |
Framework | |
The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018
Title | The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018 |
Authors | Miguel Gra{\c{c}}a, Yunsu Kim, Julian Schamper, Jiahui Geng, Hermann Ney |
Abstract | This paper describes the unsupervised neural machine translation (NMT) systems of the RWTH Aachen University developed for the English ↔ German news translation task of the \textit{EMNLP 2018 Third Conference on Machine Translation} (WMT 2018). Our work is based on iterative back-translation using a shared encoder-decoder NMT model. We extensively compare different vocabulary types, word embedding initialization schemes and optimization methods for our model. We also investigate gating and weight normalization for the word embedding layer. |
Tasks | Machine Translation, Tokenization, Word Embeddings |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6409/ |
https://www.aclweb.org/anthology/W18-6409 | |
PWC | https://paperswithcode.com/paper/the-rwth-aachen-university-english-german-and |
Repo | |
Framework | |
Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems
Title | Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems |
Authors | Jung-Su Ha, Young-Jin Park, Hyeok-Joo Chae, Soon-Seo Park, Han-Lim Choi |
Abstract | We present a representation learning algorithm that learns a low-dimensional latent dynamical system from high-dimensional sequential raw data, e.g., video. The framework builds upon recent advances in amortized inference methods that use both an inference network and a refinement procedure to output samples from a variational distribution given an observation sequence, and takes advantage of the duality between control and inference to approximately solve the intractable inference problem using the path integral control approach. The learned dynamical model can be used to predict and plan the future states; we also present the efficient planning method that exploits the learned low-dimensional latent dynamics. Numerical experiments show that the proposed path-integral control based variational inference method leads to tighter lower bounds in statistical model learning of sequential data. Supplementary video: https://youtu.be/xCp35crUoLQ |
Tasks | Representation Learning |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/8108-adaptive-path-integral-autoencoders-representation-learning-and-planning-for-dynamical-systems |
http://papers.nips.cc/paper/8108-adaptive-path-integral-autoencoders-representation-learning-and-planning-for-dynamical-systems.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-path-integral-autoencoders |
Repo | |
Framework | |
Developing New Linguistic Resources and Tools for the Galician Language
Title | Developing New Linguistic Resources and Tools for the Galician Language |
Authors | Rodrigo Agerri, Xavier G{'o}mez Guinovart, German Rigau, Miguel Anxo Solla Portela |
Abstract | |
Tasks | Lemmatization, Named Entity Recognition, Word Sense Disambiguation |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1367/ |
https://www.aclweb.org/anthology/L18-1367 | |
PWC | https://paperswithcode.com/paper/developing-new-linguistic-resources-and-tools |
Repo | |
Framework | |
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018
Title | The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018 |
Authors | Ngoc-Quan Pham, Jan Niehues, Alex Waibel, er |
Abstract | We present our experiments in the scope of the news translation task in WMT 2018, in directions: English→German. The core of our systems is the encoder-decoder based neural machine translation models using the transformer architecture. We enhanced the model with a deeper architecture. By using techniques to limit the memory consumption, we were able to train models that are 4 times larger on one GPU and improve the performance by 1.2 BLEU points. Furthermore, we performed sentence selection for the newly available ParaCrawl corpus. Thereby, we could improve the effectiveness of the corpus by 0.5 BLEU points. |
Tasks | Machine Translation, Tokenization |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6422/ |
https://www.aclweb.org/anthology/W18-6422 | |
PWC | https://paperswithcode.com/paper/the-karlsruhe-institute-of-technology-systems |
Repo | |
Framework | |
A Linguistically-Informed Fusion Approach for Multimodal Depression Detection
Title | A Linguistically-Informed Fusion Approach for Multimodal Depression Detection |
Authors | Michelle Morales, Stefan Scherer, Rivka Levitan |
Abstract | Automated depression detection is inherently a multimodal problem. Therefore, it is critical that researchers investigate fusion techniques for multimodal design. This paper presents the first-ever comprehensive study of fusion techniques for depression detection. In addition, we present novel linguistically-motivated fusion techniques, which we find outperform existing approaches. |
Tasks | |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-0602/ |
https://www.aclweb.org/anthology/W18-0602 | |
PWC | https://paperswithcode.com/paper/a-linguistically-informed-fusion-approach-for |
Repo | |
Framework | |
The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018
Title | The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018 |
Authors | Julian Schamper, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, Hermann Ney |
Abstract | This paper describes the statistical machine translation systems developed at RWTH Aachen University for the German→English, English→Turkish and Chinese→English translation tasks of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles of neural machine translation systems based on the Transformer architecture. Our main focus is on the German→English task where we to all automatic scored first with respect metrics provided by the organizers. We identify data selection, fine-tuning, batch size and model dimension as important hyperparameters. In total we improve by 6.8{%} BLEU over our last year{'}s submission and by 4.8{%} BLEU over the winning system of the 2017 German→English task. In English→Turkish task, we show 3.6{%} BLEU improvement over the last year{'}s winning system. We further report results on the Chinese→English task where we improve 2.2{%} BLEU on average over our baseline systems but stay behind the 2018 winning systems. |
Tasks | Machine Translation, Tokenization |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6426/ |
https://www.aclweb.org/anthology/W18-6426 | |
PWC | https://paperswithcode.com/paper/the-rwth-aachen-university-supervised-machine |
Repo | |
Framework | |
Reusable workflows for gender prediction
Title | Reusable workflows for gender prediction |
Authors | Matej Martinc, Senja Pollak |
Abstract | |
Tasks | Feature Engineering, Gender Prediction, Text Classification |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1082/ |
https://www.aclweb.org/anthology/L18-1082 | |
PWC | https://paperswithcode.com/paper/reusable-workflows-for-gender-prediction |
Repo | |
Framework | |
Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global
Title | Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global |
Authors | Thomas Laurent, James Brecht |
Abstract | We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=1993 |
http://proceedings.mlr.press/v80/laurent18a/laurent18a.pdf | |
PWC | https://paperswithcode.com/paper/deep-linear-networks-with-arbitrary-loss-all |
Repo | |
Framework | |
Mutual Information Neural Estimation
Title | Mutual Information Neural Estimation |
Authors | Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm |
Abstract | We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement the Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2440 |
http://proceedings.mlr.press/v80/belghazi18a/belghazi18a.pdf | |
PWC | https://paperswithcode.com/paper/mutual-information-neural-estimation |
Repo | |
Framework | |
Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation
Title | Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation |
Authors | Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, Thomas S. Huang |
Abstract | Despite remarkable progress, weakly supervised segmentation methods are still inferior to their fully supervised counterparts. We obverse that the performance gap mainly comes from the inability of producing dense and integral pixel-level object localization for training images only with image-level labels. In this work, we revisit the dilated convolution proposed in [1] and shed light on how it enables the classification network to generate dense object localization. By substantially enlarging the receptive fields of convolutional kernels with different dilation rates, the classification network can localize the object regions even when they are not so discriminative for classification and finally produce reliable object regions for benefiting both weakly- and semi- supervised semantic segmentation. Despite the apparent simplicity of dilated convolution, we are able to obtain superior performance for semantic segmentation tasks. In particular, it achieves 60.8% and 67.6% mean Intersection-over-Union (mIoU) on Pascal VOC 2012 test set in weakly- (only image-level labels are available) and semi- (1,464 segmentation masks are available) settings, which are the new state-of-the-arts. |
Tasks | Object Localization, Semantic Segmentation, Semi-Supervised Semantic Segmentation |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Wei_Revisiting_Dilated_Convolution_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Wei_Revisiting_Dilated_Convolution_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-dilated-convolution-a-simple |
Repo | |
Framework | |
A Neural Question Answering Model Based on Semi-Structured Tables
Title | A Neural Question Answering Model Based on Semi-Structured Tables |
Authors | Hao Wang, Xiaodong Zhang, Shuming Ma, Xu Sun, Houfeng Wang, Mengxiang Wang |
Abstract | Most question answering (QA) systems are based on raw text and structured knowledge graph. However, raw text corpora are hard for QA system to understand, and structured knowledge graph needs intensive manual work, while it is relatively easy to obtain semi-structured tables from many sources directly, or build them automatically. In this paper, we build an end-to-end system to answer multiple choice questions with semi-structured tables as its knowledge. Our system answers queries by two steps. First, it finds the most similar tables. Then the system measures the relevance between each question and candidate table cells, and choose the most related cell as the source of answer. The system is evaluated with TabMCQ dataset, and gets a huge improvement compared to the state of the art. |
Tasks | Knowledge Graphs, Question Answering |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1165/ |
https://www.aclweb.org/anthology/C18-1165 | |
PWC | https://paperswithcode.com/paper/a-neural-question-answering-model-based-on |
Repo | |
Framework | |
A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication
Title | A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication |
Authors | Peng Jiang, Gagan Agrawal |
Abstract | The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous works have demonstrated the potential of using gradient sparsification and quantization to reduce the communication cost. However, there is still a lack of understanding about how sparse and quantized communication affects the convergence rate of the training algorithm. In this paper, we study the convergence rate of distributed SGD for non-convex optimization with two communication reducing strategies: sparse parameter averaging and gradient quantization. We show that $O(1/\sqrt{MK})$ convergence rate can be achieved if the sparsification and quantization hyperparameters are configured properly. We also propose a strategy called periodic quantized averaging (PQASGD) that further reduces the communication cost while preserving the $O(1/\sqrt{MK})$ convergence rate. Our evaluation validates our theoretical results and shows that our PQASGD can converge as fast as full-communication SGD with only $3%-5%$ communication data size. |
Tasks | Quantization |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication |
http://papers.nips.cc/paper/7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication.pdf | |
PWC | https://paperswithcode.com/paper/a-linear-speedup-analysis-of-distributed-deep |
Repo | |
Framework | |
Provable Variable Selection for Streaming Features
Title | Provable Variable Selection for Streaming Features |
Authors | Jing Wang, Jie Shen, Ping Li |
Abstract | In large-scale machine learning applications and high-dimensional statistics, it is ubiquitous to address a considerable number of features among which many are redundant. As a remedy, online feature selection has attracted increasing attention in recent years. It sequentially reveals features and evaluates the importance of them. Though online feature selection has proven an elegant methodology, it is usually challenging to carry out a rigorous theoretical characterization. In this work, we propose a provable online feature selection algorithm that utilizes the online leverage score. The selected features are then fed to $k$-means clustering, making the clustering step memory and computationally efficient. We prove that with high probability, performing $k$-means clustering based on the selected feature space does not deviate far from the optimal clustering using the original data. The empirical results on real-world data sets demonstrate the effectiveness of our algorithm. |
Tasks | Feature Selection |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2352 |
http://proceedings.mlr.press/v80/wang18g/wang18g.pdf | |
PWC | https://paperswithcode.com/paper/provable-variable-selection-for-streaming |
Repo | |
Framework | |
Revisiting the Task of Scoring Open IE Relations
Title | Revisiting the Task of Scoring Open IE Relations |
Authors | William L{'e}chelle, Philippe Langlais |
Abstract | |
Tasks | Knowledge Base Completion, Language Modelling, Open Information Extraction |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1323/ |
https://www.aclweb.org/anthology/L18-1323 | |
PWC | https://paperswithcode.com/paper/revisiting-the-task-of-scoring-open-ie |
Repo | |
Framework | |