October 21, 2019

3067 words 15 mins read

Paper Group AWR 90

Unsupervised Statistical Machine Translation. Towards Deep Conversational Recommendations. Real-time Automatic Word Segmentation for User-generated Text. Sentences with Gapping: Parsing and Reconstructing Elided Predicates. Understanding Measures of Uncertainty for Adversarial Example Detection. Neural Machine Translation for Query Construction and …

Unsupervised Statistical Machine Translation


Title	Unsupervised Statistical Machine Translation
Authors	Mikel Artetxe, Gorka Labaka, Eneko Agirre
Abstract	While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018). Despite the potential of this approach for low-resource settings, existing systems are far behind their supervised counterparts, limiting their practical interest. In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems. Our method profits from the modular architecture of SMT: we first induce a phrase table from monolingual corpora through cross-lingual embedding mappings, combine it with an n-gram language model, and fine-tune hyperparameters through an unsupervised MERT variant. In addition, iterative backtranslation improves results further, yielding, for instance, 14.08 and 26.22 BLEU points in WMT 2014 English-German and English-French, respectively, an improvement of more than 7-10 BLEU points over previous unsupervised systems, and closing the gap with supervised SMT (Moses trained on Europarl) down to 2-5 BLEU points. Our implementation is available at https://github.com/artetxem/monoses
Tasks	Language Modelling, Machine Translation, Unsupervised Machine Translation
Published	2018-09-04
URL	http://arxiv.org/abs/1809.01272v1
PDF	http://arxiv.org/pdf/1809.01272v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-statistical-machine-translation
Repo	https://github.com/artetxem/monoses
Framework	pytorch

Towards Deep Conversational Recommendations


Title	Towards Deep Conversational Recommendations
Authors	Raymond Li, Samira Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, Chris Pal
Abstract	There has been growing interest in using neural networks and deep learning techniques to create dialogue systems. Conversational recommendation is an interesting setting for the scientific exploration of dialogue with natural language as the associated discourse involves goal-driven dialogue that often transforms naturally into more free-form chat. This paper provides two contributions. First, until now there has been no publicly available large-scale dataset consisting of real-world dialogues centered around recommendations. To address this issue and to facilitate our exploration here, we have collected ReDial, a dataset consisting of over 10,000 conversations centered around the theme of providing movie recommendations. We make this data available to the community for further research. Second, we use this dataset to explore multiple facets of conversational recommendations. In particular we explore new neural architectures, mechanisms, and methods suitable for composing conversational recommendation systems. Our dataset allows us to systematically probe model sub-components addressing different parts of the overall problem domain ranging from: sentiment analysis and cold-start recommendation generation to detailed aspects of how natural language is used in this setting in the real world. We combine such sub-components into a full-blown dialogue system and examine its behavior.
Tasks	Recommendation Systems, Sentiment Analysis
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07617v2
PDF	http://arxiv.org/pdf/1812.07617v2.pdf
PWC	https://paperswithcode.com/paper/towards-deep-conversational-recommendations
Repo	https://github.com/RaymondLi0/conversational-recommendations
Framework	pytorch

Real-time Automatic Word Segmentation for User-generated Text


Title	Real-time Automatic Word Segmentation for User-generated Text
Authors	Won Ik Cho, Sung Jun Cheon, Woo Hyun Kang, Ji Won Kim, Nam Soo Kim
Abstract	For readability and possibly for disambiguation, appropriate word segmentation is recommended for written text. In this paper, we propose a real-time assistive technology that utilizes an automatic segmentation. The language investigated is Korean, a head-final language with various morpho-syllabic blocks as characters. The training scheme is fully neural network-based and straightforward. Besides, we show how the proposed system can be utilized in a web-based real-time revision for a user-generated text. With qualitative and quantitative comparison with widely used text processing toolkits, we show the reliability of the proposed system and how it fits with conversation-style and non-canonical texts. The demonstration is available online.
Tasks
Published	2018-10-31
URL	https://arxiv.org/abs/1810.13113v2
PDF	https://arxiv.org/pdf/1810.13113v2.pdf
PWC	https://paperswithcode.com/paper/real-time-automatic-word-segmentation-for
Repo	https://github.com/warnikchow/raws
Framework	tf

Sentences with Gapping: Parsing and Reconstructing Elided Predicates


Title	Sentences with Gapping: Parsing and Reconstructing Elided Predicates
Authors	Sebastian Schuster, Joakim Nivre, Christopher D. Manning
Abstract	Sentences with gapping, such as Paul likes coffee and Mary tea, lack an overt predicate to indicate the relation between two or more arguments. Surface syntax representations of such sentences are often produced poorly by parsers, and even if correct, not well suited to downstream natural language understanding tasks such as relation extraction that are typically designed to extract information from sentences with canonical clause structure. In this paper, we present two methods for parsing to a Universal Dependencies graph representation that explicitly encodes the elided material with additional nodes and edges. We find that both methods can reconstruct elided material from dependency trees with high accuracy when the parser correctly predicts the existence of a gap. We further demonstrate that one of our methods can be applied to other languages based on a case study on Swedish.
Tasks	Relation Extraction
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06922v1
PDF	http://arxiv.org/pdf/1804.06922v1.pdf
PWC	https://paperswithcode.com/paper/sentences-with-gapping-parsing-and
Repo	https://github.com/dialogue-evaluation/AGRR-2019
Framework	none

Understanding Measures of Uncertainty for Adversarial Example Detection


Title	Understanding Measures of Uncertainty for Adversarial Example Detection
Authors	Lewis Smith, Yarin Gal
Abstract	Measuring uncertainty is a promising technique for detecting adversarial examples, crafted inputs on which the model predicts an incorrect class with high confidence. But many measures of uncertainty exist, including predictive en- tropy and mutual information, each capturing different types of uncertainty. We study these measures, and shed light on why mutual information seems to be effective at the task of adversarial example detection. We highlight failure modes for MC dropout, a widely used approach for estimating uncertainty in deep models. This leads to an improved understanding of the drawbacks of current methods, and a proposal to improve the quality of uncertainty estimates using probabilistic model ensembles. We give illustrative experiments using MNIST to demonstrate the intuition underlying the different measures of uncertainty, as well as experiments on a real world Kaggle dogs vs cats classification dataset.
Tasks
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08533v1
PDF	http://arxiv.org/pdf/1803.08533v1.pdf
PWC	https://paperswithcode.com/paper/understanding-measures-of-uncertainty-for
Repo	https://github.com/lsgos/uncertainty-adversarial-paper
Framework	tf

Neural Machine Translation for Query Construction and Composition


Title	Neural Machine Translation for Query Construction and Composition
Authors	Tommaso Soru, Edgard Marx, André Valdestilhas, Diego Esteves, Diego Moussallem, Gustavo Publio
Abstract	Research on question answering with knowledge base has recently seen an increasing use of deep architectures. In this extended abstract, we study the application of the neural machine translation paradigm for question parsing. We employ a sequence-to-sequence model to learn graph patterns in the SPARQL graph query language and their compositions. Instead of inducing the programs through question-answer pairs, we expect a semi-supervised approach, where alignments between questions and queries are built through templates. We argue that the coverage of language utterances can be expanded using late notable works in natural language generation.
Tasks	Code Generation, Knowledge Base Question Answering, Machine Translation, Semantic Parsing
Published	2018-06-27
URL	http://arxiv.org/abs/1806.10478v2
PDF	http://arxiv.org/pdf/1806.10478v2.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-for-query
Repo	https://github.com/AKSW/NSpM
Framework	tf

An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks


Title	An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks
Authors	Qianxiao Li, Shuji Hao
Abstract	Deep learning is formulated as a discrete-time optimal control problem. This allows one to characterize necessary conditions for optimality and develop training algorithms that do not rely on gradients with respect to the trainable parameters. In particular, we introduce the discrete-time method of successive approximations (MSA), which is based on the Pontryagin’s maximum principle, for training neural networks. A rigorous error estimate for the discrete MSA is obtained, which sheds light on its dynamics and the means to stabilize the algorithm. The developed methods are applied to train, in a rather principled way, neural networks with weights that are constrained to take values in a discrete set. We obtain competitive performance and interestingly, very sparse weights in the case of ternary networks, which may be useful in model deployment in low-memory devices.
Tasks
Published	2018-03-04
URL	http://arxiv.org/abs/1803.01299v2
PDF	http://arxiv.org/pdf/1803.01299v2.pdf
PWC	https://paperswithcode.com/paper/an-optimal-control-approach-to-deep-learning
Repo	https://github.com/LiQianxiao/discrete-MSA
Framework	tf

Challenges in High-dimensional Reinforcement Learning with Evolution Strategies


Title	Challenges in High-dimensional Reinforcement Learning with Evolution Strategies
Authors	Nils Müller, Tobias Glasmachers
Abstract	Evolution Strategies (ESs) have recently become popular for training deep neural networks, in particular on reinforcement learning tasks, a special form of controller design. Compared to classic problems in continuous direct search, deep networks pose extremely high-dimensional optimization problems, with many thousands or even millions of variables. In addition, many control problems give rise to a stochastic fitness function. Considering the relevance of the application, we study the suitability of evolution strategies for high-dimensional, stochastic problems. Our results give insights into which algorithmic mechanisms of modern ES are of value for the class of problems at hand, and they reveal principled limitations of the approach. They are in line with our theoretical understanding of ESs. We show that combining ESs that offer reduced internal algorithm cost with uncertainty handling techniques yields promising methods for this class of problems.
Tasks
Published	2018-06-04
URL	http://arxiv.org/abs/1806.01224v2
PDF	http://arxiv.org/pdf/1806.01224v2.pdf
PWC	https://paperswithcode.com/paper/challenges-in-high-dimensional-reinforcement
Repo	https://github.com/NiMlr/High-Dim-ES-RL
Framework	tf

Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task


Title	Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task
Authors	Jan Vanek, Josef Michalek, Josef Psutka
Abstract	In this paper, we have investigated recurrent deep neural networks (DNNs) in combination with regularization techniques as dropout, zoneout, and regularization post-layer. As a benchmark, we chose the TIMIT phone recognition task due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition task. In recent years, recurrent DNNs pushed the error rates in automatic speech recognition down. But, there was no clear winner in proposed architectures. The dropout was used as the regularization technique in most cases, but combination with other regularization techniques together with model ensembles was omitted. However, just an ensemble of recurrent DNNs performed best and achieved an average phone error rate from 10 experiments 14.84 % (minimum 14.69 %) on core test set that is slightly lower then the best-published PER to date, according to our knowledge. Finally, in contrast of the most papers, we published the open-source scripts to easily replicate the results and to help continue the development.
Tasks	Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07186v1
PDF	http://arxiv.org/pdf/1806.07186v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-dnns-and-its-ensembles-on-the-timit
Repo	https://github.com/OrcusCZ/NNAcousticModeling
Framework	none

Smooth Loss Functions for Deep Top-k Classification


Title	Smooth Loss Functions for Deep Top-k Classification
Authors	Leonard Berrada, Andrew Zisserman, M. Pawan Kumar
Abstract	The top-k error is a common measure of performance in machine learning and computer vision. In practice, top-k classification is typically performed with deep neural networks trained with the cross-entropy loss. Theoretical results indeed suggest that cross-entropy is an optimal learning objective for such a task in the limit of infinite data. In the context of limited and noisy data however, the use of a loss function that is specifically designed for top-k classification can bring significant improvements. Our empirical evidence suggests that the loss function must be smooth and have non-sparse gradients in order to work well with deep neural networks. Consequently, we introduce a family of smoothed loss functions that are suited to top-k optimization via deep learning. The widely used cross-entropy is a special case of our family. Evaluating our smooth loss functions is computationally challenging: a na"ive algorithm would require $\mathcal{O}(\binom{n}{k})$ operations, where n is the number of classes. Thanks to a connection to polynomial algebra and a divide-and-conquer approach, we provide an algorithm with a time complexity of $\mathcal{O}(k n)$. Furthermore, we present a novel approximation to obtain fast and stable algorithms on GPUs with single floating point precision. We compare the performance of the cross-entropy loss and our margin-based losses in various regimes of noise and data size, for the predominant use case of k=5. Our investigation reveals that our loss is more robust to noise and overfitting than cross-entropy.
Tasks
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07595v1
PDF	http://arxiv.org/pdf/1802.07595v1.pdf
PWC	https://paperswithcode.com/paper/smooth-loss-functions-for-deep-top-k
Repo	https://github.com/oval-group/smooth-topk
Framework	pytorch

Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams


Title	Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams
Authors	Tam Le, Makoto Yamada
Abstract	Algebraic topology methods have recently played an important role for statistical analysis with complicated geometric structured data such as shapes, linked twist maps, and material data. Among them, \textit{persistent homology} is a well-known tool to extract robust topological features, and outputs as \textit{persistence diagrams} (PDs). However, PDs are point multi-sets which can not be used in machine learning algorithms for vector data. To deal with it, an emerged approach is to use kernel methods, and an appropriate geometry for PDs is an important factor to measure the similarity of PDs. A popular geometry for PDs is the \textit{Wasserstein metric}. However, Wasserstein distance is not \textit{negative definite}. Thus, it is limited to build positive definite kernels upon the Wasserstein distance \textit{without approximation}. In this work, we rely upon the alternative \textit{Fisher information geometry} to propose a positive definite kernel for PDs \textit{without approximation}, namely the Persistence Fisher (PF) kernel. Then, we analyze eigensystem of the integral operator induced by the proposed kernel for kernel machines. Based on that, we derive generalization error bounds via covering numbers and Rademacher averages for kernel machines with the PF kernel. Additionally, we show some nice properties such as stability and infinite divisibility for the proposed kernel. Furthermore, we also propose a linear time complexity over the number of points in PDs for an approximation of our proposed kernel with a bounded error. Throughout experiments with many different tasks on various benchmark datasets, we illustrate that the PF kernel compares favorably with other baseline kernels for PDs.
Tasks
Published	2018-02-10
URL	http://arxiv.org/abs/1802.03569v5
PDF	http://arxiv.org/pdf/1802.03569v5.pdf
PWC	https://paperswithcode.com/paper/persistence-fisher-kernel-a-riemannian
Repo	https://github.com/lttam/PersistenceFisher
Framework	none

The Contextual Loss for Image Transformation with Non-Aligned Data


Title	The Contextual Loss for Image Transformation with Non-Aligned Data
Authors	Roey Mechrez, Itamar Talmi, Lihi Zelnik-Manor
Abstract	Feed-forward CNNs trained for image transformation problems rely on loss functions that measure the similarity between the generated image and a target image. Most of the common loss functions assume that these images are spatially aligned and compare pixels at corresponding locations. However, for many tasks, aligned training pairs of images will not be available. We present an alternative loss function that does not require alignment, thus providing an effective and simple solution for a new space of problems. Our loss is based on both context and semantics – it compares regions with similar semantic meaning, while considering the context of the entire image. Hence, for example, when transferring the style of one face to another, it will translate eyes-to-eyes and mouth-to-mouth. Our code can be found at https://www.github.com/roimehrez/contextualLoss
Tasks
Published	2018-03-06
URL	http://arxiv.org/abs/1803.02077v4
PDF	http://arxiv.org/pdf/1803.02077v4.pdf
PWC	https://paperswithcode.com/paper/the-contextual-loss-for-image-transformation
Repo	https://github.com/S-aiueo32/contextual_loss_pytorch
Framework	pytorch

The Deep Weight Prior


Title	The Deep Weight Prior
Authors	Andrei Atanov, Arsenii Ashukha, Kirill Struminsky, Dmitry Vetrov, Max Welling
Abstract	Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights. We define DWP in the form of an implicit distribution and propose a method for variational inference with such type of implicit priors. In experiments, we show that DWP improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from DWP accelerates training of conventional convolutional neural networks.
Tasks	Bayesian Inference
Published	2018-10-16
URL	http://arxiv.org/abs/1810.06943v6
PDF	http://arxiv.org/pdf/1810.06943v6.pdf
PWC	https://paperswithcode.com/paper/the-deep-weight-prior
Repo	https://github.com/matyushinleonid/WeightPrior
Framework	pytorch

Towards Machine Learning on data from Professional Cyclists


Title	Towards Machine Learning on data from Professional Cyclists
Authors	Agrin Hilmkil, Oscar Ivarsson, Moa Johansson, Dan Kuylenstierna, Teun van Erp
Abstract	Professional sports are developing towards increasingly scientific training methods with increasing amounts of data being collected from laboratory tests, training sessions and competitions. In cycling, it is standard to equip bicycles with small computers recording data from sensors such as power-meters, in addition to heart-rate, speed, altitude etc. Recently, machine learning techniques have provided huge success in a wide variety of areas where large amounts of data (big data) is available. In this paper, we perform a pilot experiment on machine learning to model physical response in elite cyclists. As a first experiment, we show that it is possible to train a LSTM machine learning algorithm to predict the heart-rate response of a cyclist during a training session. This work is a promising first step towards developing more elaborate models based on big data and machine learning to capture performance aspects of athletes.
Tasks
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00198v1
PDF	http://arxiv.org/pdf/1808.00198v1.pdf
PWC	https://paperswithcode.com/paper/towards-machine-learning-on-data-from
Repo	https://github.com/agrinh/procyclist_performance
Framework	tf

Bringing Cartoons to Life: Towards Improved Cartoon Face Detection and Recognition Systems


Title	Bringing Cartoons to Life: Towards Improved Cartoon Face Detection and Recognition Systems
Authors	Saurav Jha, Nikhil Agarwal, Suneeta Agarwal
Abstract	Given the recent deep learning advancements in face detection and recognition techniques for human faces, this paper answers the question “how well would they work for cartoons’?” - a domain that remains largely unexplored until recently, mainly due to the unavailability of large scale datasets and the failure of traditional methods on these. Our work studies and extends multiple frameworks for the aforementioned tasks. For face detection, we incorporate the Multi-task Cascaded Convolutional Network (MTCNN) architecture and contrast it with conventional methods. For face recognition, our two-fold contributions include: (i) an inductive transfer learning approach combining the feature learning capability of the Inception v3 network and the feature recognizing capability of Support Vector Machines (SVMs), (ii) a proposed Hybrid Convolutional Neural Network (HCNN) framework trained over a fusion of pixel values and 15 manually located facial keypoints. All the methods are evaluated on the Cartoon Faces in the Wild (IIIT-CFW) database. We demonstrate that the HCNN model offers stability superior to that of Inception+SVM over larger input variations, and explore the plausible architectural principles. We show that the Inception+SVM model establishes a state-of-the-art F1 score on the task of gender recognition of cartoon faces. Further, we introduce a small database hosting location coordinates of 15 points on the cartoon faces belonging to 50 public figures of the IIIT-CFW database.
Tasks	Face Detection, Face Recognition, Transfer Learning
Published	2018-04-05
URL	http://arxiv.org/abs/1804.01753v2
PDF	http://arxiv.org/pdf/1804.01753v2.pdf
PWC	https://paperswithcode.com/paper/bringing-cartoons-to-life-towards-improved
Repo	https://github.com/Saurav0074/Cartoon-Face-Detection-and-Recognition
Framework	none