Paper Group AWR 90
Unsupervised Statistical Machine Translation. Towards Deep Conversational Recommendations. Real-time Automatic Word Segmentation for User-generated Text. Sentences with Gapping: Parsing and Reconstructing Elided Predicates. Understanding Measures of Uncertainty for Adversarial Example Detection. Neural Machine Translation for Query Construction and …
Unsupervised Statistical Machine Translation
Title | Unsupervised Statistical Machine Translation |
Authors | Mikel Artetxe, Gorka Labaka, Eneko Agirre |
Abstract | While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018). Despite the potential of this approach for low-resource settings, existing systems are far behind their supervised counterparts, limiting their practical interest. In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems. Our method profits from the modular architecture of SMT: we first induce a phrase table from monolingual corpora through cross-lingual embedding mappings, combine it with an n-gram language model, and fine-tune hyperparameters through an unsupervised MERT variant. In addition, iterative backtranslation improves results further, yielding, for instance, 14.08 and 26.22 BLEU points in WMT 2014 English-German and English-French, respectively, an improvement of more than 7-10 BLEU points over previous unsupervised systems, and closing the gap with supervised SMT (Moses trained on Europarl) down to 2-5 BLEU points. Our implementation is available at https://github.com/artetxem/monoses |
Tasks | Language Modelling, Machine Translation, Unsupervised Machine Translation |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.01272v1 |
http://arxiv.org/pdf/1809.01272v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-statistical-machine-translation |
Repo | https://github.com/artetxem/monoses |
Framework | pytorch |
Towards Deep Conversational Recommendations
Title | Towards Deep Conversational Recommendations |
Authors | Raymond Li, Samira Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, Chris Pal |
Abstract | There has been growing interest in using neural networks and deep learning techniques to create dialogue systems. Conversational recommendation is an interesting setting for the scientific exploration of dialogue with natural language as the associated discourse involves goal-driven dialogue that often transforms naturally into more free-form chat. This paper provides two contributions. First, until now there has been no publicly available large-scale dataset consisting of real-world dialogues centered around recommendations. To address this issue and to facilitate our exploration here, we have collected ReDial, a dataset consisting of over 10,000 conversations centered around the theme of providing movie recommendations. We make this data available to the community for further research. Second, we use this dataset to explore multiple facets of conversational recommendations. In particular we explore new neural architectures, mechanisms, and methods suitable for composing conversational recommendation systems. Our dataset allows us to systematically probe model sub-components addressing different parts of the overall problem domain ranging from: sentiment analysis and cold-start recommendation generation to detailed aspects of how natural language is used in this setting in the real world. We combine such sub-components into a full-blown dialogue system and examine its behavior. |
Tasks | Recommendation Systems, Sentiment Analysis |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07617v2 |
http://arxiv.org/pdf/1812.07617v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-deep-conversational-recommendations |
Repo | https://github.com/RaymondLi0/conversational-recommendations |
Framework | pytorch |
Real-time Automatic Word Segmentation for User-generated Text
Title | Real-time Automatic Word Segmentation for User-generated Text |
Authors | Won Ik Cho, Sung Jun Cheon, Woo Hyun Kang, Ji Won Kim, Nam Soo Kim |
Abstract | For readability and possibly for disambiguation, appropriate word segmentation is recommended for written text. In this paper, we propose a real-time assistive technology that utilizes an automatic segmentation. The language investigated is Korean, a head-final language with various morpho-syllabic blocks as characters. The training scheme is fully neural network-based and straightforward. Besides, we show how the proposed system can be utilized in a web-based real-time revision for a user-generated text. With qualitative and quantitative comparison with widely used text processing toolkits, we show the reliability of the proposed system and how it fits with conversation-style and non-canonical texts. The demonstration is available online. |
Tasks | |
Published | 2018-10-31 |
URL | https://arxiv.org/abs/1810.13113v2 |
https://arxiv.org/pdf/1810.13113v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-automatic-word-segmentation-for |
Repo | https://github.com/warnikchow/raws |
Framework | tf |
Sentences with Gapping: Parsing and Reconstructing Elided Predicates
Title | Sentences with Gapping: Parsing and Reconstructing Elided Predicates |
Authors | Sebastian Schuster, Joakim Nivre, Christopher D. Manning |
Abstract | Sentences with gapping, such as Paul likes coffee and Mary tea, lack an overt predicate to indicate the relation between two or more arguments. Surface syntax representations of such sentences are often produced poorly by parsers, and even if correct, not well suited to downstream natural language understanding tasks such as relation extraction that are typically designed to extract information from sentences with canonical clause structure. In this paper, we present two methods for parsing to a Universal Dependencies graph representation that explicitly encodes the elided material with additional nodes and edges. We find that both methods can reconstruct elided material from dependency trees with high accuracy when the parser correctly predicts the existence of a gap. We further demonstrate that one of our methods can be applied to other languages based on a case study on Swedish. |
Tasks | Relation Extraction |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06922v1 |
http://arxiv.org/pdf/1804.06922v1.pdf | |
PWC | https://paperswithcode.com/paper/sentences-with-gapping-parsing-and |
Repo | https://github.com/dialogue-evaluation/AGRR-2019 |
Framework | none |
Understanding Measures of Uncertainty for Adversarial Example Detection
Title | Understanding Measures of Uncertainty for Adversarial Example Detection |
Authors | Lewis Smith, Yarin Gal |
Abstract | Measuring uncertainty is a promising technique for detecting adversarial examples, crafted inputs on which the model predicts an incorrect class with high confidence. But many measures of uncertainty exist, including predictive en- tropy and mutual information, each capturing different types of uncertainty. We study these measures, and shed light on why mutual information seems to be effective at the task of adversarial example detection. We highlight failure modes for MC dropout, a widely used approach for estimating uncertainty in deep models. This leads to an improved understanding of the drawbacks of current methods, and a proposal to improve the quality of uncertainty estimates using probabilistic model ensembles. We give illustrative experiments using MNIST to demonstrate the intuition underlying the different measures of uncertainty, as well as experiments on a real world Kaggle dogs vs cats classification dataset. |
Tasks | |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08533v1 |
http://arxiv.org/pdf/1803.08533v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-measures-of-uncertainty-for |
Repo | https://github.com/lsgos/uncertainty-adversarial-paper |
Framework | tf |
Neural Machine Translation for Query Construction and Composition
Title | Neural Machine Translation for Query Construction and Composition |
Authors | Tommaso Soru, Edgard Marx, André Valdestilhas, Diego Esteves, Diego Moussallem, Gustavo Publio |
Abstract | Research on question answering with knowledge base has recently seen an increasing use of deep architectures. In this extended abstract, we study the application of the neural machine translation paradigm for question parsing. We employ a sequence-to-sequence model to learn graph patterns in the SPARQL graph query language and their compositions. Instead of inducing the programs through question-answer pairs, we expect a semi-supervised approach, where alignments between questions and queries are built through templates. We argue that the coverage of language utterances can be expanded using late notable works in natural language generation. |
Tasks | Code Generation, Knowledge Base Question Answering, Machine Translation, Semantic Parsing |
Published | 2018-06-27 |
URL | http://arxiv.org/abs/1806.10478v2 |
http://arxiv.org/pdf/1806.10478v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-for-query |
Repo | https://github.com/AKSW/NSpM |
Framework | tf |
An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks
Title | An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks |
Authors | Qianxiao Li, Shuji Hao |
Abstract | Deep learning is formulated as a discrete-time optimal control problem. This allows one to characterize necessary conditions for optimality and develop training algorithms that do not rely on gradients with respect to the trainable parameters. In particular, we introduce the discrete-time method of successive approximations (MSA), which is based on the Pontryagin’s maximum principle, for training neural networks. A rigorous error estimate for the discrete MSA is obtained, which sheds light on its dynamics and the means to stabilize the algorithm. The developed methods are applied to train, in a rather principled way, neural networks with weights that are constrained to take values in a discrete set. We obtain competitive performance and interestingly, very sparse weights in the case of ternary networks, which may be useful in model deployment in low-memory devices. |
Tasks | |
Published | 2018-03-04 |
URL | http://arxiv.org/abs/1803.01299v2 |
http://arxiv.org/pdf/1803.01299v2.pdf | |
PWC | https://paperswithcode.com/paper/an-optimal-control-approach-to-deep-learning |
Repo | https://github.com/LiQianxiao/discrete-MSA |
Framework | tf |
Challenges in High-dimensional Reinforcement Learning with Evolution Strategies
Title | Challenges in High-dimensional Reinforcement Learning with Evolution Strategies |
Authors | Nils Müller, Tobias Glasmachers |
Abstract | Evolution Strategies (ESs) have recently become popular for training deep neural networks, in particular on reinforcement learning tasks, a special form of controller design. Compared to classic problems in continuous direct search, deep networks pose extremely high-dimensional optimization problems, with many thousands or even millions of variables. In addition, many control problems give rise to a stochastic fitness function. Considering the relevance of the application, we study the suitability of evolution strategies for high-dimensional, stochastic problems. Our results give insights into which algorithmic mechanisms of modern ES are of value for the class of problems at hand, and they reveal principled limitations of the approach. They are in line with our theoretical understanding of ESs. We show that combining ESs that offer reduced internal algorithm cost with uncertainty handling techniques yields promising methods for this class of problems. |
Tasks | |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.01224v2 |
http://arxiv.org/pdf/1806.01224v2.pdf | |
PWC | https://paperswithcode.com/paper/challenges-in-high-dimensional-reinforcement |
Repo | https://github.com/NiMlr/High-Dim-ES-RL |
Framework | tf |
Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task
Title | Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task |
Authors | Jan Vanek, Josef Michalek, Josef Psutka |
Abstract | In this paper, we have investigated recurrent deep neural networks (DNNs) in combination with regularization techniques as dropout, zoneout, and regularization post-layer. As a benchmark, we chose the TIMIT phone recognition task due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition task. In recent years, recurrent DNNs pushed the error rates in automatic speech recognition down. But, there was no clear winner in proposed architectures. The dropout was used as the regularization technique in most cases, but combination with other regularization techniques together with model ensembles was omitted. However, just an ensemble of recurrent DNNs performed best and achieved an average phone error rate from 10 experiments 14.84 % (minimum 14.69 %) on core test set that is slightly lower then the best-published PER to date, according to our knowledge. Finally, in contrast of the most papers, we published the open-source scripts to easily replicate the results and to help continue the development. |
Tasks | Large Vocabulary Continuous Speech Recognition, Speech Recognition |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07186v1 |
http://arxiv.org/pdf/1806.07186v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-dnns-and-its-ensembles-on-the-timit |
Repo | https://github.com/OrcusCZ/NNAcousticModeling |
Framework | none |
Smooth Loss Functions for Deep Top-k Classification
Title | Smooth Loss Functions for Deep Top-k Classification |
Authors | Leonard Berrada, Andrew Zisserman, M. Pawan Kumar |
Abstract | The top-k error is a common measure of performance in machine learning and computer vision. In practice, top-k classification is typically performed with deep neural networks trained with the cross-entropy loss. Theoretical results indeed suggest that cross-entropy is an optimal learning objective for such a task in the limit of infinite data. In the context of limited and noisy data however, the use of a loss function that is specifically designed for top-k classification can bring significant improvements. Our empirical evidence suggests that the loss function must be smooth and have non-sparse gradients in order to work well with deep neural networks. Consequently, we introduce a family of smoothed loss functions that are suited to top-k optimization via deep learning. The widely used cross-entropy is a special case of our family. Evaluating our smooth loss functions is computationally challenging: a na"ive algorithm would require $\mathcal{O}(\binom{n}{k})$ operations, where n is the number of classes. Thanks to a connection to polynomial algebra and a divide-and-conquer approach, we provide an algorithm with a time complexity of $\mathcal{O}(k n)$. Furthermore, we present a novel approximation to obtain fast and stable algorithms on GPUs with single floating point precision. We compare the performance of the cross-entropy loss and our margin-based losses in various regimes of noise and data size, for the predominant use case of k=5. Our investigation reveals that our loss is more robust to noise and overfitting than cross-entropy. |
Tasks | |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07595v1 |
http://arxiv.org/pdf/1802.07595v1.pdf | |
PWC | https://paperswithcode.com/paper/smooth-loss-functions-for-deep-top-k |
Repo | https://github.com/oval-group/smooth-topk |
Framework | pytorch |
Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams
Title | Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams |
Authors | Tam Le, Makoto Yamada |
Abstract | Algebraic topology methods have recently played an important role for statistical analysis with complicated geometric structured data such as shapes, linked twist maps, and material data. Among them, \textit{persistent homology} is a well-known tool to extract robust topological features, and outputs as \textit{persistence diagrams} (PDs). However, PDs are point multi-sets which can not be used in machine learning algorithms for vector data. To deal with it, an emerged approach is to use kernel methods, and an appropriate geometry for PDs is an important factor to measure the similarity of PDs. A popular geometry for PDs is the \textit{Wasserstein metric}. However, Wasserstein distance is not \textit{negative definite}. Thus, it is limited to build positive definite kernels upon the Wasserstein distance \textit{without approximation}. In this work, we rely upon the alternative \textit{Fisher information geometry} to propose a positive definite kernel for PDs \textit{without approximation}, namely the Persistence Fisher (PF) kernel. Then, we analyze eigensystem of the integral operator induced by the proposed kernel for kernel machines. Based on that, we derive generalization error bounds via covering numbers and Rademacher averages for kernel machines with the PF kernel. Additionally, we show some nice properties such as stability and infinite divisibility for the proposed kernel. Furthermore, we also propose a linear time complexity over the number of points in PDs for an approximation of our proposed kernel with a bounded error. Throughout experiments with many different tasks on various benchmark datasets, we illustrate that the PF kernel compares favorably with other baseline kernels for PDs. |
Tasks | |
Published | 2018-02-10 |
URL | http://arxiv.org/abs/1802.03569v5 |
http://arxiv.org/pdf/1802.03569v5.pdf | |
PWC | https://paperswithcode.com/paper/persistence-fisher-kernel-a-riemannian |
Repo | https://github.com/lttam/PersistenceFisher |
Framework | none |
The Contextual Loss for Image Transformation with Non-Aligned Data
Title | The Contextual Loss for Image Transformation with Non-Aligned Data |
Authors | Roey Mechrez, Itamar Talmi, Lihi Zelnik-Manor |
Abstract | Feed-forward CNNs trained for image transformation problems rely on loss functions that measure the similarity between the generated image and a target image. Most of the common loss functions assume that these images are spatially aligned and compare pixels at corresponding locations. However, for many tasks, aligned training pairs of images will not be available. We present an alternative loss function that does not require alignment, thus providing an effective and simple solution for a new space of problems. Our loss is based on both context and semantics – it compares regions with similar semantic meaning, while considering the context of the entire image. Hence, for example, when transferring the style of one face to another, it will translate eyes-to-eyes and mouth-to-mouth. Our code can be found at https://www.github.com/roimehrez/contextualLoss |
Tasks | |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02077v4 |
http://arxiv.org/pdf/1803.02077v4.pdf | |
PWC | https://paperswithcode.com/paper/the-contextual-loss-for-image-transformation |
Repo | https://github.com/S-aiueo32/contextual_loss_pytorch |
Framework | pytorch |
The Deep Weight Prior
Title | The Deep Weight Prior |
Authors | Andrei Atanov, Arsenii Ashukha, Kirill Struminsky, Dmitry Vetrov, Max Welling |
Abstract | Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights. We define DWP in the form of an implicit distribution and propose a method for variational inference with such type of implicit priors. In experiments, we show that DWP improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from DWP accelerates training of conventional convolutional neural networks. |
Tasks | Bayesian Inference |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.06943v6 |
http://arxiv.org/pdf/1810.06943v6.pdf | |
PWC | https://paperswithcode.com/paper/the-deep-weight-prior |
Repo | https://github.com/matyushinleonid/WeightPrior |
Framework | pytorch |
Towards Machine Learning on data from Professional Cyclists
Title | Towards Machine Learning on data from Professional Cyclists |
Authors | Agrin Hilmkil, Oscar Ivarsson, Moa Johansson, Dan Kuylenstierna, Teun van Erp |
Abstract | Professional sports are developing towards increasingly scientific training methods with increasing amounts of data being collected from laboratory tests, training sessions and competitions. In cycling, it is standard to equip bicycles with small computers recording data from sensors such as power-meters, in addition to heart-rate, speed, altitude etc. Recently, machine learning techniques have provided huge success in a wide variety of areas where large amounts of data (big data) is available. In this paper, we perform a pilot experiment on machine learning to model physical response in elite cyclists. As a first experiment, we show that it is possible to train a LSTM machine learning algorithm to predict the heart-rate response of a cyclist during a training session. This work is a promising first step towards developing more elaborate models based on big data and machine learning to capture performance aspects of athletes. |
Tasks | |
Published | 2018-08-01 |
URL | http://arxiv.org/abs/1808.00198v1 |
http://arxiv.org/pdf/1808.00198v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-machine-learning-on-data-from |
Repo | https://github.com/agrinh/procyclist_performance |
Framework | tf |
Bringing Cartoons to Life: Towards Improved Cartoon Face Detection and Recognition Systems
Title | Bringing Cartoons to Life: Towards Improved Cartoon Face Detection and Recognition Systems |
Authors | Saurav Jha, Nikhil Agarwal, Suneeta Agarwal |
Abstract | Given the recent deep learning advancements in face detection and recognition techniques for human faces, this paper answers the question “how well would they work for cartoons’?” - a domain that remains largely unexplored until recently, mainly due to the unavailability of large scale datasets and the failure of traditional methods on these. Our work studies and extends multiple frameworks for the aforementioned tasks. For face detection, we incorporate the Multi-task Cascaded Convolutional Network (MTCNN) architecture and contrast it with conventional methods. For face recognition, our two-fold contributions include: (i) an inductive transfer learning approach combining the feature learning capability of the Inception v3 network and the feature recognizing capability of Support Vector Machines (SVMs), (ii) a proposed Hybrid Convolutional Neural Network (HCNN) framework trained over a fusion of pixel values and 15 manually located facial keypoints. All the methods are evaluated on the Cartoon Faces in the Wild (IIIT-CFW) database. We demonstrate that the HCNN model offers stability superior to that of Inception+SVM over larger input variations, and explore the plausible architectural principles. We show that the Inception+SVM model establishes a state-of-the-art F1 score on the task of gender recognition of cartoon faces. Further, we introduce a small database hosting location coordinates of 15 points on the cartoon faces belonging to 50 public figures of the IIIT-CFW database. |
Tasks | Face Detection, Face Recognition, Transfer Learning |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1804.01753v2 |
http://arxiv.org/pdf/1804.01753v2.pdf | |
PWC | https://paperswithcode.com/paper/bringing-cartoons-to-life-towards-improved |
Repo | https://github.com/Saurav0074/Cartoon-Face-Detection-and-Recognition |
Framework | none |