July 29, 2019

3313 words 16 mins read

Paper Group AWR 127

Doubly Stochastic Variational Inference for Deep Gaussian Processes. PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison. Estimating the unseen from multiple populations. SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text. ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Rela …

Doubly Stochastic Variational Inference for Deep Gaussian Processes


Title	Doubly Stochastic Variational Inference for Deep Gaussian Processes
Authors	Hugh Salimbeni, Marc Deisenroth
Abstract	Gaussian processes (GPs) are a good choice for function approximation as they are flexible, robust to over-fitting, and provide well-calibrated predictive uncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations of GPs, but inference in these models has proved challenging. Existing approaches to inference in DGP models assume approximate posteriors that force independence between the layers, and do not work well in practice. We present a doubly stochastic variational inference algorithm, which does not force independence between layers. With our method of inference we demonstrate that a DGP model can be used effectively on data ranging in size from hundreds to a billion points. We provide strong empirical evidence that our inference scheme for DGPs works well in practice in both classification and regression.
Tasks	Gaussian Processes
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08933v2
PDF	http://arxiv.org/pdf/1705.08933v2.pdf
PWC	https://paperswithcode.com/paper/doubly-stochastic-variational-inference-for
Repo	https://github.com/pyro-ppl/pyro
Framework	pytorch

PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison


Title	PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
Authors	Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, Jason H. Moore
Abstract	The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00512v1
PDF	http://arxiv.org/pdf/1703.00512v1.pdf
PWC	https://paperswithcode.com/paper/pmlb-a-large-benchmark-suite-for-machine
Repo	https://github.com/EpistasisLab/penn-ml-benchmarks
Framework	none

Estimating the unseen from multiple populations


Title	Estimating the unseen from multiple populations
Authors	Aditi Raghunathan, Greg Valiant, James Zou
Abstract	Given samples from a distribution, how many new elements should we expect to find if we continue sampling this distribution? This is an important and actively studied problem, with many applications ranging from unseen species estimation to genomics. We generalize this extrapolation and related unseen estimation problems to the multiple population setting, where population $j$ has an unknown distribution $D_j$ from which we observe $n_j$ samples. We derive an optimal estimator for the total number of elements we expect to find among new samples across the populations. Surprisingly, we prove that our estimator’s accuracy is independent of the number of populations. We also develop an efficient optimization algorithm to solve the more general problem of estimating multi-population frequency distributions. We validate our methods and theory through extensive experiments. Finally, on a real dataset of human genomes across multiple ancestries, we demonstrate how our approach for unseen estimation can enable cohort designs that can discover interesting mutations with greater efficiency.
Tasks
Published	2017-07-12
URL	http://arxiv.org/abs/1707.03854v1
PDF	http://arxiv.org/pdf/1707.03854v1.pdf
PWC	https://paperswithcode.com/paper/estimating-the-unseen-from-multiple
Repo	https://github.com/siddarthhari95/unseen_estimator
Framework	none


Title	SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text
Authors	Deepak Gupta, Shubham Tripathi, Asif Ekbal, Pushpak Bhattacharyya
Abstract	Use of social media has grown dramatically during the last few years. Users follow informal languages in communicating through social media. The language of communication is often mixed in nature, where people transcribe their regional language with English and this technique is found to be extremely popular. Natural language processing (NLP) aims to infer the information from these text where Part-of-Speech (PoS) tagging plays an important role in getting the prosody of the written text. For the task of PoS tagging on Code-Mixed Indian Social Media Text, we develop a supervised system based on Conditional Random Field classifier. In order to tackle the problem effectively, we have focused on extracting rich linguistic features. We participate in three different language pairs, ie. English-Hindi, English-Bengali and English-Telugu on three different social media platforms, Twitter, Facebook & WhatsApp. The proposed system is able to successfully assign coarse as well as fine-grained PoS tag labels for a given a code-mixed sentence. Experiments show that our system is quite generic that shows encouraging performance levels on all the three language pairs in all the domains.
Tasks	Part-Of-Speech Tagging
Published	2017-02-01
URL	http://arxiv.org/abs/1702.00167v2
PDF	http://arxiv.org/pdf/1702.00167v2.pdf
PWC	https://paperswithcode.com/paper/smpost-parts-of-speech-tagger-for-code-mixed
Repo	https://github.com/stripathi08/pos_cmism
Framework	none

ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge


Title	ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge
Authors	Robyn Speer, Joanna Lowry-Duda
Abstract	This paper describes Luminoso’s participation in SemEval 2017 Task 2, “Multilingual and Cross-lingual Semantic Word Similarity”, with a system based on ConceptNet. ConceptNet is an open, multilingual knowledge graph that focuses on general knowledge that relates the meanings of words and phrases. Our submission to SemEval was an update of previous work that builds high-quality, multilingual word embeddings from a combination of ConceptNet and distributional semantics. Our system took first place in both subtasks. It ranked first in 4 out of 5 of the separate languages, and also ranked first in all 10 of the cross-lingual language pairs.
Tasks	Multilingual Word Embeddings, Word Embeddings
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03560v2
PDF	http://arxiv.org/pdf/1704.03560v2.pdf
PWC	https://paperswithcode.com/paper/conceptnet-at-semeval-2017-task-2-extending
Repo	https://github.com/commonsense/conceptnet-numberbatch
Framework	none

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium


Title	GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
Authors	Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter
Abstract	Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not been proved. We propose a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions. TTUR has an individual learning rate for both the discriminator and the generator. Using the theory of stochastic approximation, we prove that the TTUR converges under mild assumptions to a stationary local Nash equilibrium. The convergence carries over to the popular Adam optimization, for which we prove that it follows the dynamics of a heavy ball with friction and thus prefers flat minima in the objective landscape. For the evaluation of the performance of GANs at image generation, we introduce the “Fr'echet Inception Distance” (FID) which captures the similarity of generated images to real ones better than the Inception Score. In experiments, TTUR improves learning for DCGANs and Improved Wasserstein GANs (WGAN-GP) outperforming conventional GAN training on CelebA, CIFAR-10, SVHN, LSUN Bedrooms, and the One Billion Word Benchmark.
Tasks	Image Generation
Published	2017-06-26
URL	http://arxiv.org/abs/1706.08500v6
PDF	http://arxiv.org/pdf/1706.08500v6.pdf
PWC	https://paperswithcode.com/paper/gans-trained-by-a-two-time-scale-update-rule
Repo	https://github.com/DevashishJoshi/Transferring-GANs-FYP
Framework	tf

Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration


Title	Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration
Authors	Rouhollah Rahmatizadeh, Pooya Abolghasemi, Ladislau Bölöni, Sergey Levine
Abstract	We propose a technique for multi-task learning from demonstration that trains the controller of a low-cost robotic arm to accomplish several complex picking and placing tasks, as well as non-prehensile manipulation. The controller is a recurrent neural network using raw images as input and generating robot arm trajectories, with the parameters shared across the tasks. The controller also combines VAE-GAN-based reconstruction with autoregressive multimodal action prediction. Our results demonstrate that it is possible to learn complex manipulation tasks, such as picking up a towel, wiping an object, and depositing the towel to its previous position, entirely from raw images with direct behavior cloning. We show that weight sharing and reconstruction-based regularization substantially improve generalization and robustness, and training on multiple tasks simultaneously increases the success rate on all tasks.
Tasks	Multi-Task Learning
Published	2017-07-10
URL	http://arxiv.org/abs/1707.02920v2
PDF	http://arxiv.org/pdf/1707.02920v2.pdf
PWC	https://paperswithcode.com/paper/vision-based-multi-task-manipulation-for
Repo	https://github.com/rrahmati/roboinstruct-2
Framework	none

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction


Title	DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Authors	Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He
Abstract	Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared input to its “wide” and “deep” parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.
Tasks	Click-Through Rate Prediction, Feature Engineering, Recommendation Systems
Published	2017-03-13
URL	http://arxiv.org/abs/1703.04247v1
PDF	http://arxiv.org/pdf/1703.04247v1.pdf
PWC	https://paperswithcode.com/paper/deepfm-a-factorization-machine-based-neural
Repo	https://github.com/Leavingseason/OpenLearning4DeepRecsys
Framework	tf

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization


Title	Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization
Authors	Nasser Mohammadiha, Paris Smaragdis, Arne Leijon
Abstract	Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.
Tasks	Denoising, Speech Enhancement
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05362v1
PDF	http://arxiv.org/pdf/1709.05362v1.pdf
PWC	https://paperswithcode.com/paper/supervised-and-unsupervised-speech
Repo	https://github.com/mohammadiha/bnmf
Framework	none

Machine learning modeling of superconducting critical temperature


Title	Machine learning modeling of superconducting critical temperature
Authors	Valentin Stanev, Corey Oses, A. Gilad Kusne, Efrain Rodriguez, Johnpierre Paglione, Stefano Curtarolo, Ichiro Takeuchi
Abstract	Superconductivity has been the focus of enormous research effort since its discovery more than a century ago. Yet, some features of this unique phenomenon remain poorly understood; prime among these is the connection between superconductivity and chemical/structural properties of materials. To bridge the gap, several machine learning schemes are developed herein to model the critical temperatures ($T_{\mathrm{c}}$) of the 12,000+ known superconductors available via the SuperCon database. Materials are first divided into two classes based on their $T_{\mathrm{c}}$ values, above and below 10 K, and a classification model predicting this label is trained. The model uses coarse-grained features based only on the chemical compositions. It shows strong predictive power, with out-of-sample accuracy of about 92%. Separate regression models are developed to predict the values of $T_{\mathrm{c}}$ for cuprate, iron-based, and “low-$T_{\mathrm{c}}$” compounds. These models also demonstrate good performance, with learned predictors offering potential insights into the mechanisms behind superconductivity in different families of materials. To improve the accuracy and interpretability of these models, new features are incorporated using materials data from the AFLOW Online Repositories. Finally, the classification and regression models are combined into a single integrated pipeline and employed to search the entire Inorganic Crystallographic Structure Database (ICSD) for potential new superconductors. We identify more than 30 non-cuprate and non-iron-based oxides as candidate materials.
Tasks
Published	2017-09-08
URL	http://arxiv.org/abs/1709.02727v2
PDF	http://arxiv.org/pdf/1709.02727v2.pdf
PWC	https://paperswithcode.com/paper/machine-learning-modeling-of-superconducting
Repo	https://github.com/robertvici/Predicting-the-Critical-Temperature-of-a-Superconductor
Framework	none

Easy over Hard: A Case Study on Deep Learning


Title	Easy over Hard: A Case Study on Deep Learning
Authors	Wei Fu, Tim Menzies
Abstract	While deep learning is an exciting new technique, the benefits of this method need to be assessed with respect to its computational cost. This is particularly important for deep learning since these learners need hours (to weeks) to train the model. Such long training time limits the ability of (a)~a researcher to test the stability of their conclusion via repeated runs with different random seeds; and (b)~other researchers to repeat, improve, or even refute that original work. For example, recently, deep learning was used to find which questions in the Stack Overflow programmer discussion forum can be linked together. That deep learning system took 14 hours to execute. We show here that applying a very simple optimizer called DE to fine tune SVM, it can achieve similar (and sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84 times faster hours than deep learning method. We offer these results as a cautionary tale to the software analytics community and suggest that not every new innovation should be applied without critical analysis. If researchers deploy some new and expensive process, that work should be baselined against some simpler and faster alternatives.
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00133v2
PDF	http://arxiv.org/pdf/1703.00133v2.pdf
PWC	https://paperswithcode.com/paper/easy-over-hard-a-case-study-on-deep-learning
Repo	https://github.com/WeiFoo/EasyOverHard
Framework	none

Measuring the tendency of CNNs to Learn Surface Statistical Regularities


Title	Measuring the tendency of CNNs to Learn Surface Statistical Regularities
Authors	Jason Jo, Yoshua Bengio
Abstract	Deep CNNs are known to exhibit the following peculiarity: on the one hand they generalize extremely well to a test set, while on the other hand they are extremely sensitive to so-called adversarial perturbations. The extreme sensitivity of high performance CNNs to adversarial examples casts serious doubt that these networks are learning high level abstractions in the dataset. We are concerned with the following question: How can a deep CNN that does not learn any high level semantics of the dataset manage to generalize so well? The goal of this article is to measure the tendency of CNNs to learn surface statistical regularities of the dataset. To this end, we use Fourier filtering to construct datasets which share the exact same high level abstractions but exhibit qualitatively different surface statistical regularities. For the SVHN and CIFAR-10 datasets, we present two Fourier filtered variants: a low frequency variant and a randomly filtered variant. Each of the Fourier filtering schemes is tuned to preserve the recognizability of the objects. Our main finding is that CNNs exhibit a tendency to latch onto the Fourier image statistics of the training dataset, sometimes exhibiting up to a 28% generalization gap across the various test sets. Moreover, we observe that significantly increasing the depth of a network has a very marginal impact on closing the aforementioned generalization gap. Thus we provide quantitative evidence supporting the hypothesis that deep CNNs tend to learn surface statistical regularities in the dataset rather than higher-level abstract concepts.
Tasks
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11561v1
PDF	http://arxiv.org/pdf/1711.11561v1.pdf
PWC	https://paperswithcode.com/paper/measuring-the-tendency-of-cnns-to-learn
Repo	https://github.com/dtak/local-independence-public
Framework	tf

VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition


Title	VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition
Authors	Seokju Lee, Junsik Kim, Jae Shin Yoon, Seunghak Shin, Oleksandr Bailo, Namil Kim, Tae-Hee Lee, Hyun Seok Hong, Seung-Hoon Han, In So Kweon
Abstract	In this paper, we propose a unified end-to-end trainable multi-task network that jointly handles lane and road marking detection and recognition that is guided by a vanishing point under adverse weather conditions. We tackle rainy and low illumination conditions, which have not been extensively studied until now due to clear challenges. For example, images taken under rainy days are subject to low illumination, while wet roads cause light reflection and distort the appearance of lane and road markings. At night, color distortion occurs under limited illumination. As a result, no benchmark dataset exists and only a few developed algorithms work under poor weather conditions. To address this shortcoming, we build up a lane and road marking benchmark which consists of about 20,000 images with 17 lane and road marking classes under four different scenarios: no rain, rain, heavy rain, and night. We train and evaluate several versions of the proposed multi-task network and validate the importance of each task. The resulting approach, VPGNet, can detect and classify lanes and road markings, and predict a vanishing point with a single forward pass. Experimental results show that our approach achieves high accuracy and robustness under various conditions in real-time (20 fps). The benchmark and the VPGNet model will be publicly available.
Tasks	Lane Detection
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06288v1
PDF	http://arxiv.org/pdf/1710.06288v1.pdf
PWC	https://paperswithcode.com/paper/vpgnet-vanishing-point-guided-network-for
Repo	https://github.com/cciprianmihai/Self_Driving_Car_NanoDegree_P2_AdvancedLaneLines
Framework	none

Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models


Title	Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models
Authors	Gabriel Lima Guimaraes, Benjamin Sanchez-Lengeling, Carlos Outeiral, Pedro Luis Cunha Farias, Alán Aspuru-Guzik
Abstract	In unsupervised data generation tasks, besides the generation of a sample based on previous observations, one would often like to give hints to the model in order to bias the generation towards desirable metrics. We propose a method that combines Generative Adversarial Networks (GANs) and reinforcement learning (RL) in order to accomplish exactly that. While RL biases the data generation process towards arbitrary metrics, the GAN component of the reward function ensures that the model still remembers information learned from data. We build upon previous results that incorporated GANs and RL in order to generate sequence data and test this model in several settings for the generation of molecules encoded as text sequences (SMILES) and in the context of music generation, showing for each case that we can effectively bias the generation process towards desired metrics.
Tasks	Music Generation
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10843v3
PDF	http://arxiv.org/pdf/1705.10843v3.pdf
PWC	https://paperswithcode.com/paper/objective-reinforced-generative-adversarial
Repo	https://github.com/gablg1/ORGAN
Framework	tf

Unsupervised Neural Machine Translation


Title	Unsupervised Neural Machine Translation
Authors	Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho
Abstract	In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. There have been several proposals to alleviate this issue with, for instance, triangulation and semi-supervised learning techniques, but they still require a strong cross-lingual signal. In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora. Our model builds upon the recent work on unsupervised embedding mappings, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingual corpora alone using a combination of denoising and backtranslation. Despite the simplicity of the approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 French-to-English and German-to-English translation. The model can also profit from small parallel corpora, and attains 21.81 and 15.24 points when combined with 100,000 parallel sentences, respectively. Our implementation is released as an open source project.
Tasks	Machine Translation, Unsupervised Machine Translation
Published	2017-10-30
URL	http://arxiv.org/abs/1710.11041v2
PDF	http://arxiv.org/pdf/1710.11041v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-neural-machine-translation
Repo	https://github.com/artetxem/undreamt
Framework	pytorch