Paper Group AWR 127
Doubly Stochastic Variational Inference for Deep Gaussian Processes. PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison. Estimating the unseen from multiple populations. SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text. ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Rela …
Doubly Stochastic Variational Inference for Deep Gaussian Processes
Title | Doubly Stochastic Variational Inference for Deep Gaussian Processes |
Authors | Hugh Salimbeni, Marc Deisenroth |
Abstract | Gaussian processes (GPs) are a good choice for function approximation as they are flexible, robust to over-fitting, and provide well-calibrated predictive uncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations of GPs, but inference in these models has proved challenging. Existing approaches to inference in DGP models assume approximate posteriors that force independence between the layers, and do not work well in practice. We present a doubly stochastic variational inference algorithm, which does not force independence between layers. With our method of inference we demonstrate that a DGP model can be used effectively on data ranging in size from hundreds to a billion points. We provide strong empirical evidence that our inference scheme for DGPs works well in practice in both classification and regression. |
Tasks | Gaussian Processes |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08933v2 |
http://arxiv.org/pdf/1705.08933v2.pdf | |
PWC | https://paperswithcode.com/paper/doubly-stochastic-variational-inference-for |
Repo | https://github.com/pyro-ppl/pyro |
Framework | pytorch |
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
Title | PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison |
Authors | Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, Jason H. Moore |
Abstract | The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future. |
Tasks | |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00512v1 |
http://arxiv.org/pdf/1703.00512v1.pdf | |
PWC | https://paperswithcode.com/paper/pmlb-a-large-benchmark-suite-for-machine |
Repo | https://github.com/EpistasisLab/penn-ml-benchmarks |
Framework | none |
Estimating the unseen from multiple populations
Title | Estimating the unseen from multiple populations |
Authors | Aditi Raghunathan, Greg Valiant, James Zou |
Abstract | Given samples from a distribution, how many new elements should we expect to find if we continue sampling this distribution? This is an important and actively studied problem, with many applications ranging from unseen species estimation to genomics. We generalize this extrapolation and related unseen estimation problems to the multiple population setting, where population $j$ has an unknown distribution $D_j$ from which we observe $n_j$ samples. We derive an optimal estimator for the total number of elements we expect to find among new samples across the populations. Surprisingly, we prove that our estimator’s accuracy is independent of the number of populations. We also develop an efficient optimization algorithm to solve the more general problem of estimating multi-population frequency distributions. We validate our methods and theory through extensive experiments. Finally, on a real dataset of human genomes across multiple ancestries, we demonstrate how our approach for unseen estimation can enable cohort designs that can discover interesting mutations with greater efficiency. |
Tasks | |
Published | 2017-07-12 |
URL | http://arxiv.org/abs/1707.03854v1 |
http://arxiv.org/pdf/1707.03854v1.pdf | |
PWC | https://paperswithcode.com/paper/estimating-the-unseen-from-multiple |
Repo | https://github.com/siddarthhari95/unseen_estimator |
Framework | none |
SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text
Title | SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text |
Authors | Deepak Gupta, Shubham Tripathi, Asif Ekbal, Pushpak Bhattacharyya |
Abstract | Use of social media has grown dramatically during the last few years. Users follow informal languages in communicating through social media. The language of communication is often mixed in nature, where people transcribe their regional language with English and this technique is found to be extremely popular. Natural language processing (NLP) aims to infer the information from these text where Part-of-Speech (PoS) tagging plays an important role in getting the prosody of the written text. For the task of PoS tagging on Code-Mixed Indian Social Media Text, we develop a supervised system based on Conditional Random Field classifier. In order to tackle the problem effectively, we have focused on extracting rich linguistic features. We participate in three different language pairs, ie. English-Hindi, English-Bengali and English-Telugu on three different social media platforms, Twitter, Facebook & WhatsApp. The proposed system is able to successfully assign coarse as well as fine-grained PoS tag labels for a given a code-mixed sentence. Experiments show that our system is quite generic that shows encouraging performance levels on all the three language pairs in all the domains. |
Tasks | Part-Of-Speech Tagging |
Published | 2017-02-01 |
URL | http://arxiv.org/abs/1702.00167v2 |
http://arxiv.org/pdf/1702.00167v2.pdf | |
PWC | https://paperswithcode.com/paper/smpost-parts-of-speech-tagger-for-code-mixed |
Repo | https://github.com/stripathi08/pos_cmism |
Framework | none |
ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge
Title | ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge |
Authors | Robyn Speer, Joanna Lowry-Duda |
Abstract | This paper describes Luminoso’s participation in SemEval 2017 Task 2, “Multilingual and Cross-lingual Semantic Word Similarity”, with a system based on ConceptNet. ConceptNet is an open, multilingual knowledge graph that focuses on general knowledge that relates the meanings of words and phrases. Our submission to SemEval was an update of previous work that builds high-quality, multilingual word embeddings from a combination of ConceptNet and distributional semantics. Our system took first place in both subtasks. It ranked first in 4 out of 5 of the separate languages, and also ranked first in all 10 of the cross-lingual language pairs. |
Tasks | Multilingual Word Embeddings, Word Embeddings |
Published | 2017-04-11 |
URL | http://arxiv.org/abs/1704.03560v2 |
http://arxiv.org/pdf/1704.03560v2.pdf | |
PWC | https://paperswithcode.com/paper/conceptnet-at-semeval-2017-task-2-extending |
Repo | https://github.com/commonsense/conceptnet-numberbatch |
Framework | none |
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
Title | GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium |
Authors | Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter |
Abstract | Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not been proved. We propose a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions. TTUR has an individual learning rate for both the discriminator and the generator. Using the theory of stochastic approximation, we prove that the TTUR converges under mild assumptions to a stationary local Nash equilibrium. The convergence carries over to the popular Adam optimization, for which we prove that it follows the dynamics of a heavy ball with friction and thus prefers flat minima in the objective landscape. For the evaluation of the performance of GANs at image generation, we introduce the “Fr'echet Inception Distance” (FID) which captures the similarity of generated images to real ones better than the Inception Score. In experiments, TTUR improves learning for DCGANs and Improved Wasserstein GANs (WGAN-GP) outperforming conventional GAN training on CelebA, CIFAR-10, SVHN, LSUN Bedrooms, and the One Billion Word Benchmark. |
Tasks | Image Generation |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08500v6 |
http://arxiv.org/pdf/1706.08500v6.pdf | |
PWC | https://paperswithcode.com/paper/gans-trained-by-a-two-time-scale-update-rule |
Repo | https://github.com/DevashishJoshi/Transferring-GANs-FYP |
Framework | tf |
Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration
Title | Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration |
Authors | Rouhollah Rahmatizadeh, Pooya Abolghasemi, Ladislau Bölöni, Sergey Levine |
Abstract | We propose a technique for multi-task learning from demonstration that trains the controller of a low-cost robotic arm to accomplish several complex picking and placing tasks, as well as non-prehensile manipulation. The controller is a recurrent neural network using raw images as input and generating robot arm trajectories, with the parameters shared across the tasks. The controller also combines VAE-GAN-based reconstruction with autoregressive multimodal action prediction. Our results demonstrate that it is possible to learn complex manipulation tasks, such as picking up a towel, wiping an object, and depositing the towel to its previous position, entirely from raw images with direct behavior cloning. We show that weight sharing and reconstruction-based regularization substantially improve generalization and robustness, and training on multiple tasks simultaneously increases the success rate on all tasks. |
Tasks | Multi-Task Learning |
Published | 2017-07-10 |
URL | http://arxiv.org/abs/1707.02920v2 |
http://arxiv.org/pdf/1707.02920v2.pdf | |
PWC | https://paperswithcode.com/paper/vision-based-multi-task-manipulation-for |
Repo | https://github.com/rrahmati/roboinstruct-2 |
Framework | none |
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Title | DeepFM: A Factorization-Machine based Neural Network for CTR Prediction |
Authors | Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He |
Abstract | Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared input to its “wide” and “deep” parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data. |
Tasks | Click-Through Rate Prediction, Feature Engineering, Recommendation Systems |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04247v1 |
http://arxiv.org/pdf/1703.04247v1.pdf | |
PWC | https://paperswithcode.com/paper/deepfm-a-factorization-machine-based-neural |
Repo | https://github.com/Leavingseason/OpenLearning4DeepRecsys |
Framework | tf |
Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization
Title | Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization |
Authors | Nasser Mohammadiha, Paris Smaragdis, Arne Leijon |
Abstract | Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially. |
Tasks | Denoising, Speech Enhancement |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05362v1 |
http://arxiv.org/pdf/1709.05362v1.pdf | |
PWC | https://paperswithcode.com/paper/supervised-and-unsupervised-speech |
Repo | https://github.com/mohammadiha/bnmf |
Framework | none |
Machine learning modeling of superconducting critical temperature
Title | Machine learning modeling of superconducting critical temperature |
Authors | Valentin Stanev, Corey Oses, A. Gilad Kusne, Efrain Rodriguez, Johnpierre Paglione, Stefano Curtarolo, Ichiro Takeuchi |
Abstract | Superconductivity has been the focus of enormous research effort since its discovery more than a century ago. Yet, some features of this unique phenomenon remain poorly understood; prime among these is the connection between superconductivity and chemical/structural properties of materials. To bridge the gap, several machine learning schemes are developed herein to model the critical temperatures ($T_{\mathrm{c}}$) of the 12,000+ known superconductors available via the SuperCon database. Materials are first divided into two classes based on their $T_{\mathrm{c}}$ values, above and below 10 K, and a classification model predicting this label is trained. The model uses coarse-grained features based only on the chemical compositions. It shows strong predictive power, with out-of-sample accuracy of about 92%. Separate regression models are developed to predict the values of $T_{\mathrm{c}}$ for cuprate, iron-based, and “low-$T_{\mathrm{c}}$” compounds. These models also demonstrate good performance, with learned predictors offering potential insights into the mechanisms behind superconductivity in different families of materials. To improve the accuracy and interpretability of these models, new features are incorporated using materials data from the AFLOW Online Repositories. Finally, the classification and regression models are combined into a single integrated pipeline and employed to search the entire Inorganic Crystallographic Structure Database (ICSD) for potential new superconductors. We identify more than 30 non-cuprate and non-iron-based oxides as candidate materials. |
Tasks | |
Published | 2017-09-08 |
URL | http://arxiv.org/abs/1709.02727v2 |
http://arxiv.org/pdf/1709.02727v2.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-modeling-of-superconducting |
Repo | https://github.com/robertvici/Predicting-the-Critical-Temperature-of-a-Superconductor |
Framework | none |
Easy over Hard: A Case Study on Deep Learning
Title | Easy over Hard: A Case Study on Deep Learning |
Authors | Wei Fu, Tim Menzies |
Abstract | While deep learning is an exciting new technique, the benefits of this method need to be assessed with respect to its computational cost. This is particularly important for deep learning since these learners need hours (to weeks) to train the model. Such long training time limits the ability of (a)~a researcher to test the stability of their conclusion via repeated runs with different random seeds; and (b)~other researchers to repeat, improve, or even refute that original work. For example, recently, deep learning was used to find which questions in the Stack Overflow programmer discussion forum can be linked together. That deep learning system took 14 hours to execute. We show here that applying a very simple optimizer called DE to fine tune SVM, it can achieve similar (and sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84 times faster hours than deep learning method. We offer these results as a cautionary tale to the software analytics community and suggest that not every new innovation should be applied without critical analysis. If researchers deploy some new and expensive process, that work should be baselined against some simpler and faster alternatives. |
Tasks | |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00133v2 |
http://arxiv.org/pdf/1703.00133v2.pdf | |
PWC | https://paperswithcode.com/paper/easy-over-hard-a-case-study-on-deep-learning |
Repo | https://github.com/WeiFoo/EasyOverHard |
Framework | none |
Measuring the tendency of CNNs to Learn Surface Statistical Regularities
Title | Measuring the tendency of CNNs to Learn Surface Statistical Regularities |
Authors | Jason Jo, Yoshua Bengio |
Abstract | Deep CNNs are known to exhibit the following peculiarity: on the one hand they generalize extremely well to a test set, while on the other hand they are extremely sensitive to so-called adversarial perturbations. The extreme sensitivity of high performance CNNs to adversarial examples casts serious doubt that these networks are learning high level abstractions in the dataset. We are concerned with the following question: How can a deep CNN that does not learn any high level semantics of the dataset manage to generalize so well? The goal of this article is to measure the tendency of CNNs to learn surface statistical regularities of the dataset. To this end, we use Fourier filtering to construct datasets which share the exact same high level abstractions but exhibit qualitatively different surface statistical regularities. For the SVHN and CIFAR-10 datasets, we present two Fourier filtered variants: a low frequency variant and a randomly filtered variant. Each of the Fourier filtering schemes is tuned to preserve the recognizability of the objects. Our main finding is that CNNs exhibit a tendency to latch onto the Fourier image statistics of the training dataset, sometimes exhibiting up to a 28% generalization gap across the various test sets. Moreover, we observe that significantly increasing the depth of a network has a very marginal impact on closing the aforementioned generalization gap. Thus we provide quantitative evidence supporting the hypothesis that deep CNNs tend to learn surface statistical regularities in the dataset rather than higher-level abstract concepts. |
Tasks | |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11561v1 |
http://arxiv.org/pdf/1711.11561v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-tendency-of-cnns-to-learn |
Repo | https://github.com/dtak/local-independence-public |
Framework | tf |
VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition
Title | VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition |
Authors | Seokju Lee, Junsik Kim, Jae Shin Yoon, Seunghak Shin, Oleksandr Bailo, Namil Kim, Tae-Hee Lee, Hyun Seok Hong, Seung-Hoon Han, In So Kweon |
Abstract | In this paper, we propose a unified end-to-end trainable multi-task network that jointly handles lane and road marking detection and recognition that is guided by a vanishing point under adverse weather conditions. We tackle rainy and low illumination conditions, which have not been extensively studied until now due to clear challenges. For example, images taken under rainy days are subject to low illumination, while wet roads cause light reflection and distort the appearance of lane and road markings. At night, color distortion occurs under limited illumination. As a result, no benchmark dataset exists and only a few developed algorithms work under poor weather conditions. To address this shortcoming, we build up a lane and road marking benchmark which consists of about 20,000 images with 17 lane and road marking classes under four different scenarios: no rain, rain, heavy rain, and night. We train and evaluate several versions of the proposed multi-task network and validate the importance of each task. The resulting approach, VPGNet, can detect and classify lanes and road markings, and predict a vanishing point with a single forward pass. Experimental results show that our approach achieves high accuracy and robustness under various conditions in real-time (20 fps). The benchmark and the VPGNet model will be publicly available. |
Tasks | Lane Detection |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06288v1 |
http://arxiv.org/pdf/1710.06288v1.pdf | |
PWC | https://paperswithcode.com/paper/vpgnet-vanishing-point-guided-network-for |
Repo | https://github.com/cciprianmihai/Self_Driving_Car_NanoDegree_P2_AdvancedLaneLines |
Framework | none |
Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models
Title | Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models |
Authors | Gabriel Lima Guimaraes, Benjamin Sanchez-Lengeling, Carlos Outeiral, Pedro Luis Cunha Farias, Alán Aspuru-Guzik |
Abstract | In unsupervised data generation tasks, besides the generation of a sample based on previous observations, one would often like to give hints to the model in order to bias the generation towards desirable metrics. We propose a method that combines Generative Adversarial Networks (GANs) and reinforcement learning (RL) in order to accomplish exactly that. While RL biases the data generation process towards arbitrary metrics, the GAN component of the reward function ensures that the model still remembers information learned from data. We build upon previous results that incorporated GANs and RL in order to generate sequence data and test this model in several settings for the generation of molecules encoded as text sequences (SMILES) and in the context of music generation, showing for each case that we can effectively bias the generation process towards desired metrics. |
Tasks | Music Generation |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10843v3 |
http://arxiv.org/pdf/1705.10843v3.pdf | |
PWC | https://paperswithcode.com/paper/objective-reinforced-generative-adversarial |
Repo | https://github.com/gablg1/ORGAN |
Framework | tf |
Unsupervised Neural Machine Translation
Title | Unsupervised Neural Machine Translation |
Authors | Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho |
Abstract | In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. There have been several proposals to alleviate this issue with, for instance, triangulation and semi-supervised learning techniques, but they still require a strong cross-lingual signal. In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora. Our model builds upon the recent work on unsupervised embedding mappings, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingual corpora alone using a combination of denoising and backtranslation. Despite the simplicity of the approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 French-to-English and German-to-English translation. The model can also profit from small parallel corpora, and attains 21.81 and 15.24 points when combined with 100,000 parallel sentences, respectively. Our implementation is released as an open source project. |
Tasks | Machine Translation, Unsupervised Machine Translation |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.11041v2 |
http://arxiv.org/pdf/1710.11041v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-neural-machine-translation |
Repo | https://github.com/artetxem/undreamt |
Framework | pytorch |