Paper Group AWR 168
Improved Variational Autoencoders for Text Modeling using Dilated Convolutions. Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. cvpaper.challenge in 2016: Futuristic Computer …
Improved Variational Autoencoders for Text Modeling using Dilated Convolutions
Title | Improved Variational Autoencoders for Text Modeling using Dilated Convolutions |
Authors | Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick |
Abstract | Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s dilation architecture, we control the effective context from previously generated words. In experiments, we find that there is a trade off between the contextual capacity of the decoder and the amount of encoding information used. We show that with the right decoder, VAE can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive experimental result on the use VAE for generative modeling of text. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines. |
Tasks | Text Generation |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08139v2 |
http://arxiv.org/pdf/1702.08139v2.pdf | |
PWC | https://paperswithcode.com/paper/improved-variational-autoencoders-for-text |
Repo | https://github.com/ryokamoi/dcnn_textvae |
Framework | tf |
Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels
Title | Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels |
Authors | Curtis G. Northcutt, Tailin Wu, Isaac L. Chuang |
Abstract | Noisy PN learning is the problem of binary classification when training examples may be mislabeled (flipped) uniformly with noise rate rho1 for positive examples and rho0 for negative examples. We propose Rank Pruning (RP) to solve noisy PN learning and the open problem of estimating the noise rates, i.e. the fraction of wrong positive and negative labels. Unlike prior solutions, RP is time-efficient and general, requiring O(T) for any unrestricted choice of probabilistic classifier with T fitting time. We prove RP has consistent noise estimation and equivalent expected risk as learning with uncorrupted labels in ideal conditions, and derive closed-form solutions when conditions are non-ideal. RP achieves state-of-the-art noise estimation and F1, error, and AUC-PR for both MNIST and CIFAR datasets, regardless of the amount of noise and performs similarly impressively when a large portion of training examples are noise drawn from a third distribution. To highlight, RP with a CNN classifier can predict if an MNIST digit is a “one"or “not” with only 0.25% error, and 0.46 error across all digits, even when 50% of positive examples are mislabeled and 50% of observed positive labels are mislabeled negative examples. |
Tasks | |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.01936v3 |
http://arxiv.org/pdf/1705.01936v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-with-confident-examples-rank-pruning |
Repo | https://github.com/cgnorthcutt/cleanlab |
Framework | pytorch |
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
Title | Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning |
Authors | Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, Jeff Clune |
Abstract | Deep artificial neural networks (DNNs) are typically trained via gradient-based learning algorithms, namely backpropagation. Evolution strategies (ES) can rival backprop-based algorithms such as Q-learning and policy gradients on challenging deep reinforcement learning (RL) problems. However, ES can be considered a gradient-based algorithm because it performs stochastic gradient descent via an operation similar to a finite-difference approximation of the gradient. That raises the question of whether non-gradient-based evolutionary algorithms can work at DNN scales. Here we demonstrate they can: we evolve the weights of a DNN with a simple, gradient-free, population-based genetic algorithm (GA) and it performs well on hard deep RL problems, including Atari and humanoid locomotion. The Deep GA successfully evolves networks with over four million free parameters, the largest neural networks ever evolved with a traditional evolutionary algorithm. These results (1) expand our sense of the scale at which GAs can operate, (2) suggest intriguingly that in some cases following the gradient is not the best choice for optimizing performance, and (3) make immediately available the multitude of neuroevolution techniques that improve performance. We demonstrate the latter by showing that combining DNNs with novelty search, which encourages exploration on tasks with deceptive or sparse reward functions, can solve a high-dimensional problem on which reward-maximizing algorithms (e.g.\ DQN, A3C, ES, and the GA) fail. Additionally, the Deep GA is faster than ES, A3C, and DQN (it can train Atari in ${\raise.17ex\hbox{$\scriptstyle\sim$}}$4 hours on one desktop or ${\raise.17ex\hbox{$\scriptstyle\sim$}}$1 hour distributed on 720 cores), and enables a state-of-the-art, up to 10,000-fold compact encoding technique. |
Tasks | Q-Learning |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06567v3 |
http://arxiv.org/pdf/1712.06567v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-neuroevolution-genetic-algorithms-are-a |
Repo | https://github.com/kevin5naug/summer_project |
Framework | pytorch |
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
Title | cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey |
Authors | Hirokatsu Kataoka, Soma Shirakabe, Yun He, Shunya Ueta, Teppei Suzuki, Kaori Abe, Asako Kanezaki, Shin’ichiro Morita, Toshiyuki Yabe, Yoshihiro Kanehara, Hiroya Yatsuyanagi, Shinya Maruyama, Ryosuke Takasawa, Masataka Fuchida, Yudai Miyashita, Kazushige Okayasu, Yuta Matsuzaki |
Abstract | The paper gives futuristic challenges disscussed in the cvpaper.challenge. In 2015 and 2016, we thoroughly study 1,600+ papers in several conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV. |
Tasks | |
Published | 2017-07-20 |
URL | http://arxiv.org/abs/1707.06436v1 |
http://arxiv.org/pdf/1707.06436v1.pdf | |
PWC | https://paperswithcode.com/paper/cvpaperchallenge-in-2016-futuristic-computer |
Repo | https://github.com/hurutoriya/hurutoriya.github.io |
Framework | none |
Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
Title | Train Once, Test Anywhere: Zero-Shot Learning for Text Classification |
Authors | Pushpankar Kumar Pushp, Muktabh Mayank Srivastava |
Abstract | Zero-shot Learners are models capable of predicting unseen classes. In this work, we propose a Zero-shot Learning approach for text categorization. Our method involves training model on a large corpus of sentences to learn the relationship between a sentence and embedding of sentence’s tags. Learning such relationship makes the model generalize to unseen sentences, tags, and even new datasets provided they can be put into same embedding space. The model learns to predict whether a given sentence is related to a tag or not; unlike other classifiers that learn to classify the sentence as one of the possible classes. We propose three different neural networks for the task and report their accuracy on the test set of the dataset used for training them as well as two other standard datasets for which no retraining was done. We show that our models generalize well across new unseen classes in both cases. Although the models do not achieve the accuracy level of the state of the art supervised models, yet it evidently is a step forward towards general intelligence in natural language processing. |
Tasks | Text Classification, Zero-Shot Learning |
Published | 2017-12-16 |
URL | http://arxiv.org/abs/1712.05972v2 |
http://arxiv.org/pdf/1712.05972v2.pdf | |
PWC | https://paperswithcode.com/paper/train-once-test-anywhere-zero-shot-learning |
Repo | https://github.com/adamlin120/Zero-shot_Classification_of_News_Title |
Framework | pytorch |
Single-Pass PCA of Large High-Dimensional Data
Title | Single-Pass PCA of Large High-Dimensional Data |
Authors | Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, Yaohang Li |
Abstract | Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large and high-dimensional data, computing the PCA (i.e., the singular vectors corresponding to a number of dominant singular values of the data matrix) becomes a challenging task. In this work, a single-pass randomized algorithm is proposed to compute PCA with only one pass over the data. It is suitable for processing extremely large and high-dimensional data stored in slow memory (hard disk) or the data generated in a streaming fashion. Experiments with synthetic and real data validate the algorithm’s accuracy, which has orders of magnitude smaller error than an existing single-pass algorithm. For a set of high-dimensional data stored as a 150 GB file, the proposed algorithm is able to compute the first 50 principal components in just 24 minutes on a typical 24-core computer, with less than 1 GB memory cost. |
Tasks | Dimensionality Reduction |
Published | 2017-04-25 |
URL | http://arxiv.org/abs/1704.07669v1 |
http://arxiv.org/pdf/1704.07669v1.pdf | |
PWC | https://paperswithcode.com/paper/single-pass-pca-of-large-high-dimensional |
Repo | https://github.com/WenjianYu/rSVD-single-pass |
Framework | none |
Espresso: Efficient Forward Propagation for BCNNs
Title | Espresso: Efficient Forward Propagation for BCNNs |
Authors | Fabrizio Pedersoli, George Tzanetakis, Andrea Tagliasacchi |
Abstract | There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required for the forward propagation of CNNs, in a binary file less than 400KB, without any external dependencies. Although it is mainly designed to take advantage of massive GPU parallelism, Espresso also provides an equivalent CPU implementation for CNNs. Espresso provides special convolutional and dense layers for BCNNs, leveraging bit-packing and bit-wise computations for efficient execution. These techniques provide a speed-up of matrix-multiplication routines, and at the same time, reduce memory usage when storing parameters and activations. We experimentally show that Espresso is significantly faster than existing implementations of optimized binary neural networks ($\approx$ 2 orders of magnitude). Espresso is released under the Apache 2.0 license and is available at http://github.com/fpeder/espresso. |
Tasks | |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07175v2 |
http://arxiv.org/pdf/1705.07175v2.pdf | |
PWC | https://paperswithcode.com/paper/espresso-efficient-forward-propagation-for |
Repo | https://github.com/fpeder/espresso |
Framework | none |
A Survey of Machine Learning for Big Code and Naturalness
Title | A Survey of Machine Learning for Big Code and Naturalness |
Authors | Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton |
Abstract | Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code’s abundance of patterns. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities. |
Tasks | |
Published | 2017-09-18 |
URL | http://arxiv.org/abs/1709.06182v2 |
http://arxiv.org/pdf/1709.06182v2.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-machine-learning-for-big-code-and |
Repo | https://github.com/quepas/ReadingPublications |
Framework | none |
Thoracic Disease Identification and Localization with Limited Supervision
Title | Thoracic Disease Identification and Localization with Limited Supervision |
Authors | Zhe Li, Chong Wang, Mei Han, Yuan Xue, Wei Wei, Li-Jia Li, Li Fei-Fei |
Abstract | Accurate identification and localization of abnormalities from radiology images play an integral part in clinical diagnosis and treatment planning. Building a highly accurate prediction model for these tasks usually requires a large number of images manually annotated with labels and finding sites of abnormalities. In reality, however, such annotated data are expensive to acquire, especially the ones with location annotations. We need methods that can work well with only a small amount of location annotations. To address this challenge, we present a unified approach that simultaneously performs disease identification and localization through the same underlying model for all images. We demonstrate that our approach can effectively leverage both class information as well as limited location annotation, and significantly outperforms the comparative reference baseline in both classification and localization tasks. |
Tasks | |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06373v6 |
http://arxiv.org/pdf/1711.06373v6.pdf | |
PWC | https://paperswithcode.com/paper/thoracic-disease-identification-and |
Repo | https://github.com/romanovar/evaluation_MIL |
Framework | tf |
Improving Pairwise Ranking for Multi-label Image Classification
Title | Improving Pairwise Ranking for Multi-label Image Classification |
Authors | Yuncheng Li, Yale Song, Jiebo Luo |
Abstract | Learning to rank has recently emerged as an attractive technique to train deep convolutional neural networks for various computer vision tasks. Pairwise ranking, in particular, has been successful in multi-label image classification, achieving state-of-the-art results on various benchmarks. However, most existing approaches use the hinge loss to train their models, which is non-smooth and thus is difficult to optimize especially with deep networks. Furthermore, they employ simple heuristics, such as top-k or thresholding, to determine which labels to include in the output from a ranked list of labels, which limits their use in the real-world setting. In this work, we propose two techniques to improve pairwise ranking based multi-label image classification: (1) we propose a novel loss function for pairwise ranking, which is smooth everywhere and thus is easier to optimize; and (2) we incorporate a label decision module into the model, estimating the optimal confidence thresholds for each visual concept. We provide theoretical analyses of our loss function in the Bayes consistency and risk minimization framework, and show its benefit over existing pairwise ranking formulations. We demonstrate the effectiveness of our approach on three large-scale datasets, VOC2007, NUS-WIDE and MS-COCO, achieving the best reported results in the literature. |
Tasks | Image Classification, Learning-To-Rank |
Published | 2017-04-11 |
URL | http://arxiv.org/abs/1704.03135v3 |
http://arxiv.org/pdf/1704.03135v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-pairwise-ranking-for-multi-label |
Repo | https://github.com/OFRIN/Tensorflow_Improving_Pairwise_Ranking_for_Multi-label_Image_Classification |
Framework | tf |
Online Structure Learning for Sum-Product Networks with Gaussian Leaves
Title | Online Structure Learning for Sum-Product Networks with Gaussian Leaves |
Authors | Wilson Hsu, Agastya Kalra, Pascal Poupart |
Abstract | Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes the first online structure learning technique for continuous SPNs with Gaussian leaves. We also introduce an accompanying new parameter learning technique. |
Tasks | |
Published | 2017-01-19 |
URL | http://arxiv.org/abs/1701.05265v1 |
http://arxiv.org/pdf/1701.05265v1.pdf | |
PWC | https://paperswithcode.com/paper/online-structure-learning-for-sum-product |
Repo | https://github.com/whsu/spn |
Framework | none |
A Minimal Developmental Model Can Increase Evolvability in Soft Robots
Title | A Minimal Developmental Model Can Increase Evolvability in Soft Robots |
Authors | Sam Kriegman, Nick Cheney, Francesco Corucci, Josh C. Bongard |
Abstract | Different subsystems of organisms adapt over many time scales, such as rapid changes in the nervous system (learning), slower morphological and neurological change over the lifetime of the organism (postnatal development), and change over many generations (evolution). Much work has focused on instantiating learning or evolution in robots, but relatively little on development. Although many theories have been forwarded as to how development can aid evolution, it is difficult to isolate each such proposed mechanism. Thus, here we introduce a minimal yet embodied model of development: the body of the robot changes over its lifetime, yet growth is not influenced by the environment. We show that even this simple developmental model confers evolvability because it allows evolution to sweep over a larger range of body plans than an equivalent non-developmental system, and subsequent heterochronic mutations ‘lock in’ this body plan in more morphologically-static descendants. Future work will involve gradually complexifying the developmental model to determine when and how such added complexity increases evolvability. |
Tasks | |
Published | 2017-06-22 |
URL | http://arxiv.org/abs/1706.07296v1 |
http://arxiv.org/pdf/1706.07296v1.pdf | |
PWC | https://paperswithcode.com/paper/a-minimal-developmental-model-can-increase |
Repo | https://github.com/skriegman/gecco-2017 |
Framework | none |
An efficient clustering algorithm from the measure of local Gaussian distribution
Title | An efficient clustering algorithm from the measure of local Gaussian distribution |
Authors | Yuan-Yen Tai |
Abstract | In this paper, I will introduce a fast and novel clustering algorithm based on Gaussian distribution and it can guarantee the separation of each cluster centroid as a given parameter, $d_s$. The worst run time complexity of this algorithm is approximately $\sim$O$(T\times N \times \log(N))$ where $T$ is the iteration steps and $N$ is the number of features. |
Tasks | |
Published | 2017-09-13 |
URL | https://arxiv.org/abs/1709.08470v2 |
https://arxiv.org/pdf/1709.08470v2.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-clustering-algorithm-from-the |
Repo | https://github.com/Anrris/glassfire |
Framework | none |
Non-linear motor control by local learning in spiking neural networks
Title | Non-linear motor control by local learning in spiking neural networks |
Authors | Aditya Gilra, Wulfram Gerstner |
Abstract | Learning weights in a spiking neural network with hidden neurons, using local, stable and online rules, to control non-linear body dynamics is an open problem. Here, we employ a supervised scheme, Feedback-based Online Local Learning Of Weights (FOLLOW), to train a network of heterogeneous spiking neurons with hidden layers, to control a two-link arm so as to reproduce a desired state trajectory. The network first learns an inverse model of the non-linear dynamics, i.e. from state trajectory as input to the network, it learns to infer the continuous-time command that produced the trajectory. Connection weights are adjusted via a local plasticity rule that involves pre-synaptic firing and post-synaptic feedback of the error in the inferred command. We choose a network architecture, termed differential feedforward, that gives the lowest test error from different feedforward and recurrent architectures. The learned inverse model is then used to generate a continuous-time motor command to control the arm, given a desired trajectory. |
Tasks | |
Published | 2017-12-29 |
URL | http://arxiv.org/abs/1712.10158v1 |
http://arxiv.org/pdf/1712.10158v1.pdf | |
PWC | https://paperswithcode.com/paper/non-linear-motor-control-by-local-learning-in |
Repo | https://github.com/adityagilra/FOLLOWControl |
Framework | none |
Domain Generalization by Marginal Transfer Learning
Title | Domain Generalization by Marginal Transfer Learning |
Authors | Gilles Blanchard, Aniket Anand Deshmukh, Urun Dogan, Gyemin Lee, Clayton Scott |
Abstract | Domain generalization is the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distribution-free, kernel-based approach that predicts a classifier from the marginal distribution of features, by leveraging the trends present in related classification tasks. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on synthetic data and three real data applications demonstrate the superiority of the method with respect to a pooling strategy. |
Tasks | Domain Generalization, Transfer Learning |
Published | 2017-11-21 |
URL | http://arxiv.org/abs/1711.07910v1 |
http://arxiv.org/pdf/1711.07910v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-generalization-by-marginal-transfer |
Repo | https://github.com/aniketde/DomainGeneralizationMarginal |
Framework | none |