July 29, 2019

2912 words 14 mins read

Paper Group AWR 168

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions. Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. cvpaper.challenge in 2016: Futuristic Computer …

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions


Title	Improved Variational Autoencoders for Text Modeling using Dilated Convolutions
Authors	Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick
Abstract	Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s dilation architecture, we control the effective context from previously generated words. In experiments, we find that there is a trade off between the contextual capacity of the decoder and the amount of encoding information used. We show that with the right decoder, VAE can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive experimental result on the use VAE for generative modeling of text. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines.
Tasks	Text Generation
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08139v2
PDF	http://arxiv.org/pdf/1702.08139v2.pdf
PWC	https://paperswithcode.com/paper/improved-variational-autoencoders-for-text
Repo	https://github.com/ryokamoi/dcnn_textvae
Framework	tf

Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels


Title	Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels
Authors	Curtis G. Northcutt, Tailin Wu, Isaac L. Chuang
Abstract	Noisy PN learning is the problem of binary classification when training examples may be mislabeled (flipped) uniformly with noise rate rho1 for positive examples and rho0 for negative examples. We propose Rank Pruning (RP) to solve noisy PN learning and the open problem of estimating the noise rates, i.e. the fraction of wrong positive and negative labels. Unlike prior solutions, RP is time-efficient and general, requiring O(T) for any unrestricted choice of probabilistic classifier with T fitting time. We prove RP has consistent noise estimation and equivalent expected risk as learning with uncorrupted labels in ideal conditions, and derive closed-form solutions when conditions are non-ideal. RP achieves state-of-the-art noise estimation and F1, error, and AUC-PR for both MNIST and CIFAR datasets, regardless of the amount of noise and performs similarly impressively when a large portion of training examples are noise drawn from a third distribution. To highlight, RP with a CNN classifier can predict if an MNIST digit is a “one"or “not” with only 0.25% error, and 0.46 error across all digits, even when 50% of positive examples are mislabeled and 50% of observed positive labels are mislabeled negative examples.
Tasks
Published	2017-05-04
URL	http://arxiv.org/abs/1705.01936v3
PDF	http://arxiv.org/pdf/1705.01936v3.pdf
PWC	https://paperswithcode.com/paper/learning-with-confident-examples-rank-pruning
Repo	https://github.com/cgnorthcutt/cleanlab
Framework	pytorch

Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning


Title	Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
Authors	Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, Jeff Clune
Abstract	Deep artificial neural networks (DNNs) are typically trained via gradient-based learning algorithms, namely backpropagation. Evolution strategies (ES) can rival backprop-based algorithms such as Q-learning and policy gradients on challenging deep reinforcement learning (RL) problems. However, ES can be considered a gradient-based algorithm because it performs stochastic gradient descent via an operation similar to a finite-difference approximation of the gradient. That raises the question of whether non-gradient-based evolutionary algorithms can work at DNN scales. Here we demonstrate they can: we evolve the weights of a DNN with a simple, gradient-free, population-based genetic algorithm (GA) and it performs well on hard deep RL problems, including Atari and humanoid locomotion. The Deep GA successfully evolves networks with over four million free parameters, the largest neural networks ever evolved with a traditional evolutionary algorithm. These results (1) expand our sense of the scale at which GAs can operate, (2) suggest intriguingly that in some cases following the gradient is not the best choice for optimizing performance, and (3) make immediately available the multitude of neuroevolution techniques that improve performance. We demonstrate the latter by showing that combining DNNs with novelty search, which encourages exploration on tasks with deceptive or sparse reward functions, can solve a high-dimensional problem on which reward-maximizing algorithms (e.g.\ DQN, A3C, ES, and the GA) fail. Additionally, the Deep GA is faster than ES, A3C, and DQN (it can train Atari in ${\raise.17ex\hbox{$\scriptstyle\sim$}}$4 hours on one desktop or ${\raise.17ex\hbox{$\scriptstyle\sim$}}$1 hour distributed on 720 cores), and enables a state-of-the-art, up to 10,000-fold compact encoding technique.
Tasks	Q-Learning
Published	2017-12-18
URL	http://arxiv.org/abs/1712.06567v3
PDF	http://arxiv.org/pdf/1712.06567v3.pdf
PWC	https://paperswithcode.com/paper/deep-neuroevolution-genetic-algorithms-are-a
Repo	https://github.com/kevin5naug/summer_project
Framework	pytorch

cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey


Title	cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
Authors	Hirokatsu Kataoka, Soma Shirakabe, Yun He, Shunya Ueta, Teppei Suzuki, Kaori Abe, Asako Kanezaki, Shin’ichiro Morita, Toshiyuki Yabe, Yoshihiro Kanehara, Hiroya Yatsuyanagi, Shinya Maruyama, Ryosuke Takasawa, Masataka Fuchida, Yudai Miyashita, Kazushige Okayasu, Yuta Matsuzaki
Abstract	The paper gives futuristic challenges disscussed in the cvpaper.challenge. In 2015 and 2016, we thoroughly study 1,600+ papers in several conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV.
Tasks
Published	2017-07-20
URL	http://arxiv.org/abs/1707.06436v1
PDF	http://arxiv.org/pdf/1707.06436v1.pdf
PWC	https://paperswithcode.com/paper/cvpaperchallenge-in-2016-futuristic-computer
Repo	https://github.com/hurutoriya/hurutoriya.github.io
Framework	none

Train Once, Test Anywhere: Zero-Shot Learning for Text Classification


Title	Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
Authors	Pushpankar Kumar Pushp, Muktabh Mayank Srivastava
Abstract	Zero-shot Learners are models capable of predicting unseen classes. In this work, we propose a Zero-shot Learning approach for text categorization. Our method involves training model on a large corpus of sentences to learn the relationship between a sentence and embedding of sentence’s tags. Learning such relationship makes the model generalize to unseen sentences, tags, and even new datasets provided they can be put into same embedding space. The model learns to predict whether a given sentence is related to a tag or not; unlike other classifiers that learn to classify the sentence as one of the possible classes. We propose three different neural networks for the task and report their accuracy on the test set of the dataset used for training them as well as two other standard datasets for which no retraining was done. We show that our models generalize well across new unseen classes in both cases. Although the models do not achieve the accuracy level of the state of the art supervised models, yet it evidently is a step forward towards general intelligence in natural language processing.
Tasks	Text Classification, Zero-Shot Learning
Published	2017-12-16
URL	http://arxiv.org/abs/1712.05972v2
PDF	http://arxiv.org/pdf/1712.05972v2.pdf
PWC	https://paperswithcode.com/paper/train-once-test-anywhere-zero-shot-learning
Repo	https://github.com/adamlin120/Zero-shot_Classification_of_News_Title
Framework	pytorch

Single-Pass PCA of Large High-Dimensional Data


Title	Single-Pass PCA of Large High-Dimensional Data
Authors	Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, Yaohang Li
Abstract	Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large and high-dimensional data, computing the PCA (i.e., the singular vectors corresponding to a number of dominant singular values of the data matrix) becomes a challenging task. In this work, a single-pass randomized algorithm is proposed to compute PCA with only one pass over the data. It is suitable for processing extremely large and high-dimensional data stored in slow memory (hard disk) or the data generated in a streaming fashion. Experiments with synthetic and real data validate the algorithm’s accuracy, which has orders of magnitude smaller error than an existing single-pass algorithm. For a set of high-dimensional data stored as a 150 GB file, the proposed algorithm is able to compute the first 50 principal components in just 24 minutes on a typical 24-core computer, with less than 1 GB memory cost.
Tasks	Dimensionality Reduction
Published	2017-04-25
URL	http://arxiv.org/abs/1704.07669v1
PDF	http://arxiv.org/pdf/1704.07669v1.pdf
PWC	https://paperswithcode.com/paper/single-pass-pca-of-large-high-dimensional
Repo	https://github.com/WenjianYu/rSVD-single-pass
Framework	none

Espresso: Efficient Forward Propagation for BCNNs


Title	Espresso: Efficient Forward Propagation for BCNNs
Authors	Fabrizio Pedersoli, George Tzanetakis, Andrea Tagliasacchi
Abstract	There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required for the forward propagation of CNNs, in a binary file less than 400KB, without any external dependencies. Although it is mainly designed to take advantage of massive GPU parallelism, Espresso also provides an equivalent CPU implementation for CNNs. Espresso provides special convolutional and dense layers for BCNNs, leveraging bit-packing and bit-wise computations for efficient execution. These techniques provide a speed-up of matrix-multiplication routines, and at the same time, reduce memory usage when storing parameters and activations. We experimentally show that Espresso is significantly faster than existing implementations of optimized binary neural networks ($\approx$ 2 orders of magnitude). Espresso is released under the Apache 2.0 license and is available at http://github.com/fpeder/espresso.
Tasks
Published	2017-05-19
URL	http://arxiv.org/abs/1705.07175v2
PDF	http://arxiv.org/pdf/1705.07175v2.pdf
PWC	https://paperswithcode.com/paper/espresso-efficient-forward-propagation-for
Repo	https://github.com/fpeder/espresso
Framework	none

A Survey of Machine Learning for Big Code and Naturalness


Title	A Survey of Machine Learning for Big Code and Naturalness
Authors	Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton
Abstract	Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code’s abundance of patterns. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.
Tasks
Published	2017-09-18
URL	http://arxiv.org/abs/1709.06182v2
PDF	http://arxiv.org/pdf/1709.06182v2.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-machine-learning-for-big-code-and
Repo	https://github.com/quepas/ReadingPublications
Framework	none

Thoracic Disease Identification and Localization with Limited Supervision


Title	Thoracic Disease Identification and Localization with Limited Supervision
Authors	Zhe Li, Chong Wang, Mei Han, Yuan Xue, Wei Wei, Li-Jia Li, Li Fei-Fei
Abstract	Accurate identification and localization of abnormalities from radiology images play an integral part in clinical diagnosis and treatment planning. Building a highly accurate prediction model for these tasks usually requires a large number of images manually annotated with labels and finding sites of abnormalities. In reality, however, such annotated data are expensive to acquire, especially the ones with location annotations. We need methods that can work well with only a small amount of location annotations. To address this challenge, we present a unified approach that simultaneously performs disease identification and localization through the same underlying model for all images. We demonstrate that our approach can effectively leverage both class information as well as limited location annotation, and significantly outperforms the comparative reference baseline in both classification and localization tasks.
Tasks
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06373v6
PDF	http://arxiv.org/pdf/1711.06373v6.pdf
PWC	https://paperswithcode.com/paper/thoracic-disease-identification-and
Repo	https://github.com/romanovar/evaluation_MIL
Framework	tf

Improving Pairwise Ranking for Multi-label Image Classification


Title	Improving Pairwise Ranking for Multi-label Image Classification
Authors	Yuncheng Li, Yale Song, Jiebo Luo
Abstract	Learning to rank has recently emerged as an attractive technique to train deep convolutional neural networks for various computer vision tasks. Pairwise ranking, in particular, has been successful in multi-label image classification, achieving state-of-the-art results on various benchmarks. However, most existing approaches use the hinge loss to train their models, which is non-smooth and thus is difficult to optimize especially with deep networks. Furthermore, they employ simple heuristics, such as top-k or thresholding, to determine which labels to include in the output from a ranked list of labels, which limits their use in the real-world setting. In this work, we propose two techniques to improve pairwise ranking based multi-label image classification: (1) we propose a novel loss function for pairwise ranking, which is smooth everywhere and thus is easier to optimize; and (2) we incorporate a label decision module into the model, estimating the optimal confidence thresholds for each visual concept. We provide theoretical analyses of our loss function in the Bayes consistency and risk minimization framework, and show its benefit over existing pairwise ranking formulations. We demonstrate the effectiveness of our approach on three large-scale datasets, VOC2007, NUS-WIDE and MS-COCO, achieving the best reported results in the literature.
Tasks	Image Classification, Learning-To-Rank
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03135v3
PDF	http://arxiv.org/pdf/1704.03135v3.pdf
PWC	https://paperswithcode.com/paper/improving-pairwise-ranking-for-multi-label
Repo	https://github.com/OFRIN/Tensorflow_Improving_Pairwise_Ranking_for_Multi-label_Image_Classification
Framework	tf

Online Structure Learning for Sum-Product Networks with Gaussian Leaves


Title	Online Structure Learning for Sum-Product Networks with Gaussian Leaves
Authors	Wilson Hsu, Agastya Kalra, Pascal Poupart
Abstract	Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes the first online structure learning technique for continuous SPNs with Gaussian leaves. We also introduce an accompanying new parameter learning technique.
Tasks
Published	2017-01-19
URL	http://arxiv.org/abs/1701.05265v1
PDF	http://arxiv.org/pdf/1701.05265v1.pdf
PWC	https://paperswithcode.com/paper/online-structure-learning-for-sum-product
Repo	https://github.com/whsu/spn
Framework	none

A Minimal Developmental Model Can Increase Evolvability in Soft Robots


Title	A Minimal Developmental Model Can Increase Evolvability in Soft Robots
Authors	Sam Kriegman, Nick Cheney, Francesco Corucci, Josh C. Bongard
Abstract	Different subsystems of organisms adapt over many time scales, such as rapid changes in the nervous system (learning), slower morphological and neurological change over the lifetime of the organism (postnatal development), and change over many generations (evolution). Much work has focused on instantiating learning or evolution in robots, but relatively little on development. Although many theories have been forwarded as to how development can aid evolution, it is difficult to isolate each such proposed mechanism. Thus, here we introduce a minimal yet embodied model of development: the body of the robot changes over its lifetime, yet growth is not influenced by the environment. We show that even this simple developmental model confers evolvability because it allows evolution to sweep over a larger range of body plans than an equivalent non-developmental system, and subsequent heterochronic mutations ‘lock in’ this body plan in more morphologically-static descendants. Future work will involve gradually complexifying the developmental model to determine when and how such added complexity increases evolvability.
Tasks
Published	2017-06-22
URL	http://arxiv.org/abs/1706.07296v1
PDF	http://arxiv.org/pdf/1706.07296v1.pdf
PWC	https://paperswithcode.com/paper/a-minimal-developmental-model-can-increase
Repo	https://github.com/skriegman/gecco-2017
Framework	none

An efficient clustering algorithm from the measure of local Gaussian distribution


Title	An efficient clustering algorithm from the measure of local Gaussian distribution
Authors	Yuan-Yen Tai
Abstract	In this paper, I will introduce a fast and novel clustering algorithm based on Gaussian distribution and it can guarantee the separation of each cluster centroid as a given parameter, $d_s$. The worst run time complexity of this algorithm is approximately $\sim$O$(T\times N \times \log(N))$ where $T$ is the iteration steps and $N$ is the number of features.
Tasks
Published	2017-09-13
URL	https://arxiv.org/abs/1709.08470v2
PDF	https://arxiv.org/pdf/1709.08470v2.pdf
PWC	https://paperswithcode.com/paper/an-efficient-clustering-algorithm-from-the
Repo	https://github.com/Anrris/glassfire
Framework	none

Non-linear motor control by local learning in spiking neural networks


Title	Non-linear motor control by local learning in spiking neural networks
Authors	Aditya Gilra, Wulfram Gerstner
Abstract	Learning weights in a spiking neural network with hidden neurons, using local, stable and online rules, to control non-linear body dynamics is an open problem. Here, we employ a supervised scheme, Feedback-based Online Local Learning Of Weights (FOLLOW), to train a network of heterogeneous spiking neurons with hidden layers, to control a two-link arm so as to reproduce a desired state trajectory. The network first learns an inverse model of the non-linear dynamics, i.e. from state trajectory as input to the network, it learns to infer the continuous-time command that produced the trajectory. Connection weights are adjusted via a local plasticity rule that involves pre-synaptic firing and post-synaptic feedback of the error in the inferred command. We choose a network architecture, termed differential feedforward, that gives the lowest test error from different feedforward and recurrent architectures. The learned inverse model is then used to generate a continuous-time motor command to control the arm, given a desired trajectory.
Tasks
Published	2017-12-29
URL	http://arxiv.org/abs/1712.10158v1
PDF	http://arxiv.org/pdf/1712.10158v1.pdf
PWC	https://paperswithcode.com/paper/non-linear-motor-control-by-local-learning-in
Repo	https://github.com/adityagilra/FOLLOWControl
Framework	none

Domain Generalization by Marginal Transfer Learning


Title	Domain Generalization by Marginal Transfer Learning
Authors	Gilles Blanchard, Aniket Anand Deshmukh, Urun Dogan, Gyemin Lee, Clayton Scott
Abstract	Domain generalization is the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distribution-free, kernel-based approach that predicts a classifier from the marginal distribution of features, by leveraging the trends present in related classification tasks. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on synthetic data and three real data applications demonstrate the superiority of the method with respect to a pooling strategy.
Tasks	Domain Generalization, Transfer Learning
Published	2017-11-21
URL	http://arxiv.org/abs/1711.07910v1
PDF	http://arxiv.org/pdf/1711.07910v1.pdf
PWC	https://paperswithcode.com/paper/domain-generalization-by-marginal-transfer
Repo	https://github.com/aniketde/DomainGeneralizationMarginal
Framework	none