May 7, 2019

2947 words 14 mins read

Paper Group AWR 84

Survey of resampling techniques for improving classification performance in unbalanced datasets. Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders. Texture Networks: Feed-forward Synthes …

Survey of resampling techniques for improving classification performance in unbalanced datasets


Title	Survey of resampling techniques for improving classification performance in unbalanced datasets
Authors	Ajinkya More
Abstract	A number of classification problems need to deal with data imbalance between classes. Often it is desired to have a high recall on the minority class while maintaining a high precision on the majority class. In this paper, we review a number of resampling techniques proposed in literature to handle unbalanced datasets and study their effect on classification performance.
Tasks
Published	2016-08-22
URL	http://arxiv.org/abs/1608.06048v1
PDF	http://arxiv.org/pdf/1608.06048v1.pdf
PWC	https://paperswithcode.com/paper/survey-of-resampling-techniques-for-improving
Repo	https://github.com/wpriyadarshani/Resampling_Techniques
Framework	none

Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment


Title	Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment
Authors	Erik Wijmans, Yasutaka Furukawa
Abstract	This paper presents a novel algorithm that utilizes a 2D floorplan to align panorama RGBD scans. While effective panorama RGBD alignment techniques exist, such a system requires extremely dense RGBD image sampling. Our approach can significantly reduce the number of necessary scans with the aid of a floorplan image. We formulate a novel Markov Random Field inference problem as a scan placement over the floorplan, as opposed to the conventional scan-to-scan alignment. The technical contributions lie in multi-modal image correspondence cues (between scans and schematic floorplan) as well as a novel coverage potential avoiding an inherent stacking bias. The proposed approach has been evaluated on five challenging large indoor spaces. To the best of our knowledge, we present the first effective system that utilizes a 2D floorplan image for building-scale 3D pointcloud alignment. The source code and the data will be shared with the community to further enhance indoor mapping research.
Tasks
Published	2016-12-08
URL	http://arxiv.org/abs/1612.02859v1
PDF	http://arxiv.org/pdf/1612.02859v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-2d-floorplan-for-building-scale
Repo	https://github.com/erikwijmans/WashU-Research
Framework	none

Learning to Communicate with Deep Multi-Agent Reinforcement Learning


Title	Learning to Communicate with Deep Multi-Agent Reinforcement Learning
Authors	Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson
Abstract	We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.
Tasks	Multi-agent Reinforcement Learning, Q-Learning
Published	2016-05-21
URL	http://arxiv.org/abs/1605.06676v2
PDF	http://arxiv.org/pdf/1605.06676v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-communicate-with-deep-multi-agent
Repo	https://github.com/iassael/learning-to-communicate
Framework	pytorch

Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders


Title	Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders
Authors	Nat Dilokthanakul, Pedro A. M. Mediano, Marta Garnelo, Matthew C. H. Lee, Hugh Salimbeni, Kai Arulkumaran, Murray Shanahan
Abstract	We study a variant of the variational autoencoder model (VAE) with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the known problem of over-regularisation that has been shown to arise in regular VAEs also manifests itself in our model and leads to cluster degeneracy. We show that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with our model. Furthermore we analyse the effect of this heuristic and provide an intuition of the various processes with the help of visualizations. Finally, we demonstrate the performance of our model on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving competitive performance on unsupervised clustering to the state-of-the-art results.
Tasks
Published	2016-11-08
URL	http://arxiv.org/abs/1611.02648v2
PDF	http://arxiv.org/pdf/1611.02648v2.pdf
PWC	https://paperswithcode.com/paper/deep-unsupervised-clustering-with-gaussian
Repo	https://github.com/lemon1215/VAE-GMVAE
Framework	tf

Texture Networks: Feed-forward Synthesis of Textures and Stylized Images


Title	Texture Networks: Feed-forward Synthesis of Textures and Stylized Images
Authors	Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, Victor Lempitsky
Abstract	Gatys et al. recently demonstrated that deep networks can generate beautiful textures and stylized images from a single texture example. However, their methods requires a slow and memory-consuming optimization process. We propose here an alternative approach that moves the computational burden to a learning stage. Given a single example of a texture, our approach trains compact feed-forward convolutional networks to generate multiple samples of the same texture of arbitrary size and to transfer artistic style from a given image to any other image. The resulting networks are remarkably light-weight and can generate textures of quality comparable to Gatys~et~al., but hundreds of times faster. More generally, our approach highlights the power and flexibility of generative feed-forward models trained with complex and expressive loss functions.
Tasks	Style Transfer
Published	2016-03-10
URL	http://arxiv.org/abs/1603.03417v1
PDF	http://arxiv.org/pdf/1603.03417v1.pdf
PWC	https://paperswithcode.com/paper/texture-networks-feed-forward-synthesis-of
Repo	https://github.com/zhanghang1989/PyTorch-Multi-Style-Transfer
Framework	pytorch

Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery


Title	Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery
Authors	Scott Wisdom, Thomas Powers, James Pitton, Les Atlas
Abstract	Recurrent neural networks (RNNs) are powerful and effective for processing sequential data. However, RNNs are usually considered “black box” models whose internal structure and learned parameters are not interpretable. In this paper, we propose an interpretable RNN based on the sequential iterative soft-thresholding algorithm (SISTA) for solving the sequential sparse recovery problem, which models a sequence of correlated observations with a sequence of sparse latent vectors. The architecture of the resulting SISTA-RNN is implicitly defined by the computational structure of SISTA, which results in a novel stacked RNN architecture. Furthermore, the weights of the SISTA-RNN are perfectly interpretable as the parameters of a principled statistical model, which in this case include a sparsifying dictionary, iterative step size, and regularization parameters. In addition, on a particular sequential compressive sensing task, the SISTA-RNN trains faster and achieves better performance than conventional state-of-the-art black box RNNs, including long-short term memory (LSTM) RNNs.
Tasks	Compressive Sensing
Published	2016-11-22
URL	http://arxiv.org/abs/1611.07252v1
PDF	http://arxiv.org/pdf/1611.07252v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-recurrent-neural-networks-using
Repo	https://github.com/stwisdom/sista-rnn
Framework	none

Ambient Sound Provides Supervision for Visual Learning


Title	Ambient Sound Provides Supervision for Visual Learning
Authors	Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba
Abstract	The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.
Tasks	Object Recognition
Published	2016-08-25
URL	http://arxiv.org/abs/1608.07017v2
PDF	http://arxiv.org/pdf/1608.07017v2.pdf
PWC	https://paperswithcode.com/paper/ambient-sound-provides-supervision-for-visual
Repo	https://github.com/rowhanm/ambient-sound-self-supervision
Framework	pytorch

F-measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets


Title	F-measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets
Authors	Maxime Gasse, Alex Aussem
Abstract	We discuss a method to improve the exact F-measure maximization algorithm called GFM, proposed in (Dembczynski et al. 2011) for multi-label classification, assuming the label set can be can partitioned into conditionally independent subsets given the input features. If the labels were all independent, the estimation of only $m$ parameters ($m$ denoting the number of labels) would suffice to derive Bayes-optimal predictions in $O(m^2)$ operations. In the general case, $m^2+1$ parameters are required by GFM, to solve the problem in $O(m^3)$ operations. In this work, we show that the number of parameters can be reduced further to $m^2/n$, in the best case, assuming the label set can be partitioned into $n$ conditionally independent subsets. As this label partition needs to be estimated from the data beforehand, we use first the procedure proposed in (Gasse et al. 2015) that finds such partition and then infer the required parameters locally in each label subset. The latter are aggregated and serve as input to GFM to form the Bayes-optimal prediction. We show on a synthetic experiment that the reduction in the number of parameters brings about significant benefits in terms of performance.
Tasks	Multi-Label Classification
Published	2016-04-26
URL	http://arxiv.org/abs/1604.07759v3
PDF	http://arxiv.org/pdf/1604.07759v3.pdf
PWC	https://paperswithcode.com/paper/f-measure-maximization-in-multi-label
Repo	https://github.com/gasse/fgfm-toy
Framework	none

Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction


Title	Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction
Authors	Weinan Zhang, Tianming Du, Jun Wang
Abstract	Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features that we usually found in the image and audio domains, the input features in web space are always of multi-field and are mostly discrete and categorical while their dependencies are little known. Major user response prediction models have to either limit themselves to linear models or require manually building up high-order combination features. The former loses the ability of exploring feature interactions, while the latter results in a heavy computation in the large feature space. To tackle the issue, we propose two novel models using deep neural networks (DNNs) to automatically learn effective patterns from categorical feature interactions and make predictions of users’ ad clicks. To get our DNNs efficiently work, we propose to leverage three feature transformation methods, i.e., factorisation machines (FMs), restricted Boltzmann machines (RBMs) and denoising auto-encoders (DAEs). This paper presents the structure of our models and their efficient training algorithms. The large-scale experiments with real-world data demonstrate that our methods work better than major state-of-the-art models.
Tasks	Click-Through Rate Prediction
Published	2016-01-11
URL	http://arxiv.org/abs/1601.02376v1
PDF	http://arxiv.org/pdf/1601.02376v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-over-multi-field-categorical
Repo	https://github.com/ddatta-DAC/Learning
Framework	tf

Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features


Title	Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features
Authors	Hamed R. -Tavakoli, Ali Borji, Jorma Laaksonen, Esa Rahtu
Abstract	This paper presents a novel fixation prediction and saliency modeling framework based on inter-image similarities and ensemble of Extreme Learning Machines (ELM). The proposed framework is inspired by two observations, 1) the contextual information of a scene along with low-level visual cues modulates attention, 2) the influence of scene memorability on eye movement patterns caused by the resemblance of a scene to a former visual experience. Motivated by such observations, we develop a framework that estimates the saliency of a given image using an ensemble of extreme learners, each trained on an image similar to the input image. That is, after retrieving a set of similar images for a given image, a saliency predictor is learnt from each of the images in the retrieved image set using an ELM, resulting in an ensemble. The saliency of the given image is then measured in terms of the mean of predicted saliency value by the ensemble’s members.
Tasks
Published	2016-10-20
URL	http://arxiv.org/abs/1610.06449v1
PDF	http://arxiv.org/pdf/1610.06449v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-inter-image-similarity-and
Repo	https://github.com/hrtavakoli/iseel
Framework	none

Neural Autoregressive Distribution Estimation


Title	Neural Autoregressive Distribution Estimation
Authors	Benigno Uria, Marc-Alexandre Côté, Karol Gregor, Iain Murray, Hugo Larochelle
Abstract	We present Neural Autoregressive Distribution Estimation (NADE) models, which are neural network architectures applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a weight sharing scheme inspired from restricted Boltzmann machines, to yield an estimator that is both tractable and has good generalization performance. We discuss how they achieve competitive performance in modeling both binary and real-valued observations. We also present how deep NADE models can be trained to be agnostic to the ordering of input dimensions used by the autoregressive product rule decomposition. Finally, we also show how to exploit the topological structure of pixels in images using a deep convolutional architecture for NADE.
Tasks	Density Estimation
Published	2016-05-07
URL	http://arxiv.org/abs/1605.02226v3
PDF	http://arxiv.org/pdf/1605.02226v3.pdf
PWC	https://paperswithcode.com/paper/neural-autoregressive-distribution-estimation
Repo	https://github.com/MarcCote/NADE
Framework	none

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets


Title	A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets
Authors	Srikrishna Karanam, Mengran Gou, Ziyan Wu, Angels Rates-Borras, Octavia Camps, Richard J. Radke
Abstract	Person re-identification (re-id) is a critical problem in video analytics applications such as security and surveillance. The public release of several datasets and code for vision algorithms has facilitated rapid progress in this area over the last few years. However, directly comparing re-id algorithms reported in the literature has become difficult since a wide variety of features, experimental protocols, and evaluation metrics are employed. In order to address this need, we present an extensive review and performance evaluation of single- and multi-shot re-id algorithms. The experimental protocol incorporates the most recent advances in both feature extraction and metric learning. To ensure a fair comparison, all of the approaches were implemented using a unified code library that includes 11 feature extraction algorithms and 22 metric learning and ranking techniques. All approaches were evaluated using a new large-scale dataset that closely mimics a real-world problem setting, in addition to 16 other publicly available datasets: VIPeR, GRID, CAVIAR, DukeMTMC4ReID, 3DPeS, PRID, V47, WARD, SAIVT-SoftBio, CUHK01, CHUK02, CUHK03, RAiD, iLIDSVID, HDA+ and Market1501. The evaluation codebase and results will be made publicly available for community use.
Tasks	Metric Learning, Person Re-Identification
Published	2016-05-31
URL	http://arxiv.org/abs/1605.09653v5
PDF	http://arxiv.org/pdf/1605.09653v5.pdf
PWC	https://paperswithcode.com/paper/a-systematic-evaluation-and-benchmark-for
Repo	https://github.com/NEU-Gou/awesome-reid-dataset
Framework	none

Pruning Filters for Efficient ConvNets


Title	Pruning Filters for Efficient ConvNets
Authors	Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf
Abstract	The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
Tasks	Image Classification
Published	2016-08-31
URL	http://arxiv.org/abs/1608.08710v3
PDF	http://arxiv.org/pdf/1608.08710v3.pdf
PWC	https://paperswithcode.com/paper/pruning-filters-for-efficient-convnets
Repo	https://github.com/marcoancona/TorchPruner
Framework	pytorch

The Predictron: End-To-End Learning and Planning


Title	The Predictron: End-To-End Learning and Planning
Authors	David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
Abstract	One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple “imagined” planning steps. Each forward pass of the predictron accumulates internal rewards and values over multiple planning depths. The predictron is trained end-to-end so as to make these accumulated values accurately approximate the true value function. We applied the predictron to procedurally generated random mazes and a simulator for the game of pool. The predictron yielded significantly more accurate predictions than conventional deep neural network architectures.
Tasks
Published	2016-12-28
URL	http://arxiv.org/abs/1612.08810v3
PDF	http://arxiv.org/pdf/1612.08810v3.pdf
PWC	https://paperswithcode.com/paper/the-predictron-end-to-end-learning-and
Repo	https://github.com/zhongwen/predictron
Framework	tf

A Point Set Generation Network for 3D Object Reconstruction from a Single Image


Title	A Point Set Generation Network for 3D Object Reconstruction from a Single Image
Authors	Haoqiang Fan, Hao Su, Leonidas Guibas
Abstract	Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output – point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions.
Tasks	3D Object Reconstruction, 3D Object Reconstruction From A Single Image, 3D Reconstruction, Object Reconstruction
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00603v2
PDF	http://arxiv.org/pdf/1612.00603v2.pdf
PWC	https://paperswithcode.com/paper/a-point-set-generation-network-for-3d-object
Repo	https://github.com/pointcloudlearning/3D-Deep-Learning-Paper-List
Framework	none