Paper Group AWR 84
Survey of resampling techniques for improving classification performance in unbalanced datasets. Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders. Texture Networks: Feed-forward Synthes …
Survey of resampling techniques for improving classification performance in unbalanced datasets
Title | Survey of resampling techniques for improving classification performance in unbalanced datasets |
Authors | Ajinkya More |
Abstract | A number of classification problems need to deal with data imbalance between classes. Often it is desired to have a high recall on the minority class while maintaining a high precision on the majority class. In this paper, we review a number of resampling techniques proposed in literature to handle unbalanced datasets and study their effect on classification performance. |
Tasks | |
Published | 2016-08-22 |
URL | http://arxiv.org/abs/1608.06048v1 |
http://arxiv.org/pdf/1608.06048v1.pdf | |
PWC | https://paperswithcode.com/paper/survey-of-resampling-techniques-for-improving |
Repo | https://github.com/wpriyadarshani/Resampling_Techniques |
Framework | none |
Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment
Title | Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment |
Authors | Erik Wijmans, Yasutaka Furukawa |
Abstract | This paper presents a novel algorithm that utilizes a 2D floorplan to align panorama RGBD scans. While effective panorama RGBD alignment techniques exist, such a system requires extremely dense RGBD image sampling. Our approach can significantly reduce the number of necessary scans with the aid of a floorplan image. We formulate a novel Markov Random Field inference problem as a scan placement over the floorplan, as opposed to the conventional scan-to-scan alignment. The technical contributions lie in multi-modal image correspondence cues (between scans and schematic floorplan) as well as a novel coverage potential avoiding an inherent stacking bias. The proposed approach has been evaluated on five challenging large indoor spaces. To the best of our knowledge, we present the first effective system that utilizes a 2D floorplan image for building-scale 3D pointcloud alignment. The source code and the data will be shared with the community to further enhance indoor mapping research. |
Tasks | |
Published | 2016-12-08 |
URL | http://arxiv.org/abs/1612.02859v1 |
http://arxiv.org/pdf/1612.02859v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-2d-floorplan-for-building-scale |
Repo | https://github.com/erikwijmans/WashU-Research |
Framework | none |
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
Title | Learning to Communicate with Deep Multi-Agent Reinforcement Learning |
Authors | Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson |
Abstract | We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains. |
Tasks | Multi-agent Reinforcement Learning, Q-Learning |
Published | 2016-05-21 |
URL | http://arxiv.org/abs/1605.06676v2 |
http://arxiv.org/pdf/1605.06676v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-communicate-with-deep-multi-agent |
Repo | https://github.com/iassael/learning-to-communicate |
Framework | pytorch |
Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders
Title | Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders |
Authors | Nat Dilokthanakul, Pedro A. M. Mediano, Marta Garnelo, Matthew C. H. Lee, Hugh Salimbeni, Kai Arulkumaran, Murray Shanahan |
Abstract | We study a variant of the variational autoencoder model (VAE) with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the known problem of over-regularisation that has been shown to arise in regular VAEs also manifests itself in our model and leads to cluster degeneracy. We show that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with our model. Furthermore we analyse the effect of this heuristic and provide an intuition of the various processes with the help of visualizations. Finally, we demonstrate the performance of our model on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving competitive performance on unsupervised clustering to the state-of-the-art results. |
Tasks | |
Published | 2016-11-08 |
URL | http://arxiv.org/abs/1611.02648v2 |
http://arxiv.org/pdf/1611.02648v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-unsupervised-clustering-with-gaussian |
Repo | https://github.com/lemon1215/VAE-GMVAE |
Framework | tf |
Texture Networks: Feed-forward Synthesis of Textures and Stylized Images
Title | Texture Networks: Feed-forward Synthesis of Textures and Stylized Images |
Authors | Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, Victor Lempitsky |
Abstract | Gatys et al. recently demonstrated that deep networks can generate beautiful textures and stylized images from a single texture example. However, their methods requires a slow and memory-consuming optimization process. We propose here an alternative approach that moves the computational burden to a learning stage. Given a single example of a texture, our approach trains compact feed-forward convolutional networks to generate multiple samples of the same texture of arbitrary size and to transfer artistic style from a given image to any other image. The resulting networks are remarkably light-weight and can generate textures of quality comparable to Gatys~et~al., but hundreds of times faster. More generally, our approach highlights the power and flexibility of generative feed-forward models trained with complex and expressive loss functions. |
Tasks | Style Transfer |
Published | 2016-03-10 |
URL | http://arxiv.org/abs/1603.03417v1 |
http://arxiv.org/pdf/1603.03417v1.pdf | |
PWC | https://paperswithcode.com/paper/texture-networks-feed-forward-synthesis-of |
Repo | https://github.com/zhanghang1989/PyTorch-Multi-Style-Transfer |
Framework | pytorch |
Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery
Title | Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery |
Authors | Scott Wisdom, Thomas Powers, James Pitton, Les Atlas |
Abstract | Recurrent neural networks (RNNs) are powerful and effective for processing sequential data. However, RNNs are usually considered “black box” models whose internal structure and learned parameters are not interpretable. In this paper, we propose an interpretable RNN based on the sequential iterative soft-thresholding algorithm (SISTA) for solving the sequential sparse recovery problem, which models a sequence of correlated observations with a sequence of sparse latent vectors. The architecture of the resulting SISTA-RNN is implicitly defined by the computational structure of SISTA, which results in a novel stacked RNN architecture. Furthermore, the weights of the SISTA-RNN are perfectly interpretable as the parameters of a principled statistical model, which in this case include a sparsifying dictionary, iterative step size, and regularization parameters. In addition, on a particular sequential compressive sensing task, the SISTA-RNN trains faster and achieves better performance than conventional state-of-the-art black box RNNs, including long-short term memory (LSTM) RNNs. |
Tasks | Compressive Sensing |
Published | 2016-11-22 |
URL | http://arxiv.org/abs/1611.07252v1 |
http://arxiv.org/pdf/1611.07252v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-recurrent-neural-networks-using |
Repo | https://github.com/stwisdom/sista-rnn |
Framework | none |
Ambient Sound Provides Supervision for Visual Learning
Title | Ambient Sound Provides Supervision for Visual Learning |
Authors | Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba |
Abstract | The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds. |
Tasks | Object Recognition |
Published | 2016-08-25 |
URL | http://arxiv.org/abs/1608.07017v2 |
http://arxiv.org/pdf/1608.07017v2.pdf | |
PWC | https://paperswithcode.com/paper/ambient-sound-provides-supervision-for-visual |
Repo | https://github.com/rowhanm/ambient-sound-self-supervision |
Framework | pytorch |
F-measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets
Title | F-measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets |
Authors | Maxime Gasse, Alex Aussem |
Abstract | We discuss a method to improve the exact F-measure maximization algorithm called GFM, proposed in (Dembczynski et al. 2011) for multi-label classification, assuming the label set can be can partitioned into conditionally independent subsets given the input features. If the labels were all independent, the estimation of only $m$ parameters ($m$ denoting the number of labels) would suffice to derive Bayes-optimal predictions in $O(m^2)$ operations. In the general case, $m^2+1$ parameters are required by GFM, to solve the problem in $O(m^3)$ operations. In this work, we show that the number of parameters can be reduced further to $m^2/n$, in the best case, assuming the label set can be partitioned into $n$ conditionally independent subsets. As this label partition needs to be estimated from the data beforehand, we use first the procedure proposed in (Gasse et al. 2015) that finds such partition and then infer the required parameters locally in each label subset. The latter are aggregated and serve as input to GFM to form the Bayes-optimal prediction. We show on a synthetic experiment that the reduction in the number of parameters brings about significant benefits in terms of performance. |
Tasks | Multi-Label Classification |
Published | 2016-04-26 |
URL | http://arxiv.org/abs/1604.07759v3 |
http://arxiv.org/pdf/1604.07759v3.pdf | |
PWC | https://paperswithcode.com/paper/f-measure-maximization-in-multi-label |
Repo | https://github.com/gasse/fgfm-toy |
Framework | none |
Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction
Title | Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction |
Authors | Weinan Zhang, Tianming Du, Jun Wang |
Abstract | Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features that we usually found in the image and audio domains, the input features in web space are always of multi-field and are mostly discrete and categorical while their dependencies are little known. Major user response prediction models have to either limit themselves to linear models or require manually building up high-order combination features. The former loses the ability of exploring feature interactions, while the latter results in a heavy computation in the large feature space. To tackle the issue, we propose two novel models using deep neural networks (DNNs) to automatically learn effective patterns from categorical feature interactions and make predictions of users’ ad clicks. To get our DNNs efficiently work, we propose to leverage three feature transformation methods, i.e., factorisation machines (FMs), restricted Boltzmann machines (RBMs) and denoising auto-encoders (DAEs). This paper presents the structure of our models and their efficient training algorithms. The large-scale experiments with real-world data demonstrate that our methods work better than major state-of-the-art models. |
Tasks | Click-Through Rate Prediction |
Published | 2016-01-11 |
URL | http://arxiv.org/abs/1601.02376v1 |
http://arxiv.org/pdf/1601.02376v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-over-multi-field-categorical |
Repo | https://github.com/ddatta-DAC/Learning |
Framework | tf |
Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features
Title | Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features |
Authors | Hamed R. -Tavakoli, Ali Borji, Jorma Laaksonen, Esa Rahtu |
Abstract | This paper presents a novel fixation prediction and saliency modeling framework based on inter-image similarities and ensemble of Extreme Learning Machines (ELM). The proposed framework is inspired by two observations, 1) the contextual information of a scene along with low-level visual cues modulates attention, 2) the influence of scene memorability on eye movement patterns caused by the resemblance of a scene to a former visual experience. Motivated by such observations, we develop a framework that estimates the saliency of a given image using an ensemble of extreme learners, each trained on an image similar to the input image. That is, after retrieving a set of similar images for a given image, a saliency predictor is learnt from each of the images in the retrieved image set using an ELM, resulting in an ensemble. The saliency of the given image is then measured in terms of the mean of predicted saliency value by the ensemble’s members. |
Tasks | |
Published | 2016-10-20 |
URL | http://arxiv.org/abs/1610.06449v1 |
http://arxiv.org/pdf/1610.06449v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-inter-image-similarity-and |
Repo | https://github.com/hrtavakoli/iseel |
Framework | none |
Neural Autoregressive Distribution Estimation
Title | Neural Autoregressive Distribution Estimation |
Authors | Benigno Uria, Marc-Alexandre Côté, Karol Gregor, Iain Murray, Hugo Larochelle |
Abstract | We present Neural Autoregressive Distribution Estimation (NADE) models, which are neural network architectures applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a weight sharing scheme inspired from restricted Boltzmann machines, to yield an estimator that is both tractable and has good generalization performance. We discuss how they achieve competitive performance in modeling both binary and real-valued observations. We also present how deep NADE models can be trained to be agnostic to the ordering of input dimensions used by the autoregressive product rule decomposition. Finally, we also show how to exploit the topological structure of pixels in images using a deep convolutional architecture for NADE. |
Tasks | Density Estimation |
Published | 2016-05-07 |
URL | http://arxiv.org/abs/1605.02226v3 |
http://arxiv.org/pdf/1605.02226v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-autoregressive-distribution-estimation |
Repo | https://github.com/MarcCote/NADE |
Framework | none |
A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets
Title | A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets |
Authors | Srikrishna Karanam, Mengran Gou, Ziyan Wu, Angels Rates-Borras, Octavia Camps, Richard J. Radke |
Abstract | Person re-identification (re-id) is a critical problem in video analytics applications such as security and surveillance. The public release of several datasets and code for vision algorithms has facilitated rapid progress in this area over the last few years. However, directly comparing re-id algorithms reported in the literature has become difficult since a wide variety of features, experimental protocols, and evaluation metrics are employed. In order to address this need, we present an extensive review and performance evaluation of single- and multi-shot re-id algorithms. The experimental protocol incorporates the most recent advances in both feature extraction and metric learning. To ensure a fair comparison, all of the approaches were implemented using a unified code library that includes 11 feature extraction algorithms and 22 metric learning and ranking techniques. All approaches were evaluated using a new large-scale dataset that closely mimics a real-world problem setting, in addition to 16 other publicly available datasets: VIPeR, GRID, CAVIAR, DukeMTMC4ReID, 3DPeS, PRID, V47, WARD, SAIVT-SoftBio, CUHK01, CHUK02, CUHK03, RAiD, iLIDSVID, HDA+ and Market1501. The evaluation codebase and results will be made publicly available for community use. |
Tasks | Metric Learning, Person Re-Identification |
Published | 2016-05-31 |
URL | http://arxiv.org/abs/1605.09653v5 |
http://arxiv.org/pdf/1605.09653v5.pdf | |
PWC | https://paperswithcode.com/paper/a-systematic-evaluation-and-benchmark-for |
Repo | https://github.com/NEU-Gou/awesome-reid-dataset |
Framework | none |
Pruning Filters for Efficient ConvNets
Title | Pruning Filters for Efficient ConvNets |
Authors | Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf |
Abstract | The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks. |
Tasks | Image Classification |
Published | 2016-08-31 |
URL | http://arxiv.org/abs/1608.08710v3 |
http://arxiv.org/pdf/1608.08710v3.pdf | |
PWC | https://paperswithcode.com/paper/pruning-filters-for-efficient-convnets |
Repo | https://github.com/marcoancona/TorchPruner |
Framework | pytorch |
The Predictron: End-To-End Learning and Planning
Title | The Predictron: End-To-End Learning and Planning |
Authors | David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris |
Abstract | One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple “imagined” planning steps. Each forward pass of the predictron accumulates internal rewards and values over multiple planning depths. The predictron is trained end-to-end so as to make these accumulated values accurately approximate the true value function. We applied the predictron to procedurally generated random mazes and a simulator for the game of pool. The predictron yielded significantly more accurate predictions than conventional deep neural network architectures. |
Tasks | |
Published | 2016-12-28 |
URL | http://arxiv.org/abs/1612.08810v3 |
http://arxiv.org/pdf/1612.08810v3.pdf | |
PWC | https://paperswithcode.com/paper/the-predictron-end-to-end-learning-and |
Repo | https://github.com/zhongwen/predictron |
Framework | tf |
A Point Set Generation Network for 3D Object Reconstruction from a Single Image
Title | A Point Set Generation Network for 3D Object Reconstruction from a Single Image |
Authors | Haoqiang Fan, Hao Su, Leonidas Guibas |
Abstract | Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output – point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions. |
Tasks | 3D Object Reconstruction, 3D Object Reconstruction From A Single Image, 3D Reconstruction, Object Reconstruction |
Published | 2016-12-02 |
URL | http://arxiv.org/abs/1612.00603v2 |
http://arxiv.org/pdf/1612.00603v2.pdf | |
PWC | https://paperswithcode.com/paper/a-point-set-generation-network-for-3d-object |
Repo | https://github.com/pointcloudlearning/3D-Deep-Learning-Paper-List |
Framework | none |