Paper Group AWR 294
Convolutional Analysis Operator Learning: Dependence on Training Data. The HSIC Bottleneck: Deep Learning without Back-Propagation. Decoupled Greedy Learning of CNNs. Iterative Normalization: Beyond Standardization towards Efficient Whitening. Provably Robust Blackbox Optimization for Reinforcement Learning. Hyperbolic Image Embeddings. FIGR: Few-s …
Convolutional Analysis Operator Learning: Dependence on Training Data
Title | Convolutional Analysis Operator Learning: Dependence on Training Data |
Authors | Il Yong Chun, David Hong, Ben Adcock, Jeffrey A. Fessler |
Abstract | Convolutional analysis operator learning (CAOL) enables the unsupervised training of (hierarchical) convolutional sparsifying operators or autoencoders from large datasets. One can use many training images for CAOL, but a precise understanding of the impact of doing so has remained an open question. This paper presents a series of results that lend insight into the impact of dataset size on the filter update in CAOL. The first result is a general deterministic bound on errors in the estimated filters, and is followed by a bound on the expected errors as the number of training samples increases. The second result provides a high probability analogue. The bounds depend on properties of the training data, and we investigate their empirical values with real data. Taken together, these results provide evidence for the potential benefit of using more training data in CAOL. |
Tasks | |
Published | 2019-02-21 |
URL | https://arxiv.org/abs/1902.08267v4 |
https://arxiv.org/pdf/1902.08267v4.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-analysis-operator-learning |
Repo | https://github.com/mechatoz/convolt |
Framework | none |
The HSIC Bottleneck: Deep Learning without Back-Propagation
Title | The HSIC Bottleneck: Deep Learning without Back-Propagation |
Authors | Wan-Duo Kurt Ma, J. P. Lewis, W. Bastiaan Kleijn |
Abstract | We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance. |
Tasks | |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01580v3 |
https://arxiv.org/pdf/1908.01580v3.pdf | |
PWC | https://paperswithcode.com/paper/the-hsic-bottleneck-deep-learning-without |
Repo | https://github.com/gusye1234/Pytorch-HSIC-bottleneck |
Framework | pytorch |
Decoupled Greedy Learning of CNNs
Title | Decoupled Greedy Learning of CNNs |
Authors | Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon |
Abstract | A commonly cited inefficiency of neural network training by back-propagation is the update locking problem: each layer must wait for the signal to propagate through the network before updating. In recent years multiple authors have considered alternatives that can alleviate this issue. In this context, we consider a simpler, but more effective, substitute that uses minimal feedback, which we call Decoupled Greedy Learning (DGL). It is based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization in layers. We show theoretically and empirically that this approach converges. Then, we empirically find that it can lead to better generalization than sequential greedy optimization and sometimes end-to-end back-propagation. We show an extension of this approach to asynchronous settings, where modules can operate with large communication delays, is possible with the use of a replay buffer. We demonstrate the effectiveness of DGL on the CIFAR-10 dataset against alternatives and on the large-scale ImageNet dataset. |
Tasks | Image Classification |
Published | 2019-01-23 |
URL | https://arxiv.org/abs/1901.08164v2 |
https://arxiv.org/pdf/1901.08164v2.pdf | |
PWC | https://paperswithcode.com/paper/decoupled-greedy-learning-of-cnns |
Repo | https://github.com/eugenium/DGL |
Framework | pytorch |
Iterative Normalization: Beyond Standardization towards Efficient Whitening
Title | Iterative Normalization: Beyond Standardization towards Efficient Whitening |
Authors | Lei Huang, Yi Zhou, Fan Zhu, Li Liu, Ling Shao |
Abstract | Batch Normalization (BN) is ubiquitously employed for accelerating neural network training and improving the generalization capability by performing standardization within mini-batches. Decorrelated Batch Normalization (DBN) further boosts the above effectiveness by whitening. However, DBN relies heavily on either a large batch size, or eigen-decomposition that suffers from poor efficiency on GPUs. We propose Iterative Normalization (IterNorm), which employs Newton’s iterations for much more efficient whitening, while simultaneously avoiding the eigen-decomposition. Furthermore, we develop a comprehensive study to show IterNorm has better trade-off between optimization and generalization, with theoretical and experimental support. To this end, we exclusively introduce Stochastic Normalization Disturbance (SND), which measures the inherent stochastic uncertainty of samples when applied to normalization operations. With the support of SND, we provide natural explanations to several phenomena from the perspective of optimization, e.g., why group-wise whitening of DBN generally outperforms full-whitening and why the accuracy of BN degenerates with reduced batch sizes. We demonstrate the consistently improved performance of IterNorm with extensive experiments on CIFAR-10 and ImageNet over BN and DBN. |
Tasks | |
Published | 2019-04-06 |
URL | http://arxiv.org/abs/1904.03441v1 |
http://arxiv.org/pdf/1904.03441v1.pdf | |
PWC | https://paperswithcode.com/paper/iterative-normalization-beyond |
Repo | https://github.com/huangleiBuaa/IterNorm-pytorch |
Framework | pytorch |
Provably Robust Blackbox Optimization for Reinforcement Learning
Title | Provably Robust Blackbox Optimization for Reinforcement Learning |
Authors | Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani |
Abstract | Interest in derivative-free optimization (DFO) and “evolutionary strategies” (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics. However, it is well known that DFO methods suffer from prohibitively high sampling complexity. They can also be very sensitive to noisy rewards and stochastic dynamics. In this paper, we propose a new class of algorithms, called Robust Blackbox Optimization (RBO). Remarkably, even if up to $23%$ of all the measurements are arbitrarily corrupted, RBO can provably recover gradients to high accuracy. RBO relies on learning gradient flows using robust regression methods to enable off-policy updates. On several MuJoCo robot control tasks, when all other RL approaches collapse in the presence of adversarial noise, RBO is able to train policies effectively. We also show that RBO can be applied to legged locomotion tasks including path tracking for quadruped robots. |
Tasks | Text-to-Image Generation |
Published | 2019-03-07 |
URL | https://arxiv.org/abs/1903.02993v2 |
https://arxiv.org/pdf/1903.02993v2.pdf | |
PWC | https://paperswithcode.com/paper/when-random-search-is-not-enough-sample |
Repo | https://github.com/FlorianWilk/SpotMicroAI |
Framework | none |
Hyperbolic Image Embeddings
Title | Hyperbolic Image Embeddings |
Authors | Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, Victor Lempitsky |
Abstract | Computer vision tasks such as image classification, image retrieval and few-shot learning are currently dominated by Euclidean and spherical embeddings, so that the final decisions about class belongings or the degree of similarity are made using linear hyperplanes, Euclidean distances, or spherical geodesic distances (cosine similarity). In this work, we demonstrate that in many practical scenarios hyperbolic embeddings provide a better alternative. |
Tasks | Few-Shot Learning, Image Classification, Image Retrieval |
Published | 2019-04-03 |
URL | https://arxiv.org/abs/1904.02239v2 |
https://arxiv.org/pdf/1904.02239v2.pdf | |
PWC | https://paperswithcode.com/paper/hyperbolic-image-embeddings |
Repo | https://github.com/leymir/hyperbolic-image-embeddings |
Framework | pytorch |
FIGR: Few-shot Image Generation with Reptile
Title | FIGR: Few-shot Image Generation with Reptile |
Authors | Louis Clouâtre, Marc Demers |
Abstract | Generative Adversarial Networks (GAN) boast impressive capacity to generate realistic images. However, like much of the field of deep learning, they require an inordinate amount of data to produce results, thereby limiting their usefulness in generating novelty. In the same vein, recent advances in meta-learning have opened the door to many few-shot learning applications. In the present work, we propose Few-shot Image Generation using Reptile (FIGR), a GAN meta-trained with Reptile. Our model successfully generates novel images on both MNIST and Omniglot with as little as 4 images from an unseen class. We further contribute FIGR-8, a new dataset for few-shot image generation, which contains 1,548,944 icons categorized in over 18,409 classes. Trained on FIGR-8, initial results show that our model can generalize to more advanced concepts (such as “bird” and “knife”) from as few as 8 samples from a previously unseen class of images and as little as 10 training steps through those 8 images. This work demonstrates the potential of training a GAN for few-shot image generation and aims to set a new benchmark for future work in the domain. |
Tasks | Few-Shot Learning, Image Generation, Meta-Learning, Omniglot |
Published | 2019-01-08 |
URL | http://arxiv.org/abs/1901.02199v1 |
http://arxiv.org/pdf/1901.02199v1.pdf | |
PWC | https://paperswithcode.com/paper/figr-few-shot-image-generation-with-reptile |
Repo | https://github.com/marcdemers/FIGR-8-SVG |
Framework | none |
Deep Learning for Large-Scale Traffic-Sign Detection and Recognition
Title | Deep Learning for Large-Scale Traffic-Sign Detection and Recognition |
Authors | Domen Tabernik, Danijel Skočaj |
Abstract | Automatic detection and recognition of traffic signs plays a crucial role in management of the traffic-sign inventory. It provides accurate and timely way to manage traffic-sign inventory with a minimal human effort. In the computer vision community the recognition and detection of traffic signs is a well-researched problem. A vast majority of existing approaches perform well on traffic signs needed for advanced drivers-assistance and autonomous systems. However, this represents a relatively small number of all traffic signs (around 50 categories out of several hundred) and performance on the remaining set of traffic signs, which are required to eliminate the manual labor in traffic-sign inventory management, remains an open question. In this paper, we address the issue of detecting and recognizing a large number of traffic-sign categories suitable for automating traffic-sign inventory management. We adopt a convolutional neural network (CNN) approach, the Mask R-CNN, to address the full pipeline of detection and recognition with automatic end-to-end learning. We propose several improvements that are evaluated on the detection of traffic signs and result in an improved overall performance. This approach is applied to detection of 200 traffic-sign categories represented in our novel dataset. Results are reported on highly challenging traffic-sign categories that have not yet been considered in previous works. We provide comprehensive analysis of the deep learning method for the detection of traffic signs with large intra-category appearance variation and show below 3% error rates with the proposed approach, which is sufficient for deployment in practical applications of traffic-sign inventory management. |
Tasks | Traffic Sign Recognition |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.00649v1 |
http://arxiv.org/pdf/1904.00649v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-large-scale-traffic-sign |
Repo | https://github.com/skokec/detectron-traffic-signs |
Framework | none |
End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model
Title | End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model |
Authors | Daniel Stoller, Simon Durand, Sebastian Ewert |
Abstract | Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training required fine-grained annotations to be available in some form. Here, we present a novel system based on a modified Wave-U-Net architecture, which predicts character probabilities directly from raw audio using learnt multi-scale representations of the various signal components. There are no sub-modules whose interdependencies need to be optimized. Our training procedure is designed to work with weak, line-level annotations available in the real world. With a mean alignment error of 0.35s on a standard dataset our system outperforms the state-of-the-art by an order of magnitude. |
Tasks | |
Published | 2019-02-18 |
URL | http://arxiv.org/abs/1902.06797v1 |
http://arxiv.org/pdf/1902.06797v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-lyrics-alignment-for-polyphonic |
Repo | https://github.com/f90/jamendolyrics |
Framework | none |
Fine-Grained Entity Typing in Hyperbolic Space
Title | Fine-Grained Entity Typing in Hyperbolic Space |
Authors | Federico López, Benjamin Heinzerling, Michael Strube |
Abstract | How can we represent hierarchical information present in large type inventories for entity typing? We study the ability of hyperbolic embeddings to capture hierarchical relations between mentions in context and their target types in a shared vector space. We evaluate on two datasets and investigate two different techniques for creating a large hierarchical entity type inventory: from an expert-generated ontology and by automatically mining type co-occurrences. We find that the hyperbolic model yields improvements over its Euclidean counterpart in some, but not all cases. Our analysis suggests that the adequacy of this geometry depends on the granularity of the type inventory and the way hierarchical relations are inferred. |
Tasks | Entity Typing |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02505v1 |
https://arxiv.org/pdf/1906.02505v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-entity-typing-in-hyperbolic |
Repo | https://github.com/nlpAThits/figet-hyperbolic-space |
Framework | pytorch |
Avoiding Implementation Pitfalls of “Matrix Capsules with EM Routing” by Hinton et al
Title | Avoiding Implementation Pitfalls of “Matrix Capsules with EM Routing” by Hinton et al |
Authors | Ashley Daniel Gritzman |
Abstract | The recent progress on capsule networks by Hinton et al. has generated considerable excitement in the machine learning community. The idea behind a capsule is inspired by a cortical minicolumn in the brain, whereby a vertically organised group of around 100 neurons receive common inputs, have common outputs, are interconnected, and may well constitute a fundamental computation unit of the cerebral cortex. However, Hinton’s paper on “Matrix Capsule with EM Routing’” was unfortunately not accompanied by a release of source code, which left interested researchers attempting to implement the architecture and reproduce the benchmarks on their own. This has certainly slowed the progress of research building on this work. While writing our own implementation, we noticed several common mistakes in other open source implementations that we came across. In this paper we share some of these learnings, specifically focusing on three implementation pitfalls and how to avoid them: (1) parent capsules with only one child; (2) normalising the amount of data assigned to parent capsules; (3) parent capsules at different positions compete for child capsules. While our implementation is a considerable improvement over currently available implementations, it still falls slightly short of the performance reported by Hinton et al. (2018). The source code for this implementation is available on GitHub at the following URL: https://github.com/IBM/matrix-capsules-with-em-routing. |
Tasks | |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00652v1 |
https://arxiv.org/pdf/1907.00652v1.pdf | |
PWC | https://paperswithcode.com/paper/avoiding-implementation-pitfalls-of-matrix |
Repo | https://github.com/IBM/matrix-capsules-with-em-routing |
Framework | tf |
Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks
Title | Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks |
Authors | Sitao Luan, Mingde Zhao, Xiao-Wen Chang, Doina Precup |
Abstract | Recently, neural network based approaches have achieved significant improvement for solving large, complex, graph-structured problems. However, their bottlenecks still need to be addressed, and the advantages of multi-scale information and deep architectures have not been sufficiently exploited. In this paper, we theoretically analyze how existing Graph Convolutional Networks (GCNs) have limited expressive power due to the constraint of the activation functions and their architectures. We generalize spectral graph convolution and deep GCN in block Krylov subspace forms and devise two architectures, both with the potential to be scaled deeper but each making use of the multi-scale information in different ways. We further show that the equivalence of these two architectures can be established under certain conditions. On several node classification tasks, with or without the help of validation, the two new architectures achieve better performance compared to many state-of-the-art methods. |
Tasks | Node Classification |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02174v3 |
https://arxiv.org/pdf/1906.02174v3.pdf | |
PWC | https://paperswithcode.com/paper/break-the-ceiling-stronger-multi-scale-deep |
Repo | https://github.com/PwnerHarry/Stronger_GCN |
Framework | pytorch |
Attentive Deep Regression Networks for Real-Time Visual Face Tracking in Video Surveillance
Title | Attentive Deep Regression Networks for Real-Time Visual Face Tracking in Video Surveillance |
Authors | Safa Alver, Ugur Halici |
Abstract | Visual face tracking is one of the most important tasks in video surveillance systems. However, due to the variations in pose, scale, expression, and illumination it is considered to be a difficult task. Recent studies show that deep learning methods have a significant potential in object tracking tasks and adaptive feature selection methods can boost their performance. Motivated by these, we propose an end-to-end attentive deep learning based tracker, that is build on top of the state-of-the-art GOTURN tracker, for the task of real-time visual face tracking in video surveillance. Our method outperforms the state-of-the-art GOTURN and IVT trackers by very large margins and it achieves speeds that are very far beyond the requirements of real-time tracking. Additionally, to overcome the scarce data problem in visual face tracking, we also provide bounding box annotations for the G1 and G2 sets of ChokePoint dataset and make it suitable for further studies in face tracking under surveillance conditions. |
Tasks | Feature Selection, Object Tracking |
Published | 2019-08-10 |
URL | https://arxiv.org/abs/1908.03812v1 |
https://arxiv.org/pdf/1908.03812v1.pdf | |
PWC | https://paperswithcode.com/paper/attentive-deep-regression-networks-for-real |
Repo | https://github.com/alversafa/chokepoint-bbs |
Framework | none |
Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation
Title | Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation |
Authors | Jaime Spencer, Richard Bowden, Simon Hadfield |
Abstract | How do computers and intelligent agents view the world around them? Feature extraction and representation constitutes one the basic building blocks towards answering this question. Traditionally, this has been done with carefully engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is no ``one size fits all’’ approach that satisfies all requirements. In recent years, the rising popularity of deep learning has resulted in a myriad of end-to-end solutions to many computer vision problems. These approaches, while successful, tend to lack scalability and can’t easily exploit information learned by other systems. Instead, we propose SAND features, a dedicated deep learning solution to feature extraction capable of providing hierarchical context information. This is achieved by employing sparse relative labels indicating relationships of similarity/dissimilarity between image locations. The nature of these labels results in an almost infinite set of dissimilar examples to choose from. We demonstrate how the selection of negative examples during training can be used to modify the feature space and vary it’s properties. To demonstrate the generality of this approach, we apply the proposed features to a multitude of tasks, each requiring different properties. This includes disparity estimation, semantic segmentation, self-localisation and SLAM. In all cases, we show how incorporating SAND features results in better or comparable results to the baseline, whilst requiring little to no additional training. Code can be found at: https://github.com/jspenmar/SAND_features | |
Tasks | Disparity Estimation, Semantic Segmentation |
Published | 2019-03-25 |
URL | http://arxiv.org/abs/1903.10427v1 |
http://arxiv.org/pdf/1903.10427v1.pdf | |
PWC | https://paperswithcode.com/paper/scale-adaptive-neural-dense-features-learning |
Repo | https://github.com/jspenmar/SAND_features |
Framework | pytorch |
Improving Robot Success Detection using Static Object Data
Title | Improving Robot Success Detection using Static Object Data |
Authors | Rosario Scalise, Jesse Thomason, Yonatan Bisk, Siddhartha Srinivasa |
Abstract | We use static object data to improve success detection for stacking objects on and nesting objects in one another. Such actions are necessary for certain robotics tasks, e.g., clearing a dining table or packing a warehouse bin. However, using an RGB-D camera to detect success can be insufficient: same-colored objects can be difficult to differentiate, and reflective silverware cause noisy depth camera perception. We show that adding static data about the objects themselves improves the performance of an end-to-end pipeline for classifying action outcomes. Images of the objects, and language expressions describing them, encode prior geometry, shape, and size information that refine classification accuracy. We collect over 13 hours of egocentric manipulation data for training a model to reason about whether a robot successfully placed unseen objects in or on one another. The model achieves up to a 57% absolute gain over the task baseline on pairs of previously unseen objects. |
Tasks | |
Published | 2019-04-02 |
URL | https://arxiv.org/abs/1904.01650v2 |
https://arxiv.org/pdf/1904.01650v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-robot-success-detection-using |
Repo | https://github.com/thomason-jesse/YCBLanguage |
Framework | pytorch |