July 29, 2019

3117 words 15 mins read

Paper Group AWR 148

Neural Discrete Representation Learning. Geometric Insights into Support Vector Machine Behavior using the KKT Conditions. Robust Estimation of Similarity Transformation for Visual Object Tracking. Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras. Multi-view Low-rank Sparse Subspace Cluste …

Neural Discrete Representation Learning


Title	Neural Discrete Representation Learning
Authors	Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
Abstract	Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of “posterior collapse” – where the latents are ignored when they are paired with a powerful autoregressive decoder – typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.
Tasks	Representation Learning
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00937v2
PDF	http://arxiv.org/pdf/1711.00937v2.pdf
PWC	https://paperswithcode.com/paper/neural-discrete-representation-learning
Repo	https://github.com/jellycsc/softmax-vqvae
Framework	pytorch

Geometric Insights into Support Vector Machine Behavior using the KKT Conditions


Title	Geometric Insights into Support Vector Machine Behavior using the KKT Conditions
Authors	Iain Carmichael, J. S. Marron
Abstract	The support vector machine (SVM) is a powerful and widely used classification algorithm. This paper uses the Karush-Kuhn-Tucker conditions to provide rigorous mathematical proof for new insights into the behavior of SVM. These insights provide perhaps unexpected relationships between SVM and two other linear classifiers: the mean difference and the maximal data piling direction. For example, we show that in many cases SVM can be viewed as a cropped version of these classifiers. By carefully exploring these connections we show how SVM tuning behavior is affected by characteristics including: balanced vs. unbalanced classes, low vs. high dimension, separable vs. non-separable data. These results provide further insights into tuning SVM via cross-validation by explaining observed pathological behavior and motivating improved cross-validation methodology. Finally, we also provide new results on the geometry of complete data piling directions in high dimensional space.
Tasks
Published	2017-04-03
URL	http://arxiv.org/abs/1704.00767v2
PDF	http://arxiv.org/pdf/1704.00767v2.pdf
PWC	https://paperswithcode.com/paper/geometric-insights-into-support-vector
Repo	https://github.com/idc9/svm_geometry
Framework	none

Robust Estimation of Similarity Transformation for Visual Object Tracking


Title	Robust Estimation of Similarity Transformation for Visual Object Tracking
Authors	Yang Li, Jianke Zhu, Steven C. H. Hoi, Wenjie Song, Zhefeng Wang, Hantang Liu
Abstract	Most of existing correlation filter-based tracking approaches only estimate simple axis-aligned bounding boxes, and very few of them is capable of recovering the underlying similarity transformation. To tackle this challenging problem, in this paper, we propose a new correlation filter-based tracker with a novel robust estimation of similarity transformation on the large displacements. In order to efficiently search in such a large 4-DoF space in real-time, we formulate the problem into two 2-DoF sub-problems and apply an efficient Block Coordinates Descent solver to optimize the estimation result. Specifically, we employ an efficient phase correlation scheme to deal with both scale and rotation changes simultaneously in log-polar coordinates. Moreover, a variant of correlation filter is used to predict the translational motion individually. Our experimental results demonstrate that the proposed tracker achieves very promising prediction performance compared with the state-of-the-art visual object tracking methods while still retaining the advantages of high efficiency and simplicity in conventional correlation filter-based tracking methods.
Tasks	Object Tracking, Visual Object Tracking
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05231v2
PDF	http://arxiv.org/pdf/1712.05231v2.pdf
PWC	https://paperswithcode.com/paper/robust-estimation-of-similarity
Repo	https://github.com/ihpdep/LDES
Framework	none

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras


Title	Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras
Authors	Keunwoo Choi, Deokjin Joo, Juho Kim
Abstract	We introduce Kapre, Keras layers for audio and music signal preprocessing. Music research using deep neural networks requires a heavy and tedious preprocessing stage, for which audio processing parameters are often ignored in parameter optimisation. To solve this problem, Kapre implements time-frequency conversions, normalisation, and data augmentation as Keras layers. We report simple benchmark results, showing real-time on-GPU preprocessing adds a reasonable amount of computation.
Tasks	Data Augmentation
Published	2017-06-19
URL	http://arxiv.org/abs/1706.05781v1
PDF	http://arxiv.org/pdf/1706.05781v1.pdf
PWC	https://paperswithcode.com/paper/kapre-on-gpu-audio-preprocessing-layers-for-a
Repo	https://github.com/keunwoochoi/kapre
Framework	tf

Multi-view Low-rank Sparse Subspace Clustering


Title	Multi-view Low-rank Sparse Subspace Clustering
Authors	Maria Brbic, Ivica Kopriva
Abstract	Most existing approaches address multi-view subspace clustering problem by constructing the affinity matrix on each view separately and afterwards propose how to extend spectral clustering algorithm to handle multi-view data. This paper presents an approach to multi-view subspace clustering that learns a joint subspace representation by constructing affinity matrix shared among all views. Relying on the importance of both low-rank and sparsity constraints in the construction of the affinity matrix, we introduce the objective that balances between the agreement across different views, while at the same time encourages sparsity and low-rankness of the solution. Related low-rank and sparsity constrained optimization problem is for each view solved using the alternating direction method of multipliers. Furthermore, we extend our approach to cluster data drawn from nonlinear subspaces by solving the corresponding problem in a reproducing kernel Hilbert space. The proposed algorithm outperforms state-of-the-art multi-view subspace clustering algorithms on one synthetic and four real-world datasets.
Tasks	Multi-view Subspace Clustering
Published	2017-08-29
URL	http://arxiv.org/abs/1708.08732v1
PDF	http://arxiv.org/pdf/1708.08732v1.pdf
PWC	https://paperswithcode.com/paper/multi-view-low-rank-sparse-subspace
Repo	https://github.com/mbrbic/MultiViewLRSSC
Framework	none

HDLTex: Hierarchical Deep Learning for Text Classification


Title	HDLTex: Hierarchical Deep Learning for Text Classification
Authors	Kamran Kowsari, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber, Laura E. Barnes
Abstract	The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number of documents has increased. This is because along with this growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy.
Tasks	Document Classification, Text Classification
Published	2017-09-24
URL	http://arxiv.org/abs/1709.08267v2
PDF	http://arxiv.org/pdf/1709.08267v2.pdf
PWC	https://paperswithcode.com/paper/hdltex-hierarchical-deep-learning-for-text
Repo	https://github.com/kk7nc/HDLTex
Framework	tf

Wavelet Domain Residual Network (WavResNet) for Low-Dose X-ray CT Reconstruction


Title	Wavelet Domain Residual Network (WavResNet) for Low-Dose X-ray CT Reconstruction
Authors	Eunhee Kang, junhong Min, Jong Chul Ye
Abstract	Model based iterative reconstruction (MBIR) algorithms for low-dose X-ray CT are computationally complex because of the repeated use of the forward and backward projection. Inspired by this success of deep learning in computer vision applications, we recently proposed a deep convolutional neural network (CNN) for low-dose X-ray CT and won the second place in 2016 AAPM Low-Dose CT Grand Challenge. However, some of the texture are not fully recovered, which was unfamiliar to some radiologists. To cope with this problem, here we propose a direct residual learning approach on directional wavelet domain to solve this problem and to improve the performance against previous work. In particular, the new network estimates the noise of each input wavelet transform, and then the de-noised wavelet coefficients are obtained by subtracting the noise from the input wavelet transform bands. The experimental results confirm that the proposed network has significantly improved performance, preserving the detail texture of the original images.
Tasks	Low-Dose X-Ray Ct Reconstruction
Published	2017-03-04
URL	http://arxiv.org/abs/1703.01383v1
PDF	http://arxiv.org/pdf/1703.01383v1.pdf
PWC	https://paperswithcode.com/paper/wavelet-domain-residual-network-wavresnet-for
Repo	https://github.com/eunh/low_dose_CT
Framework	none

Metric Learning for Generalizing Spatial Relations to New Objects


Title	Metric Learning for Generalizing Spatial Relations to New Objects
Authors	Oier Mees, Nichola Abdo, Mladen Mazuran, Wolfram Burgard
Abstract	Human-centered environments are rich with a wide variety of spatial relations between everyday objects. For autonomous robots to operate effectively in such environments, they should be able to reason about these relations and generalize them to objects with different shapes and sizes. For example, having learned to place a toy inside a basket, a robot should be able to generalize this concept using a spoon and a cup. This requires a robot to have the flexibility to learn arbitrary relations in a lifelong manner, making it challenging for an expert to pre-program it with sufficient knowledge to do so beforehand. In this paper, we address the problem of learning spatial relations by introducing a novel method from the perspective of distance metric learning. Our approach enables a robot to reason about the similarity between pairwise spatial relations, thereby enabling it to use its previous knowledge when presented with a new relation to imitate. We show how this makes it possible to learn arbitrary spatial relations from non-expert users using a small number of examples and in an interactive manner. Our extensive evaluation with real-world data demonstrates the effectiveness of our method in reasoning about a continuous spectrum of spatial relations and generalizing them to new objects.
Tasks	Metric Learning
Published	2017-03-06
URL	http://arxiv.org/abs/1703.01946v3
PDF	http://arxiv.org/pdf/1703.01946v3.pdf
PWC	https://paperswithcode.com/paper/metric-learning-for-generalizing-spatial
Repo	https://github.com/mees/generalize_spatial_relations
Framework	none

ICNet for Real-Time Semantic Segmentation on High-Resolution Images


Title	ICNet for Real-Time Semantic Segmentation on High-Resolution Images
Authors	Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia
Abstract	We focus on the challenging task of real-time semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. We provide in-depth analysis of our framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. Our system yields real-time inference on a single GPU card with decent quality results evaluated on challenging datasets like Cityscapes, CamVid and COCO-Stuff.
Tasks	Real-Time Semantic Segmentation, Semantic Segmentation
Published	2017-04-27
URL	http://arxiv.org/abs/1704.08545v2
PDF	http://arxiv.org/pdf/1704.08545v2.pdf
PWC	https://paperswithcode.com/paper/icnet-for-real-time-semantic-segmentation-on
Repo	https://github.com/osmr/imgclsmob
Framework	mxnet

Neural Motifs: Scene Graph Parsing with Global Context


Title	Neural Motifs: Scene Graph Parsing with Global Context
Authors	Rowan Zellers, Mark Yatskar, Sam Thomson, Yejin Choi
Abstract	We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new quantitative insights on such repeated structures in the Visual Genome dataset. Our analysis shows that object labels are highly predictive of relation labels but not vice-versa. We also find that there are recurring patterns even in larger subgraphs: more than 50% of graphs contain motifs involving at least two relations. Our analysis motivates a new baseline: given object detections, predict the most frequent relation between object pairs with the given labels, as seen in the training set. This baseline improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings. We then introduce Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graphs that further improves over our strong baseline by an average 7.1% relative gain. Our code is available at github.com/rowanz/neural-motifs.
Tasks
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06640v2
PDF	http://arxiv.org/pdf/1711.06640v2.pdf
PWC	https://paperswithcode.com/paper/neural-motifs-scene-graph-parsing-with-global
Repo	https://github.com/rowanz/neural-motifs
Framework	pytorch

Deep Learning for Target Classification from SAR Imagery: Data Augmentation and Translation Invariance


Title	Deep Learning for Target Classification from SAR Imagery: Data Augmentation and Translation Invariance
Authors	Hidetoshi Furukawa
Abstract	This report deals with translation invariance of convolutional neural networks (CNNs) for automatic target recognition (ATR) from synthetic aperture radar (SAR) imagery. In particular, the translation invariance of CNNs for SAR ATR represents the robustness against misalignment of target chips extracted from SAR images. To understand the translation invariance of the CNNs, we trained CNNs which classify the target chips from the MSTAR into the ten classes under the condition of with and without data augmentation, and then visualized the translation invariance of the CNNs. According to our results, even if we use a deep residual network, the translation invariance of the CNN without data augmentation using the aligned images such as the MSTAR target chips is not so large. A more important factor of translation invariance is the use of augmented training data. Furthermore, our CNN using augmented training data achieved a state-of-the-art classification accuracy of 99.6%. These results show an importance of domain-specific data augmentation.
Tasks	Data Augmentation
Published	2017-08-26
URL	http://arxiv.org/abs/1708.07920v1
PDF	http://arxiv.org/pdf/1708.07920v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-target-classification-from
Repo	https://github.com/singh-shakti94/Deep-Learning-Project
Framework	tf

Underwater Multi-Robot Convoying using Visual Tracking by Detection


Title	Underwater Multi-Robot Convoying using Visual Tracking by Detection
Authors	Florian Shkurti, Wei-Di Chang, Peter Henderson, Md Jahidul Islam, Juan Camilo Gamboa Higuera, Jimmy Li, Travis Manderson, Anqi Xu, Gregory Dudek, Junaed Sattar
Abstract	We present a robust multi-robot convoying approach that relies on visual detection of the leading agent, thus enabling target following in unstructured 3-D environments. Our method is based on the idea of tracking-by-detection, which interleaves efficient model-based object detection with temporal filtering of image-based bounding box estimation. This approach has the important advantage of mitigating tracking drift (i.e. drifting away from the target object), which is a common symptom of model-free trackers and is detrimental to sustained convoying in practice. To illustrate our solution, we collected extensive footage of an underwater robot in ocean settings, and hand-annotated its location in each frame. Based on this dataset, we present an empirical comparison of multiple tracker variants, including the use of several convolutional neural networks, both with and without recurrent connections, as well as frequency-based model-free trackers. We also demonstrate the practicality of this tracking-by-detection strategy in real-world scenarios by successfully controlling a legged underwater robot in five degrees of freedom to follow another robot’s independent motion.
Tasks	Object Detection, Visual Tracking
Published	2017-09-25
URL	http://arxiv.org/abs/1709.08292v1
PDF	http://arxiv.org/pdf/1709.08292v1.pdf
PWC	https://paperswithcode.com/paper/underwater-multi-robot-convoying-using-visual
Repo	https://github.com/Breakend/TemporalYolo
Framework	tf

Maximum Classifier Discrepancy for Unsupervised Domain Adaptation


Title	Maximum Classifier Discrepancy for Unsupervised Domain Adaptation
Authors	Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada
Abstract	In this work, we present a method for unsupervised domain adaptation. Many adversarial learning methods train domain classifier networks to distinguish the features as either a source or target and train a feature generator network to mimic the discriminator. Two problems exist with these methods. First, the domain classifier only tries to distinguish the features as a source or target and thus does not consider task-specific decision boundaries between classes. Therefore, a trained generator can generate ambiguous features near class boundaries. Second, these methods aim to completely match the feature distributions between different domains, which is difficult because of each domain’s characteristics. To solve these problems, we introduce a new approach that attempts to align distributions of source and target by utilizing the task-specific decision boundaries. We propose to maximize the discrepancy between two classifiers’ outputs to detect target samples that are far from the support of the source. A feature generator learns to generate target features near the support to minimize the discrepancy. Our method outperforms other methods on several datasets of image classification and semantic segmentation. The codes are available at \url{https://github.com/mil-tokyo/MCD_DA}
Tasks	Domain Adaptation, Image Classification, Semantic Segmentation, Unsupervised Domain Adaptation
Published	2017-12-07
URL	http://arxiv.org/abs/1712.02560v4
PDF	http://arxiv.org/pdf/1712.02560v4.pdf
PWC	https://paperswithcode.com/paper/maximum-classifier-discrepancy-for
Repo	https://github.com/YouYueHuang/CycleGAN_Unsupervised_Domain_Adaptation
Framework	pytorch

Is Second-order Information Helpful for Large-scale Visual Recognition?


Title	Is Second-order Information Helpful for Large-scale Visual Recognition?
Authors	Peihua Li, Jiangtao Xie, Qilong Wang, Wangmeng Zuo
Abstract	By stacking layers of convolution and nonlinearity, convolutional networks (ConvNets) effectively learn from low-level to high-level features and discriminative representations. Since the end goal of large-scale recognition is to delineate complex boundaries of thousands of classes, adequate exploration of feature distributions is important for realizing full potentials of ConvNets. However, state-of-the-art works concentrate only on deeper or wider architecture design, while rarely exploring feature statistics higher than first-order. We take a step towards addressing this problem. Our method consists in covariance pooling, instead of the most commonly used first-order pooling, of high-level convolutional features. The main challenges involved are robust covariance estimation given a small sample of large-dimensional features and usage of the manifold structure of covariance matrices. To address these challenges, we present a Matrix Power Normalized Covariance (MPN-COV) method. We develop forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end. In addition, we analyze both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric. On the ImageNet 2012 validation set, by combining MPN-COV we achieve over 4%, 3% and 2.5% gains for AlexNet, VGG-M and VGG-16, respectively; integration of MPN-COV into 50-layer ResNet outperforms ResNet-101 and is comparable to ResNet-152. The source code will be available on the project page: http://www.peihuali.org/MPN-COV
Tasks	Object Recognition
Published	2017-03-23
URL	http://arxiv.org/abs/1703.08050v3
PDF	http://arxiv.org/pdf/1703.08050v3.pdf
PWC	https://paperswithcode.com/paper/is-second-order-information-helpful-for-large
Repo	https://github.com/jiangtaoxie/MPN-COV
Framework	tf

Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues


Title	Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues
Authors	Lu Chi, Yadong Mu
Abstract	In recent years, autonomous driving algorithms using low-cost vehicle-mounted cameras have attracted increasing endeavors from both academia and industry. There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks. This represents a nascent research topic in computer vision. The technical contributions of this work are three-fold. First, the model is learned and evaluated on real human driving videos that are time-synchronized with other vehicle sensors. This differs from many prior models trained from synthetic data in racing games. Second, state-of-the-art models, such as PilotNet, mostly predict the wheel angles independently on each video frame, which contradicts common understanding of driving as a stateful process. Instead, our proposed model strikes a combination of spatial and temporal cues, jointly investigating instantaneous monocular camera observations and vehicle’s historical states. This is in practice accomplished by inserting carefully-designed recurrent units (e.g., LSTM and Conv-LSTM) at proper network layers. Third, to facilitate the interpretability of the learned model, we utilize a visual back-propagation scheme for discovering and visualizing image regions crucially influencing the final steering prediction. Our experimental study is based on about 6 hours of human driving data provided by Udacity. Comprehensive quantitative evaluations demonstrate the effectiveness and robustness of our model, even under scenarios like drastic lighting changes and abrupt turning. The comparison with other state-of-the-art models clearly reveals its superior performance in predicting the due wheel angle for a self-driving car.
Tasks	Autonomous Driving, Object Detection
Published	2017-08-12
URL	http://arxiv.org/abs/1708.03798v1
PDF	http://arxiv.org/pdf/1708.03798v1.pdf
PWC	https://paperswithcode.com/paper/deep-steering-learning-end-to-end-driving
Repo	https://github.com/abhileshborode/Behavorial-Clonng-Self-driving-cars
Framework	tf