Paper Group AWR 148
Neural Discrete Representation Learning. Geometric Insights into Support Vector Machine Behavior using the KKT Conditions. Robust Estimation of Similarity Transformation for Visual Object Tracking. Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras. Multi-view Low-rank Sparse Subspace Cluste …
Neural Discrete Representation Learning
Title | Neural Discrete Representation Learning |
Authors | Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu |
Abstract | Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of “posterior collapse” – where the latents are ignored when they are paired with a powerful autoregressive decoder – typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations. |
Tasks | Representation Learning |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00937v2 |
http://arxiv.org/pdf/1711.00937v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-discrete-representation-learning |
Repo | https://github.com/jellycsc/softmax-vqvae |
Framework | pytorch |
Geometric Insights into Support Vector Machine Behavior using the KKT Conditions
Title | Geometric Insights into Support Vector Machine Behavior using the KKT Conditions |
Authors | Iain Carmichael, J. S. Marron |
Abstract | The support vector machine (SVM) is a powerful and widely used classification algorithm. This paper uses the Karush-Kuhn-Tucker conditions to provide rigorous mathematical proof for new insights into the behavior of SVM. These insights provide perhaps unexpected relationships between SVM and two other linear classifiers: the mean difference and the maximal data piling direction. For example, we show that in many cases SVM can be viewed as a cropped version of these classifiers. By carefully exploring these connections we show how SVM tuning behavior is affected by characteristics including: balanced vs. unbalanced classes, low vs. high dimension, separable vs. non-separable data. These results provide further insights into tuning SVM via cross-validation by explaining observed pathological behavior and motivating improved cross-validation methodology. Finally, we also provide new results on the geometry of complete data piling directions in high dimensional space. |
Tasks | |
Published | 2017-04-03 |
URL | http://arxiv.org/abs/1704.00767v2 |
http://arxiv.org/pdf/1704.00767v2.pdf | |
PWC | https://paperswithcode.com/paper/geometric-insights-into-support-vector |
Repo | https://github.com/idc9/svm_geometry |
Framework | none |
Robust Estimation of Similarity Transformation for Visual Object Tracking
Title | Robust Estimation of Similarity Transformation for Visual Object Tracking |
Authors | Yang Li, Jianke Zhu, Steven C. H. Hoi, Wenjie Song, Zhefeng Wang, Hantang Liu |
Abstract | Most of existing correlation filter-based tracking approaches only estimate simple axis-aligned bounding boxes, and very few of them is capable of recovering the underlying similarity transformation. To tackle this challenging problem, in this paper, we propose a new correlation filter-based tracker with a novel robust estimation of similarity transformation on the large displacements. In order to efficiently search in such a large 4-DoF space in real-time, we formulate the problem into two 2-DoF sub-problems and apply an efficient Block Coordinates Descent solver to optimize the estimation result. Specifically, we employ an efficient phase correlation scheme to deal with both scale and rotation changes simultaneously in log-polar coordinates. Moreover, a variant of correlation filter is used to predict the translational motion individually. Our experimental results demonstrate that the proposed tracker achieves very promising prediction performance compared with the state-of-the-art visual object tracking methods while still retaining the advantages of high efficiency and simplicity in conventional correlation filter-based tracking methods. |
Tasks | Object Tracking, Visual Object Tracking |
Published | 2017-12-14 |
URL | http://arxiv.org/abs/1712.05231v2 |
http://arxiv.org/pdf/1712.05231v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-estimation-of-similarity |
Repo | https://github.com/ihpdep/LDES |
Framework | none |
Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras
Title | Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras |
Authors | Keunwoo Choi, Deokjin Joo, Juho Kim |
Abstract | We introduce Kapre, Keras layers for audio and music signal preprocessing. Music research using deep neural networks requires a heavy and tedious preprocessing stage, for which audio processing parameters are often ignored in parameter optimisation. To solve this problem, Kapre implements time-frequency conversions, normalisation, and data augmentation as Keras layers. We report simple benchmark results, showing real-time on-GPU preprocessing adds a reasonable amount of computation. |
Tasks | Data Augmentation |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.05781v1 |
http://arxiv.org/pdf/1706.05781v1.pdf | |
PWC | https://paperswithcode.com/paper/kapre-on-gpu-audio-preprocessing-layers-for-a |
Repo | https://github.com/keunwoochoi/kapre |
Framework | tf |
Multi-view Low-rank Sparse Subspace Clustering
Title | Multi-view Low-rank Sparse Subspace Clustering |
Authors | Maria Brbic, Ivica Kopriva |
Abstract | Most existing approaches address multi-view subspace clustering problem by constructing the affinity matrix on each view separately and afterwards propose how to extend spectral clustering algorithm to handle multi-view data. This paper presents an approach to multi-view subspace clustering that learns a joint subspace representation by constructing affinity matrix shared among all views. Relying on the importance of both low-rank and sparsity constraints in the construction of the affinity matrix, we introduce the objective that balances between the agreement across different views, while at the same time encourages sparsity and low-rankness of the solution. Related low-rank and sparsity constrained optimization problem is for each view solved using the alternating direction method of multipliers. Furthermore, we extend our approach to cluster data drawn from nonlinear subspaces by solving the corresponding problem in a reproducing kernel Hilbert space. The proposed algorithm outperforms state-of-the-art multi-view subspace clustering algorithms on one synthetic and four real-world datasets. |
Tasks | Multi-view Subspace Clustering |
Published | 2017-08-29 |
URL | http://arxiv.org/abs/1708.08732v1 |
http://arxiv.org/pdf/1708.08732v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-low-rank-sparse-subspace |
Repo | https://github.com/mbrbic/MultiViewLRSSC |
Framework | none |
HDLTex: Hierarchical Deep Learning for Text Classification
Title | HDLTex: Hierarchical Deep Learning for Text Classification |
Authors | Kamran Kowsari, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber, Laura E. Barnes |
Abstract | The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number of documents has increased. This is because along with this growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy. |
Tasks | Document Classification, Text Classification |
Published | 2017-09-24 |
URL | http://arxiv.org/abs/1709.08267v2 |
http://arxiv.org/pdf/1709.08267v2.pdf | |
PWC | https://paperswithcode.com/paper/hdltex-hierarchical-deep-learning-for-text |
Repo | https://github.com/kk7nc/HDLTex |
Framework | tf |
Wavelet Domain Residual Network (WavResNet) for Low-Dose X-ray CT Reconstruction
Title | Wavelet Domain Residual Network (WavResNet) for Low-Dose X-ray CT Reconstruction |
Authors | Eunhee Kang, junhong Min, Jong Chul Ye |
Abstract | Model based iterative reconstruction (MBIR) algorithms for low-dose X-ray CT are computationally complex because of the repeated use of the forward and backward projection. Inspired by this success of deep learning in computer vision applications, we recently proposed a deep convolutional neural network (CNN) for low-dose X-ray CT and won the second place in 2016 AAPM Low-Dose CT Grand Challenge. However, some of the texture are not fully recovered, which was unfamiliar to some radiologists. To cope with this problem, here we propose a direct residual learning approach on directional wavelet domain to solve this problem and to improve the performance against previous work. In particular, the new network estimates the noise of each input wavelet transform, and then the de-noised wavelet coefficients are obtained by subtracting the noise from the input wavelet transform bands. The experimental results confirm that the proposed network has significantly improved performance, preserving the detail texture of the original images. |
Tasks | Low-Dose X-Ray Ct Reconstruction |
Published | 2017-03-04 |
URL | http://arxiv.org/abs/1703.01383v1 |
http://arxiv.org/pdf/1703.01383v1.pdf | |
PWC | https://paperswithcode.com/paper/wavelet-domain-residual-network-wavresnet-for |
Repo | https://github.com/eunh/low_dose_CT |
Framework | none |
Metric Learning for Generalizing Spatial Relations to New Objects
Title | Metric Learning for Generalizing Spatial Relations to New Objects |
Authors | Oier Mees, Nichola Abdo, Mladen Mazuran, Wolfram Burgard |
Abstract | Human-centered environments are rich with a wide variety of spatial relations between everyday objects. For autonomous robots to operate effectively in such environments, they should be able to reason about these relations and generalize them to objects with different shapes and sizes. For example, having learned to place a toy inside a basket, a robot should be able to generalize this concept using a spoon and a cup. This requires a robot to have the flexibility to learn arbitrary relations in a lifelong manner, making it challenging for an expert to pre-program it with sufficient knowledge to do so beforehand. In this paper, we address the problem of learning spatial relations by introducing a novel method from the perspective of distance metric learning. Our approach enables a robot to reason about the similarity between pairwise spatial relations, thereby enabling it to use its previous knowledge when presented with a new relation to imitate. We show how this makes it possible to learn arbitrary spatial relations from non-expert users using a small number of examples and in an interactive manner. Our extensive evaluation with real-world data demonstrates the effectiveness of our method in reasoning about a continuous spectrum of spatial relations and generalizing them to new objects. |
Tasks | Metric Learning |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.01946v3 |
http://arxiv.org/pdf/1703.01946v3.pdf | |
PWC | https://paperswithcode.com/paper/metric-learning-for-generalizing-spatial |
Repo | https://github.com/mees/generalize_spatial_relations |
Framework | none |
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
Title | ICNet for Real-Time Semantic Segmentation on High-Resolution Images |
Authors | Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia |
Abstract | We focus on the challenging task of real-time semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. We provide in-depth analysis of our framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. Our system yields real-time inference on a single GPU card with decent quality results evaluated on challenging datasets like Cityscapes, CamVid and COCO-Stuff. |
Tasks | Real-Time Semantic Segmentation, Semantic Segmentation |
Published | 2017-04-27 |
URL | http://arxiv.org/abs/1704.08545v2 |
http://arxiv.org/pdf/1704.08545v2.pdf | |
PWC | https://paperswithcode.com/paper/icnet-for-real-time-semantic-segmentation-on |
Repo | https://github.com/osmr/imgclsmob |
Framework | mxnet |
Neural Motifs: Scene Graph Parsing with Global Context
Title | Neural Motifs: Scene Graph Parsing with Global Context |
Authors | Rowan Zellers, Mark Yatskar, Sam Thomson, Yejin Choi |
Abstract | We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new quantitative insights on such repeated structures in the Visual Genome dataset. Our analysis shows that object labels are highly predictive of relation labels but not vice-versa. We also find that there are recurring patterns even in larger subgraphs: more than 50% of graphs contain motifs involving at least two relations. Our analysis motivates a new baseline: given object detections, predict the most frequent relation between object pairs with the given labels, as seen in the training set. This baseline improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings. We then introduce Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graphs that further improves over our strong baseline by an average 7.1% relative gain. Our code is available at github.com/rowanz/neural-motifs. |
Tasks | |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06640v2 |
http://arxiv.org/pdf/1711.06640v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-motifs-scene-graph-parsing-with-global |
Repo | https://github.com/rowanz/neural-motifs |
Framework | pytorch |
Deep Learning for Target Classification from SAR Imagery: Data Augmentation and Translation Invariance
Title | Deep Learning for Target Classification from SAR Imagery: Data Augmentation and Translation Invariance |
Authors | Hidetoshi Furukawa |
Abstract | This report deals with translation invariance of convolutional neural networks (CNNs) for automatic target recognition (ATR) from synthetic aperture radar (SAR) imagery. In particular, the translation invariance of CNNs for SAR ATR represents the robustness against misalignment of target chips extracted from SAR images. To understand the translation invariance of the CNNs, we trained CNNs which classify the target chips from the MSTAR into the ten classes under the condition of with and without data augmentation, and then visualized the translation invariance of the CNNs. According to our results, even if we use a deep residual network, the translation invariance of the CNN without data augmentation using the aligned images such as the MSTAR target chips is not so large. A more important factor of translation invariance is the use of augmented training data. Furthermore, our CNN using augmented training data achieved a state-of-the-art classification accuracy of 99.6%. These results show an importance of domain-specific data augmentation. |
Tasks | Data Augmentation |
Published | 2017-08-26 |
URL | http://arxiv.org/abs/1708.07920v1 |
http://arxiv.org/pdf/1708.07920v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-target-classification-from |
Repo | https://github.com/singh-shakti94/Deep-Learning-Project |
Framework | tf |
Underwater Multi-Robot Convoying using Visual Tracking by Detection
Title | Underwater Multi-Robot Convoying using Visual Tracking by Detection |
Authors | Florian Shkurti, Wei-Di Chang, Peter Henderson, Md Jahidul Islam, Juan Camilo Gamboa Higuera, Jimmy Li, Travis Manderson, Anqi Xu, Gregory Dudek, Junaed Sattar |
Abstract | We present a robust multi-robot convoying approach that relies on visual detection of the leading agent, thus enabling target following in unstructured 3-D environments. Our method is based on the idea of tracking-by-detection, which interleaves efficient model-based object detection with temporal filtering of image-based bounding box estimation. This approach has the important advantage of mitigating tracking drift (i.e. drifting away from the target object), which is a common symptom of model-free trackers and is detrimental to sustained convoying in practice. To illustrate our solution, we collected extensive footage of an underwater robot in ocean settings, and hand-annotated its location in each frame. Based on this dataset, we present an empirical comparison of multiple tracker variants, including the use of several convolutional neural networks, both with and without recurrent connections, as well as frequency-based model-free trackers. We also demonstrate the practicality of this tracking-by-detection strategy in real-world scenarios by successfully controlling a legged underwater robot in five degrees of freedom to follow another robot’s independent motion. |
Tasks | Object Detection, Visual Tracking |
Published | 2017-09-25 |
URL | http://arxiv.org/abs/1709.08292v1 |
http://arxiv.org/pdf/1709.08292v1.pdf | |
PWC | https://paperswithcode.com/paper/underwater-multi-robot-convoying-using-visual |
Repo | https://github.com/Breakend/TemporalYolo |
Framework | tf |
Maximum Classifier Discrepancy for Unsupervised Domain Adaptation
Title | Maximum Classifier Discrepancy for Unsupervised Domain Adaptation |
Authors | Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada |
Abstract | In this work, we present a method for unsupervised domain adaptation. Many adversarial learning methods train domain classifier networks to distinguish the features as either a source or target and train a feature generator network to mimic the discriminator. Two problems exist with these methods. First, the domain classifier only tries to distinguish the features as a source or target and thus does not consider task-specific decision boundaries between classes. Therefore, a trained generator can generate ambiguous features near class boundaries. Second, these methods aim to completely match the feature distributions between different domains, which is difficult because of each domain’s characteristics. To solve these problems, we introduce a new approach that attempts to align distributions of source and target by utilizing the task-specific decision boundaries. We propose to maximize the discrepancy between two classifiers’ outputs to detect target samples that are far from the support of the source. A feature generator learns to generate target features near the support to minimize the discrepancy. Our method outperforms other methods on several datasets of image classification and semantic segmentation. The codes are available at \url{https://github.com/mil-tokyo/MCD_DA} |
Tasks | Domain Adaptation, Image Classification, Semantic Segmentation, Unsupervised Domain Adaptation |
Published | 2017-12-07 |
URL | http://arxiv.org/abs/1712.02560v4 |
http://arxiv.org/pdf/1712.02560v4.pdf | |
PWC | https://paperswithcode.com/paper/maximum-classifier-discrepancy-for |
Repo | https://github.com/YouYueHuang/CycleGAN_Unsupervised_Domain_Adaptation |
Framework | pytorch |
Is Second-order Information Helpful for Large-scale Visual Recognition?
Title | Is Second-order Information Helpful for Large-scale Visual Recognition? |
Authors | Peihua Li, Jiangtao Xie, Qilong Wang, Wangmeng Zuo |
Abstract | By stacking layers of convolution and nonlinearity, convolutional networks (ConvNets) effectively learn from low-level to high-level features and discriminative representations. Since the end goal of large-scale recognition is to delineate complex boundaries of thousands of classes, adequate exploration of feature distributions is important for realizing full potentials of ConvNets. However, state-of-the-art works concentrate only on deeper or wider architecture design, while rarely exploring feature statistics higher than first-order. We take a step towards addressing this problem. Our method consists in covariance pooling, instead of the most commonly used first-order pooling, of high-level convolutional features. The main challenges involved are robust covariance estimation given a small sample of large-dimensional features and usage of the manifold structure of covariance matrices. To address these challenges, we present a Matrix Power Normalized Covariance (MPN-COV) method. We develop forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end. In addition, we analyze both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric. On the ImageNet 2012 validation set, by combining MPN-COV we achieve over 4%, 3% and 2.5% gains for AlexNet, VGG-M and VGG-16, respectively; integration of MPN-COV into 50-layer ResNet outperforms ResNet-101 and is comparable to ResNet-152. The source code will be available on the project page: http://www.peihuali.org/MPN-COV |
Tasks | Object Recognition |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08050v3 |
http://arxiv.org/pdf/1703.08050v3.pdf | |
PWC | https://paperswithcode.com/paper/is-second-order-information-helpful-for-large |
Repo | https://github.com/jiangtaoxie/MPN-COV |
Framework | tf |
Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues
Title | Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues |
Authors | Lu Chi, Yadong Mu |
Abstract | In recent years, autonomous driving algorithms using low-cost vehicle-mounted cameras have attracted increasing endeavors from both academia and industry. There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks. This represents a nascent research topic in computer vision. The technical contributions of this work are three-fold. First, the model is learned and evaluated on real human driving videos that are time-synchronized with other vehicle sensors. This differs from many prior models trained from synthetic data in racing games. Second, state-of-the-art models, such as PilotNet, mostly predict the wheel angles independently on each video frame, which contradicts common understanding of driving as a stateful process. Instead, our proposed model strikes a combination of spatial and temporal cues, jointly investigating instantaneous monocular camera observations and vehicle’s historical states. This is in practice accomplished by inserting carefully-designed recurrent units (e.g., LSTM and Conv-LSTM) at proper network layers. Third, to facilitate the interpretability of the learned model, we utilize a visual back-propagation scheme for discovering and visualizing image regions crucially influencing the final steering prediction. Our experimental study is based on about 6 hours of human driving data provided by Udacity. Comprehensive quantitative evaluations demonstrate the effectiveness and robustness of our model, even under scenarios like drastic lighting changes and abrupt turning. The comparison with other state-of-the-art models clearly reveals its superior performance in predicting the due wheel angle for a self-driving car. |
Tasks | Autonomous Driving, Object Detection |
Published | 2017-08-12 |
URL | http://arxiv.org/abs/1708.03798v1 |
http://arxiv.org/pdf/1708.03798v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-steering-learning-end-to-end-driving |
Repo | https://github.com/abhileshborode/Behavorial-Clonng-Self-driving-cars |
Framework | tf |