October 20, 2019

3013 words 15 mins read

Paper Group ANR 4

Gaussian mixture models with Wasserstein distance. The Globally Optimal Reparameterization Algorithm: an Alternative to Fast Dynamic Time Warping for Action Recognition in Video Sequences. Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification. DxNAT - Deep Neural Networks for Explaining Non-Recurring Traffic Congest …

Gaussian mixture models with Wasserstein distance


Title	Gaussian mixture models with Wasserstein distance
Authors	Benoit Gaujac, Ilya Feige, David Barber
Abstract	Generative models with both discrete and continuous latent variables are highly motivated by the structure of many real-world data sets. They present, however, subtleties in training often manifesting in the discrete latent being under leveraged. In this paper, we show that such models are more amenable to training when using the Optimal Transport framework of Wasserstein Autoencoders. We find our discrete latent variable to be fully leveraged by the model when trained, without any modifications to the objective function or significant fine tuning. Our model generates comparable samples to other approaches while using relatively simple neural networks, since the discrete latent variable carries much of the descriptive burden. Furthermore, the discrete latent provides significant control over generation.
Tasks
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04465v1
PDF	http://arxiv.org/pdf/1806.04465v1.pdf
PWC	https://paperswithcode.com/paper/gaussian-mixture-models-with-wasserstein
Repo
Framework

The Globally Optimal Reparameterization Algorithm: an Alternative to Fast Dynamic Time Warping for Action Recognition in Video Sequences


Title	The Globally Optimal Reparameterization Algorithm: an Alternative to Fast Dynamic Time Warping for Action Recognition in Video Sequences
Authors	Thomas Mitchel, Sipu Ruan, Yixin Gao, Gregory S. Chirikjian
Abstract	Signal alignment has become a popular problem in robotics due in part to its fundamental role in action recognition. Currently, the most successful algorithms for signal alignment are Dynamic Time Warping (DTW) and its variant ‘Fast’ Dynamic Time Warping (FastDTW). Here we introduce a new framework for signal alignment, namely the Globally Optimal Reparameterization Algorithm (GORA). We review the algorithm’s mathematical foundation and provide a numerical verification of its theoretical basis. We compare the performance of GORA with that of the DTW and FastDTW algorithms, in terms of computational efficiency and accuracy in matching signals. Our results show a significant improvement in both speed and accuracy over the DTW and FastDTW algorithms and suggest that GORA has the potential to provide a highly effective framework for signal alignment and action recognition.
Tasks	Temporal Action Localization
Published	2018-07-15
URL	http://arxiv.org/abs/1807.05485v1
PDF	http://arxiv.org/pdf/1807.05485v1.pdf
PWC	https://paperswithcode.com/paper/the-globally-optimal-reparameterization
Repo
Framework

Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification


Title	Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification
Authors	Xiu-Shen Wei, Chen-Lin Zhang, Lingqiao Liu, Chunhua Shen, Jianxin Wu
Abstract	Vehicle re-identification is an important problem and becomes desirable with the rapid expansion of applications in video surveillance and intelligent transportation. By recalling the identification process of human vision, we are aware that there exists a native hierarchical dependency when humans identify different vehicles. Specifically, humans always firstly determine one vehicle’s coarse-grained category, i.e., the car model/type. Then, under the branch of the predicted car model/type, they are going to identify specific vehicles by relying on subtle visual cues, e.g., customized paintings and windshield stickers, at the fine-grained level. Inspired by the coarse-to-fine hierarchical process, we propose an end-to-end RNN-based Hierarchical Attention (RNN-HA) classification model for vehicle re-identification. RNN-HA consists of three mutually coupled modules: the first module generates image representations for vehicle images, the second hierarchical module models the aforementioned hierarchical dependent relationship, and the last attention module focuses on capturing the subtle visual information distinguishing specific vehicles from each other. By conducting comprehensive experiments on two vehicle re-identification benchmark datasets VeRi and VehicleID, we demonstrate that the proposed model achieves superior performance over state-of-the-art methods.
Tasks	Vehicle Re-Identification
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04239v1
PDF	http://arxiv.org/pdf/1812.04239v1.pdf
PWC	https://paperswithcode.com/paper/coarse-to-fine-a-rnn-based-hierarchical
Repo
Framework

DxNAT - Deep Neural Networks for Explaining Non-Recurring Traffic Congestion


Title	DxNAT - Deep Neural Networks for Explaining Non-Recurring Traffic Congestion
Authors	Fangzhou sun, Abhishek Dubey, Jules White
Abstract	Non-recurring traffic congestion is caused by temporary disruptions, such as accidents, sports games, adverse weather, etc. We use data related to real-time traffic speed, jam factors (a traffic congestion indicator), and events collected over a year from Nashville, TN to train a multi-layered deep neural network. The traffic dataset contains over 900 million data records. The network is thereafter used to classify the real-time data and identify anomalous operations. Compared with traditional approaches of using statistical or machine learning techniques, our model reaches an accuracy of 98.73 percent when identifying traffic congestion caused by football games. Our approach first encodes the traffic across a region as a scaled image. After that the image data from different timestamps is fused with event- and time-related data. Then a crossover operator is used as a data augmentation method to generate training datasets with more balanced classes. Finally, we use the receiver operating characteristic (ROC) analysis to tune the sensitivity of the classifier. We present the analysis of the training time and the inference time separately.
Tasks	Data Augmentation
Published	2018-01-30
URL	http://arxiv.org/abs/1802.00002v1
PDF	http://arxiv.org/pdf/1802.00002v1.pdf
PWC	https://paperswithcode.com/paper/dxnat-deep-neural-networks-for-explaining-non
Repo
Framework


Title	A multi-task deep learning model for the classification of Age-related Macular Degeneration
Authors	Qingyu Chen, Yifan Peng, Tiarnan Keenan, Shazia Dharssi, Elvira Agron, Wai T. Wong, Emily Y. Chew, Zhiyong Lu
Abstract	Age-related Macular Degeneration (AMD) is a leading cause of blindness. Although the Age-Related Eye Disease Study group previously developed a 9-step AMD severity scale for manual classification of AMD severity from color fundus images, manual grading of images is time-consuming and expensive. Built on our previous work DeepSeeNet, we developed a novel deep learning model for automated classification of images into the 9-step scale. Instead of predicting the 9-step score directly, our approach simulates the reading center grading process. It first detects four AMD characteristics (drusen area, geographic atrophy, increased pigment, and depigmentation), then combines these to derive the overall 9-step score. Importantly, we applied multi-task learning techniques, which allowed us to train classification of the four characteristics in parallel, share representation, and prevent overfitting. Evaluation on two image datasets showed that the accuracy of the model exceeded the current state-of-the-art model by > 10%.
Tasks	Age-Related Macular Degeneration Classification, Multi-Task Learning
Published	2018-12-02
URL	http://arxiv.org/abs/1812.00422v1
PDF	http://arxiv.org/pdf/1812.00422v1.pdf
PWC	https://paperswithcode.com/paper/a-multi-task-deep-learning-model-for-the
Repo
Framework

Unpaired High-Resolution and Scalable Style Transfer Using Generative Adversarial Networks


Title	Unpaired High-Resolution and Scalable Style Transfer Using Generative Adversarial Networks
Authors	Andrej Junginger, Markus Hanselmann, Thilo Strauss, Sebastian Boblest, Jens Buchner, Holger Ulmer
Abstract	Neural networks have proven their capabilities by outperforming many other approaches on regression or classification tasks on various kinds of data. Other astonishing results have been achieved using neural nets as data generators, especially in settings of generative adversarial networks (GANs). One special application is the field of image domain translations. Here, the goal is to take an image with a certain style (e.g. a photography) and transform it into another one (e.g. a painting). If such a task is performed for unpaired training examples, the corresponding GAN setting is complex, the neural networks are large, and this leads to a high peak memory consumption during, both, training and evaluation phase. This sets a limit to the highest processable image size. We address this issue by the idea of not processing the whole image at once, but to train and evaluate the domain translation on the level of overlapping image subsamples. This new approach not only enables us to translate high-resolution images that otherwise cannot be processed by the neural network at once, but also allows us to work with comparably small neural networks and with limited hardware resources. Additionally, the number of images required for the training process is significantly reduced. We present high-quality results on images with a total resolution of up to over 50 megapixels and emonstrate that our method helps to preserve local image details while it also keeps global consistency.
Tasks	Style Transfer
Published	2018-10-10
URL	http://arxiv.org/abs/1810.05724v1
PDF	http://arxiv.org/pdf/1810.05724v1.pdf
PWC	https://paperswithcode.com/paper/unpaired-high-resolution-and-scalable-style
Repo
Framework

DAQN: Deep Auto-encoder and Q-Network


Title	DAQN: Deep Auto-encoder and Q-Network
Authors	Daiki Kimura
Abstract	The deep reinforcement learning method usually requires a large number of training images and executing actions to obtain sufficient results. When it is extended a real-task in the real environment with an actual robot, the method will be required more training images due to complexities or noises of the input images, and executing a lot of actions on the real robot also becomes a serious problem. Therefore, we propose an extended deep reinforcement learning method that is applied a generative model to initialize the network for reducing the number of training trials. In this paper, we used a deep q-network method as the deep reinforcement learning method and a deep auto-encoder as the generative model. We conducted experiments on three different tasks: a cart-pole game, an atari game, and a real-game with an actual robot. The proposed method trained efficiently on all tasks than the previous method, especially 2.5 times faster on a task with real environment images.
Tasks
Published	2018-06-02
URL	http://arxiv.org/abs/1806.00630v1
PDF	http://arxiv.org/pdf/1806.00630v1.pdf
PWC	https://paperswithcode.com/paper/daqn-deep-auto-encoder-and-q-network
Repo
Framework

Theory of Deep Learning IIb: Optimization Properties of SGD


Title	Theory of Deep Learning IIb: Optimization Properties of SGD
Authors	Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio
Abstract	In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability – like the classical Langevin equation – on large volume, “flat” minima, selecting flat minimizers which are with very high probability also global minimizers
Tasks
Published	2018-01-07
URL	http://arxiv.org/abs/1801.02254v1
PDF	http://arxiv.org/pdf/1801.02254v1.pdf
PWC	https://paperswithcode.com/paper/theory-of-deep-learning-iib-optimization
Repo
Framework

Machine Learning for Yield Curve Feature Extraction: Application to Illiquid Corporate Bonds (Preliminary Draft)


Title	Machine Learning for Yield Curve Feature Extraction: Application to Illiquid Corporate Bonds (Preliminary Draft)
Authors	Greg Kirczenow, Ali Fathi, Matt Davison
Abstract	This paper studies the application of machine learning in extracting the market implied features from historical risk neutral corporate bond yields. We consider the example of a hypothetical illiquid fixed income market. After choosing a surrogate liquid market, we apply the Denoising Autoencoder algorithm from the field of computer vision and pattern recognition to learn the features of the missing yield parameters from the historically implied data of the instruments traded in the chosen liquid market. The results of the trained machine learning algorithm are compared with the outputs of a point in- time 2 dimensional interpolation algorithm known as the Thin Plate Spline. Finally, the performances of the two algorithms are compared.
Tasks	Denoising
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01731v1
PDF	http://arxiv.org/pdf/1806.01731v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-for-yield-curve-feature
Repo
Framework

Building Sparse Deep Feedforward Networks using Tree Receptive Fields


Title	Building Sparse Deep Feedforward Networks using Tree Receptive Fields
Authors	Xiaopeng Li, Zhourong Chen, Nevin L. Zhang
Abstract	Sparse connectivity is an important factor behind the success of convolutional neural networks and recurrent neural networks. In this paper, we consider the problem of learning sparse connectivity for feedforward neural networks (FNNs). The key idea is that a unit should be connected to a small number of units at the next level below that are strongly correlated. We use Chow-Liu’s algorithm to learn a tree-structured probabilistic model for the units at the current level, use the tree to identify subsets of units that are strongly correlated, and introduce a new unit with receptive field over the subsets. The procedure is repeated on the new units to build multiple layers of hidden units. The resulting model is called a TRF-net. Empirical results show that, when compared to dense FNNs, TRF-net achieves better or comparable classification performance with much fewer parameters and sparser structures. They are also more interpretable.
Tasks
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05209v2
PDF	http://arxiv.org/pdf/1803.05209v2.pdf
PWC	https://paperswithcode.com/paper/building-sparse-deep-feedforward-networks
Repo
Framework

Generative One-Shot Learning (GOL): A Semi-Parametric Approach to One-Shot Learning in Autonomous Vision


Title	Generative One-Shot Learning (GOL): A Semi-Parametric Approach to One-Shot Learning in Autonomous Vision
Authors	Sorin Grigorescu
Abstract	Highly Autonomous Driving (HAD) systems rely on deep neural networks for the visual perception of the driving environment. Such networks are trained on large manually annotated databases. In this work, a semi-parametric approach to one-shot learning is proposed, with the aim of bypassing the manual annotation step required for training perceptions systems used in autonomous driving. The proposed generative framework, coined Generative One-Shot Learning (GOL), takes as input single one-shot objects, or generic patterns, and a small set of so-called regularization samples used to drive the generative process. New synthetic data is generated as Pareto optimal solutions from one-shot objects using a set of generalization functions built into a generalization generator. GOL has been evaluated on environment perception challenges encountered in autonomous vision.
Tasks	Autonomous Driving, One-Shot Learning
Published	2018-12-19
URL	http://arxiv.org/abs/1812.07567v1
PDF	http://arxiv.org/pdf/1812.07567v1.pdf
PWC	https://paperswithcode.com/paper/generative-one-shot-learning-gol-a-semi
Repo
Framework

Going Deeper in Spiking Neural Networks: VGG and Residual Architectures


Title	Going Deeper in Spiking Neural Networks: VGG and Residual Architectures
Authors	Abhronil Sengupta, Yuting Ye, Robert Wang, Chiao Liu, Kaushik Roy
Abstract	Over the past few years, Spiking Neural Networks (SNNs) have become popular as a possible pathway to enable low-power event-driven neuromorphic hardware. However, their application in machine learning have largely been limited to very shallow neural network architectures for simple problems. In this paper, we propose a novel algorithmic technique for generating an SNN with a deep architecture, and demonstrate its effectiveness on complex visual recognition problems such as CIFAR-10 and ImageNet. Our technique applies to both VGG and Residual network architectures, with significantly better accuracy than the state-of-the-art. Finally, we present analysis of the sparse event-driven computations to demonstrate reduced hardware overhead when operating in the spiking domain.
Tasks
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02627v4
PDF	http://arxiv.org/pdf/1802.02627v4.pdf
PWC	https://paperswithcode.com/paper/going-deeper-in-spiking-neural-networks-vgg
Repo
Framework

TEA-DNN: the Quest for Time-Energy-Accuracy Co-optimized Deep Neural Networks


Title	TEA-DNN: the Quest for Time-Energy-Accuracy Co-optimized Deep Neural Networks
Authors	Lile Cai, Anne-Maelle Barneche, Arthur Herbout, Chuan Sheng Foo, Jie Lin, Vijay Ramaseshan Chandrasekhar, Mohamed M. Sabry
Abstract	Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through the use of automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there has been increasing interest in developing hardware accelerators for CNNs that provide improved inference performance and energy consumption compared to GPUs. Such embedded deep learning platforms differ in the amount of compute resources and memory-access bandwidth, which would affect performance and energy consumption of CNNs. It is therefore critical to consider the available hardware resources in the network architecture search. To this end, we introduce TEA-DNN, a NAS algorithm targeting multi-objective optimization of execution time, energy consumption, and classification accuracy of CNN workloads on embedded architectures. TEA-DNN leverages energy and execution time measurements on embedded hardware when exploring the Pareto-optimal curves across accuracy, execution time, and energy consumption and does not require additional effort to model the underlying hardware. We apply TEA-DNN for image classification on actual embedded platforms (NVIDIA Jetson TX2 and Intel Movidius Neural Compute Stick). We highlight the Pareto-optimal operating points that emphasize the necessity to explicitly consider hardware characteristics in the search process. To the best of our knowledge, this is the most comprehensive study of Pareto-optimal models across a range of hardware platforms using actual measurements on hardware to obtain objective values.
Tasks	Image Classification, Neural Architecture Search
Published	2018-11-29
URL	https://arxiv.org/abs/1811.12065v2
PDF	https://arxiv.org/pdf/1811.12065v2.pdf
PWC	https://paperswithcode.com/paper/tea-dnn-the-quest-for-time-energy-accuracy-co
Repo
Framework

Diversity in Machine Learning


Title	Diversity in Machine Learning
Authors	Zhiqiang Gong, Ping Zhong, Weidong Hu
Abstract	Machine learning methods have achieved good performance and been widely applied in various real-world applications. They can learn the model adaptively and be better fit for special requirements of different tasks. Generally, a good machine learning system is composed of plentiful training data, a good model training process, and an accurate inference. Many factors can affect the performance of the machine learning process, among which the diversity of the machine learning process is an important one. The diversity can help each procedure to guarantee a total good machine learning: diversity of the training data ensures that the training data can provide more discriminative information for the model, diversity of the learned model (diversity in parameters of each model or diversity among different base models) makes each parameter/model capture unique or complement information and the diversity in inference can provide multiple choices each of which corresponds to a specific plausible local optimal result. Even though the diversity plays an important role in machine learning process, there is no systematical analysis of the diversification in machine learning system. In this paper, we systematically summarize the methods to make data diversification, model diversification, and inference diversification in the machine learning process, respectively. In addition, the typical applications where the diversity technology improved the machine learning performance have been surveyed, including the remote sensing imaging tasks, machine translation, camera relocalization, image segmentation, object detection, topic modeling, and others. Finally, we discuss some challenges of the diversity technology in machine learning and point out some directions in future work.
Tasks	Camera Relocalization, Machine Translation, Object Detection, Semantic Segmentation
Published	2018-07-04
URL	https://arxiv.org/abs/1807.01477v2
PDF	https://arxiv.org/pdf/1807.01477v2.pdf
PWC	https://paperswithcode.com/paper/diversity-in-machine-learning
Repo
Framework

Semantically Selective Augmentation for Deep Compact Person Re-Identification


Title	Semantically Selective Augmentation for Deep Compact Person Re-Identification
Authors	Víctor Ponce-López, Tilo Burghardt, Sion Hannunna, Dima Damen, Alessandro Masullo, Majid Mirmehdi
Abstract	We present a deep person re-identification approach that combines semantically selective, deep data augmentation with clustering-based network compression to generate high performance, light and fast inference networks. In particular, we propose to augment limited training data via sampling from a deep convolutional generative adversarial network (DCGAN), whose discriminator is constrained by a semantic classifier to explicitly control the domain specificity of the generation process. Thereby, we encode information in the classifier network which can be utilized to steer adversarial synthesis, and which fuels our CondenseNet ID-network training. We provide a quantitative and qualitative analysis of the approach and its variants on a number of datasets, obtaining results that outperform the state-of-the-art on the LIMA dataset for long-term monitoring in indoor living spaces.
Tasks	Data Augmentation, Person Re-Identification
Published	2018-06-11
URL	http://arxiv.org/abs/1806.04074v3
PDF	http://arxiv.org/pdf/1806.04074v3.pdf
PWC	https://paperswithcode.com/paper/semantically-selective-augmentation-for-deep
Repo
Framework