July 29, 2019

3329 words 16 mins read

Paper Group AWR 142

Paper Group AWR 142

A Framework for Visually Realistic Multi-robot Simulation in Natural Environment. Sushi Dish - Object detection and classification from real images. AxonDeepSeg: automatic axon and myelin segmentation from microscopy data using convolutional neural networks. Adaptive Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmi …

A Framework for Visually Realistic Multi-robot Simulation in Natural Environment

Title A Framework for Visually Realistic Multi-robot Simulation in Natural Environment
Authors Ori Ganoni, Ramakrishnan Mukundan
Abstract This paper presents a generalized framework for the simulation of multiple robots and drones in highly realistic models of natural environments. The proposed simulation architecture uses the Unreal Engine4 for generating both optical and depth sensor outputs from any position and orientation within the environment and provides several key domain specific simulation capabilities. Various components and functionalities of the system have been discussed in detail. The simulation engine also allows users to test and validate a wide range of computer vision algorithms involving different drone configurations under many types of environmental effects such as wind gusts. The paper demonstrates the effectiveness of the system by giving experimental results for a test scenario where one drone tracks the simulated motion of another in a complex natural environment.
Tasks
Published 2017-08-06
URL http://arxiv.org/abs/1708.01938v1
PDF http://arxiv.org/pdf/1708.01938v1.pdf
PWC https://paperswithcode.com/paper/a-framework-for-visually-realistic-multi
Repo https://github.com/orig74/DroneSimLab
Framework none

Sushi Dish - Object detection and classification from real images

Title Sushi Dish - Object detection and classification from real images
Authors Yeongjin Oh, Seunghyun Son, Gyumin Sim
Abstract In conveyor belt sushi restaurants, billing is a burdened job because one has to manually count the number of dishes and identify the color of them to calculate the price. In a busy situation, there can be a mistake that customers are overcharged or under-charged. To deal with this problem, we developed a method that automatically identifies the color of dishes and calculate the total price using real images. Our method consists of ellipse fitting and convol-utional neural network. It achieves ellipse detection precision 85% and recall 96% and classification accuracy 92%.
Tasks Object Detection
Published 2017-09-03
URL http://arxiv.org/abs/1709.00751v2
PDF http://arxiv.org/pdf/1709.00751v2.pdf
PWC https://paperswithcode.com/paper/sushi-dish-object-detection-and
Repo https://github.com/YeongjinOh/Sushi-Dish
Framework none

AxonDeepSeg: automatic axon and myelin segmentation from microscopy data using convolutional neural networks

Title AxonDeepSeg: automatic axon and myelin segmentation from microscopy data using convolutional neural networks
Authors Aldo Zaimi, Maxime Wabartha, Victor Herman, Pierre-Louis Antonsanti, Christian Samuel Perone, Julien Cohen-Adad
Abstract Segmentation of axon and myelin from microscopy images of the nervous system provides useful quantitative information about the tissue microstructure, such as axon density and myelin thickness. This could be used for instance to document cell morphometry across species, or to validate novel non-invasive quantitative magnetic resonance imaging techniques. Most currently-available segmentation algorithms are based on standard image processing and usually require multiple processing steps and/or parameter tuning by the user to adapt to different modalities. Moreover, only few methods are publicly available. We introduce AxonDeepSeg, an open-source software that performs axon and myelin segmentation of microscopic images using deep learning. AxonDeepSeg features: (i) a convolutional neural network architecture; (ii) an easy training procedure to generate new models based on manually-labelled data and (iii) two ready-to-use models trained from scanning electron microscopy (SEM) and transmission electron microscopy (TEM). Results show high pixel-wise accuracy across various species: 85% on rat SEM, 81% on human SEM, 95% on mice TEM and 84% on macaque TEM. Segmentation of a full rat spinal cord slice is computed and morphological metrics are extracted and compared against the literature. AxonDeepSeg is freely available at https://github.com/neuropoly/axondeepseg
Tasks
Published 2017-11-03
URL http://arxiv.org/abs/1711.01004v2
PDF http://arxiv.org/pdf/1711.01004v2.pdf
PWC https://paperswithcode.com/paper/axondeepseg-automatic-axon-and-myelin
Repo https://github.com/neuropoly/axondeepseg
Framework tf

Adaptive Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks

Title Adaptive Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks
Authors Hongyin Luo, Jie Fu, James Glass
Abstract The back-propagation (BP) algorithm has been considered the de-facto method for training deep neural networks. It back-propagates errors from the output layer to the hidden layers in an exact manner using the transpose of the feedforward weights. However, it has been argued that this is not biologically plausible because back-propagating error signals with the exact incoming weights are not considered possible in biological neural systems. In this work, we propose a biologically plausible paradigm of neural architecture based on related literature in neuroscience and asymmetric BP-like methods. Specifically, we propose two bidirectional learning algorithms with trainable feedforward and feedback weights. The feedforward weights are used to relay activations from the inputs to target outputs. The feedback weights pass the error signals from the output layer to the hidden layers. Different from other asymmetric BP-like methods, the feedback weights are also plastic in our framework and are trained to approximate the forward activations. Preliminary results show that our models outperform other asymmetric BP-like methods on the MNIST and the CIFAR-10 datasets.
Tasks
Published 2017-02-23
URL http://arxiv.org/abs/1702.07097v4
PDF http://arxiv.org/pdf/1702.07097v4.pdf
PWC https://paperswithcode.com/paper/adaptive-bidirectional-backpropagation
Repo https://github.com/SkTim/bdfa-torch
Framework torch

Fast Feature Fool: A data independent approach to universal adversarial perturbations

Title Fast Feature Fool: A data independent approach to universal adversarial perturbations
Authors Konda Reddy Mopuri, Utsav Garg, R. Venkatesh Babu
Abstract State-of-the-art object recognition Convolutional Neural Networks (CNNs) are shown to be fooled by image agnostic perturbations, called universal adversarial perturbations. It is also observed that these perturbations generalize across multiple networks trained on the same target data. However, these algorithms require training data on which the CNNs were trained and compute adversarial perturbations via complex optimization. The fooling performance of these approaches is directly proportional to the amount of available training data. This makes them unsuitable for practical attacks since its unreasonable for an attacker to have access to the training data. In this paper, for the first time, we propose a novel data independent approach to generate image agnostic perturbations for a range of CNNs trained for object recognition. We further show that these perturbations are transferable across multiple network architectures trained either on same or different data. In the absence of data, our method generates universal adversarial perturbations efficiently via fooling the features learned at multiple layers thereby causing CNNs to misclassify. Experiments demonstrate impressive fooling rates and surprising transferability for the proposed universal perturbations generated without any training data.
Tasks Object Recognition
Published 2017-07-18
URL http://arxiv.org/abs/1707.05572v1
PDF http://arxiv.org/pdf/1707.05572v1.pdf
PWC https://paperswithcode.com/paper/fast-feature-fool-a-data-independent-approach
Repo https://github.com/utsavgarg/fast-feature-fool
Framework tf

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

Title Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
Authors Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen
Abstract The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions. Our proposed learning approach addresses the difficulty of vanishing gradients during training by providing a learning objective function that enforces intermediate supervisions. Particularly, we optimize our model with a reinforcement learning approach which utilizes the output of each intermediate decoder’s test-time inference algorithm as well as the output of its preceding decoder to normalize the rewards, which simultaneously solves the well-known exposure bias problem and the loss-evaluation mismatch problem. We extensively evaluate the proposed approach on MSCOCO and show that our approach can achieve the state-of-the-art performance.
Tasks Image Captioning
Published 2017-09-11
URL http://arxiv.org/abs/1709.03376v3
PDF http://arxiv.org/pdf/1709.03376v3.pdf
PWC https://paperswithcode.com/paper/stack-captioning-coarse-to-fine-learning-for
Repo https://github.com/pdaicode/ImageCaptioning
Framework none

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Title TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Authors Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer
Abstract We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and (3) requires more cross sentence reasoning to find answers. We also present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40% vs. 80%), suggesting that TriviaQA is a challenging testbed that is worth significant future study. Data and code available at – http://nlp.cs.washington.edu/triviaqa/
Tasks Reading Comprehension
Published 2017-05-09
URL http://arxiv.org/abs/1705.03551v2
PDF http://arxiv.org/pdf/1705.03551v2.pdf
PWC https://paperswithcode.com/paper/triviaqa-a-large-scale-distantly-supervised
Repo https://github.com/mandarjoshi90/triviaqa
Framework tf

On Data-Driven Saak Transform

Title On Data-Driven Saak Transform
Authors C. -C. Jay Kuo, Yueru Chen
Abstract Being motivated by the multilayer RECOS (REctified-COrrelations on a Sphere) transform, we develop a data-driven Saak (Subspace approximation with augmented kernels) transform in this work. The Saak transform consists of three steps: 1) building the optimal linear subspace approximation with orthonormal bases using the second-order statistics of input vectors, 2) augmenting each transform kernel with its negative, 3) applying the rectified linear unit (ReLU) to the transform output. The Karhunen-Lo'eve transform (KLT) is used in the first step. The integration of Steps 2 and 3 is powerful since they resolve the sign confusion problem, remove the rectification loss and allow a straightforward implementation of the inverse Saak transform at the same time. Multiple Saak transforms are cascaded to transform images of a larger size. All Saak transform kernels are derived from the second-order statistics of input random vectors in a one-pass feedforward manner. Neither data labels nor backpropagation is used in kernel determination. Multi-stage Saak transforms offer a family of joint spatial-spectral representations between two extremes; namely, the full spatial-domain representation and the full spectral-domain representation. We select Saak coefficients of higher discriminant power to form a feature vector for pattern recognition, and use the MNIST dataset classification problem as an illustrative example.
Tasks
Published 2017-10-11
URL http://arxiv.org/abs/1710.04176v2
PDF http://arxiv.org/pdf/1710.04176v2.pdf
PWC https://paperswithcode.com/paper/on-data-driven-saak-transform
Repo https://github.com/rickerish-nah/Digital-Image-Processing-Cpp-
Framework none

SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud

Title SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud
Authors Bichen Wu, Alvin Wan, Xiangyu Yue, Kurt Keutzer
Abstract In this paper, we address semantic segmentation of road-objects from 3D LiDAR point clouds. In particular, we wish to detect and categorize instances of interest, such as cars, pedestrians and cyclists. We formulate this problem as a point- wise classification problem, and propose an end-to-end pipeline called SqueezeSeg based on convolutional neural networks (CNN): the CNN takes a transformed LiDAR point cloud as input and directly outputs a point-wise label map, which is then refined by a conditional random field (CRF) implemented as a recurrent layer. Instance-level labels are then obtained by conventional clustering algorithms. Our CNN model is trained on LiDAR point clouds from the KITTI dataset, and our point-wise segmentation labels are derived from 3D bounding boxes from KITTI. To obtain extra training data, we built a LiDAR simulator into Grand Theft Auto V (GTA-V), a popular video game, to synthesize large amounts of realistic training data. Our experiments show that SqueezeSeg achieves high accuracy with astonishingly fast and stable runtime (8.7 ms per frame), highly desirable for autonomous driving applications. Furthermore, additionally training on synthesized data boosts validation accuracy on real-world data. Our source code and synthesized data will be open-sourced.
Tasks Autonomous Driving, Semantic Segmentation
Published 2017-10-19
URL http://arxiv.org/abs/1710.07368v1
PDF http://arxiv.org/pdf/1710.07368v1.pdf
PWC https://paperswithcode.com/paper/squeezeseg-convolutional-neural-nets-with
Repo https://github.com/priyankanagaraj1494/Squeezseg
Framework tf

Transfer learning for music classification and regression tasks

Title Transfer learning for music classification and regression tasks
Authors Keunwoo Choi, György Fazekas, Mark Sandler, Kyunghyun Cho
Abstract In this paper, we present a transfer learning approach for music classification and regression tasks. We propose to use a pre-trained convnet feature, a concatenated feature vector using the activations of feature maps of multiple layers in a trained convolutional network. We show how this convnet feature can serve as general-purpose music representation. In the experiments, a convnet is trained for music tagging and then transferred to other music-related classification and regression tasks. The convnet feature outperforms the baseline MFCC feature in all the considered tasks and several previous approaches that are aggregating MFCCs as well as low- and high-level music features.
Tasks Music Classification, Transfer Learning
Published 2017-03-27
URL http://arxiv.org/abs/1703.09179v4
PDF http://arxiv.org/pdf/1703.09179v4.pdf
PWC https://paperswithcode.com/paper/transfer-learning-for-music-classification
Repo https://github.com/eatsleepraverepeat/emusic_net
Framework tf

Exploring Sparsity in Recurrent Neural Networks

Title Exploring Sparsity in Recurrent Neural Networks
Authors Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta
Abstract Recurrent Neural Networks (RNN) are widely used to solve a variety of problems and as the quantity of data and the amount of available compute have increased, so have model sizes. The number of parameters in recent state-of-the-art networks makes them hard to deploy, especially on mobile phones and embedded devices. The challenge is due to both the size of the model and the time it takes to evaluate it. In order to deploy these RNNs efficiently, we propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network. At the end of training, the parameters of the network are sparse while accuracy is still close to the original dense neural network. The network size is reduced by 8x and the time required to train the model remains constant. Additionally, we can prune a larger dense network to achieve better than baseline performance while still reducing the total number of parameters significantly. Pruning RNNs reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x.
Tasks
Published 2017-04-17
URL http://arxiv.org/abs/1704.05119v2
PDF http://arxiv.org/pdf/1704.05119v2.pdf
PWC https://paperswithcode.com/paper/exploring-sparsity-in-recurrent-neural
Repo https://github.com/puhsu/pruning
Framework pytorch

Quality and Diversity Optimization: A Unifying Modular Framework

Title Quality and Diversity Optimization: A Unifying Modular Framework
Authors Antoine Cully, Yiannis Demiris
Abstract The optimization of functions to find the best solution according to one or several objectives has a central role in many engineering and research fields. Recently, a new family of optimization algorithms, named Quality-Diversity optimization, has been introduced, and contrasts with classic algorithms. Instead of searching for a single solution, Quality-Diversity algorithms are searching for a large collection of both diverse and high-performing solutions. The role of this collection is to cover the range of possible solution types as much as possible, and to contain the best solution for each type. The contribution of this paper is threefold. Firstly, we present a unifying framework of Quality-Diversity optimization algorithms that covers the two main algorithms of this family (Multi-dimensional Archive of Phenotypic Elites and the Novelty Search with Local Competition), and that highlights the large variety of variants that can be investigated within this family. Secondly, we propose algorithms with a new selection mechanism for Quality-Diversity algorithms that outperforms all the algorithms tested in this paper. Lastly, we present a new collection management that overcomes the erosion issues observed when using unstructured collections. These three contributions are supported by extensive experimental comparisons of Quality-Diversity algorithms on three different experimental scenarios.
Tasks
Published 2017-05-12
URL http://arxiv.org/abs/1708.09251v1
PDF http://arxiv.org/pdf/1708.09251v1.pdf
PWC https://paperswithcode.com/paper/quality-and-diversity-optimization-a-unifying
Repo https://github.com/sferes2/modular_QD
Framework none

Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning

Title Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning
Authors Werner Zellinger, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, Susanne Saminger-Platz
Abstract The learning of domain-invariant representations in the context of domain adaptation with neural networks is considered. We propose a new regularization method that minimizes the discrepancy between domain-specific latent feature representations directly in the hidden activation space. Although some standard distribution matching approaches exist that can be interpreted as the matching of weighted sums of moments, e.g. Maximum Mean Discrepancy (MMD), an explicit order-wise matching of higher order moments has not been considered before. We propose to match the higher order central moments of probability distributions by means of order-wise moment differences. Our model does not require computationally expensive distance and kernel matrix computations. We utilize the equivalent representation of probability distributions by moment sequences to define a new distance function, called Central Moment Discrepancy (CMD). We prove that CMD is a metric on the set of probability distributions on a compact interval. We further prove that convergence of probability distributions on compact intervals w.r.t. the new metric implies convergence in distribution of the respective random variables. We test our approach on two different benchmark data sets for object recognition (Office) and sentiment analysis of product reviews (Amazon reviews). CMD achieves a new state-of-the-art performance on most domain adaptation tasks of Office and outperforms networks trained with MMD, Variational Fair Autoencoders and Domain Adversarial Neural Networks on Amazon reviews. In addition, a post-hoc parameter sensitivity analysis shows that the new approach is stable w.r.t. parameter changes in a certain interval. The source code of the experiments is publicly available.
Tasks Domain Adaptation, Object Recognition, Representation Learning, Sentiment Analysis
Published 2017-02-28
URL http://arxiv.org/abs/1702.08811v3
PDF http://arxiv.org/pdf/1702.08811v3.pdf
PWC https://paperswithcode.com/paper/central-moment-discrepancy-cmd-for-domain
Repo https://github.com/wzell/cmd
Framework none

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Title Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Authors Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He
Abstract Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers. Yet to make this scheme efficient, the per-worker workload must be large, which implies nontrivial growth in the SGD minibatch size. In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization. Specifically, we show no loss of accuracy when training with large minibatch sizes up to 8192 images. To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training. With these simple techniques, our Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs in one hour, while matching small minibatch accuracy. Using commodity hardware, our implementation achieves ~90% scaling efficiency when moving from 8 to 256 GPUs. Our findings enable training visual recognition models on internet-scale data with high efficiency.
Tasks Stochastic Optimization
Published 2017-06-08
URL http://arxiv.org/abs/1706.02677v2
PDF http://arxiv.org/pdf/1706.02677v2.pdf
PWC https://paperswithcode.com/paper/accurate-large-minibatch-sgd-training
Repo https://github.com/tensorpack/benchmarks
Framework tf

Learning Intrinsic Sparse Structures within Long Short-Term Memory

Title Learning Intrinsic Sparse Structures within Long Short-Term Memory
Authors Wei Wen, Yuxiong He, Samyam Rajbhandari, Minjia Zhang, Wenhan Wang, Fang Liu, Bin Hu, Yiran Chen, Hai Li
Abstract Model compression is significant for the wide adoption of Recurrent Neural Networks (RNNs) in both user devices possessing limited resources and business clusters requiring quick responses to large-scale service requests. This work aims to learn structurally-sparse Long Short-Term Memory (LSTM) by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs. Independently reducing the sizes of basic structures can result in inconsistent dimensions among them, and consequently, end up with invalid LSTM units. To overcome the problem, we propose Intrinsic Sparse Structures (ISS) in LSTMs. Removing a component of ISS will simultaneously decrease the sizes of all basic structures by one and thereby always maintain the dimension consistency. By learning ISS within LSTM units, the obtained LSTMs remain regular while having much smaller basic structures. Based on group Lasso regularization, our method achieves 10.59x speedup without losing any perplexity of a language modeling of Penn TreeBank dataset. It is also successfully evaluated through a compact model with only 2.69M weights for machine Question Answering of SQuAD dataset. Our approach is successfully extended to non- LSTM RNNs, like Recurrent Highway Networks (RHNs). Our source code is publicly available at https://github.com/wenwei202/iss-rnns
Tasks Language Modelling, Model Compression, Question Answering
Published 2017-09-15
URL http://arxiv.org/abs/1709.05027v7
PDF http://arxiv.org/pdf/1709.05027v7.pdf
PWC https://paperswithcode.com/paper/learning-intrinsic-sparse-structures-within
Repo https://github.com/wenwei202/iss-rnns
Framework tf
comments powered by Disqus