October 20, 2019

3148 words 15 mins read

Paper Group AWR 315

Paper Group AWR 315

Point Convolutional Neural Networks by Extension Operators. FWLBP: A Scale Invariant Descriptor for Texture Classification. Masked Conditional Neural Networks for Automatic Sound Events Recognition. On Kernel Method-Based Connectionist Models and Supervised Deep Learning Without Backpropagation. ClariNet: Parallel Wave Generation in End-to-End Text …

Point Convolutional Neural Networks by Extension Operators

Title Point Convolutional Neural Networks by Extension Operators
Authors Matan Atzmon, Haggai Maron, Yaron Lipman
Abstract This paper presents Point Convolutional Neural Networks (PCNN): a novel framework for applying convolutional neural networks to point clouds. The framework consists of two operators: extension and restriction, mapping point cloud functions to volumetric functions and vise-versa. A point cloud convolution is defined by pull-back of the Euclidean volumetric convolution via an extension-restriction mechanism. The point cloud convolution is computationally efficient, invariant to the order of points in the point cloud, robust to different samplings and varying densities, and translation invariant, that is the same convolution kernel is used at all points. PCNN generalizes image CNNs and allows readily adapting their architectures to the point cloud setting. Evaluation of PCNN on three central point cloud learning benchmarks convincingly outperform competing point cloud learning methods, and the vast majority of methods working with more informative shape representations such as surfaces and/or normals.
Tasks Classify 3D Point Clouds
Published 2018-03-27
URL http://arxiv.org/abs/1803.10091v1
PDF http://arxiv.org/pdf/1803.10091v1.pdf
PWC https://paperswithcode.com/paper/point-convolutional-neural-networks-by
Repo https://github.com/matanatz/pcnn
Framework tf

FWLBP: A Scale Invariant Descriptor for Texture Classification

Title FWLBP: A Scale Invariant Descriptor for Texture Classification
Authors Swalpa Kumar Roy, Nilavra Bhattacharya, Bhabatosh Chanda, Bidyut B. Chaudhuri, Dipak Kumar Ghosh
Abstract In this paper we propose a novel texture descriptor called Fractal Weighted Local Binary Pattern (FWLBP). The fractal dimension (FD) measure is relatively invariant to scale-changes, and presents a good correlation with human viewpoint of surface roughness. We have utilized this property to construct a scale-invariant descriptor. Here, the input image is sampled using an augmented form of the local binary pattern (LBP) over three different radii, and then used an indexing operation to assign FD weights to the collected samples. The final histogram of the descriptor has its features calculated using LBP, and its weights computed from the FD image. The proposed descriptor is scale invariant, and is also robust in rotation or reflection, and partially tolerant to noise and illumination changes. In addition, the local fractal dimension is relatively insensitive to the bi-Lipschitz transformations, whereas its extension is adequate to precisely discriminate the fundamental of texture primitives. Experiment results carried out on standard texture databases show that the proposed descriptor achieved better classification rates compared to the state-of-the-art descriptors.
Tasks Texture Classification
Published 2018-01-10
URL http://arxiv.org/abs/1801.03228v2
PDF http://arxiv.org/pdf/1801.03228v2.pdf
PWC https://paperswithcode.com/paper/fwlbp-a-scale-invariant-descriptor-for
Repo https://github.com/swalpa/FWLBP
Framework none

Masked Conditional Neural Networks for Automatic Sound Events Recognition

Title Masked Conditional Neural Networks for Automatic Sound Events Recognition
Authors Fady Medhat, David Chesmore, John Robinson
Abstract Deep neural network architectures designed for application domains other than sound, especially image recognition, may not optimally harness the time-frequency representation when adapted to the sound recognition problem. In this work, we explore the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) for multi-dimensional temporal signal recognition. The CLNN considers the inter-frame relationship, and the MCLNN enforces a systematic sparseness over the network’s links to enable learning in frequency bands rather than bins allowing the network to be frequency shift invariant mimicking a filterbank. The mask also allows considering several combinations of features concurrently, which is usually handcrafted through exhaustive manual search. We applied the MCLNN to the environmental sound recognition problem using the ESC-10 and ESC-50 datasets. MCLNN achieved competitive performance, using 12% of the parameters and without augmentation, compared to state-of-the-art Convolutional Neural Networks.
Tasks
Published 2018-02-15
URL http://arxiv.org/abs/1802.05792v2
PDF http://arxiv.org/pdf/1802.05792v2.pdf
PWC https://paperswithcode.com/paper/masked-conditional-neural-networks-for
Repo https://github.com/fadymedhat/MCLNN
Framework tf

On Kernel Method-Based Connectionist Models and Supervised Deep Learning Without Backpropagation

Title On Kernel Method-Based Connectionist Models and Supervised Deep Learning Without Backpropagation
Authors Shiyu Duan, Shujian Yu, Yunmei Chen, Jose Principe
Abstract We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer-by-layer a compositional hypothesis class, i.e., a feedforward, multilayer architecture, in a supervised setting. In terms of the models, we present a principled method to “kernelize” (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since there are no explicit targets for the hidden layers (Rumelhart86). We consider without loss of generality the two-layer case and present a general framework that explicitly characterizes a target for the hidden layer that is optimal for minimizing the objective function of the network. This characterization then makes possible a purely greedy training scheme that learns one layer at a time, starting from the input layer. We provide realizations of the abstract framework under certain architectures and objective functions. Based on these realizations, we present a layer-wise training algorithm for an l-layer feedforward network for classification, where l>=2 can be arbitrary. This algorithm can be given an intuitive geometric interpretation that makes the learning dynamics transparent. Empirical results are provided to complement our theory. We show that the kernelized networks, trained layer-wise, compare favorably with classical kernel machines as well as other connectionist models trained by BP. We also visualize the inner workings of the greedy kernelized models to validate our claim on the transparency of the layer-wise algorithm.
Tasks
Published 2018-02-11
URL https://arxiv.org/abs/1802.03774v4
PDF https://arxiv.org/pdf/1802.03774v4.pdf
PWC https://paperswithcode.com/paper/learning-backpropagation-free-deep
Repo https://github.com/michaelshiyu/kerNET
Framework pytorch

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech

Title ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech
Authors Wei Ping, Kainan Peng, Jitong Chen
Abstract In this work, we propose a new solution for parallel wave generation by WaveNet. In contrast to parallel WaveNet (van den Oord et al., 2018), we distill a Gaussian inverse autoregressive flow from the autoregressive WaveNet by minimizing a regularized KL divergence between their highly-peaked output distributions. Our method computes the KL divergence in closed-form, which simplifies the training algorithm and provides very efficient distillation. In addition, we introduce the first text-to-wave neural architecture for speech synthesis, which is fully convolutional and enables fast end-to-end training from scratch. It significantly outperforms the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet (Ping et al., 2018). We also successfully distill a parallel waveform synthesizer conditioned on the hidden representation in this end-to-end model.
Tasks Speech Synthesis
Published 2018-07-19
URL http://arxiv.org/abs/1807.07281v3
PDF http://arxiv.org/pdf/1807.07281v3.pdf
PWC https://paperswithcode.com/paper/clarinet-parallel-wave-generation-in-end-to
Repo https://github.com/rickyHong/ClariNet-WaveNet-repl
Framework pytorch

Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions

Title Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions
Authors Ricson Cheng, Arpit Agarwal, Katerina Fragkiadaki
Abstract We consider artificial agents that learn to jointly control their gripperand camera in order to reinforcement learn manipulation policies in the presenceof occlusions from distractor objects. Distractors often occlude the object of in-terest and cause it to disappear from the field of view. We propose hand/eye con-trollers that learn to move the camera to keep the object within the field of viewand visible, in coordination to manipulating it to achieve the desired goal, e.g.,pushing it to a target location. We incorporate structural biases of object-centricattention within our actor-critic architectures, which our experiments suggest tobe a key for good performance. Our results further highlight the importance ofcurriculum with regards to environment difficulty. The resulting active vision /manipulation policies outperform static camera setups for a variety of clutteredenvironments.
Tasks
Published 2018-11-20
URL http://arxiv.org/abs/1811.08067v2
PDF http://arxiv.org/pdf/1811.08067v2.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-of-active-vision-for
Repo https://github.com/ricsonc/ActiveVisionManipulation
Framework none

Dynamic Runtime Feature Map Pruning

Title Dynamic Runtime Feature Map Pruning
Authors Tailin Liang, Lei Wang, Shaobo Shi, John Glossner
Abstract High bandwidth requirements are an obstacle for accelerating the training and inference of deep neural networks. Most previous research focuses on reducing the size of kernel maps for inference. We analyze parameter sparsity of six popular convolutional neural networks - AlexNet, MobileNet, ResNet-50, SqueezeNet, TinyNet, and VGG16. Of the networks considered, those using ReLU (AlexNet, SqueezeNet, VGG16) contain a high percentage of 0-valued parameters and can be statically pruned. Networks with Non-ReLU activation functions in some cases may not contain any 0-valued parameters (ResNet-50, TinyNet). We also investigate runtime feature map usage and find that input feature maps comprise the majority of bandwidth requirements when depth-wise convolution and point-wise convolutions used. We introduce dynamic runtime pruning of feature maps and show that 10% of dynamic feature map execution can be removed without loss of accuracy. We then extend dynamic pruning to allow for values within an epsilon of zero and show a further 5% reduction of feature map loading with a 1% loss of accuracy in top-1.
Tasks
Published 2018-12-24
URL http://arxiv.org/abs/1812.09922v2
PDF http://arxiv.org/pdf/1812.09922v2.pdf
PWC https://paperswithcode.com/paper/dynamic-runtime-feature-map-pruning
Repo https://github.com/liangtailin/darknet-modified
Framework none

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Title Scalable Coordinated Exploration in Concurrent Reinforcement Learning
Authors Maria Dimakopoulou, Ian Osband, Benjamin Van Roy
Abstract We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling (Dimakopoulou and Van Roy, 2018) and randomized value function learning (Osband et al., 2016). We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods (Dimakopoulou and Van Roy, 2018). With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.
Tasks
Published 2018-05-23
URL http://arxiv.org/abs/1805.08948v2
PDF http://arxiv.org/pdf/1805.08948v2.pdf
PWC https://paperswithcode.com/paper/scalable-coordinated-exploration-in
Repo https://github.com/efancher/cs234_work_final_project
Framework none

Deconvolution-Based Global Decoding for Neural Machine Translation

Title Deconvolution-Based Global Decoding for Neural Machine Translation
Authors Junyang Lin, Xu Sun, Xuancheng Ren, Shuming Ma, Jinsong Su, Qi Su
Abstract A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order. As the studies of linguistics have proved that language is not linear word sequence but sequence of complex structure, translation at each step should be conditioned on the whole target-side context. To tackle the problem, we propose a new NMT model that decodes the sequence with the guidance of its structural prediction of the context of the target sequence. Our model generates translation based on the structural prediction of the target-side context so that the translation can be freed from the bind of sequential order. Experimental results demonstrate that our model is more competitive compared with the state-of-the-art methods, and the analysis reflects that our model is also robust to translating sentences of different lengths and it also reduces repetition with the instruction from the target-side context for decoding.
Tasks Machine Translation
Published 2018-06-10
URL http://arxiv.org/abs/1806.03692v1
PDF http://arxiv.org/pdf/1806.03692v1.pdf
PWC https://paperswithcode.com/paper/deconvolution-based-global-decoding-for
Repo https://github.com/lancopku/DeconvDec
Framework pytorch

Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing

Title Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing
Authors Fan Yang, Ryota Hinami, Yusuke Matsui, Steven Ly, Shin’ichi Satoh
Abstract Diffusion is commonly used as a ranking or re-ranking method in retrieval tasks to achieve higher retrieval performance, and has attracted lots of attention in recent years. A downside to diffusion is that it performs slowly in comparison to the naive k-NN search, which causes a non-trivial online computational cost on large datasets. To overcome this weakness, we propose a novel diffusion technique in this paper. In our work, instead of applying diffusion to the query, we pre-compute the diffusion results of each element in the database, making the online search a simple linear combination on top of the k-NN search process. Our proposed method becomes 10~ times faster in terms of online search speed. Moreover, we propose to use late truncation instead of early truncation in previous works to achieve better retrieval performance.
Tasks Image Retrieval
Published 2018-11-27
URL http://arxiv.org/abs/1811.10907v2
PDF http://arxiv.org/pdf/1811.10907v2.pdf
PWC https://paperswithcode.com/paper/efficient-image-retrieval-via-decoupling
Repo https://github.com/fyang93/diffusion
Framework pytorch

Putting a bug in ML: The moth olfactory network learns to read MNIST

Title Putting a bug in ML: The moth olfactory network learns to read MNIST
Authors Charles B. Delahunt, J. Nathan Kutz
Abstract We seek to (i) characterize the learning architectures exploited in biological neural networks for training on very few samples, and (ii) port these algorithmic structures to a machine learning context. The Moth Olfactory Network is among the simplest biological neural systems that can learn, and its architecture includes key structural elements and mechanisms widespread in biological neural nets, such as cascaded networks, competitive inhibition, high intrinsic noise, sparsity, reward mechanisms, and Hebbian plasticity. These structural biological elements, in combination, enable rapid learning. MothNet is a computational model of the Moth Olfactory Network, closely aligned with the moth’s known biophysics and with in vivo electrode data collected from moths learning new odors. We assign this model the task of learning to read the MNIST digits. We show that MothNet successfully learns to read given very few training samples (1 to 10 samples per class). In this few-samples regime, it outperforms standard machine learning methods such as nearest-neighbors, support-vector machines, and neural networks (NNs), and matches specialized one-shot transfer-learning methods but without the need for pre-training. The MothNet architecture illustrates how algorithmic structures derived from biological brains can be used to build alternative NNs that may avoid some of the learning rate limitations of current engineered NNs.
Tasks Transfer Learning
Published 2018-02-15
URL http://arxiv.org/abs/1802.05405v3
PDF http://arxiv.org/pdf/1802.05405v3.pdf
PWC https://paperswithcode.com/paper/putting-a-bug-in-ml-the-moth-olfactory
Repo https://github.com/charlesDelahunt/PuttingABugInML
Framework none

DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes

Title DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes
Authors Berta Bescos, José M. Fácil, Javier Civera, José Neira
Abstract The assumption of scene rigidity is typical in SLAM algorithms. Such a strong assumption limits the use of most visual SLAM systems in populated real-world environments, which are the target of several relevant applications like service robotics or autonomous vehicles. In this paper we present DynaSLAM, a visual SLAM system that, building over ORB-SLAM2 [1], adds the capabilities of dynamic object detection and background inpainting. DynaSLAM is robust in dynamic scenarios for monocular, stereo and RGB-D configurations. We are capable of detecting the moving objects either by multi-view geometry, deep learning or both. Having a static map of the scene allows inpainting the frame background that has been occluded by such dynamic objects. We evaluate our system in public monocular, stereo and RGB-D datasets. We study the impact of several accuracy/speed trade-offs to assess the limits of the proposed methodology. DynaSLAM outperforms the accuracy of standard visual SLAM baselines in highly dynamic scenarios. And it also estimates a map of the static parts of the scene, which is a must for long-term applications in real-world environments.
Tasks Autonomous Vehicles, Object Detection
Published 2018-06-14
URL http://arxiv.org/abs/1806.05620v2
PDF http://arxiv.org/pdf/1806.05620v2.pdf
PWC https://paperswithcode.com/paper/dynaslam-tracking-mapping-and-inpainting-in
Repo https://github.com/BertaBescos/DynaSLAM
Framework tf

Deformable Generator Network: Unsupervised Disentanglement of Appearance and Geometry

Title Deformable Generator Network: Unsupervised Disentanglement of Appearance and Geometry
Authors Xianglei Xing, Ruiqi Gao, Tian Han, Song-Chun Zhu, Ying Nian Wu
Abstract We present a deformable generator model to disentangle the appearance and geometric information for both image and video data in a purely unsupervised manner. The appearance generator network models the information related to appearance, including color, illumination, identity or category, while the geometric generator performs geometric warping, such as rotation and stretching, through generating deformation field which is used to warp the generated appearance to obtain the final image or video sequences. Two generators take independent latent vectors as input to disentangle the appearance and geometric information from image or video sequences. For video data, a nonlinear transition model is introduced to both the appearance and geometric generators to capture the dynamics over time. The proposed scheme is general and can be easily integrated into different generative models. An extensive set of qualitative and quantitative experiments shows that the appearance and geometric information can be well disentangled, and the learned geometric generator can be conveniently transferred to other image datasets to facilitate knowledge transfer tasks.
Tasks Transfer Learning
Published 2018-06-16
URL https://arxiv.org/abs/1806.06298v3
PDF https://arxiv.org/pdf/1806.06298v3.pdf
PWC https://paperswithcode.com/paper/deformable-generator-network-unsupervised
Repo https://github.com/andyxingxl/Deformable-generator
Framework tf

The ApolloScape Open Dataset for Autonomous Driving and its Application

Title The ApolloScape Open Dataset for Autonomous Driving and its Application
Authors Xinyu Huang, Peng Wang, Xinjing Cheng, Dingfu Zhou, Qichuan Geng, Ruigang Yang
Abstract Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing the driving road and understanding objects, which enable vehicles to reason and act. However, large scale data set for training and system evaluation is still a bottleneck for developing robust perception models. In this paper, we present the ApolloScape dataset [1] and its applications for autonomous driving. Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labelling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labelling, lanemark labelling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities and daytimes. For each task, it contains at lease 15x larger amount of images than SOTA datasets. To label such a complete dataset, we develop various tools and algorithms specified for each task to accelerate the labelling process, such as 3D-2D segment labeling tools, active labelling in videos etc. Depend on ApolloScape, we are able to develop algorithms jointly consider the learning and inference of multiple tasks. In this paper, we provide a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving. We show that practically, sensor fusion and joint learning of multiple tasks are beneficial to achieve a more robust and accurate system. We expect our dataset and proposed relevant algorithms can support and motivate researchers for further development of multi-sensor fusion and multi-task learning in the field of computer vision.
Tasks Autonomous Driving, Instance Segmentation, Multi-Task Learning, Semantic Segmentation, Sensor Fusion
Published 2018-03-16
URL https://arxiv.org/abs/1803.06184v4
PDF https://arxiv.org/pdf/1803.06184v4.pdf
PWC https://paperswithcode.com/paper/the-apolloscape-open-dataset-for-autonomous
Repo https://github.com/pengwangucla/DeLS-3D
Framework tf

Sentence-State LSTM for Text Representation

Title Sentence-State LSTM for Text Representation
Authors Yue Zhang, Qi Liu, Linfeng Song
Abstract Bi-directional LSTMs are a powerful tool for text representation. On the other hand, they have been shown to suffer various limitations due to their sequential nature. We investigate an alternative LSTM structure for encoding text, which consists of a parallel state for each word. Recurrent steps are used to perform local and global information exchange between words simultaneously, rather than incremental reading of a sequence of words. Results on various classification and sequence labelling benchmarks show that the proposed model has strong representation power, giving highly competitive performances compared to stacked BiLSTM models with similar parameter numbers.
Tasks Named Entity Recognition, Part-Of-Speech Tagging, Sentiment Analysis, Text Classification
Published 2018-05-07
URL http://arxiv.org/abs/1805.02474v1
PDF http://arxiv.org/pdf/1805.02474v1.pdf
PWC https://paperswithcode.com/paper/sentence-state-lstm-for-text-representation
Repo https://github.com/leuchine/S-LSTM
Framework none
comments powered by Disqus