Paper Group AWR 315
Point Convolutional Neural Networks by Extension Operators. FWLBP: A Scale Invariant Descriptor for Texture Classification. Masked Conditional Neural Networks for Automatic Sound Events Recognition. On Kernel Method-Based Connectionist Models and Supervised Deep Learning Without Backpropagation. ClariNet: Parallel Wave Generation in End-to-End Text …
Point Convolutional Neural Networks by Extension Operators
Title | Point Convolutional Neural Networks by Extension Operators |
Authors | Matan Atzmon, Haggai Maron, Yaron Lipman |
Abstract | This paper presents Point Convolutional Neural Networks (PCNN): a novel framework for applying convolutional neural networks to point clouds. The framework consists of two operators: extension and restriction, mapping point cloud functions to volumetric functions and vise-versa. A point cloud convolution is defined by pull-back of the Euclidean volumetric convolution via an extension-restriction mechanism. The point cloud convolution is computationally efficient, invariant to the order of points in the point cloud, robust to different samplings and varying densities, and translation invariant, that is the same convolution kernel is used at all points. PCNN generalizes image CNNs and allows readily adapting their architectures to the point cloud setting. Evaluation of PCNN on three central point cloud learning benchmarks convincingly outperform competing point cloud learning methods, and the vast majority of methods working with more informative shape representations such as surfaces and/or normals. |
Tasks | Classify 3D Point Clouds |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.10091v1 |
http://arxiv.org/pdf/1803.10091v1.pdf | |
PWC | https://paperswithcode.com/paper/point-convolutional-neural-networks-by |
Repo | https://github.com/matanatz/pcnn |
Framework | tf |
FWLBP: A Scale Invariant Descriptor for Texture Classification
Title | FWLBP: A Scale Invariant Descriptor for Texture Classification |
Authors | Swalpa Kumar Roy, Nilavra Bhattacharya, Bhabatosh Chanda, Bidyut B. Chaudhuri, Dipak Kumar Ghosh |
Abstract | In this paper we propose a novel texture descriptor called Fractal Weighted Local Binary Pattern (FWLBP). The fractal dimension (FD) measure is relatively invariant to scale-changes, and presents a good correlation with human viewpoint of surface roughness. We have utilized this property to construct a scale-invariant descriptor. Here, the input image is sampled using an augmented form of the local binary pattern (LBP) over three different radii, and then used an indexing operation to assign FD weights to the collected samples. The final histogram of the descriptor has its features calculated using LBP, and its weights computed from the FD image. The proposed descriptor is scale invariant, and is also robust in rotation or reflection, and partially tolerant to noise and illumination changes. In addition, the local fractal dimension is relatively insensitive to the bi-Lipschitz transformations, whereas its extension is adequate to precisely discriminate the fundamental of texture primitives. Experiment results carried out on standard texture databases show that the proposed descriptor achieved better classification rates compared to the state-of-the-art descriptors. |
Tasks | Texture Classification |
Published | 2018-01-10 |
URL | http://arxiv.org/abs/1801.03228v2 |
http://arxiv.org/pdf/1801.03228v2.pdf | |
PWC | https://paperswithcode.com/paper/fwlbp-a-scale-invariant-descriptor-for |
Repo | https://github.com/swalpa/FWLBP |
Framework | none |
Masked Conditional Neural Networks for Automatic Sound Events Recognition
Title | Masked Conditional Neural Networks for Automatic Sound Events Recognition |
Authors | Fady Medhat, David Chesmore, John Robinson |
Abstract | Deep neural network architectures designed for application domains other than sound, especially image recognition, may not optimally harness the time-frequency representation when adapted to the sound recognition problem. In this work, we explore the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) for multi-dimensional temporal signal recognition. The CLNN considers the inter-frame relationship, and the MCLNN enforces a systematic sparseness over the network’s links to enable learning in frequency bands rather than bins allowing the network to be frequency shift invariant mimicking a filterbank. The mask also allows considering several combinations of features concurrently, which is usually handcrafted through exhaustive manual search. We applied the MCLNN to the environmental sound recognition problem using the ESC-10 and ESC-50 datasets. MCLNN achieved competitive performance, using 12% of the parameters and without augmentation, compared to state-of-the-art Convolutional Neural Networks. |
Tasks | |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05792v2 |
http://arxiv.org/pdf/1802.05792v2.pdf | |
PWC | https://paperswithcode.com/paper/masked-conditional-neural-networks-for |
Repo | https://github.com/fadymedhat/MCLNN |
Framework | tf |
On Kernel Method-Based Connectionist Models and Supervised Deep Learning Without Backpropagation
Title | On Kernel Method-Based Connectionist Models and Supervised Deep Learning Without Backpropagation |
Authors | Shiyu Duan, Shujian Yu, Yunmei Chen, Jose Principe |
Abstract | We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer-by-layer a compositional hypothesis class, i.e., a feedforward, multilayer architecture, in a supervised setting. In terms of the models, we present a principled method to “kernelize” (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since there are no explicit targets for the hidden layers (Rumelhart86). We consider without loss of generality the two-layer case and present a general framework that explicitly characterizes a target for the hidden layer that is optimal for minimizing the objective function of the network. This characterization then makes possible a purely greedy training scheme that learns one layer at a time, starting from the input layer. We provide realizations of the abstract framework under certain architectures and objective functions. Based on these realizations, we present a layer-wise training algorithm for an l-layer feedforward network for classification, where l>=2 can be arbitrary. This algorithm can be given an intuitive geometric interpretation that makes the learning dynamics transparent. Empirical results are provided to complement our theory. We show that the kernelized networks, trained layer-wise, compare favorably with classical kernel machines as well as other connectionist models trained by BP. We also visualize the inner workings of the greedy kernelized models to validate our claim on the transparency of the layer-wise algorithm. |
Tasks | |
Published | 2018-02-11 |
URL | https://arxiv.org/abs/1802.03774v4 |
https://arxiv.org/pdf/1802.03774v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-backpropagation-free-deep |
Repo | https://github.com/michaelshiyu/kerNET |
Framework | pytorch |
ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech
Title | ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech |
Authors | Wei Ping, Kainan Peng, Jitong Chen |
Abstract | In this work, we propose a new solution for parallel wave generation by WaveNet. In contrast to parallel WaveNet (van den Oord et al., 2018), we distill a Gaussian inverse autoregressive flow from the autoregressive WaveNet by minimizing a regularized KL divergence between their highly-peaked output distributions. Our method computes the KL divergence in closed-form, which simplifies the training algorithm and provides very efficient distillation. In addition, we introduce the first text-to-wave neural architecture for speech synthesis, which is fully convolutional and enables fast end-to-end training from scratch. It significantly outperforms the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet (Ping et al., 2018). We also successfully distill a parallel waveform synthesizer conditioned on the hidden representation in this end-to-end model. |
Tasks | Speech Synthesis |
Published | 2018-07-19 |
URL | http://arxiv.org/abs/1807.07281v3 |
http://arxiv.org/pdf/1807.07281v3.pdf | |
PWC | https://paperswithcode.com/paper/clarinet-parallel-wave-generation-in-end-to |
Repo | https://github.com/rickyHong/ClariNet-WaveNet-repl |
Framework | pytorch |
Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions
Title | Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions |
Authors | Ricson Cheng, Arpit Agarwal, Katerina Fragkiadaki |
Abstract | We consider artificial agents that learn to jointly control their gripperand camera in order to reinforcement learn manipulation policies in the presenceof occlusions from distractor objects. Distractors often occlude the object of in-terest and cause it to disappear from the field of view. We propose hand/eye con-trollers that learn to move the camera to keep the object within the field of viewand visible, in coordination to manipulating it to achieve the desired goal, e.g.,pushing it to a target location. We incorporate structural biases of object-centricattention within our actor-critic architectures, which our experiments suggest tobe a key for good performance. Our results further highlight the importance ofcurriculum with regards to environment difficulty. The resulting active vision /manipulation policies outperform static camera setups for a variety of clutteredenvironments. |
Tasks | |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08067v2 |
http://arxiv.org/pdf/1811.08067v2.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-of-active-vision-for |
Repo | https://github.com/ricsonc/ActiveVisionManipulation |
Framework | none |
Dynamic Runtime Feature Map Pruning
Title | Dynamic Runtime Feature Map Pruning |
Authors | Tailin Liang, Lei Wang, Shaobo Shi, John Glossner |
Abstract | High bandwidth requirements are an obstacle for accelerating the training and inference of deep neural networks. Most previous research focuses on reducing the size of kernel maps for inference. We analyze parameter sparsity of six popular convolutional neural networks - AlexNet, MobileNet, ResNet-50, SqueezeNet, TinyNet, and VGG16. Of the networks considered, those using ReLU (AlexNet, SqueezeNet, VGG16) contain a high percentage of 0-valued parameters and can be statically pruned. Networks with Non-ReLU activation functions in some cases may not contain any 0-valued parameters (ResNet-50, TinyNet). We also investigate runtime feature map usage and find that input feature maps comprise the majority of bandwidth requirements when depth-wise convolution and point-wise convolutions used. We introduce dynamic runtime pruning of feature maps and show that 10% of dynamic feature map execution can be removed without loss of accuracy. We then extend dynamic pruning to allow for values within an epsilon of zero and show a further 5% reduction of feature map loading with a 1% loss of accuracy in top-1. |
Tasks | |
Published | 2018-12-24 |
URL | http://arxiv.org/abs/1812.09922v2 |
http://arxiv.org/pdf/1812.09922v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-runtime-feature-map-pruning |
Repo | https://github.com/liangtailin/darknet-modified |
Framework | none |
Scalable Coordinated Exploration in Concurrent Reinforcement Learning
Title | Scalable Coordinated Exploration in Concurrent Reinforcement Learning |
Authors | Maria Dimakopoulou, Ian Osband, Benjamin Van Roy |
Abstract | We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling (Dimakopoulou and Van Roy, 2018) and randomized value function learning (Osband et al., 2016). We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods (Dimakopoulou and Van Roy, 2018). With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes. |
Tasks | |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.08948v2 |
http://arxiv.org/pdf/1805.08948v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-coordinated-exploration-in |
Repo | https://github.com/efancher/cs234_work_final_project |
Framework | none |
Deconvolution-Based Global Decoding for Neural Machine Translation
Title | Deconvolution-Based Global Decoding for Neural Machine Translation |
Authors | Junyang Lin, Xu Sun, Xuancheng Ren, Shuming Ma, Jinsong Su, Qi Su |
Abstract | A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order. As the studies of linguistics have proved that language is not linear word sequence but sequence of complex structure, translation at each step should be conditioned on the whole target-side context. To tackle the problem, we propose a new NMT model that decodes the sequence with the guidance of its structural prediction of the context of the target sequence. Our model generates translation based on the structural prediction of the target-side context so that the translation can be freed from the bind of sequential order. Experimental results demonstrate that our model is more competitive compared with the state-of-the-art methods, and the analysis reflects that our model is also robust to translating sentences of different lengths and it also reduces repetition with the instruction from the target-side context for decoding. |
Tasks | Machine Translation |
Published | 2018-06-10 |
URL | http://arxiv.org/abs/1806.03692v1 |
http://arxiv.org/pdf/1806.03692v1.pdf | |
PWC | https://paperswithcode.com/paper/deconvolution-based-global-decoding-for |
Repo | https://github.com/lancopku/DeconvDec |
Framework | pytorch |
Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing
Title | Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing |
Authors | Fan Yang, Ryota Hinami, Yusuke Matsui, Steven Ly, Shin’ichi Satoh |
Abstract | Diffusion is commonly used as a ranking or re-ranking method in retrieval tasks to achieve higher retrieval performance, and has attracted lots of attention in recent years. A downside to diffusion is that it performs slowly in comparison to the naive k-NN search, which causes a non-trivial online computational cost on large datasets. To overcome this weakness, we propose a novel diffusion technique in this paper. In our work, instead of applying diffusion to the query, we pre-compute the diffusion results of each element in the database, making the online search a simple linear combination on top of the k-NN search process. Our proposed method becomes 10~ times faster in terms of online search speed. Moreover, we propose to use late truncation instead of early truncation in previous works to achieve better retrieval performance. |
Tasks | Image Retrieval |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.10907v2 |
http://arxiv.org/pdf/1811.10907v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-image-retrieval-via-decoupling |
Repo | https://github.com/fyang93/diffusion |
Framework | pytorch |
Putting a bug in ML: The moth olfactory network learns to read MNIST
Title | Putting a bug in ML: The moth olfactory network learns to read MNIST |
Authors | Charles B. Delahunt, J. Nathan Kutz |
Abstract | We seek to (i) characterize the learning architectures exploited in biological neural networks for training on very few samples, and (ii) port these algorithmic structures to a machine learning context. The Moth Olfactory Network is among the simplest biological neural systems that can learn, and its architecture includes key structural elements and mechanisms widespread in biological neural nets, such as cascaded networks, competitive inhibition, high intrinsic noise, sparsity, reward mechanisms, and Hebbian plasticity. These structural biological elements, in combination, enable rapid learning. MothNet is a computational model of the Moth Olfactory Network, closely aligned with the moth’s known biophysics and with in vivo electrode data collected from moths learning new odors. We assign this model the task of learning to read the MNIST digits. We show that MothNet successfully learns to read given very few training samples (1 to 10 samples per class). In this few-samples regime, it outperforms standard machine learning methods such as nearest-neighbors, support-vector machines, and neural networks (NNs), and matches specialized one-shot transfer-learning methods but without the need for pre-training. The MothNet architecture illustrates how algorithmic structures derived from biological brains can be used to build alternative NNs that may avoid some of the learning rate limitations of current engineered NNs. |
Tasks | Transfer Learning |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05405v3 |
http://arxiv.org/pdf/1802.05405v3.pdf | |
PWC | https://paperswithcode.com/paper/putting-a-bug-in-ml-the-moth-olfactory |
Repo | https://github.com/charlesDelahunt/PuttingABugInML |
Framework | none |
DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes
Title | DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes |
Authors | Berta Bescos, José M. Fácil, Javier Civera, José Neira |
Abstract | The assumption of scene rigidity is typical in SLAM algorithms. Such a strong assumption limits the use of most visual SLAM systems in populated real-world environments, which are the target of several relevant applications like service robotics or autonomous vehicles. In this paper we present DynaSLAM, a visual SLAM system that, building over ORB-SLAM2 [1], adds the capabilities of dynamic object detection and background inpainting. DynaSLAM is robust in dynamic scenarios for monocular, stereo and RGB-D configurations. We are capable of detecting the moving objects either by multi-view geometry, deep learning or both. Having a static map of the scene allows inpainting the frame background that has been occluded by such dynamic objects. We evaluate our system in public monocular, stereo and RGB-D datasets. We study the impact of several accuracy/speed trade-offs to assess the limits of the proposed methodology. DynaSLAM outperforms the accuracy of standard visual SLAM baselines in highly dynamic scenarios. And it also estimates a map of the static parts of the scene, which is a must for long-term applications in real-world environments. |
Tasks | Autonomous Vehicles, Object Detection |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05620v2 |
http://arxiv.org/pdf/1806.05620v2.pdf | |
PWC | https://paperswithcode.com/paper/dynaslam-tracking-mapping-and-inpainting-in |
Repo | https://github.com/BertaBescos/DynaSLAM |
Framework | tf |
Deformable Generator Network: Unsupervised Disentanglement of Appearance and Geometry
Title | Deformable Generator Network: Unsupervised Disentanglement of Appearance and Geometry |
Authors | Xianglei Xing, Ruiqi Gao, Tian Han, Song-Chun Zhu, Ying Nian Wu |
Abstract | We present a deformable generator model to disentangle the appearance and geometric information for both image and video data in a purely unsupervised manner. The appearance generator network models the information related to appearance, including color, illumination, identity or category, while the geometric generator performs geometric warping, such as rotation and stretching, through generating deformation field which is used to warp the generated appearance to obtain the final image or video sequences. Two generators take independent latent vectors as input to disentangle the appearance and geometric information from image or video sequences. For video data, a nonlinear transition model is introduced to both the appearance and geometric generators to capture the dynamics over time. The proposed scheme is general and can be easily integrated into different generative models. An extensive set of qualitative and quantitative experiments shows that the appearance and geometric information can be well disentangled, and the learned geometric generator can be conveniently transferred to other image datasets to facilitate knowledge transfer tasks. |
Tasks | Transfer Learning |
Published | 2018-06-16 |
URL | https://arxiv.org/abs/1806.06298v3 |
https://arxiv.org/pdf/1806.06298v3.pdf | |
PWC | https://paperswithcode.com/paper/deformable-generator-network-unsupervised |
Repo | https://github.com/andyxingxl/Deformable-generator |
Framework | tf |
The ApolloScape Open Dataset for Autonomous Driving and its Application
Title | The ApolloScape Open Dataset for Autonomous Driving and its Application |
Authors | Xinyu Huang, Peng Wang, Xinjing Cheng, Dingfu Zhou, Qichuan Geng, Ruigang Yang |
Abstract | Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing the driving road and understanding objects, which enable vehicles to reason and act. However, large scale data set for training and system evaluation is still a bottleneck for developing robust perception models. In this paper, we present the ApolloScape dataset [1] and its applications for autonomous driving. Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labelling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labelling, lanemark labelling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities and daytimes. For each task, it contains at lease 15x larger amount of images than SOTA datasets. To label such a complete dataset, we develop various tools and algorithms specified for each task to accelerate the labelling process, such as 3D-2D segment labeling tools, active labelling in videos etc. Depend on ApolloScape, we are able to develop algorithms jointly consider the learning and inference of multiple tasks. In this paper, we provide a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving. We show that practically, sensor fusion and joint learning of multiple tasks are beneficial to achieve a more robust and accurate system. We expect our dataset and proposed relevant algorithms can support and motivate researchers for further development of multi-sensor fusion and multi-task learning in the field of computer vision. |
Tasks | Autonomous Driving, Instance Segmentation, Multi-Task Learning, Semantic Segmentation, Sensor Fusion |
Published | 2018-03-16 |
URL | https://arxiv.org/abs/1803.06184v4 |
https://arxiv.org/pdf/1803.06184v4.pdf | |
PWC | https://paperswithcode.com/paper/the-apolloscape-open-dataset-for-autonomous |
Repo | https://github.com/pengwangucla/DeLS-3D |
Framework | tf |
Sentence-State LSTM for Text Representation
Title | Sentence-State LSTM for Text Representation |
Authors | Yue Zhang, Qi Liu, Linfeng Song |
Abstract | Bi-directional LSTMs are a powerful tool for text representation. On the other hand, they have been shown to suffer various limitations due to their sequential nature. We investigate an alternative LSTM structure for encoding text, which consists of a parallel state for each word. Recurrent steps are used to perform local and global information exchange between words simultaneously, rather than incremental reading of a sequence of words. Results on various classification and sequence labelling benchmarks show that the proposed model has strong representation power, giving highly competitive performances compared to stacked BiLSTM models with similar parameter numbers. |
Tasks | Named Entity Recognition, Part-Of-Speech Tagging, Sentiment Analysis, Text Classification |
Published | 2018-05-07 |
URL | http://arxiv.org/abs/1805.02474v1 |
http://arxiv.org/pdf/1805.02474v1.pdf | |
PWC | https://paperswithcode.com/paper/sentence-state-lstm-for-text-representation |
Repo | https://github.com/leuchine/S-LSTM |
Framework | none |