October 20, 2019

3225 words 16 mins read

Paper Group AWR 331

Paper Group AWR 331

PointCNN: Convolution On $\mathcal{X}$-Transformed Points. Learning Neural Templates for Text Generation. Path-Level Network Transformation for Efficient Architecture Search. Wide Activation for Efficient and Accurate Image Super-Resolution. Diverse Few-Shot Text Classification with Multiple Metrics. PointConv: Deep Convolutional Networks on 3D Poi …

PointCNN: Convolution On $\mathcal{X}$-Transformed Points

Title PointCNN: Convolution On $\mathcal{X}$-Transformed Points
Authors Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, Baoquan Chen
Abstract We present a simple and general framework for feature learning from point clouds. The key to the success of CNNs is the convolution operator that is capable of leveraging spatially-local correlation in data represented densely in grids (e.g. images). However, point clouds are irregular and unordered, thus directly convolving kernels against features associated with the points, will result in desertion of shape information and variance to point ordering. To address these problems, we propose to learn an $\mathcal{X}$-transformation from the input points, to simultaneously promote two causes. The first is the weighting of the input features associated with the points, and the second is the permutation of the points into a latent and potentially canonical order. Element-wise product and sum operations of the typical convolution operator are subsequently applied on the $\mathcal{X}$-transformed features. The proposed method is a generalization of typical CNNs to feature learning from point clouds, thus we call it PointCNN. Experiments show that PointCNN achieves on par or better performance than state-of-the-art methods on multiple challenging benchmark datasets and tasks.
Tasks 3D Instance Segmentation, 3D Part Segmentation
Published 2018-01-23
URL http://arxiv.org/abs/1801.07791v5
PDF http://arxiv.org/pdf/1801.07791v5.pdf
PWC https://paperswithcode.com/paper/pointcnn-convolution-on-mathcalx-transformed
Repo https://github.com/hbb1/reading-list
Framework none

Learning Neural Templates for Text Generation

Title Learning Neural Templates for Text Generation
Authors Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
Abstract While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation. Encoder-decoder models are largely (a) uninterpretable, and (b) difficult to control in terms of their phrasing or content. This work proposes a neural generation system using a hidden semi-markov model (HSMM) decoder, which learns latent, discrete templates jointly with learning to generate. We show that this model learns useful templates, and that these templates make generation both more interpretable and controllable. Furthermore, we show that this approach scales to real data sets and achieves strong performance nearing that of encoder-decoder text generation models.
Tasks Text Generation
Published 2018-08-30
URL https://arxiv.org/abs/1808.10122v3
PDF https://arxiv.org/pdf/1808.10122v3.pdf
PWC https://paperswithcode.com/paper/learning-neural-templates-for-text-generation
Repo https://github.com/harvardnlp/neural-template-gen
Framework pytorch
Title Path-Level Network Transformation for Efficient Architecture Search
Authors Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, Yong Yu
Abstract We introduce a new function-preserving transformation for efficient neural architecture search. This network transformation allows reusing previously trained networks and existing successful architectures that improves sample efficiency. We aim to address the limitation of current network transformation operations that can only perform layer-level architecture modifications, such as adding (pruning) filters or inserting (removing) a layer, which fails to change the topology of connection paths. Our proposed path-level transformation operations enable the meta-controller to modify the path topology of the given network while keeping the merits of reusing weights, and thus allow efficiently designing effective structures with complex path topologies like Inception models. We further propose a bidirectional tree-structured reinforcement learning meta-controller to explore a simple yet highly expressive tree-structured architecture space that can be viewed as a generalization of multi-branch architectures. We experimented on the image classification datasets with limited computational resources (about 200 GPU-hours), where we observed improved parameter efficiency and better test results (97.70% test accuracy on CIFAR-10 with 14.3M parameters and 74.6% top-1 accuracy on ImageNet in the mobile setting), demonstrating the effectiveness and transferability of our designed architectures.
Tasks Image Classification, Neural Architecture Search
Published 2018-06-07
URL http://arxiv.org/abs/1806.02639v1
PDF http://arxiv.org/pdf/1806.02639v1.pdf
PWC https://paperswithcode.com/paper/path-level-network-transformation-for
Repo https://github.com/han-cai/PathLevel-EAS
Framework pytorch

Wide Activation for Efficient and Accurate Image Super-Resolution

Title Wide Activation for Efficient and Accurate Image Super-Resolution
Authors Jiahui Yu, Yuchen Fan, Jianchao Yang, Ning Xu, Zhaowen Wang, Xinchao Wang, Thomas Huang
Abstract Keras-based implementation of WDSR, EDSR and SRGAN for single image super-resolution
Tasks Image Super-Resolution, Multi-Frame Super-Resolution, Super-Resolution
Published 2018-08-27
URL http://arxiv.org/abs/1808.08718v2
PDF http://arxiv.org/pdf/1808.08718v2.pdf
PWC https://paperswithcode.com/paper/wide-activation-for-efficient-and-accurate
Repo https://github.com/yjn870/WDSR-pytorch
Framework pytorch

Diverse Few-Shot Text Classification with Multiple Metrics

Title Diverse Few-Shot Text Classification with Multiple Metrics
Authors Mo Yu, Xiaoxiao Guo, Jinfeng Yi, Shiyu Chang, Saloni Potdar, Yu Cheng, Gerald Tesauro, Haoyu Wang, Bowen Zhou
Abstract We study few-shot learning in natural language domains. Compared to many existing works that apply either metric-based or optimization-based meta-learning to image domain with low inter-task variance, we consider a more realistic setting, where tasks are diverse. However, it imposes tremendous difficulties to existing state-of-the-art metric-based algorithms since a single metric is insufficient to capture complex task variations in natural language domain. To alleviate the problem, we propose an adaptive metric learning approach that automatically determines the best weighted combination from a set of metrics obtained from meta-training tasks for a newly seen few-shot task. Extensive quantitative evaluations on real-world sentiment analysis and dialog intent classification datasets demonstrate that the proposed method performs favorably against state-of-the-art few shot learning algorithms in terms of predictive accuracy. We make our code and data available for further study.
Tasks Few-Shot Learning, Intent Classification, Meta-Learning, Metric Learning, Sentiment Analysis, Text Classification
Published 2018-05-19
URL http://arxiv.org/abs/1805.07513v1
PDF http://arxiv.org/pdf/1805.07513v1.pdf
PWC https://paperswithcode.com/paper/diverse-few-shot-text-classification-with
Repo https://github.com/Gorov/DiverseFewShot_Amazon
Framework pytorch

PointConv: Deep Convolutional Networks on 3D Point Clouds

Title PointConv: Deep Convolutional Networks on 3D Point Clouds
Authors Wenxuan Wu, Zhongang Qi, Li Fuxin
Abstract Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and density functions through kernel density estimation. The most important contribution of this work is a novel reformulation proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.
Tasks 3D Part Segmentation, Density Estimation, Semantic Segmentation
Published 2018-11-17
URL http://arxiv.org/abs/1811.07246v2
PDF http://arxiv.org/pdf/1811.07246v2.pdf
PWC https://paperswithcode.com/paper/pointconv-deep-convolutional-networks-on-3d
Repo https://github.com/DylanWusee/pointconv_pytorch
Framework pytorch

Visual Representations for Semantic Target Driven Navigation

Title Visual Representations for Semantic Target Driven Navigation
Authors Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, Ayzaan Wahid, James Davidson
Abstract What is a good visual representation for autonomous agents? We address this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a complex environment to a target object, e.g. go to the refrigerator. Instead of acquiring a metric semantic map of an environment and using planning for navigation, our approach learns navigation policies on top of representations that capture spatial layout and semantic contextual cues. We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy. This choice allows using additional data, from orthogonal sources, to better train different parts of the model the representation extraction is trained on large standard vision datasets while the navigation component leverages large synthetic environments for training. This combination of real and synthetic is possible because equitable feature representations are available in both (e.g., segmentation and detection masks), which alleviates the need for domain adaptation. Both the representation and the navigation policy can be readily applied to real non-synthetic environments as demonstrated on the Active Vision Dataset [1]. Our approach gets successfully to the target in 54% of the cases in unexplored environments, compared to 46% for non-learning based approach, and 28% for the learning-based baseline.
Tasks Domain Adaptation, Visual Navigation
Published 2018-05-15
URL https://arxiv.org/abs/1805.06066v3
PDF https://arxiv.org/pdf/1805.06066v3.pdf
PWC https://paperswithcode.com/paper/visual-representations-for-semantic-target
Repo https://github.com/arsalan-mousavian/Navigation
Framework tf

LEAF: A Benchmark for Federated Settings

Title LEAF: A Benchmark for Federated Settings
Authors Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith, Ameet Talwalkar
Abstract Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day. This wealth of data can help to learn models that can improve the user experience on each device. However, the scale and heterogeneity of federated data presents new challenges in research areas such as federated learning, meta-learning, and multi-task learning. As the machine learning community begins to tackle these challenges, we are at a critical time to ensure that developments made in these areas are grounded with realistic benchmarks. To this end, we propose LEAF, a modular benchmarking framework for learning in federated settings. LEAF includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.
Tasks Autonomous Vehicles, Meta-Learning, Multi-Task Learning
Published 2018-12-03
URL https://arxiv.org/abs/1812.01097v3
PDF https://arxiv.org/pdf/1812.01097v3.pdf
PWC https://paperswithcode.com/paper/leaf-a-benchmark-for-federated-settings
Repo https://github.com/PaddlePaddle/PaddleFL
Framework none

Slum Segmentation and Change Detection : A Deep Learning Approach

Title Slum Segmentation and Change Detection : A Deep Learning Approach
Authors Shishira R Maiya, Sudharshan Chandra Babu
Abstract More than one billion people live in slums around the world. In some developing countries, slum residents make up for more than half of the population and lack reliable sanitation services, clean water, electricity, other basic services. Thus, slum rehabilitation and improvement is an important global challenge, and a significant amount of effort and resources have been put into this endeavor. These initiatives rely heavily on slum mapping and monitoring, and it is essential to have robust and efficient methods for mapping and monitoring existing slum settlements. In this work, we introduce an approach to segment and map individual slums from satellite imagery, leveraging regional convolutional neural networks for instance segmentation using transfer learning. In addition, we also introduce a method to perform change detection and monitor slum change over time. We show that our approach effectively learns slum shape and appearance, and demonstrates strong quantitative results, resulting in a maximum AP of 80.0.
Tasks Instance Segmentation, Semantic Segmentation, Transfer Learning
Published 2018-11-19
URL http://arxiv.org/abs/1811.07896v1
PDF http://arxiv.org/pdf/1811.07896v1.pdf
PWC https://paperswithcode.com/paper/slum-segmentation-and-change-detection-a-deep
Repo https://github.com/cbsudux/Mumbai-slum-segmentation
Framework tf

Actor and Observer: Joint Modeling of First and Third-Person Videos

Title Actor and Observer: Joint Modeling of First and Third-Person Videos
Authors Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari
Abstract Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such models for human action recognition has not been achievable due to the lack of data. This paper takes a step in this direction, with the introduction of Charades-Ego, a large-scale dataset of paired first-person and third-person videos, involving 112 people, with 4000 paired videos. This enables learning the link between the two, actor and observer perspectives. Thereby, we address one of the biggest bottlenecks facing egocentric vision research, providing a link from first-person to the abundant third-person data on the web. We use this data to learn a joint representation of first and third-person videos, with only weak supervision, and show its effectiveness for transferring knowledge from the third-person to the first-person domain.
Tasks Temporal Action Localization
Published 2018-04-25
URL http://arxiv.org/abs/1804.09627v1
PDF http://arxiv.org/pdf/1804.09627v1.pdf
PWC https://paperswithcode.com/paper/actor-and-observer-joint-modeling-of-first
Repo https://github.com/gsig/actor-observer
Framework pytorch

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Title IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Authors Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu
Abstract In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.
Tasks Atari Games
Published 2018-02-05
URL http://arxiv.org/abs/1802.01561v3
PDF http://arxiv.org/pdf/1802.01561v3.pdf
PWC https://paperswithcode.com/paper/impala-scalable-distributed-deep-rl-with
Repo https://github.com/haje01/impala
Framework pytorch

M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network

Title M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network
Authors Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai, Haibin Ling
Abstract Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask R-CNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multi-scale, pyramidal architecture of the backbones which are actually designed for object classification task. Newly, in this work, we present a method called Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each u-shape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to develop a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, which gets better detection performance than state-of-the-art one-stage detectors. Specifically, on MS-COCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which is the new state-of-the-art results among one-stage detectors. The code will be made available on \url{https://github.com/qijiezhao/M2Det.
Tasks Object Classification, Object Detection
Published 2018-11-12
URL http://arxiv.org/abs/1811.04533v3
PDF http://arxiv.org/pdf/1811.04533v3.pdf
PWC https://paperswithcode.com/paper/m2det-a-single-shot-object-detector-based-on
Repo https://github.com/taashi-s/M2Det_keras
Framework none

Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

Title Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
Authors Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie
Abstract We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%-19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.
Tasks Robust Speech Recognition, Speech Enhancement, Speech Recognition
Published 2018-03-27
URL http://arxiv.org/abs/1803.10132v3
PDF http://arxiv.org/pdf/1803.10132v3.pdf
PWC https://paperswithcode.com/paper/investigating-generative-adversarial-networks
Repo https://github.com/wangkenpu/rsrgan
Framework tf

Multilingual bottleneck features for subword modeling in zero-resource languages

Title Multilingual bottleneck features for subword modeling in zero-resource languages
Authors Enno Hermann, Sharon Goldwater
Abstract How can we effectively develop speech technology for languages where no transcribed data is available? Many existing approaches use no annotated resources at all, yet it makes sense to leverage information from large annotated corpora in other languages, for example in the form of multilingual bottleneck features (BNFs) obtained from a supervised speech recognition system. In this work, we evaluate the benefits of BNFs for subword modeling (feature extraction) in six unseen languages on a word discrimination task. First we establish a strong unsupervised baseline by combining two existing methods: vocal tract length normalisation (VTLN) and the correspondence autoencoder (cAE). We then show that BNFs trained on a single language already beat this baseline; including up to 10 languages results in additional improvements which cannot be matched by just adding more data from a single language. Finally, we show that the cAE can improve further on the BNFs if high-quality same-word pairs are available.
Tasks Speech Recognition
Published 2018-03-23
URL http://arxiv.org/abs/1803.08863v2
PDF http://arxiv.org/pdf/1803.08863v2.pdf
PWC https://paperswithcode.com/paper/multilingual-bottleneck-features-for-subword
Repo https://github.com/eginhard/cae-utd-utils
Framework none

Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification

Title Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification
Authors Vaibhav B Sinha, Sukrut Rao, Vineeth N Balasubramanian
Abstract Many real world problems can now be effectively solved using supervised machine learning. A major roadblock is often the lack of an adequate quantity of labeled data for training. A possible solution is to assign the task of labeling data to a crowd, and then infer the true label using aggregation methods. A well-known approach for aggregation is the Dawid-Skene (DS) algorithm, which is based on the principle of Expectation-Maximization (EM). We propose a new simple, yet effective, EM-based algorithm, which can be interpreted as a `hard’ version of DS, that allows much faster convergence while maintaining similar accuracy in aggregation. We show the use of this algorithm as a quick and effective technique for online, real-time sentiment annotation. We also prove that our algorithm converges to the estimated labels at a linear rate. Our experiments on standard datasets show a significant speedup in time taken for aggregation - upto $\sim$8x over Dawid-Skene and $\sim$6x over other fast EM methods, at competitive accuracy performance. The code for the implementation of the algorithms can be found at https://github.com/GoodDeeds/Fast-Dawid-Skene |
Tasks Sentiment Analysis
Published 2018-03-07
URL http://arxiv.org/abs/1803.02781v3
PDF http://arxiv.org/pdf/1803.02781v3.pdf
PWC https://paperswithcode.com/paper/fast-dawid-skene-a-fast-vote-aggregation
Repo https://github.com/GoodDeeds/Fast-Dawid-Skene
Framework none
comments powered by Disqus