October 20, 2019

3225 words 16 mins read

Paper Group AWR 331

PointCNN: Convolution On $\mathcal{X}$-Transformed Points. Learning Neural Templates for Text Generation. Path-Level Network Transformation for Efficient Architecture Search. Wide Activation for Efficient and Accurate Image Super-Resolution. Diverse Few-Shot Text Classification with Multiple Metrics. PointConv: Deep Convolutional Networks on 3D Poi …

PointCNN: Convolution On $\mathcal{X}$-Transformed Points


Title	PointCNN: Convolution On $\mathcal{X}$-Transformed Points
Authors	Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, Baoquan Chen
Abstract	We present a simple and general framework for feature learning from point clouds. The key to the success of CNNs is the convolution operator that is capable of leveraging spatially-local correlation in data represented densely in grids (e.g. images). However, point clouds are irregular and unordered, thus directly convolving kernels against features associated with the points, will result in desertion of shape information and variance to point ordering. To address these problems, we propose to learn an $\mathcal{X}$-transformation from the input points, to simultaneously promote two causes. The first is the weighting of the input features associated with the points, and the second is the permutation of the points into a latent and potentially canonical order. Element-wise product and sum operations of the typical convolution operator are subsequently applied on the $\mathcal{X}$-transformed features. The proposed method is a generalization of typical CNNs to feature learning from point clouds, thus we call it PointCNN. Experiments show that PointCNN achieves on par or better performance than state-of-the-art methods on multiple challenging benchmark datasets and tasks.
Tasks	3D Instance Segmentation, 3D Part Segmentation
Published	2018-01-23
URL	http://arxiv.org/abs/1801.07791v5
PDF	http://arxiv.org/pdf/1801.07791v5.pdf
PWC	https://paperswithcode.com/paper/pointcnn-convolution-on-mathcalx-transformed
Repo	https://github.com/hbb1/reading-list
Framework	none

Learning Neural Templates for Text Generation


Title	Learning Neural Templates for Text Generation
Authors	Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
Abstract	While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation. Encoder-decoder models are largely (a) uninterpretable, and (b) difficult to control in terms of their phrasing or content. This work proposes a neural generation system using a hidden semi-markov model (HSMM) decoder, which learns latent, discrete templates jointly with learning to generate. We show that this model learns useful templates, and that these templates make generation both more interpretable and controllable. Furthermore, we show that this approach scales to real data sets and achieves strong performance nearing that of encoder-decoder text generation models.
Tasks	Text Generation
Published	2018-08-30
URL	https://arxiv.org/abs/1808.10122v3
PDF	https://arxiv.org/pdf/1808.10122v3.pdf
PWC	https://paperswithcode.com/paper/learning-neural-templates-for-text-generation
Repo	https://github.com/harvardnlp/neural-template-gen
Framework	pytorch

Path-Level Network Transformation for Efficient Architecture Search


Title	Path-Level Network Transformation for Efficient Architecture Search
Authors	Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, Yong Yu
Abstract	We introduce a new function-preserving transformation for efficient neural architecture search. This network transformation allows reusing previously trained networks and existing successful architectures that improves sample efficiency. We aim to address the limitation of current network transformation operations that can only perform layer-level architecture modifications, such as adding (pruning) filters or inserting (removing) a layer, which fails to change the topology of connection paths. Our proposed path-level transformation operations enable the meta-controller to modify the path topology of the given network while keeping the merits of reusing weights, and thus allow efficiently designing effective structures with complex path topologies like Inception models. We further propose a bidirectional tree-structured reinforcement learning meta-controller to explore a simple yet highly expressive tree-structured architecture space that can be viewed as a generalization of multi-branch architectures. We experimented on the image classification datasets with limited computational resources (about 200 GPU-hours), where we observed improved parameter efficiency and better test results (97.70% test accuracy on CIFAR-10 with 14.3M parameters and 74.6% top-1 accuracy on ImageNet in the mobile setting), demonstrating the effectiveness and transferability of our designed architectures.
Tasks	Image Classification, Neural Architecture Search
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02639v1
PDF	http://arxiv.org/pdf/1806.02639v1.pdf
PWC	https://paperswithcode.com/paper/path-level-network-transformation-for
Repo	https://github.com/han-cai/PathLevel-EAS
Framework	pytorch

Wide Activation for Efficient and Accurate Image Super-Resolution


Title	Wide Activation for Efficient and Accurate Image Super-Resolution
Authors	Jiahui Yu, Yuchen Fan, Jianchao Yang, Ning Xu, Zhaowen Wang, Xinchao Wang, Thomas Huang
Abstract	Keras-based implementation of WDSR, EDSR and SRGAN for single image super-resolution
Tasks	Image Super-Resolution, Multi-Frame Super-Resolution, Super-Resolution
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08718v2
PDF	http://arxiv.org/pdf/1808.08718v2.pdf
PWC	https://paperswithcode.com/paper/wide-activation-for-efficient-and-accurate
Repo	https://github.com/yjn870/WDSR-pytorch
Framework	pytorch

Diverse Few-Shot Text Classification with Multiple Metrics


Title	Diverse Few-Shot Text Classification with Multiple Metrics
Authors	Mo Yu, Xiaoxiao Guo, Jinfeng Yi, Shiyu Chang, Saloni Potdar, Yu Cheng, Gerald Tesauro, Haoyu Wang, Bowen Zhou
Abstract	We study few-shot learning in natural language domains. Compared to many existing works that apply either metric-based or optimization-based meta-learning to image domain with low inter-task variance, we consider a more realistic setting, where tasks are diverse. However, it imposes tremendous difficulties to existing state-of-the-art metric-based algorithms since a single metric is insufficient to capture complex task variations in natural language domain. To alleviate the problem, we propose an adaptive metric learning approach that automatically determines the best weighted combination from a set of metrics obtained from meta-training tasks for a newly seen few-shot task. Extensive quantitative evaluations on real-world sentiment analysis and dialog intent classification datasets demonstrate that the proposed method performs favorably against state-of-the-art few shot learning algorithms in terms of predictive accuracy. We make our code and data available for further study.
Tasks	Few-Shot Learning, Intent Classification, Meta-Learning, Metric Learning, Sentiment Analysis, Text Classification
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07513v1
PDF	http://arxiv.org/pdf/1805.07513v1.pdf
PWC	https://paperswithcode.com/paper/diverse-few-shot-text-classification-with
Repo	https://github.com/Gorov/DiverseFewShot_Amazon
Framework	pytorch

PointConv: Deep Convolutional Networks on 3D Point Clouds


Title	PointConv: Deep Convolutional Networks on 3D Point Clouds
Authors	Wenxuan Wu, Zhongang Qi, Li Fuxin
Abstract	Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and density functions through kernel density estimation. The most important contribution of this work is a novel reformulation proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.
Tasks	3D Part Segmentation, Density Estimation, Semantic Segmentation
Published	2018-11-17
URL	http://arxiv.org/abs/1811.07246v2
PDF	http://arxiv.org/pdf/1811.07246v2.pdf
PWC	https://paperswithcode.com/paper/pointconv-deep-convolutional-networks-on-3d
Repo	https://github.com/DylanWusee/pointconv_pytorch
Framework	pytorch


Title	Visual Representations for Semantic Target Driven Navigation
Authors	Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, Ayzaan Wahid, James Davidson
Abstract	What is a good visual representation for autonomous agents? We address this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a complex environment to a target object, e.g. go to the refrigerator. Instead of acquiring a metric semantic map of an environment and using planning for navigation, our approach learns navigation policies on top of representations that capture spatial layout and semantic contextual cues. We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy. This choice allows using additional data, from orthogonal sources, to better train different parts of the model the representation extraction is trained on large standard vision datasets while the navigation component leverages large synthetic environments for training. This combination of real and synthetic is possible because equitable feature representations are available in both (e.g., segmentation and detection masks), which alleviates the need for domain adaptation. Both the representation and the navigation policy can be readily applied to real non-synthetic environments as demonstrated on the Active Vision Dataset [1]. Our approach gets successfully to the target in 54% of the cases in unexplored environments, compared to 46% for non-learning based approach, and 28% for the learning-based baseline.
Tasks	Domain Adaptation, Visual Navigation
Published	2018-05-15
URL	https://arxiv.org/abs/1805.06066v3
PDF	https://arxiv.org/pdf/1805.06066v3.pdf
PWC	https://paperswithcode.com/paper/visual-representations-for-semantic-target
Repo	https://github.com/arsalan-mousavian/Navigation
Framework	tf

LEAF: A Benchmark for Federated Settings


Title	LEAF: A Benchmark for Federated Settings
Authors	Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith, Ameet Talwalkar
Abstract	Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day. This wealth of data can help to learn models that can improve the user experience on each device. However, the scale and heterogeneity of federated data presents new challenges in research areas such as federated learning, meta-learning, and multi-task learning. As the machine learning community begins to tackle these challenges, we are at a critical time to ensure that developments made in these areas are grounded with realistic benchmarks. To this end, we propose LEAF, a modular benchmarking framework for learning in federated settings. LEAF includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.
Tasks	Autonomous Vehicles, Meta-Learning, Multi-Task Learning
Published	2018-12-03
URL	https://arxiv.org/abs/1812.01097v3
PDF	https://arxiv.org/pdf/1812.01097v3.pdf
PWC	https://paperswithcode.com/paper/leaf-a-benchmark-for-federated-settings
Repo	https://github.com/PaddlePaddle/PaddleFL
Framework	none

Slum Segmentation and Change Detection : A Deep Learning Approach


Title	Slum Segmentation and Change Detection : A Deep Learning Approach
Authors	Shishira R Maiya, Sudharshan Chandra Babu
Abstract	More than one billion people live in slums around the world. In some developing countries, slum residents make up for more than half of the population and lack reliable sanitation services, clean water, electricity, other basic services. Thus, slum rehabilitation and improvement is an important global challenge, and a significant amount of effort and resources have been put into this endeavor. These initiatives rely heavily on slum mapping and monitoring, and it is essential to have robust and efficient methods for mapping and monitoring existing slum settlements. In this work, we introduce an approach to segment and map individual slums from satellite imagery, leveraging regional convolutional neural networks for instance segmentation using transfer learning. In addition, we also introduce a method to perform change detection and monitor slum change over time. We show that our approach effectively learns slum shape and appearance, and demonstrates strong quantitative results, resulting in a maximum AP of 80.0.
Tasks	Instance Segmentation, Semantic Segmentation, Transfer Learning
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07896v1
PDF	http://arxiv.org/pdf/1811.07896v1.pdf
PWC	https://paperswithcode.com/paper/slum-segmentation-and-change-detection-a-deep
Repo	https://github.com/cbsudux/Mumbai-slum-segmentation
Framework	tf

Actor and Observer: Joint Modeling of First and Third-Person Videos


Title	Actor and Observer: Joint Modeling of First and Third-Person Videos
Authors	Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari
Abstract	Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such models for human action recognition has not been achievable due to the lack of data. This paper takes a step in this direction, with the introduction of Charades-Ego, a large-scale dataset of paired first-person and third-person videos, involving 112 people, with 4000 paired videos. This enables learning the link between the two, actor and observer perspectives. Thereby, we address one of the biggest bottlenecks facing egocentric vision research, providing a link from first-person to the abundant third-person data on the web. We use this data to learn a joint representation of first and third-person videos, with only weak supervision, and show its effectiveness for transferring knowledge from the third-person to the first-person domain.
Tasks	Temporal Action Localization
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09627v1
PDF	http://arxiv.org/pdf/1804.09627v1.pdf
PWC	https://paperswithcode.com/paper/actor-and-observer-joint-modeling-of-first
Repo	https://github.com/gsig/actor-observer
Framework	pytorch

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures


Title	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Authors	Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu
Abstract	In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.
Tasks	Atari Games
Published	2018-02-05
URL	http://arxiv.org/abs/1802.01561v3
PDF	http://arxiv.org/pdf/1802.01561v3.pdf
PWC	https://paperswithcode.com/paper/impala-scalable-distributed-deep-rl-with
Repo	https://github.com/haje01/impala
Framework	pytorch

M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network


Title	M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network
Authors	Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai, Haibin Ling
Abstract	Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask R-CNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multi-scale, pyramidal architecture of the backbones which are actually designed for object classification task. Newly, in this work, we present a method called Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each u-shape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to develop a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, which gets better detection performance than state-of-the-art one-stage detectors. Specifically, on MS-COCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which is the new state-of-the-art results among one-stage detectors. The code will be made available on \url{https://github.com/qijiezhao/M2Det.
Tasks	Object Classification, Object Detection
Published	2018-11-12
URL	http://arxiv.org/abs/1811.04533v3
PDF	http://arxiv.org/pdf/1811.04533v3.pdf
PWC	https://paperswithcode.com/paper/m2det-a-single-shot-object-detector-based-on
Repo	https://github.com/taashi-s/M2Det_keras
Framework	none

Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition


Title	Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
Authors	Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie
Abstract	We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%-19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.
Tasks	Robust Speech Recognition, Speech Enhancement, Speech Recognition
Published	2018-03-27
URL	http://arxiv.org/abs/1803.10132v3
PDF	http://arxiv.org/pdf/1803.10132v3.pdf
PWC	https://paperswithcode.com/paper/investigating-generative-adversarial-networks
Repo	https://github.com/wangkenpu/rsrgan
Framework	tf

Multilingual bottleneck features for subword modeling in zero-resource languages


Title	Multilingual bottleneck features for subword modeling in zero-resource languages
Authors	Enno Hermann, Sharon Goldwater
Abstract	How can we effectively develop speech technology for languages where no transcribed data is available? Many existing approaches use no annotated resources at all, yet it makes sense to leverage information from large annotated corpora in other languages, for example in the form of multilingual bottleneck features (BNFs) obtained from a supervised speech recognition system. In this work, we evaluate the benefits of BNFs for subword modeling (feature extraction) in six unseen languages on a word discrimination task. First we establish a strong unsupervised baseline by combining two existing methods: vocal tract length normalisation (VTLN) and the correspondence autoencoder (cAE). We then show that BNFs trained on a single language already beat this baseline; including up to 10 languages results in additional improvements which cannot be matched by just adding more data from a single language. Finally, we show that the cAE can improve further on the BNFs if high-quality same-word pairs are available.
Tasks	Speech Recognition
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08863v2
PDF	http://arxiv.org/pdf/1803.08863v2.pdf
PWC	https://paperswithcode.com/paper/multilingual-bottleneck-features-for-subword
Repo	https://github.com/eginhard/cae-utd-utils
Framework	none

Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification


Title	Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification
Authors	Vaibhav B Sinha, Sukrut Rao, Vineeth N Balasubramanian
Abstract	Many real world problems can now be effectively solved using supervised machine learning. A major roadblock is often the lack of an adequate quantity of labeled data for training. A possible solution is to assign the task of labeling data to a crowd, and then infer the true label using aggregation methods. A well-known approach for aggregation is the Dawid-Skene (DS) algorithm, which is based on the principle of Expectation-Maximization (EM). We propose a new simple, yet effective, EM-based algorithm, which can be interpreted as a `hard’ version of DS, that allows much faster convergence while maintaining similar accuracy in aggregation. We show the use of this algorithm as a quick and effective technique for online, real-time sentiment annotation. We also prove that our algorithm converges to the estimated labels at a linear rate. Our experiments on standard datasets show a significant speedup in time taken for aggregation - upto $\sim$8x over Dawid-Skene and $\sim$6x over other fast EM methods, at competitive accuracy performance. The code for the implementation of the algorithms can be found at https://github.com/GoodDeeds/Fast-Dawid-Skene \|
Tasks	Sentiment Analysis
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02781v3
PDF	http://arxiv.org/pdf/1803.02781v3.pdf
PWC	https://paperswithcode.com/paper/fast-dawid-skene-a-fast-vote-aggregation
Repo	https://github.com/GoodDeeds/Fast-Dawid-Skene
Framework	none