Paper Group AWR 331
PointCNN: Convolution On $\mathcal{X}$-Transformed Points. Learning Neural Templates for Text Generation. Path-Level Network Transformation for Efficient Architecture Search. Wide Activation for Efficient and Accurate Image Super-Resolution. Diverse Few-Shot Text Classification with Multiple Metrics. PointConv: Deep Convolutional Networks on 3D Poi …
PointCNN: Convolution On $\mathcal{X}$-Transformed Points
Title | PointCNN: Convolution On $\mathcal{X}$-Transformed Points |
Authors | Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, Baoquan Chen |
Abstract | We present a simple and general framework for feature learning from point clouds. The key to the success of CNNs is the convolution operator that is capable of leveraging spatially-local correlation in data represented densely in grids (e.g. images). However, point clouds are irregular and unordered, thus directly convolving kernels against features associated with the points, will result in desertion of shape information and variance to point ordering. To address these problems, we propose to learn an $\mathcal{X}$-transformation from the input points, to simultaneously promote two causes. The first is the weighting of the input features associated with the points, and the second is the permutation of the points into a latent and potentially canonical order. Element-wise product and sum operations of the typical convolution operator are subsequently applied on the $\mathcal{X}$-transformed features. The proposed method is a generalization of typical CNNs to feature learning from point clouds, thus we call it PointCNN. Experiments show that PointCNN achieves on par or better performance than state-of-the-art methods on multiple challenging benchmark datasets and tasks. |
Tasks | 3D Instance Segmentation, 3D Part Segmentation |
Published | 2018-01-23 |
URL | http://arxiv.org/abs/1801.07791v5 |
http://arxiv.org/pdf/1801.07791v5.pdf | |
PWC | https://paperswithcode.com/paper/pointcnn-convolution-on-mathcalx-transformed |
Repo | https://github.com/hbb1/reading-list |
Framework | none |
Learning Neural Templates for Text Generation
Title | Learning Neural Templates for Text Generation |
Authors | Sam Wiseman, Stuart M. Shieber, Alexander M. Rush |
Abstract | While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation. Encoder-decoder models are largely (a) uninterpretable, and (b) difficult to control in terms of their phrasing or content. This work proposes a neural generation system using a hidden semi-markov model (HSMM) decoder, which learns latent, discrete templates jointly with learning to generate. We show that this model learns useful templates, and that these templates make generation both more interpretable and controllable. Furthermore, we show that this approach scales to real data sets and achieves strong performance nearing that of encoder-decoder text generation models. |
Tasks | Text Generation |
Published | 2018-08-30 |
URL | https://arxiv.org/abs/1808.10122v3 |
https://arxiv.org/pdf/1808.10122v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-neural-templates-for-text-generation |
Repo | https://github.com/harvardnlp/neural-template-gen |
Framework | pytorch |
Path-Level Network Transformation for Efficient Architecture Search
Title | Path-Level Network Transformation for Efficient Architecture Search |
Authors | Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, Yong Yu |
Abstract | We introduce a new function-preserving transformation for efficient neural architecture search. This network transformation allows reusing previously trained networks and existing successful architectures that improves sample efficiency. We aim to address the limitation of current network transformation operations that can only perform layer-level architecture modifications, such as adding (pruning) filters or inserting (removing) a layer, which fails to change the topology of connection paths. Our proposed path-level transformation operations enable the meta-controller to modify the path topology of the given network while keeping the merits of reusing weights, and thus allow efficiently designing effective structures with complex path topologies like Inception models. We further propose a bidirectional tree-structured reinforcement learning meta-controller to explore a simple yet highly expressive tree-structured architecture space that can be viewed as a generalization of multi-branch architectures. We experimented on the image classification datasets with limited computational resources (about 200 GPU-hours), where we observed improved parameter efficiency and better test results (97.70% test accuracy on CIFAR-10 with 14.3M parameters and 74.6% top-1 accuracy on ImageNet in the mobile setting), demonstrating the effectiveness and transferability of our designed architectures. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02639v1 |
http://arxiv.org/pdf/1806.02639v1.pdf | |
PWC | https://paperswithcode.com/paper/path-level-network-transformation-for |
Repo | https://github.com/han-cai/PathLevel-EAS |
Framework | pytorch |
Wide Activation for Efficient and Accurate Image Super-Resolution
Title | Wide Activation for Efficient and Accurate Image Super-Resolution |
Authors | Jiahui Yu, Yuchen Fan, Jianchao Yang, Ning Xu, Zhaowen Wang, Xinchao Wang, Thomas Huang |
Abstract | Keras-based implementation of WDSR, EDSR and SRGAN for single image super-resolution |
Tasks | Image Super-Resolution, Multi-Frame Super-Resolution, Super-Resolution |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08718v2 |
http://arxiv.org/pdf/1808.08718v2.pdf | |
PWC | https://paperswithcode.com/paper/wide-activation-for-efficient-and-accurate |
Repo | https://github.com/yjn870/WDSR-pytorch |
Framework | pytorch |
Diverse Few-Shot Text Classification with Multiple Metrics
Title | Diverse Few-Shot Text Classification with Multiple Metrics |
Authors | Mo Yu, Xiaoxiao Guo, Jinfeng Yi, Shiyu Chang, Saloni Potdar, Yu Cheng, Gerald Tesauro, Haoyu Wang, Bowen Zhou |
Abstract | We study few-shot learning in natural language domains. Compared to many existing works that apply either metric-based or optimization-based meta-learning to image domain with low inter-task variance, we consider a more realistic setting, where tasks are diverse. However, it imposes tremendous difficulties to existing state-of-the-art metric-based algorithms since a single metric is insufficient to capture complex task variations in natural language domain. To alleviate the problem, we propose an adaptive metric learning approach that automatically determines the best weighted combination from a set of metrics obtained from meta-training tasks for a newly seen few-shot task. Extensive quantitative evaluations on real-world sentiment analysis and dialog intent classification datasets demonstrate that the proposed method performs favorably against state-of-the-art few shot learning algorithms in terms of predictive accuracy. We make our code and data available for further study. |
Tasks | Few-Shot Learning, Intent Classification, Meta-Learning, Metric Learning, Sentiment Analysis, Text Classification |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07513v1 |
http://arxiv.org/pdf/1805.07513v1.pdf | |
PWC | https://paperswithcode.com/paper/diverse-few-shot-text-classification-with |
Repo | https://github.com/Gorov/DiverseFewShot_Amazon |
Framework | pytorch |
PointConv: Deep Convolutional Networks on 3D Point Clouds
Title | PointConv: Deep Convolutional Networks on 3D Point Clouds |
Authors | Wenxuan Wu, Zhongang Qi, Li Fuxin |
Abstract | Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and density functions through kernel density estimation. The most important contribution of this work is a novel reformulation proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure. |
Tasks | 3D Part Segmentation, Density Estimation, Semantic Segmentation |
Published | 2018-11-17 |
URL | http://arxiv.org/abs/1811.07246v2 |
http://arxiv.org/pdf/1811.07246v2.pdf | |
PWC | https://paperswithcode.com/paper/pointconv-deep-convolutional-networks-on-3d |
Repo | https://github.com/DylanWusee/pointconv_pytorch |
Framework | pytorch |
Visual Representations for Semantic Target Driven Navigation
Title | Visual Representations for Semantic Target Driven Navigation |
Authors | Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, Ayzaan Wahid, James Davidson |
Abstract | What is a good visual representation for autonomous agents? We address this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a complex environment to a target object, e.g. go to the refrigerator. Instead of acquiring a metric semantic map of an environment and using planning for navigation, our approach learns navigation policies on top of representations that capture spatial layout and semantic contextual cues. We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy. This choice allows using additional data, from orthogonal sources, to better train different parts of the model the representation extraction is trained on large standard vision datasets while the navigation component leverages large synthetic environments for training. This combination of real and synthetic is possible because equitable feature representations are available in both (e.g., segmentation and detection masks), which alleviates the need for domain adaptation. Both the representation and the navigation policy can be readily applied to real non-synthetic environments as demonstrated on the Active Vision Dataset [1]. Our approach gets successfully to the target in 54% of the cases in unexplored environments, compared to 46% for non-learning based approach, and 28% for the learning-based baseline. |
Tasks | Domain Adaptation, Visual Navigation |
Published | 2018-05-15 |
URL | https://arxiv.org/abs/1805.06066v3 |
https://arxiv.org/pdf/1805.06066v3.pdf | |
PWC | https://paperswithcode.com/paper/visual-representations-for-semantic-target |
Repo | https://github.com/arsalan-mousavian/Navigation |
Framework | tf |
LEAF: A Benchmark for Federated Settings
Title | LEAF: A Benchmark for Federated Settings |
Authors | Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith, Ameet Talwalkar |
Abstract | Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day. This wealth of data can help to learn models that can improve the user experience on each device. However, the scale and heterogeneity of federated data presents new challenges in research areas such as federated learning, meta-learning, and multi-task learning. As the machine learning community begins to tackle these challenges, we are at a critical time to ensure that developments made in these areas are grounded with realistic benchmarks. To this end, we propose LEAF, a modular benchmarking framework for learning in federated settings. LEAF includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments. |
Tasks | Autonomous Vehicles, Meta-Learning, Multi-Task Learning |
Published | 2018-12-03 |
URL | https://arxiv.org/abs/1812.01097v3 |
https://arxiv.org/pdf/1812.01097v3.pdf | |
PWC | https://paperswithcode.com/paper/leaf-a-benchmark-for-federated-settings |
Repo | https://github.com/PaddlePaddle/PaddleFL |
Framework | none |
Slum Segmentation and Change Detection : A Deep Learning Approach
Title | Slum Segmentation and Change Detection : A Deep Learning Approach |
Authors | Shishira R Maiya, Sudharshan Chandra Babu |
Abstract | More than one billion people live in slums around the world. In some developing countries, slum residents make up for more than half of the population and lack reliable sanitation services, clean water, electricity, other basic services. Thus, slum rehabilitation and improvement is an important global challenge, and a significant amount of effort and resources have been put into this endeavor. These initiatives rely heavily on slum mapping and monitoring, and it is essential to have robust and efficient methods for mapping and monitoring existing slum settlements. In this work, we introduce an approach to segment and map individual slums from satellite imagery, leveraging regional convolutional neural networks for instance segmentation using transfer learning. In addition, we also introduce a method to perform change detection and monitor slum change over time. We show that our approach effectively learns slum shape and appearance, and demonstrates strong quantitative results, resulting in a maximum AP of 80.0. |
Tasks | Instance Segmentation, Semantic Segmentation, Transfer Learning |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07896v1 |
http://arxiv.org/pdf/1811.07896v1.pdf | |
PWC | https://paperswithcode.com/paper/slum-segmentation-and-change-detection-a-deep |
Repo | https://github.com/cbsudux/Mumbai-slum-segmentation |
Framework | tf |
Actor and Observer: Joint Modeling of First and Third-Person Videos
Title | Actor and Observer: Joint Modeling of First and Third-Person Videos |
Authors | Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari |
Abstract | Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such models for human action recognition has not been achievable due to the lack of data. This paper takes a step in this direction, with the introduction of Charades-Ego, a large-scale dataset of paired first-person and third-person videos, involving 112 people, with 4000 paired videos. This enables learning the link between the two, actor and observer perspectives. Thereby, we address one of the biggest bottlenecks facing egocentric vision research, providing a link from first-person to the abundant third-person data on the web. We use this data to learn a joint representation of first and third-person videos, with only weak supervision, and show its effectiveness for transferring knowledge from the third-person to the first-person domain. |
Tasks | Temporal Action Localization |
Published | 2018-04-25 |
URL | http://arxiv.org/abs/1804.09627v1 |
http://arxiv.org/pdf/1804.09627v1.pdf | |
PWC | https://paperswithcode.com/paper/actor-and-observer-joint-modeling-of-first |
Repo | https://github.com/gsig/actor-observer |
Framework | pytorch |
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Title | IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures |
Authors | Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu |
Abstract | In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach. |
Tasks | Atari Games |
Published | 2018-02-05 |
URL | http://arxiv.org/abs/1802.01561v3 |
http://arxiv.org/pdf/1802.01561v3.pdf | |
PWC | https://paperswithcode.com/paper/impala-scalable-distributed-deep-rl-with |
Repo | https://github.com/haje01/impala |
Framework | pytorch |
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network
Title | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network |
Authors | Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai, Haibin Ling |
Abstract | Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask R-CNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multi-scale, pyramidal architecture of the backbones which are actually designed for object classification task. Newly, in this work, we present a method called Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each u-shape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to develop a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, which gets better detection performance than state-of-the-art one-stage detectors. Specifically, on MS-COCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which is the new state-of-the-art results among one-stage detectors. The code will be made available on \url{https://github.com/qijiezhao/M2Det. |
Tasks | Object Classification, Object Detection |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04533v3 |
http://arxiv.org/pdf/1811.04533v3.pdf | |
PWC | https://paperswithcode.com/paper/m2det-a-single-shot-object-detector-based-on |
Repo | https://github.com/taashi-s/M2Det_keras |
Framework | none |
Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
Title | Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition |
Authors | Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie |
Abstract | We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%-19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model. |
Tasks | Robust Speech Recognition, Speech Enhancement, Speech Recognition |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.10132v3 |
http://arxiv.org/pdf/1803.10132v3.pdf | |
PWC | https://paperswithcode.com/paper/investigating-generative-adversarial-networks |
Repo | https://github.com/wangkenpu/rsrgan |
Framework | tf |
Multilingual bottleneck features for subword modeling in zero-resource languages
Title | Multilingual bottleneck features for subword modeling in zero-resource languages |
Authors | Enno Hermann, Sharon Goldwater |
Abstract | How can we effectively develop speech technology for languages where no transcribed data is available? Many existing approaches use no annotated resources at all, yet it makes sense to leverage information from large annotated corpora in other languages, for example in the form of multilingual bottleneck features (BNFs) obtained from a supervised speech recognition system. In this work, we evaluate the benefits of BNFs for subword modeling (feature extraction) in six unseen languages on a word discrimination task. First we establish a strong unsupervised baseline by combining two existing methods: vocal tract length normalisation (VTLN) and the correspondence autoencoder (cAE). We then show that BNFs trained on a single language already beat this baseline; including up to 10 languages results in additional improvements which cannot be matched by just adding more data from a single language. Finally, we show that the cAE can improve further on the BNFs if high-quality same-word pairs are available. |
Tasks | Speech Recognition |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08863v2 |
http://arxiv.org/pdf/1803.08863v2.pdf | |
PWC | https://paperswithcode.com/paper/multilingual-bottleneck-features-for-subword |
Repo | https://github.com/eginhard/cae-utd-utils |
Framework | none |
Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification
Title | Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification |
Authors | Vaibhav B Sinha, Sukrut Rao, Vineeth N Balasubramanian |
Abstract | Many real world problems can now be effectively solved using supervised machine learning. A major roadblock is often the lack of an adequate quantity of labeled data for training. A possible solution is to assign the task of labeling data to a crowd, and then infer the true label using aggregation methods. A well-known approach for aggregation is the Dawid-Skene (DS) algorithm, which is based on the principle of Expectation-Maximization (EM). We propose a new simple, yet effective, EM-based algorithm, which can be interpreted as a `hard’ version of DS, that allows much faster convergence while maintaining similar accuracy in aggregation. We show the use of this algorithm as a quick and effective technique for online, real-time sentiment annotation. We also prove that our algorithm converges to the estimated labels at a linear rate. Our experiments on standard datasets show a significant speedup in time taken for aggregation - upto $\sim$8x over Dawid-Skene and $\sim$6x over other fast EM methods, at competitive accuracy performance. The code for the implementation of the algorithms can be found at https://github.com/GoodDeeds/Fast-Dawid-Skene | |
Tasks | Sentiment Analysis |
Published | 2018-03-07 |
URL | http://arxiv.org/abs/1803.02781v3 |
http://arxiv.org/pdf/1803.02781v3.pdf | |
PWC | https://paperswithcode.com/paper/fast-dawid-skene-a-fast-vote-aggregation |
Repo | https://github.com/GoodDeeds/Fast-Dawid-Skene |
Framework | none |