Paper Group AWR 252
MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences. SENSE: a Shared Encoder Network for Scene-flow Estimation. Improving the generalizability of convolutional neural network-based segmentation on CMR images. Attenuating Bias in Word Vectors. An optical diffractive deep neural network with multiple frequency-channels. Learning Exploration …
MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences
Title | MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences |
Authors | Xingyu Liu, Mengyuan Yan, Jeannette Bohg |
Abstract | Understanding dynamic 3D environment is crucial for robotic agents and many other applications. We propose a novel neural network architecture called $MeteorNet$ for learning representations for dynamic 3D point cloud sequences. Different from previous work that adopts a grid-based representation and applies 3D or 4D convolutions, our network directly processes point clouds. We propose two ways to construct spatiotemporal neighborhoods for each point in the point cloud sequence. Information from these neighborhoods is aggregated to learn features per point. We benchmark our network on a variety of 3D recognition tasks including action recognition, semantic segmentation and scene flow estimation. MeteorNet shows stronger performance than previous grid-based methods while achieving state-of-the-art performance on Synthia. MeteorNet also outperforms previous baseline methods that are able to process at most two consecutive point clouds. To the best of our knowledge, this is the first work on deep learning for dynamic raw point cloud sequences. |
Tasks | Scene Flow Estimation, Semantic Segmentation |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09165v2 |
https://arxiv.org/pdf/1910.09165v2.pdf | |
PWC | https://paperswithcode.com/paper/meteornet-deep-learning-on-dynamic-3d-point |
Repo | https://github.com/xingyul/meteornet |
Framework | tf |
SENSE: a Shared Encoder Network for Scene-flow Estimation
Title | SENSE: a Shared Encoder Network for Scene-flow Estimation |
Authors | Huaizu Jiang, Deqing Sun, Varun Jampani, Zhaoyang Lv, Erik Learned-Miller, Jan Kautz |
Abstract | We introduce a compact network for holistic scene flow estimation, called SENSE, which shares common encoder features among four closely-related tasks: optical flow estimation, disparity estimation from stereo, occlusion estimation, and semantic segmentation. Our key insight is that sharing features makes the network more compact, induces better feature representations, and can better exploit interactions among these tasks to handle partially labeled data. With a shared encoder, we can flexibly add decoders for different tasks during training. This modular design leads to a compact and efficient model at inference time. Exploiting the interactions among these tasks allows us to introduce distillation and self-supervised losses in addition to supervised losses, which can better handle partially labeled real-world data. SENSE achieves state-of-the-art results on several optical flow benchmarks and runs as fast as networks specifically designed for optical flow. It also compares favorably against the state of the art on stereo and scene flow, while consuming much less memory. |
Tasks | Disparity Estimation, Optical Flow Estimation, Scene Flow Estimation, Semantic Segmentation |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12361v1 |
https://arxiv.org/pdf/1910.12361v1.pdf | |
PWC | https://paperswithcode.com/paper/sense-a-shared-encoder-network-for-scene-flow-1 |
Repo | https://github.com/NVlabs/SENSE |
Framework | none |
Improving the generalizability of convolutional neural network-based segmentation on CMR images
Title | Improving the generalizability of convolutional neural network-based segmentation on CMR images |
Authors | Chen Chen, Wenjia Bai, Rhodri H. Davies, Anish N. Bhuva, Charlotte Manisty, James C. Moon, Nay Aung, Aaron M. Lee, Mihir M. Sanghvi, Kenneth Fung, Jose Miguel Paiva, Steffen E. Petersen, Elena Lukaschuk, Stefan K. Piechnik, Stefan Neubauer, Daniel Rueckert |
Abstract | Convolutional neural network (CNN) based segmentation methods provide an efficient and automated way for clinicians to assess the structure and function of the heart in cardiac MR images. While CNNs can generally perform the segmentation tasks with high accuracy when training and test images come from the same domain (e.g. same scanner or site), their performance often degrades dramatically on images from different scanners or clinical sites. We propose a simple yet effective way for improving the network generalization ability by carefully designing data normalization and augmentation strategies to accommodate common scenarios in multi-site, multi-scanner clinical imaging data sets. We demonstrate that a neural network trained on a single-site single-scanner dataset from the UK Biobank can be successfully applied to segmenting cardiac MR images across different sites and different scanners without substantial loss of accuracy. Specifically, the method was trained on a large set of 3,975 subjects from the UK Biobank. It was then directly tested on 600 different subjects from the UK Biobank for intra-domain testing and two other sets for cross-domain testing: the ACDC dataset (100 subjects, 1 site, 2 scanners) and the BSCMR-AS dataset (599 subjects, 6 sites, 9 scanners). The proposed method produces promising segmentation results on the UK Biobank test set which are comparable to previously reported values in the literature, while also performing well on cross-domain test sets, achieving a mean Dice metric of 0.90 for the left ventricle, 0.81 for the myocardium and 0.82 for the right ventricle on the ACDC dataset; and 0.89 for the left ventricle, 0.83 for the myocardium on the BSCMR-AS dataset. The proposed method offers a potential solution to improve CNN-based model generalizability for the cross-scanner and cross-site cardiac MR image segmentation task. |
Tasks | Semantic Segmentation |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01268v2 |
https://arxiv.org/pdf/1907.01268v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-the-generalizability-of |
Repo | https://github.com/cherise215/CardiacMRSegmentation |
Framework | none |
Attenuating Bias in Word Vectors
Title | Attenuating Bias in Word Vectors |
Authors | Sunipa Dev, Jeff Phillips |
Abstract | Word vector representations are well developed tools for various NLP and Machine Learning tasks and are known to retain significant semantic and syntactic structure of languages. But they are prone to carrying and amplifying bias which can perpetrate discrimination in various applications. In this work, we explore new simple ways to detect the most stereotypically gendered words in an embedding and remove the bias from them. We verify how names are masked carriers of gender bias and then use that as a tool to attenuate bias in embeddings. Further, we extend this property of names to show how names can be used to detect other types of bias in the embeddings such as bias based on race, ethnicity, and age. |
Tasks | |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.07656v1 |
http://arxiv.org/pdf/1901.07656v1.pdf | |
PWC | https://paperswithcode.com/paper/attenuating-bias-in-word-vectors |
Repo | https://github.com/sunipa/Attenuating-Bias-in-Word-Vec |
Framework | none |
An optical diffractive deep neural network with multiple frequency-channels
Title | An optical diffractive deep neural network with multiple frequency-channels |
Authors | Yingshi Chen, Jinfeng Zhu |
Abstract | Diffractive deep neural network (DNNet) is a novel machine learning framework on the modulation of optical transmission. Diffractive network would get predictions at the speed of light. It’s pure passive architecture, no additional power consumption. We improved the accuracy of diffractive network with optical waves at different frequency. Each layers have multiple frequency-channels (optical distributions at different frequency). These channels are merged at the output plane to get final output. The experiment in the fasion-MNIST and EMNIST datasets showed multiple frequency-channels would increase the accuracy a lot. We also give detailed analysis to show the difference between DNNet and MLP. The modulation process in DNNet is actually optical activation function. We develop an open source package ONNet. The source codes are available at https://github.com/closest-git/ONNet. |
Tasks | |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10730v1 |
https://arxiv.org/pdf/1912.10730v1.pdf | |
PWC | https://paperswithcode.com/paper/an-optical-diffractive-deep-neural-network |
Repo | https://github.com/closest-git/ONNet |
Framework | pytorch |
Learning Exploration Policies for Navigation
Title | Learning Exploration Policies for Navigation |
Authors | Tao Chen, Saurabh Gupta, Abhinav Gupta |
Abstract | Numerous past works have tackled the problem of task-driven navigation. But, how to effectively explore a new environment to enable a variety of down-stream tasks has received much less attention. In this work, we study how agents can autonomously explore realistic and complex 3D environments without the context of task-rewards. We propose a learning-based approach and investigate different policy architectures, reward functions, and training paradigms. We find that the use of policies with spatial memory that are bootstrapped with imitation learning and finally finetuned with coverage rewards derived purely from on-board sensors can be effective at exploring novel environments. We show that our learned exploration policies can explore better than classical approaches based on geometry alone and generic learning-based exploration techniques. Finally, we also show how such task-agnostic exploration can be used for down-stream tasks. Code and Videos are available at: https://sites.google.com/view/exploration-for-nav. |
Tasks | Imitation Learning |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.01959v1 |
http://arxiv.org/pdf/1903.01959v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-exploration-policies-for-navigation |
Repo | https://github.com/s-gupta/map-plan-baseline |
Framework | none |
FSPool: Learning Set Representations with Featurewise Sort Pooling
Title | FSPool: Learning Set Representations with Featurewise Sort Pooling |
Authors | Yan Zhang, Jonathon Hare, Adam Prügel-Bennett |
Abstract | Traditional set prediction models can struggle with simple datasets due to an issue we call the responsibility problem. We introduce a pooling method for sets of feature vectors based on sorting features across elements of the set. This can be used to construct a permutation-equivariant auto-encoder that avoids this responsibility problem. On a toy dataset of polygons and a set version of MNIST, we show that such an auto-encoder produces considerably better reconstructions and representations. Replacing the pooling function in existing set encoders with FSPool improves accuracy and convergence speed on a variety of datasets. |
Tasks | |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02795v3 |
https://arxiv.org/pdf/1906.02795v3.pdf | |
PWC | https://paperswithcode.com/paper/fspool-learning-set-representations-with |
Repo | https://github.com/Cyanogenoid/fspool |
Framework | pytorch |
Do Massively Pretrained Language Models Make Better Storytellers?
Title | Do Massively Pretrained Language Models Make Better Storytellers? |
Authors | Abigail See, Aneesh Pappu, Rohun Saxena, Akhila Yerukola, Christopher D. Manning |
Abstract | Large neural language models trained on massive amounts of text have emerged as a formidable strategy for Natural Language Understanding tasks. However, the strength of these models as Natural Language Generators is less clear. Though anecdotal evidence suggests that these models generate better quality text, there has been no detailed study characterizing their generation abilities. In this work, we compare the performance of an extensively pretrained model, OpenAI GPT2-117 (Radford et al., 2019), to a state-of-the-art neural story generation model (Fan et al., 2018). By evaluating the generated text across a wide variety of automatic metrics, we characterize the ways in which pretrained models do, and do not, make better storytellers. We find that although GPT2-117 conditions more strongly on context, is more sensitive to ordering of events, and uses more unusual words, it is just as likely to produce repetitive and under-diverse text when using likelihood-maximizing decoding algorithms. |
Tasks | |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10705v1 |
https://arxiv.org/pdf/1909.10705v1.pdf | |
PWC | https://paperswithcode.com/paper/do-massively-pretrained-language-models-make |
Repo | https://github.com/abisee/story-generation-eval |
Framework | pytorch |
Improving Collaborative Metric Learning with Efficient Negative Sampling
Title | Improving Collaborative Metric Learning with Efficient Negative Sampling |
Authors | Viet-Anh Tran, Romain Hennequin, Jimena Royo-Letelier, Manuel Moussallam |
Abstract | Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the CML model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets. |
Tasks | Face Recognition, Image Retrieval, Metric Learning |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10912v1 |
https://arxiv.org/pdf/1909.10912v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-collaborative-metric-learning-with |
Repo | https://github.com/deezer/sigir2019-2stagesampling |
Framework | tf |
Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations
Title | Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations |
Authors | Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt |
Abstract | Natural-gradient methods enable fast and simple algorithms for variational inference, but due to computational difficulties, their use is mostly limited to \emph{minimal} exponential-family (EF) approximations. In this paper, we extend their application to estimate \emph{structured} approximations such as mixtures of EF distributions. Such approximations can fit complex, multimodal posterior distributions and are generally more accurate than unimodal EF approximations. By using a \emph{minimal conditional-EF} representation of such approximations, we derive simple natural-gradient updates. Our empirical results demonstrate a faster convergence of our natural-gradient method compared to black-box gradient-based methods with reparameterization gradients. Our work expands the scope of natural gradients for Bayesian inference and makes them more widely applicable than before. |
Tasks | Bayesian Inference |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.02914v2 |
https://arxiv.org/pdf/1906.02914v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-simple-natural-gradient-variational |
Repo | https://github.com/yorkerlin/VB-MixEF |
Framework | none |
Nasal Patches and Curves for Expression-robust 3D Face Recognition
Title | Nasal Patches and Curves for Expression-robust 3D Face Recognition |
Authors | Mehryar Emambakhsh, Adrian Evans |
Abstract | The potential of the nasal region for expression robust 3D face recognition is thoroughly investigated by a novel five-step algorithm. First, the nose tip location is coarsely detected and the face is segmented, aligned and the nasal region cropped. Then, a very accurate and consistent nasal landmarking algorithm detects seven keypoints on the nasal region. In the third step, a feature extraction algorithm based on the surface normals of Gabor-wavelet filtered depth maps is utilised and, then, a set of spherical patches and curves are localised over the nasal region to provide the feature descriptors. The last step applies a genetic algorithm-based feature selector to detect the most stable patches and curves over different facial expressions. The algorithm provides the highest reported nasal region-based recognition ranks on the FRGC, Bosphorus and BU-3DFE datasets. The results are comparable with, and in many cases better than, many state-of-the-art 3D face recognition algorithms, which use the whole facial domain. The proposed method does not rely on sophisticated alignment or denoising steps, is very robust when only one sample per subject is used in the gallery, and does not require a training step for the landmarking algorithm. https://github.com/mehryaragha/NoseBiometrics |
Tasks | Denoising, Face Recognition |
Published | 2019-01-01 |
URL | http://arxiv.org/abs/1901.00206v1 |
http://arxiv.org/pdf/1901.00206v1.pdf | |
PWC | https://paperswithcode.com/paper/nasal-patches-and-curves-for-expression |
Repo | https://github.com/mehryaragha/NoseBiometrics |
Framework | none |
DEDPUL: Difference-of-Estimated-Densities-based Positive-Unlabeled Learning
Title | DEDPUL: Difference-of-Estimated-Densities-based Positive-Unlabeled Learning |
Authors | Dmitry Ivanov |
Abstract | Positive-Unlabeled Learning is an analog to supervised binary classification for the case when the negative (N) sample in the training set is noisy, i.e. is contaminated with latent instances of the positive (P) class and hence is unlabeled (U). The objective is to classify U, which requires to identify the mixing proportions of P and N in U first. Recently, unbiased Risk Estimation has been successfully applied to train classifiers on PU data, achieving state-of-the-art results (Kiryo et al., 2017). This approach, however, exhibits two major bottlenecks. First, the mixing proportions are assumed to be known in the domain or estimated with additional methods. Second, the approach relies on the classifier being a neural network. In this paper, we propose DEDPUL, a method that solves PU Learning without the aforementioned issues. The mechanism behind DEDPUL is to apply a computationally cheap post-processing procedure to predictions of any classifier trained to distinguish P from U. Instead of assuming the proportions to be identified, DEDPUL estimates them alongside with classifying U. Experiments show that DEDPUL outperforms the Risk Estimation approach when neural networks are used as classifiers. On some data sets, the performance can be improved even further if ensembles of trees are used as classifiers instead. At the same time, DEDPUL also outperforms the current state-of-the-art in Mixture Proportion Estimation, especially in the cases when P and U distributions are similar. |
Tasks | Density Estimation |
Published | 2019-02-19 |
URL | https://arxiv.org/abs/1902.06965v4 |
https://arxiv.org/pdf/1902.06965v4.pdf | |
PWC | https://paperswithcode.com/paper/dedpul-method-for-mixture-proportion |
Repo | https://github.com/dimonenka/DEDPUL |
Framework | pytorch |
ETNLP: a visual-aided systematic approach to select pre-trained embeddings for a downstream task
Title | ETNLP: a visual-aided systematic approach to select pre-trained embeddings for a downstream task |
Authors | Xuan-Son Vu, Thanh Vu, Son N. Tran, Lili Jiang |
Abstract | Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task. For extraction, we provide a method to extract subsets of the embeddings to be used in the downstream task. For evaluation, we analyse the quality of pre-trained embeddings using an input word analogy list. Finally, we visualize the word representations in the embedding space to explore the embedded words interactively. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https://github.com/vietnlp/etnlp. |
Tasks | Named Entity Recognition, Word Embeddings |
Published | 2019-03-11 |
URL | https://arxiv.org/abs/1903.04433v2 |
https://arxiv.org/pdf/1903.04433v2.pdf | |
PWC | https://paperswithcode.com/paper/etnlp-a-toolkit-for-extraction-evaluation-and |
Repo | https://github.com/vietnlp/etnlp |
Framework | none |
Relaxing Bijectivity Constraints with Continuously Indexed Normalising Flows
Title | Relaxing Bijectivity Constraints with Continuously Indexed Normalising Flows |
Authors | Rob Cornish, Anthony L. Caterini, George Deligiannidis, Arnaud Doucet |
Abstract | We show that Normalising Flows become pathological when used to model targets whose supports have complicated topologies. In this scenario, we prove that a flow must become arbitrarily numerically noninvertible in order to approximate the target closely. This result has implications for all flow-based models, and especially Residual Flows (ResFlows), which explicitly control the Lipschitz constant of the bijection used. To address this, we propose Continuously Indexed Flows (CIFs), which replace the single bijection used by normalising flows with a continuously indexed family of bijections, and which can intuitively “clean up” mass that would otherwise be misplaced by a single bijection. We show that CIFs can exactly match the support of a target even when its topology differs from the prior, and obtain empirically better performance for a variety of models and benchmarks. |
Tasks | Density Estimation, Normalising Flows |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13833v2 |
https://arxiv.org/pdf/1909.13833v2.pdf | |
PWC | https://paperswithcode.com/paper/localised-generative-flows-1 |
Repo | https://github.com/jrmcornish/lgf |
Framework | pytorch |
PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing
Title | PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing |
Authors | Hehe Fan, Yi Yang |
Abstract | In this paper, we introduce a Point Recurrent Neural Network (PointRNN) for moving point cloud processing. At each time step, PointRNN takes point coordinates $\boldsymbol{P} \in \mathbb{R}^{n \times 3}$ and point features $\boldsymbol{X} \in \mathbb{R}^{n \times d}$ as input ($n$ and $d$ denote the number of points and the number of feature channels, respectively). The state of PointRNN is composed of point coordinates $\boldsymbol{P}$ and point states $\boldsymbol{S} \in \mathbb{R}^{n \times d’}$ ($d'$ denotes the number of state channels). Similarly, the output of PointRNN is composed of $\boldsymbol{P}$ and new point features $\boldsymbol{Y} \in \mathbb{R}^{n \times d’'}$ ($d'‘$ denotes the number of new feature channels). Since point clouds are orderless, point features and states from two time steps can not be directly operated. Therefore, a point-based spatiotemporally-local correlation is adopted to aggregate point features and states according to point coordinates. We further propose two variants of PointRNN, i.e., Point Gated Recurrent Unit (PointGRU) and Point Long Short-Term Memory (PointLSTM). We apply PointRNN, PointGRU and PointLSTM to moving point cloud prediction, which aims to predict the future trajectories of points in a set given their history movements. Experimental results show that PointRNN, PointGRU and PointLSTM are able to produce correct predictions on both synthetic and real-world datasets, demonstrating their ability to model point cloud sequences. The code has been released at \url{https://github.com/hehefan/PointRNN}. |
Tasks | Moving Point Cloud Processing |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08287v2 |
https://arxiv.org/pdf/1910.08287v2.pdf | |
PWC | https://paperswithcode.com/paper/pointrnn-point-recurrent-neural-network-for |
Repo | https://github.com/hehefan/PointRNN-PyTorch |
Framework | pytorch |