February 1, 2020

3201 words 16 mins read

Paper Group AWR 252

MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences. SENSE: a Shared Encoder Network for Scene-flow Estimation. Improving the generalizability of convolutional neural network-based segmentation on CMR images. Attenuating Bias in Word Vectors. An optical diffractive deep neural network with multiple frequency-channels. Learning Exploration …

MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences


Title	MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences
Authors	Xingyu Liu, Mengyuan Yan, Jeannette Bohg
Abstract	Understanding dynamic 3D environment is crucial for robotic agents and many other applications. We propose a novel neural network architecture called $MeteorNet$ for learning representations for dynamic 3D point cloud sequences. Different from previous work that adopts a grid-based representation and applies 3D or 4D convolutions, our network directly processes point clouds. We propose two ways to construct spatiotemporal neighborhoods for each point in the point cloud sequence. Information from these neighborhoods is aggregated to learn features per point. We benchmark our network on a variety of 3D recognition tasks including action recognition, semantic segmentation and scene flow estimation. MeteorNet shows stronger performance than previous grid-based methods while achieving state-of-the-art performance on Synthia. MeteorNet also outperforms previous baseline methods that are able to process at most two consecutive point clouds. To the best of our knowledge, this is the first work on deep learning for dynamic raw point cloud sequences.
Tasks	Scene Flow Estimation, Semantic Segmentation
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09165v2
PDF	https://arxiv.org/pdf/1910.09165v2.pdf
PWC	https://paperswithcode.com/paper/meteornet-deep-learning-on-dynamic-3d-point
Repo	https://github.com/xingyul/meteornet
Framework	tf

SENSE: a Shared Encoder Network for Scene-flow Estimation


Title	SENSE: a Shared Encoder Network for Scene-flow Estimation
Authors	Huaizu Jiang, Deqing Sun, Varun Jampani, Zhaoyang Lv, Erik Learned-Miller, Jan Kautz
Abstract	We introduce a compact network for holistic scene flow estimation, called SENSE, which shares common encoder features among four closely-related tasks: optical flow estimation, disparity estimation from stereo, occlusion estimation, and semantic segmentation. Our key insight is that sharing features makes the network more compact, induces better feature representations, and can better exploit interactions among these tasks to handle partially labeled data. With a shared encoder, we can flexibly add decoders for different tasks during training. This modular design leads to a compact and efficient model at inference time. Exploiting the interactions among these tasks allows us to introduce distillation and self-supervised losses in addition to supervised losses, which can better handle partially labeled real-world data. SENSE achieves state-of-the-art results on several optical flow benchmarks and runs as fast as networks specifically designed for optical flow. It also compares favorably against the state of the art on stereo and scene flow, while consuming much less memory.
Tasks	Disparity Estimation, Optical Flow Estimation, Scene Flow Estimation, Semantic Segmentation
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12361v1
PDF	https://arxiv.org/pdf/1910.12361v1.pdf
PWC	https://paperswithcode.com/paper/sense-a-shared-encoder-network-for-scene-flow-1
Repo	https://github.com/NVlabs/SENSE
Framework	none

Improving the generalizability of convolutional neural network-based segmentation on CMR images


Title	Improving the generalizability of convolutional neural network-based segmentation on CMR images
Authors	Chen Chen, Wenjia Bai, Rhodri H. Davies, Anish N. Bhuva, Charlotte Manisty, James C. Moon, Nay Aung, Aaron M. Lee, Mihir M. Sanghvi, Kenneth Fung, Jose Miguel Paiva, Steffen E. Petersen, Elena Lukaschuk, Stefan K. Piechnik, Stefan Neubauer, Daniel Rueckert
Abstract	Convolutional neural network (CNN) based segmentation methods provide an efficient and automated way for clinicians to assess the structure and function of the heart in cardiac MR images. While CNNs can generally perform the segmentation tasks with high accuracy when training and test images come from the same domain (e.g. same scanner or site), their performance often degrades dramatically on images from different scanners or clinical sites. We propose a simple yet effective way for improving the network generalization ability by carefully designing data normalization and augmentation strategies to accommodate common scenarios in multi-site, multi-scanner clinical imaging data sets. We demonstrate that a neural network trained on a single-site single-scanner dataset from the UK Biobank can be successfully applied to segmenting cardiac MR images across different sites and different scanners without substantial loss of accuracy. Specifically, the method was trained on a large set of 3,975 subjects from the UK Biobank. It was then directly tested on 600 different subjects from the UK Biobank for intra-domain testing and two other sets for cross-domain testing: the ACDC dataset (100 subjects, 1 site, 2 scanners) and the BSCMR-AS dataset (599 subjects, 6 sites, 9 scanners). The proposed method produces promising segmentation results on the UK Biobank test set which are comparable to previously reported values in the literature, while also performing well on cross-domain test sets, achieving a mean Dice metric of 0.90 for the left ventricle, 0.81 for the myocardium and 0.82 for the right ventricle on the ACDC dataset; and 0.89 for the left ventricle, 0.83 for the myocardium on the BSCMR-AS dataset. The proposed method offers a potential solution to improve CNN-based model generalizability for the cross-scanner and cross-site cardiac MR image segmentation task.
Tasks	Semantic Segmentation
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01268v2
PDF	https://arxiv.org/pdf/1907.01268v2.pdf
PWC	https://paperswithcode.com/paper/improving-the-generalizability-of
Repo	https://github.com/cherise215/CardiacMRSegmentation
Framework	none

Attenuating Bias in Word Vectors


Title	Attenuating Bias in Word Vectors
Authors	Sunipa Dev, Jeff Phillips
Abstract	Word vector representations are well developed tools for various NLP and Machine Learning tasks and are known to retain significant semantic and syntactic structure of languages. But they are prone to carrying and amplifying bias which can perpetrate discrimination in various applications. In this work, we explore new simple ways to detect the most stereotypically gendered words in an embedding and remove the bias from them. We verify how names are masked carriers of gender bias and then use that as a tool to attenuate bias in embeddings. Further, we extend this property of names to show how names can be used to detect other types of bias in the embeddings such as bias based on race, ethnicity, and age.
Tasks
Published	2019-01-23
URL	http://arxiv.org/abs/1901.07656v1
PDF	http://arxiv.org/pdf/1901.07656v1.pdf
PWC	https://paperswithcode.com/paper/attenuating-bias-in-word-vectors
Repo	https://github.com/sunipa/Attenuating-Bias-in-Word-Vec
Framework	none

An optical diffractive deep neural network with multiple frequency-channels


Title	An optical diffractive deep neural network with multiple frequency-channels
Authors	Yingshi Chen, Jinfeng Zhu
Abstract	Diffractive deep neural network (DNNet) is a novel machine learning framework on the modulation of optical transmission. Diffractive network would get predictions at the speed of light. It’s pure passive architecture, no additional power consumption. We improved the accuracy of diffractive network with optical waves at different frequency. Each layers have multiple frequency-channels (optical distributions at different frequency). These channels are merged at the output plane to get final output. The experiment in the fasion-MNIST and EMNIST datasets showed multiple frequency-channels would increase the accuracy a lot. We also give detailed analysis to show the difference between DNNet and MLP. The modulation process in DNNet is actually optical activation function. We develop an open source package ONNet. The source codes are available at https://github.com/closest-git/ONNet.
Tasks
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10730v1
PDF	https://arxiv.org/pdf/1912.10730v1.pdf
PWC	https://paperswithcode.com/paper/an-optical-diffractive-deep-neural-network
Repo	https://github.com/closest-git/ONNet
Framework	pytorch


Title	Learning Exploration Policies for Navigation
Authors	Tao Chen, Saurabh Gupta, Abhinav Gupta
Abstract	Numerous past works have tackled the problem of task-driven navigation. But, how to effectively explore a new environment to enable a variety of down-stream tasks has received much less attention. In this work, we study how agents can autonomously explore realistic and complex 3D environments without the context of task-rewards. We propose a learning-based approach and investigate different policy architectures, reward functions, and training paradigms. We find that the use of policies with spatial memory that are bootstrapped with imitation learning and finally finetuned with coverage rewards derived purely from on-board sensors can be effective at exploring novel environments. We show that our learned exploration policies can explore better than classical approaches based on geometry alone and generic learning-based exploration techniques. Finally, we also show how such task-agnostic exploration can be used for down-stream tasks. Code and Videos are available at: https://sites.google.com/view/exploration-for-nav.
Tasks	Imitation Learning
Published	2019-03-05
URL	http://arxiv.org/abs/1903.01959v1
PDF	http://arxiv.org/pdf/1903.01959v1.pdf
PWC	https://paperswithcode.com/paper/learning-exploration-policies-for-navigation
Repo	https://github.com/s-gupta/map-plan-baseline
Framework	none

FSPool: Learning Set Representations with Featurewise Sort Pooling


Title	FSPool: Learning Set Representations with Featurewise Sort Pooling
Authors	Yan Zhang, Jonathon Hare, Adam Prügel-Bennett
Abstract	Traditional set prediction models can struggle with simple datasets due to an issue we call the responsibility problem. We introduce a pooling method for sets of feature vectors based on sorting features across elements of the set. This can be used to construct a permutation-equivariant auto-encoder that avoids this responsibility problem. On a toy dataset of polygons and a set version of MNIST, we show that such an auto-encoder produces considerably better reconstructions and representations. Replacing the pooling function in existing set encoders with FSPool improves accuracy and convergence speed on a variety of datasets.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02795v3
PDF	https://arxiv.org/pdf/1906.02795v3.pdf
PWC	https://paperswithcode.com/paper/fspool-learning-set-representations-with
Repo	https://github.com/Cyanogenoid/fspool
Framework	pytorch

Do Massively Pretrained Language Models Make Better Storytellers?


Title	Do Massively Pretrained Language Models Make Better Storytellers?
Authors	Abigail See, Aneesh Pappu, Rohun Saxena, Akhila Yerukola, Christopher D. Manning
Abstract	Large neural language models trained on massive amounts of text have emerged as a formidable strategy for Natural Language Understanding tasks. However, the strength of these models as Natural Language Generators is less clear. Though anecdotal evidence suggests that these models generate better quality text, there has been no detailed study characterizing their generation abilities. In this work, we compare the performance of an extensively pretrained model, OpenAI GPT2-117 (Radford et al., 2019), to a state-of-the-art neural story generation model (Fan et al., 2018). By evaluating the generated text across a wide variety of automatic metrics, we characterize the ways in which pretrained models do, and do not, make better storytellers. We find that although GPT2-117 conditions more strongly on context, is more sensitive to ordering of events, and uses more unusual words, it is just as likely to produce repetitive and under-diverse text when using likelihood-maximizing decoding algorithms.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10705v1
PDF	https://arxiv.org/pdf/1909.10705v1.pdf
PWC	https://paperswithcode.com/paper/do-massively-pretrained-language-models-make
Repo	https://github.com/abisee/story-generation-eval
Framework	pytorch

Improving Collaborative Metric Learning with Efficient Negative Sampling


Title	Improving Collaborative Metric Learning with Efficient Negative Sampling
Authors	Viet-Anh Tran, Romain Hennequin, Jimena Royo-Letelier, Manuel Moussallam
Abstract	Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the CML model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets.
Tasks	Face Recognition, Image Retrieval, Metric Learning
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10912v1
PDF	https://arxiv.org/pdf/1909.10912v1.pdf
PWC	https://paperswithcode.com/paper/improving-collaborative-metric-learning-with
Repo	https://github.com/deezer/sigir2019-2stagesampling
Framework	tf

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations


Title	Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations
Authors	Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt
Abstract	Natural-gradient methods enable fast and simple algorithms for variational inference, but due to computational difficulties, their use is mostly limited to \emph{minimal} exponential-family (EF) approximations. In this paper, we extend their application to estimate \emph{structured} approximations such as mixtures of EF distributions. Such approximations can fit complex, multimodal posterior distributions and are generally more accurate than unimodal EF approximations. By using a \emph{minimal conditional-EF} representation of such approximations, we derive simple natural-gradient updates. Our empirical results demonstrate a faster convergence of our natural-gradient method compared to black-box gradient-based methods with reparameterization gradients. Our work expands the scope of natural gradients for Bayesian inference and makes them more widely applicable than before.
Tasks	Bayesian Inference
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02914v2
PDF	https://arxiv.org/pdf/1906.02914v2.pdf
PWC	https://paperswithcode.com/paper/fast-and-simple-natural-gradient-variational
Repo	https://github.com/yorkerlin/VB-MixEF
Framework	none

Nasal Patches and Curves for Expression-robust 3D Face Recognition


Title	Nasal Patches and Curves for Expression-robust 3D Face Recognition
Authors	Mehryar Emambakhsh, Adrian Evans
Abstract	The potential of the nasal region for expression robust 3D face recognition is thoroughly investigated by a novel five-step algorithm. First, the nose tip location is coarsely detected and the face is segmented, aligned and the nasal region cropped. Then, a very accurate and consistent nasal landmarking algorithm detects seven keypoints on the nasal region. In the third step, a feature extraction algorithm based on the surface normals of Gabor-wavelet filtered depth maps is utilised and, then, a set of spherical patches and curves are localised over the nasal region to provide the feature descriptors. The last step applies a genetic algorithm-based feature selector to detect the most stable patches and curves over different facial expressions. The algorithm provides the highest reported nasal region-based recognition ranks on the FRGC, Bosphorus and BU-3DFE datasets. The results are comparable with, and in many cases better than, many state-of-the-art 3D face recognition algorithms, which use the whole facial domain. The proposed method does not rely on sophisticated alignment or denoising steps, is very robust when only one sample per subject is used in the gallery, and does not require a training step for the landmarking algorithm. https://github.com/mehryaragha/NoseBiometrics
Tasks	Denoising, Face Recognition
Published	2019-01-01
URL	http://arxiv.org/abs/1901.00206v1
PDF	http://arxiv.org/pdf/1901.00206v1.pdf
PWC	https://paperswithcode.com/paper/nasal-patches-and-curves-for-expression
Repo	https://github.com/mehryaragha/NoseBiometrics
Framework	none

DEDPUL: Difference-of-Estimated-Densities-based Positive-Unlabeled Learning


Title	DEDPUL: Difference-of-Estimated-Densities-based Positive-Unlabeled Learning
Authors	Dmitry Ivanov
Abstract	Positive-Unlabeled Learning is an analog to supervised binary classification for the case when the negative (N) sample in the training set is noisy, i.e. is contaminated with latent instances of the positive (P) class and hence is unlabeled (U). The objective is to classify U, which requires to identify the mixing proportions of P and N in U first. Recently, unbiased Risk Estimation has been successfully applied to train classifiers on PU data, achieving state-of-the-art results (Kiryo et al., 2017). This approach, however, exhibits two major bottlenecks. First, the mixing proportions are assumed to be known in the domain or estimated with additional methods. Second, the approach relies on the classifier being a neural network. In this paper, we propose DEDPUL, a method that solves PU Learning without the aforementioned issues. The mechanism behind DEDPUL is to apply a computationally cheap post-processing procedure to predictions of any classifier trained to distinguish P from U. Instead of assuming the proportions to be identified, DEDPUL estimates them alongside with classifying U. Experiments show that DEDPUL outperforms the Risk Estimation approach when neural networks are used as classifiers. On some data sets, the performance can be improved even further if ensembles of trees are used as classifiers instead. At the same time, DEDPUL also outperforms the current state-of-the-art in Mixture Proportion Estimation, especially in the cases when P and U distributions are similar.
Tasks	Density Estimation
Published	2019-02-19
URL	https://arxiv.org/abs/1902.06965v4
PDF	https://arxiv.org/pdf/1902.06965v4.pdf
PWC	https://paperswithcode.com/paper/dedpul-method-for-mixture-proportion
Repo	https://github.com/dimonenka/DEDPUL
Framework	pytorch

ETNLP: a visual-aided systematic approach to select pre-trained embeddings for a downstream task


Title	ETNLP: a visual-aided systematic approach to select pre-trained embeddings for a downstream task
Authors	Xuan-Son Vu, Thanh Vu, Son N. Tran, Lili Jiang
Abstract	Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task. For extraction, we provide a method to extract subsets of the embeddings to be used in the downstream task. For evaluation, we analyse the quality of pre-trained embeddings using an input word analogy list. Finally, we visualize the word representations in the embedding space to explore the embedded words interactively. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https://github.com/vietnlp/etnlp.
Tasks	Named Entity Recognition, Word Embeddings
Published	2019-03-11
URL	https://arxiv.org/abs/1903.04433v2
PDF	https://arxiv.org/pdf/1903.04433v2.pdf
PWC	https://paperswithcode.com/paper/etnlp-a-toolkit-for-extraction-evaluation-and
Repo	https://github.com/vietnlp/etnlp
Framework	none

Relaxing Bijectivity Constraints with Continuously Indexed Normalising Flows


Title	Relaxing Bijectivity Constraints with Continuously Indexed Normalising Flows
Authors	Rob Cornish, Anthony L. Caterini, George Deligiannidis, Arnaud Doucet
Abstract	We show that Normalising Flows become pathological when used to model targets whose supports have complicated topologies. In this scenario, we prove that a flow must become arbitrarily numerically noninvertible in order to approximate the target closely. This result has implications for all flow-based models, and especially Residual Flows (ResFlows), which explicitly control the Lipschitz constant of the bijection used. To address this, we propose Continuously Indexed Flows (CIFs), which replace the single bijection used by normalising flows with a continuously indexed family of bijections, and which can intuitively “clean up” mass that would otherwise be misplaced by a single bijection. We show that CIFs can exactly match the support of a target even when its topology differs from the prior, and obtain empirically better performance for a variety of models and benchmarks.
Tasks	Density Estimation, Normalising Flows
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13833v2
PDF	https://arxiv.org/pdf/1909.13833v2.pdf
PWC	https://paperswithcode.com/paper/localised-generative-flows-1
Repo	https://github.com/jrmcornish/lgf
Framework	pytorch

PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing


Title	PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing
Authors	Hehe Fan, Yi Yang
Abstract	In this paper, we introduce a Point Recurrent Neural Network (PointRNN) for moving point cloud processing. At each time step, PointRNN takes point coordinates $\boldsymbol{P} \in \mathbb{R}^{n \times 3}$ and point features $\boldsymbol{X} \in \mathbb{R}^{n \times d}$ as input ($n$ and $d$ denote the number of points and the number of feature channels, respectively). The state of PointRNN is composed of point coordinates $\boldsymbol{P}$ and point states $\boldsymbol{S} \in \mathbb{R}^{n \times d’}$ ($d'$ denotes the number of state channels). Similarly, the output of PointRNN is composed of $\boldsymbol{P}$ and new point features $\boldsymbol{Y} \in \mathbb{R}^{n \times d’'}$ ($d'‘$ denotes the number of new feature channels). Since point clouds are orderless, point features and states from two time steps can not be directly operated. Therefore, a point-based spatiotemporally-local correlation is adopted to aggregate point features and states according to point coordinates. We further propose two variants of PointRNN, i.e., Point Gated Recurrent Unit (PointGRU) and Point Long Short-Term Memory (PointLSTM). We apply PointRNN, PointGRU and PointLSTM to moving point cloud prediction, which aims to predict the future trajectories of points in a set given their history movements. Experimental results show that PointRNN, PointGRU and PointLSTM are able to produce correct predictions on both synthetic and real-world datasets, demonstrating their ability to model point cloud sequences. The code has been released at \url{https://github.com/hehefan/PointRNN}.
Tasks	Moving Point Cloud Processing
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08287v2
PDF	https://arxiv.org/pdf/1910.08287v2.pdf
PWC	https://paperswithcode.com/paper/pointrnn-point-recurrent-neural-network-for
Repo	https://github.com/hehefan/PointRNN-PyTorch
Framework	pytorch