January 29, 2020

3556 words 17 mins read

Paper Group ANR 603

RNN-based Generative Model for Fine-Grained Sketching. Joint shape learning and segmentation for medical images using a minimalistic deep network. NEMO: Future Object Localization Using Noisy Ego Priors. Credit Card Fraud Detection Using Autoencoder Neural Network. Learning Predicates as Functions to Enable Few-shot Scene Graph Prediction. Contextu …

RNN-based Generative Model for Fine-Grained Sketching


Title	RNN-based Generative Model for Fine-Grained Sketching
Authors	Andrin Jenal, Nikolay Savinov, Torsten Sattler, Gaurav Chaurasia
Abstract	Deep generative models have shown great promise when it comes to synthesising novel images. While they can generate images that look convincing on a higher-level, generating fine-grained details is still a challenge. In order to foster research on more powerful generative approaches, this paper proposes a novel task: generative modelling of 2D tree skeletons. Trees are an interesting shape class because they exhibit complexity and variations that are well-suited to measure the ability of a generative model to generated detailed structures. We propose a new dataset for this task and demonstrate that state-of-the-art generative models fail to synthesise realistic images on our benchmark, even though they perform well on current datasets like MNIST digits. Motivated by these results, we propose a novel network architecture based on combining a variational autoencoder using Recurrent Neural Networks and a convolutional discriminator. The network, error metrics and training procedure are adapted to the task of fine-grained sketching. Through quantitative and perceptual experiments, we show that our model outperforms previous work and that our dataset is a valuable benchmark for generative models. We will make our dataset publicly available.
Tasks
Published	2019-01-13
URL	http://arxiv.org/abs/1901.03991v1
PDF	http://arxiv.org/pdf/1901.03991v1.pdf
PWC	https://paperswithcode.com/paper/rnn-based-generative-model-for-fine-grained
Repo
Framework

Joint shape learning and segmentation for medical images using a minimalistic deep network


Title	Joint shape learning and segmentation for medical images using a minimalistic deep network
Authors	Balamurali Murugesan, Kaushik Sarveswaran, Sharath M Shankaranarayana, Keerthi Ram, Mohanasankar Sivaprakasam
Abstract	Recently, state-of-the-art results have been achieved in semantic segmentation using fully convolutional networks (FCNs). Most of these networks employ encoder-decoder style architecture similar to U-Net and are trained with images and the corresponding segmentation maps as a pixel-wise classification task. Such frameworks only exploit class information by using the ground truth segmentation maps. In this paper, we propose a multi-task learning framework with the main aim of exploiting structural and spatial information along with the class information. We modify the decoder part of the FCN to exploit class information and the structural information as well. We intend to do this while also keeping the parameters of the network as low as possible. We obtain the structural information using either of the two ways: i) using the contour map and ii) using the distance map, both of which can be obtained from ground truth segmentation maps with no additional annotation costs. We also explore different ways in which distance maps can be computed and study the effects of different distance maps on the segmentation performance. We also experiment extensively on two different medical image segmentation applications: i.e i) using color fundus images for optic disc and cup segmentation and ii) using endoscopic images for polyp segmentation. Through our experiments, we report results comparable to, and in some cases performing better than the current state-of-the-art architectures and with an order of 2x reduction in the number of parameters.
Tasks	Medical Image Segmentation, Multi-Task Learning, Semantic Segmentation
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08824v1
PDF	http://arxiv.org/pdf/1901.08824v1.pdf
PWC	https://paperswithcode.com/paper/joint-shape-learning-and-segmentation-for
Repo
Framework

NEMO: Future Object Localization Using Noisy Ego Priors


Title	NEMO: Future Object Localization Using Noisy Ego Priors
Authors	Srikanth Malla, Isht Dwivedi, Behzad Dariush, Chiho Choi
Abstract	Predicting the future trajectory of agents from visual observations is an important problem for realization of safe and effective navigation of autonomous systems in dynamic environments. This paper focuses on two important aspects of future trajectory forecast which are particularly relevant for mobile platforms: 1) modeling uncertainty of the predictions, particularly from egocentric views, where uncertainty in the interactive reactions and behaviors of other agents must consider the uncertainty in the ego-motion, and 2) modeling multi-modality nature of the problem, which are particularly prevalent at junctions in urban traffic scenes. To address these problems in a unified approach, we propose NEMO (Noisy Ego MOtion priors for future object localization) for future forecast of agents in the egocentric view. In the proposed approach, a predictive distribution of future forecast is jointly modeled with the uncertainty of predictions. For this, we divide the problem into two tasks: future ego-motion prediction and future object localization. We first model the multi-modal distribution of future ego-motion with uncertainty estimates. The resulting distribution of ego-behavior is used to sample multiple modes of future ego-motion. Then, each modality is used as a prior to understand the interactions between the ego-vehicle and target agent. We predict the multi-modal future locations of the target from individual modes of the ego-vehicle while modeling the uncertainty of the target’s behavior. To this end, we extensively evaluate the proposed framework using the publicly available benchmark dataset (HEV-I) supplemented with odometry data from an Inertial Measurement Unit (IMU).
Tasks	motion prediction, Object Localization
Published	2019-09-17
URL	https://arxiv.org/abs/1909.08150v2
PDF	https://arxiv.org/pdf/1909.08150v2.pdf
PWC	https://paperswithcode.com/paper/nemo-future-object-localization-using-noisy
Repo
Framework

Credit Card Fraud Detection Using Autoencoder Neural Network


Title	Credit Card Fraud Detection Using Autoencoder Neural Network
Authors	Junyi Zou, Jinliang Zhang, Ping Jiang
Abstract	Imbalanced data classification problem has always been a popular topic in the field of machine learning research. In order to balance the samples between majority and minority class. Oversampling algorithm is used to synthesize new minority class samples, but it could bring in noise. Pointing to the noise problems, this paper proposed a denoising autoencoder neural network (DAE) algorithm which can not only oversample minority class sample through misclassification cost, but it can denoise and classify the sampled dataset. Through experiments, compared with the denoising autoencoder neural network (DAE) with oversampling process and traditional fully connected neural networks, the results showed the proposed algorithm improves the classification accuracy of minority class of imbalanced datasets.
Tasks	Denoising, Fraud Detection
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11553v1
PDF	https://arxiv.org/pdf/1908.11553v1.pdf
PWC	https://paperswithcode.com/paper/credit-card-fraud-detection-using-autoencoder
Repo
Framework

Learning Predicates as Functions to Enable Few-shot Scene Graph Prediction


Title	Learning Predicates as Functions to Enable Few-shot Scene Graph Prediction
Authors	Apoorva Dornadula, Austin Narcomey, Ranjay Krishna, Michael Bernstein, Li Fei-Fei
Abstract	Scene graph prediction — classifying the set of objects and predicates in a visual scene — requires substantial training data. However, most predicates only occur a handful of times making them difficult to learn. We introduce the first scene graph prediction model that supports few-shot learning of predicates. Existing scene graph generation models represent objects using pretrained object detectors or word embeddings that capture semantic object information at the cost of encoding information about which relationships they afford. So, these object representations are unable to generalize to new few-shot relationships. We introduce a framework that induces object representations that are structured according to their visual relationships. Unlike past methods, our framework embeds objects that afford similar relationships closer together. This property allows our model to perform well in the few-shot setting. For example, applying the ‘riding’ predicate transformation to ‘person’ modifies the representation towards objects like ‘skateboard’ and ‘horse’ that enable riding. We generate object representations by learning predicates trained as message passing functions within a new graph convolution framework. The object representations are used to build few-shot predicate classifiers for rare predicates with as few as 1 labeled example. We achieve a 5-shot performance of 22.70 recall@50, a 3.7 increase when compared to strong transfer learning baselines.
Tasks	Few-Shot Learning, Graph Generation, Scene Graph Generation, Transfer Learning, Word Embeddings
Published	2019-06-12
URL	https://arxiv.org/abs/1906.04876v4
PDF	https://arxiv.org/pdf/1906.04876v4.pdf
PWC	https://paperswithcode.com/paper/visual-relationships-as-functions-enabling
Repo
Framework

Contextual Joint Factor Acoustic Embeddings


Title	Contextual Joint Factor Acoustic Embeddings
Authors	Yanpei Shi, Qiang Huang, Thomas Hain
Abstract	Embedding acoustic information into fixed length representations is of interest for a whole range of applications in speech and audio technology. We propose two novel unsupervised approaches to generate acoustic embeddings by modelling of acoustic context. The first approach is a contextual joint factor synthesis encoder, where the encoder in an encoder/decoder framework is trained to extract joint factors from surrounding audio frames to best generate the target output. The second approach is a contextual joint factor analysis encoder, where the encoder is trained to analyse joint factors from the source signal that correlates best with the neighbouring audio. To evaluate the effectiveness of our approaches compared to prior work, we chose two tasks - phone classification and speaker recognition - and test on different TIMIT data sets. Experimental results show that one of our proposed approaches outperforms phone classification baselines, yielding a classification accuracy of 74.1%. When using additional out-of-domain data for training, an additional 2-3% improvements can be obtained, for both for phone classification and speaker recognition tasks.
Tasks	Speaker Recognition
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07601v1
PDF	https://arxiv.org/pdf/1910.07601v1.pdf
PWC	https://paperswithcode.com/paper/contextual-joint-factor-acoustic-embeddings
Repo
Framework

The Noise Collector for sparse recovery in high dimensions


Title	The Noise Collector for sparse recovery in high dimensions
Authors	Miguel Moscoso, Alexei Novikov, George Papanicolaou, Chrysoula Tsogka
Abstract	The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system $A \rho = b_0$ can be found efficiently with an $l_1$-norm minimization approach if the data is noiseless. Detection of the signal’s support from data corrupted by noise is still a challenging problem, especially if the level of noise must be estimated. We propose a new efficient approach that does not require any parameter estimation. We introduce the Noise Collector (NC) matrix $C$ and solve an augmented system $A \rho + C \eta = b_0 + e$, where $ e$ is the noise. We show that the $l_1$-norm minimal solution of the augmented system has zero false discovery rate for any level of noise and with probability that tends to one as the dimension of $ b_0$ increases to infinity. We also obtain exact support recovery if the noise is not too large, and develop a Fast Noise Collector Algorithm which makes the computational cost of solving the augmented system comparable to that of the original one. Finally, we demonstrate the effectiveness of the method in applications to passive array imaging.
Tasks
Published	2019-08-05
URL	https://arxiv.org/abs/1908.04412v1
PDF	https://arxiv.org/pdf/1908.04412v1.pdf
PWC	https://paperswithcode.com/paper/the-noise-collector-for-sparse-recovery-in
Repo
Framework

R2D2: Repeatable and Reliable Detector and Descriptor


Title	R2D2: Repeatable and Reliable Detector and Descriptor
Authors	Jerome Revaud, Philippe Weinzaepfel, César De Souza, Noe Pion, Gabriela Csurka, Yohann Cabon, Martin Humenberger
Abstract	Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical methods for these tasks are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught up with these techniques, focusing on learning repeatable saliency maps for keypoint detection and learning descriptors at the detected keypoint locations. In this work, we argue that salient regions are not necessarily discriminative, and therefore can harm the performance of the description. Furthermore, we claim that descriptors should be learned only in regions for which matching can be performed with high confidence. We thus propose to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness. This allows us to avoid ambiguous areas and leads to reliable keypoint detections and descriptions. Our detection-and-description approach, trained with self-supervision, can simultaneously output sparse, repeatable and reliable keypoints that outperforms state-of-the-art detectors and descriptors on the HPatches dataset. It also establishes a record on the recently released Aachen Day-Night localization dataset.
Tasks	Interest Point Detection, Keypoint Detection, Metric Learning
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06195v2
PDF	https://arxiv.org/pdf/1906.06195v2.pdf
PWC	https://paperswithcode.com/paper/r2d2-reliable-and-repeatable-detectors-and
Repo
Framework

Learning Variations in Human Motion via Mix-and-Match Perturbation


Title	Learning Variations in Human Motion via Mix-and-Match Perturbation
Authors	Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Lars Petersson, Stephen Gould, Amirhossein Habibian
Abstract	Human motion prediction is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. In this paper, we introduce an approach to stochastically combine the root of variations with previous pose information, which forces the model to take the noise into account. We exploit this idea for motion prediction by incorporating it into a recurrent encoder-decoder network with a conditional variational autoencoder block that learns to exploit the perturbations. Our experiments demonstrate that our model yields high-quality pose sequences that are much more diverse than those from state-of-the-art stochastic motion prediction techniques.
Tasks	motion prediction
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00733v2
PDF	https://arxiv.org/pdf/1908.00733v2.pdf
PWC	https://paperswithcode.com/paper/learning-variations-in-human-motion-via-mix
Repo
Framework

Part-based Multi-stream Model for Vehicle Searching


Title	Part-based Multi-stream Model for Vehicle Searching
Authors	Ya Sun, Minxian Li, Jianfeng Lu
Abstract	Due to the enormous requirement in public security and intelligent transportation system, searching an identical vehicle has become more and more important. Current studies usually treat vehicle as an integral object and then train a distance metric to measure the similarity among vehicles. However, these raw images may be exactly similar to ones with different identification and include some pixels in background that may disturb the distance metric learning. In this paper, we propose a novel and useful method to segment an original vehicle image into several discriminative foreground parts, and these parts consist of some fine grained regions that are named discriminative patches. After that, these parts combined with the raw image are fed into the proposed deep learning network. We can easily measure the similarity of two vehicle images by computing the Euclidean distance of the features from FC layer. Two main contributions of this paper are as follows. Firstly, a method is proposed to estimate if a patch in a raw vehicle image is discriminative or not. Secondly, a new Part-based Multi-Stream Model (PMSM) is designed and optimized for vehicle retrieval and re-identification tasks. We evaluate the proposed method on the VehicleID dataset, and the experimental results show that our method can outperform the baseline.
Tasks	Metric Learning
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04144v1
PDF	https://arxiv.org/pdf/1911.04144v1.pdf
PWC	https://paperswithcode.com/paper/part-based-multi-stream-model-for-vehicle
Repo
Framework

Multitask Learning For Different Subword Segmentations In Neural Machine Translation


Title	Multitask Learning For Different Subword Segmentations In Neural Machine Translation
Authors	Tejas Srinivasan, Ramon Sanabria, Florian Metze
Abstract	In Neural Machine Translation (NMT) the usage of subwords and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words. However, selecting the optimal subword segmentation involves a trade-off between expressiveness and flexibility, and is language and dataset-dependent. We present Block Multitask Learning (BMTL), a novel NMT architecture that predicts multiple targets of different granularities simultaneously, removing the need to search for the optimal segmentation strategy. Our multi-task model exhibits improvements of up to 1.7 BLEU points on each decoder over single-task baseline models with the same number of parameters on datasets from two language pairs of IWSLT15 and one from IWSLT19. The multiple hypotheses generated at different granularities can be combined as a post-processing step to give better translations, which improves over hypothesis combination from baseline models while using substantially fewer parameters.
Tasks	Machine Translation
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12368v1
PDF	https://arxiv.org/pdf/1910.12368v1.pdf
PWC	https://paperswithcode.com/paper/multitask-learning-for-different-subword
Repo
Framework

Deep Kinematic Models for Physically Realistic Prediction of Vehicle Trajectories


Title	Deep Kinematic Models for Physically Realistic Prediction of Vehicle Trajectories
Authors	Henggang Cui, Thi Nguyen, Fang-Chieh Chou, Tsung-Han Lin, Jeff Schneider, David Bradley, Nemanja Djuric
Abstract	Self-driving vehicles (SDVs) hold great potential for improving traffic safety and are poised to positively affect the quality of life of millions of people. To unlock this potential one of the critical aspects of the autonomous technology is understanding and predicting future movement of vehicles surrounding the SDV. This work presents a deep-learning-based method for kinematically feasible motion prediction of such traffic actors. Previous work did not explicitly encode vehicle kinematics and instead relied on the models to learn the constraints directly from the data, potentially resulting in kinematically infeasible, suboptimal trajectory predictions. To address this issue we propose a method that seamlessly combines ideas from the AI with physically grounded vehicle motion models. In this way we employ best of the both worlds, coupling powerful learning models with strong feasibility guarantees for their outputs. The proposed approach is general, being applicable to any type of learning method. Extensive experiments using deep convnets on real-world data strongly indicate its benefits, outperforming the existing state-of-the-art.
Tasks	motion prediction
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00219v2
PDF	https://arxiv.org/pdf/1908.00219v2.pdf
PWC	https://paperswithcode.com/paper/deep-kinematic-models-for-physically
Repo
Framework

SLAM-based Integrity Monitoring Using GPS and Fish-eye Camera


Title	SLAM-based Integrity Monitoring Using GPS and Fish-eye Camera
Authors	Sriramya Bhamidipati, Grace Xingxin Gao
Abstract	Urban navigation using GPS and fish-eye camera suffers from multipath effects in GPS measurements and data association errors in pixel intensities across image frames. We propose a Simultaneous Localization and Mapping (SLAM)-based Integrity Monitoring (IM) algorithm to compute the position protection levels while accounting for multiple faults in both GPS and vision. We perform graph optimization using the sequential data of GPS pseudoranges, pixel intensities, vehicle dynamics, and satellite ephemeris to simultaneously localize the vehicle as well as the landmarks, namely GPS satellites and key image pixels in the world frame. We estimate the fault mode vector by analyzing the temporal correlation across the GPS measurement residuals and spatial correlation across the vision intensity residuals. In particular, to detect and isolate the vision faults, we developed a superpixel-based piecewise Random Sample Consensus (RANSAC) technique to perform spatial voting across image pixels. For an estimated fault mode, we compute the protection levels by applying worst-case failure slope analysis to the linearized Graph-SLAM framework. We perform ground vehicle experiments in the semi-urban area of Champaign, IL and have demonstrated the successful detection and isolation of multiple faults. We also validate tighter protection levels and lower localization errors achieved via the proposed algorithm as compared to SLAM-based IM that utilizes only GPS measurements.
Tasks	Simultaneous Localization and Mapping
Published	2019-10-04
URL	https://arxiv.org/abs/1910.02165v1
PDF	https://arxiv.org/pdf/1910.02165v1.pdf
PWC	https://paperswithcode.com/paper/slam-based-integrity-monitoring-using-gps-and
Repo
Framework

Embeddings for DNN speaker adaptive training


Title	Embeddings for DNN speaker adaptive training
Authors	Joanna Rownicka, Peter Bell, Steve Renals
Abstract	In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker. DNN-SAT can be viewed as learning a mapping from each embedding to transformation parameters that are applied to the shared parameters of the DNN. We investigate different approaches to applying these transformations, and find that with a good training strategy, a multi-layer adaptation network applied to all hidden layers is no more effective than a single linear layer acting on the embeddings to transform the input features. In the second part of our work, we evaluate different embeddings (i-vectors, x-vectors and deep CNN embeddings) in an additional speaker recognition task in order to gain insight into what should characterize an embedding for DNN-SAT. We find the performance for speaker recognition of a given representation is not correlated with its ASR performance; in fact, ability to capture more speech attributes than just speaker identity was the most important characteristic of the embeddings for efficient DNN-SAT ASR. Our best models achieved relative WER gains of 4% and 9% over DNN baselines using speaker-level cepstral mean normalisation (CMN), and a fully speaker-independent model, respectively.
Tasks	Speaker Recognition
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13537v1
PDF	https://arxiv.org/pdf/1909.13537v1.pdf
PWC	https://paperswithcode.com/paper/embeddings-for-dnn-speaker-adaptive-training
Repo
Framework

How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition


Title	How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition
Authors	Christine M. Anderson-Cook, Kary L. Myers, Lu Lu, Michael L. Fugate, Kevin R. Quinlan, Norma Pawley
Abstract	Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.
Tasks
Published	2019-01-16
URL	http://arxiv.org/abs/1901.05356v1
PDF	http://arxiv.org/pdf/1901.05356v1.pdf
PWC	https://paperswithcode.com/paper/how-to-host-a-data-competition-statistical
Repo
Framework