Paper Group ANR 603
RNN-based Generative Model for Fine-Grained Sketching. Joint shape learning and segmentation for medical images using a minimalistic deep network. NEMO: Future Object Localization Using Noisy Ego Priors. Credit Card Fraud Detection Using Autoencoder Neural Network. Learning Predicates as Functions to Enable Few-shot Scene Graph Prediction. Contextu …
RNN-based Generative Model for Fine-Grained Sketching
Title | RNN-based Generative Model for Fine-Grained Sketching |
Authors | Andrin Jenal, Nikolay Savinov, Torsten Sattler, Gaurav Chaurasia |
Abstract | Deep generative models have shown great promise when it comes to synthesising novel images. While they can generate images that look convincing on a higher-level, generating fine-grained details is still a challenge. In order to foster research on more powerful generative approaches, this paper proposes a novel task: generative modelling of 2D tree skeletons. Trees are an interesting shape class because they exhibit complexity and variations that are well-suited to measure the ability of a generative model to generated detailed structures. We propose a new dataset for this task and demonstrate that state-of-the-art generative models fail to synthesise realistic images on our benchmark, even though they perform well on current datasets like MNIST digits. Motivated by these results, we propose a novel network architecture based on combining a variational autoencoder using Recurrent Neural Networks and a convolutional discriminator. The network, error metrics and training procedure are adapted to the task of fine-grained sketching. Through quantitative and perceptual experiments, we show that our model outperforms previous work and that our dataset is a valuable benchmark for generative models. We will make our dataset publicly available. |
Tasks | |
Published | 2019-01-13 |
URL | http://arxiv.org/abs/1901.03991v1 |
http://arxiv.org/pdf/1901.03991v1.pdf | |
PWC | https://paperswithcode.com/paper/rnn-based-generative-model-for-fine-grained |
Repo | |
Framework | |
Joint shape learning and segmentation for medical images using a minimalistic deep network
Title | Joint shape learning and segmentation for medical images using a minimalistic deep network |
Authors | Balamurali Murugesan, Kaushik Sarveswaran, Sharath M Shankaranarayana, Keerthi Ram, Mohanasankar Sivaprakasam |
Abstract | Recently, state-of-the-art results have been achieved in semantic segmentation using fully convolutional networks (FCNs). Most of these networks employ encoder-decoder style architecture similar to U-Net and are trained with images and the corresponding segmentation maps as a pixel-wise classification task. Such frameworks only exploit class information by using the ground truth segmentation maps. In this paper, we propose a multi-task learning framework with the main aim of exploiting structural and spatial information along with the class information. We modify the decoder part of the FCN to exploit class information and the structural information as well. We intend to do this while also keeping the parameters of the network as low as possible. We obtain the structural information using either of the two ways: i) using the contour map and ii) using the distance map, both of which can be obtained from ground truth segmentation maps with no additional annotation costs. We also explore different ways in which distance maps can be computed and study the effects of different distance maps on the segmentation performance. We also experiment extensively on two different medical image segmentation applications: i.e i) using color fundus images for optic disc and cup segmentation and ii) using endoscopic images for polyp segmentation. Through our experiments, we report results comparable to, and in some cases performing better than the current state-of-the-art architectures and with an order of 2x reduction in the number of parameters. |
Tasks | Medical Image Segmentation, Multi-Task Learning, Semantic Segmentation |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.08824v1 |
http://arxiv.org/pdf/1901.08824v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-shape-learning-and-segmentation-for |
Repo | |
Framework | |
NEMO: Future Object Localization Using Noisy Ego Priors
Title | NEMO: Future Object Localization Using Noisy Ego Priors |
Authors | Srikanth Malla, Isht Dwivedi, Behzad Dariush, Chiho Choi |
Abstract | Predicting the future trajectory of agents from visual observations is an important problem for realization of safe and effective navigation of autonomous systems in dynamic environments. This paper focuses on two important aspects of future trajectory forecast which are particularly relevant for mobile platforms: 1) modeling uncertainty of the predictions, particularly from egocentric views, where uncertainty in the interactive reactions and behaviors of other agents must consider the uncertainty in the ego-motion, and 2) modeling multi-modality nature of the problem, which are particularly prevalent at junctions in urban traffic scenes. To address these problems in a unified approach, we propose NEMO (Noisy Ego MOtion priors for future object localization) for future forecast of agents in the egocentric view. In the proposed approach, a predictive distribution of future forecast is jointly modeled with the uncertainty of predictions. For this, we divide the problem into two tasks: future ego-motion prediction and future object localization. We first model the multi-modal distribution of future ego-motion with uncertainty estimates. The resulting distribution of ego-behavior is used to sample multiple modes of future ego-motion. Then, each modality is used as a prior to understand the interactions between the ego-vehicle and target agent. We predict the multi-modal future locations of the target from individual modes of the ego-vehicle while modeling the uncertainty of the target’s behavior. To this end, we extensively evaluate the proposed framework using the publicly available benchmark dataset (HEV-I) supplemented with odometry data from an Inertial Measurement Unit (IMU). |
Tasks | motion prediction, Object Localization |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.08150v2 |
https://arxiv.org/pdf/1909.08150v2.pdf | |
PWC | https://paperswithcode.com/paper/nemo-future-object-localization-using-noisy |
Repo | |
Framework | |
Credit Card Fraud Detection Using Autoencoder Neural Network
Title | Credit Card Fraud Detection Using Autoencoder Neural Network |
Authors | Junyi Zou, Jinliang Zhang, Ping Jiang |
Abstract | Imbalanced data classification problem has always been a popular topic in the field of machine learning research. In order to balance the samples between majority and minority class. Oversampling algorithm is used to synthesize new minority class samples, but it could bring in noise. Pointing to the noise problems, this paper proposed a denoising autoencoder neural network (DAE) algorithm which can not only oversample minority class sample through misclassification cost, but it can denoise and classify the sampled dataset. Through experiments, compared with the denoising autoencoder neural network (DAE) with oversampling process and traditional fully connected neural networks, the results showed the proposed algorithm improves the classification accuracy of minority class of imbalanced datasets. |
Tasks | Denoising, Fraud Detection |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11553v1 |
https://arxiv.org/pdf/1908.11553v1.pdf | |
PWC | https://paperswithcode.com/paper/credit-card-fraud-detection-using-autoencoder |
Repo | |
Framework | |
Learning Predicates as Functions to Enable Few-shot Scene Graph Prediction
Title | Learning Predicates as Functions to Enable Few-shot Scene Graph Prediction |
Authors | Apoorva Dornadula, Austin Narcomey, Ranjay Krishna, Michael Bernstein, Li Fei-Fei |
Abstract | Scene graph prediction — classifying the set of objects and predicates in a visual scene — requires substantial training data. However, most predicates only occur a handful of times making them difficult to learn. We introduce the first scene graph prediction model that supports few-shot learning of predicates. Existing scene graph generation models represent objects using pretrained object detectors or word embeddings that capture semantic object information at the cost of encoding information about which relationships they afford. So, these object representations are unable to generalize to new few-shot relationships. We introduce a framework that induces object representations that are structured according to their visual relationships. Unlike past methods, our framework embeds objects that afford similar relationships closer together. This property allows our model to perform well in the few-shot setting. For example, applying the ‘riding’ predicate transformation to ‘person’ modifies the representation towards objects like ‘skateboard’ and ‘horse’ that enable riding. We generate object representations by learning predicates trained as message passing functions within a new graph convolution framework. The object representations are used to build few-shot predicate classifiers for rare predicates with as few as 1 labeled example. We achieve a 5-shot performance of 22.70 recall@50, a 3.7 increase when compared to strong transfer learning baselines. |
Tasks | Few-Shot Learning, Graph Generation, Scene Graph Generation, Transfer Learning, Word Embeddings |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04876v4 |
https://arxiv.org/pdf/1906.04876v4.pdf | |
PWC | https://paperswithcode.com/paper/visual-relationships-as-functions-enabling |
Repo | |
Framework | |
Contextual Joint Factor Acoustic Embeddings
Title | Contextual Joint Factor Acoustic Embeddings |
Authors | Yanpei Shi, Qiang Huang, Thomas Hain |
Abstract | Embedding acoustic information into fixed length representations is of interest for a whole range of applications in speech and audio technology. We propose two novel unsupervised approaches to generate acoustic embeddings by modelling of acoustic context. The first approach is a contextual joint factor synthesis encoder, where the encoder in an encoder/decoder framework is trained to extract joint factors from surrounding audio frames to best generate the target output. The second approach is a contextual joint factor analysis encoder, where the encoder is trained to analyse joint factors from the source signal that correlates best with the neighbouring audio. To evaluate the effectiveness of our approaches compared to prior work, we chose two tasks - phone classification and speaker recognition - and test on different TIMIT data sets. Experimental results show that one of our proposed approaches outperforms phone classification baselines, yielding a classification accuracy of 74.1%. When using additional out-of-domain data for training, an additional 2-3% improvements can be obtained, for both for phone classification and speaker recognition tasks. |
Tasks | Speaker Recognition |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07601v1 |
https://arxiv.org/pdf/1910.07601v1.pdf | |
PWC | https://paperswithcode.com/paper/contextual-joint-factor-acoustic-embeddings |
Repo | |
Framework | |
The Noise Collector for sparse recovery in high dimensions
Title | The Noise Collector for sparse recovery in high dimensions |
Authors | Miguel Moscoso, Alexei Novikov, George Papanicolaou, Chrysoula Tsogka |
Abstract | The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system $A \rho = b_0$ can be found efficiently with an $l_1$-norm minimization approach if the data is noiseless. Detection of the signal’s support from data corrupted by noise is still a challenging problem, especially if the level of noise must be estimated. We propose a new efficient approach that does not require any parameter estimation. We introduce the Noise Collector (NC) matrix $C$ and solve an augmented system $A \rho + C \eta = b_0 + e$, where $ e$ is the noise. We show that the $l_1$-norm minimal solution of the augmented system has zero false discovery rate for any level of noise and with probability that tends to one as the dimension of $ b_0$ increases to infinity. We also obtain exact support recovery if the noise is not too large, and develop a Fast Noise Collector Algorithm which makes the computational cost of solving the augmented system comparable to that of the original one. Finally, we demonstrate the effectiveness of the method in applications to passive array imaging. |
Tasks | |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.04412v1 |
https://arxiv.org/pdf/1908.04412v1.pdf | |
PWC | https://paperswithcode.com/paper/the-noise-collector-for-sparse-recovery-in |
Repo | |
Framework | |
R2D2: Repeatable and Reliable Detector and Descriptor
Title | R2D2: Repeatable and Reliable Detector and Descriptor |
Authors | Jerome Revaud, Philippe Weinzaepfel, César De Souza, Noe Pion, Gabriela Csurka, Yohann Cabon, Martin Humenberger |
Abstract | Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical methods for these tasks are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught up with these techniques, focusing on learning repeatable saliency maps for keypoint detection and learning descriptors at the detected keypoint locations. In this work, we argue that salient regions are not necessarily discriminative, and therefore can harm the performance of the description. Furthermore, we claim that descriptors should be learned only in regions for which matching can be performed with high confidence. We thus propose to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness. This allows us to avoid ambiguous areas and leads to reliable keypoint detections and descriptions. Our detection-and-description approach, trained with self-supervision, can simultaneously output sparse, repeatable and reliable keypoints that outperforms state-of-the-art detectors and descriptors on the HPatches dataset. It also establishes a record on the recently released Aachen Day-Night localization dataset. |
Tasks | Interest Point Detection, Keypoint Detection, Metric Learning |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06195v2 |
https://arxiv.org/pdf/1906.06195v2.pdf | |
PWC | https://paperswithcode.com/paper/r2d2-reliable-and-repeatable-detectors-and |
Repo | |
Framework | |
Learning Variations in Human Motion via Mix-and-Match Perturbation
Title | Learning Variations in Human Motion via Mix-and-Match Perturbation |
Authors | Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Lars Petersson, Stephen Gould, Amirhossein Habibian |
Abstract | Human motion prediction is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. In this paper, we introduce an approach to stochastically combine the root of variations with previous pose information, which forces the model to take the noise into account. We exploit this idea for motion prediction by incorporating it into a recurrent encoder-decoder network with a conditional variational autoencoder block that learns to exploit the perturbations. Our experiments demonstrate that our model yields high-quality pose sequences that are much more diverse than those from state-of-the-art stochastic motion prediction techniques. |
Tasks | motion prediction |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00733v2 |
https://arxiv.org/pdf/1908.00733v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-variations-in-human-motion-via-mix |
Repo | |
Framework | |
Part-based Multi-stream Model for Vehicle Searching
Title | Part-based Multi-stream Model for Vehicle Searching |
Authors | Ya Sun, Minxian Li, Jianfeng Lu |
Abstract | Due to the enormous requirement in public security and intelligent transportation system, searching an identical vehicle has become more and more important. Current studies usually treat vehicle as an integral object and then train a distance metric to measure the similarity among vehicles. However, these raw images may be exactly similar to ones with different identification and include some pixels in background that may disturb the distance metric learning. In this paper, we propose a novel and useful method to segment an original vehicle image into several discriminative foreground parts, and these parts consist of some fine grained regions that are named discriminative patches. After that, these parts combined with the raw image are fed into the proposed deep learning network. We can easily measure the similarity of two vehicle images by computing the Euclidean distance of the features from FC layer. Two main contributions of this paper are as follows. Firstly, a method is proposed to estimate if a patch in a raw vehicle image is discriminative or not. Secondly, a new Part-based Multi-Stream Model (PMSM) is designed and optimized for vehicle retrieval and re-identification tasks. We evaluate the proposed method on the VehicleID dataset, and the experimental results show that our method can outperform the baseline. |
Tasks | Metric Learning |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04144v1 |
https://arxiv.org/pdf/1911.04144v1.pdf | |
PWC | https://paperswithcode.com/paper/part-based-multi-stream-model-for-vehicle |
Repo | |
Framework | |
Multitask Learning For Different Subword Segmentations In Neural Machine Translation
Title | Multitask Learning For Different Subword Segmentations In Neural Machine Translation |
Authors | Tejas Srinivasan, Ramon Sanabria, Florian Metze |
Abstract | In Neural Machine Translation (NMT) the usage of subwords and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words. However, selecting the optimal subword segmentation involves a trade-off between expressiveness and flexibility, and is language and dataset-dependent. We present Block Multitask Learning (BMTL), a novel NMT architecture that predicts multiple targets of different granularities simultaneously, removing the need to search for the optimal segmentation strategy. Our multi-task model exhibits improvements of up to 1.7 BLEU points on each decoder over single-task baseline models with the same number of parameters on datasets from two language pairs of IWSLT15 and one from IWSLT19. The multiple hypotheses generated at different granularities can be combined as a post-processing step to give better translations, which improves over hypothesis combination from baseline models while using substantially fewer parameters. |
Tasks | Machine Translation |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12368v1 |
https://arxiv.org/pdf/1910.12368v1.pdf | |
PWC | https://paperswithcode.com/paper/multitask-learning-for-different-subword |
Repo | |
Framework | |
Deep Kinematic Models for Physically Realistic Prediction of Vehicle Trajectories
Title | Deep Kinematic Models for Physically Realistic Prediction of Vehicle Trajectories |
Authors | Henggang Cui, Thi Nguyen, Fang-Chieh Chou, Tsung-Han Lin, Jeff Schneider, David Bradley, Nemanja Djuric |
Abstract | Self-driving vehicles (SDVs) hold great potential for improving traffic safety and are poised to positively affect the quality of life of millions of people. To unlock this potential one of the critical aspects of the autonomous technology is understanding and predicting future movement of vehicles surrounding the SDV. This work presents a deep-learning-based method for kinematically feasible motion prediction of such traffic actors. Previous work did not explicitly encode vehicle kinematics and instead relied on the models to learn the constraints directly from the data, potentially resulting in kinematically infeasible, suboptimal trajectory predictions. To address this issue we propose a method that seamlessly combines ideas from the AI with physically grounded vehicle motion models. In this way we employ best of the both worlds, coupling powerful learning models with strong feasibility guarantees for their outputs. The proposed approach is general, being applicable to any type of learning method. Extensive experiments using deep convnets on real-world data strongly indicate its benefits, outperforming the existing state-of-the-art. |
Tasks | motion prediction |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00219v2 |
https://arxiv.org/pdf/1908.00219v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-kinematic-models-for-physically |
Repo | |
Framework | |
SLAM-based Integrity Monitoring Using GPS and Fish-eye Camera
Title | SLAM-based Integrity Monitoring Using GPS and Fish-eye Camera |
Authors | Sriramya Bhamidipati, Grace Xingxin Gao |
Abstract | Urban navigation using GPS and fish-eye camera suffers from multipath effects in GPS measurements and data association errors in pixel intensities across image frames. We propose a Simultaneous Localization and Mapping (SLAM)-based Integrity Monitoring (IM) algorithm to compute the position protection levels while accounting for multiple faults in both GPS and vision. We perform graph optimization using the sequential data of GPS pseudoranges, pixel intensities, vehicle dynamics, and satellite ephemeris to simultaneously localize the vehicle as well as the landmarks, namely GPS satellites and key image pixels in the world frame. We estimate the fault mode vector by analyzing the temporal correlation across the GPS measurement residuals and spatial correlation across the vision intensity residuals. In particular, to detect and isolate the vision faults, we developed a superpixel-based piecewise Random Sample Consensus (RANSAC) technique to perform spatial voting across image pixels. For an estimated fault mode, we compute the protection levels by applying worst-case failure slope analysis to the linearized Graph-SLAM framework. We perform ground vehicle experiments in the semi-urban area of Champaign, IL and have demonstrated the successful detection and isolation of multiple faults. We also validate tighter protection levels and lower localization errors achieved via the proposed algorithm as compared to SLAM-based IM that utilizes only GPS measurements. |
Tasks | Simultaneous Localization and Mapping |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.02165v1 |
https://arxiv.org/pdf/1910.02165v1.pdf | |
PWC | https://paperswithcode.com/paper/slam-based-integrity-monitoring-using-gps-and |
Repo | |
Framework | |
Embeddings for DNN speaker adaptive training
Title | Embeddings for DNN speaker adaptive training |
Authors | Joanna Rownicka, Peter Bell, Steve Renals |
Abstract | In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker. DNN-SAT can be viewed as learning a mapping from each embedding to transformation parameters that are applied to the shared parameters of the DNN. We investigate different approaches to applying these transformations, and find that with a good training strategy, a multi-layer adaptation network applied to all hidden layers is no more effective than a single linear layer acting on the embeddings to transform the input features. In the second part of our work, we evaluate different embeddings (i-vectors, x-vectors and deep CNN embeddings) in an additional speaker recognition task in order to gain insight into what should characterize an embedding for DNN-SAT. We find the performance for speaker recognition of a given representation is not correlated with its ASR performance; in fact, ability to capture more speech attributes than just speaker identity was the most important characteristic of the embeddings for efficient DNN-SAT ASR. Our best models achieved relative WER gains of 4% and 9% over DNN baselines using speaker-level cepstral mean normalisation (CMN), and a fully speaker-independent model, respectively. |
Tasks | Speaker Recognition |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13537v1 |
https://arxiv.org/pdf/1909.13537v1.pdf | |
PWC | https://paperswithcode.com/paper/embeddings-for-dnn-speaker-adaptive-training |
Repo | |
Framework | |
How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition
Title | How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition |
Authors | Christine M. Anderson-Cook, Kary L. Myers, Lu Lu, Michael L. Fugate, Kevin R. Quinlan, Norma Pawley |
Abstract | Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment. |
Tasks | |
Published | 2019-01-16 |
URL | http://arxiv.org/abs/1901.05356v1 |
http://arxiv.org/pdf/1901.05356v1.pdf | |
PWC | https://paperswithcode.com/paper/how-to-host-a-data-competition-statistical |
Repo | |
Framework | |