April 3, 2020

3197 words 16 mins read

Paper Group AWR 40

Paper Group AWR 40

Embedding Propagation: Smoother Manifold for Few-Shot Classification. Discrete-valued Preference Estimation with Graph Side Information. PeelNet: Textured 3D reconstruction of human body using single view RGB image. Outcome Correlation in Graph Neural Network Regression. Explainable Deep Convolutional Candlestick Learner. Knowledge Distillation for …

Embedding Propagation: Smoother Manifold for Few-Shot Classification

Title Embedding Propagation: Smoother Manifold for Few-Shot Classification
Authors Pau Rodríguez, Issam Laradji, Alexandre Drouin, Alexandre Lacoste
Abstract Few-shot classification is challenging because the data distribution of the training set can be widely different to the distribution of the test set as their classes are disjoint. This distribution shift often results in poor generalization. Manifold smoothing has been shown to address the distribution shift problem by extending the decision boundaries and reducing the noise of the class representations. Moreover, manifold smoothness is a key factor for semi-supervised learning and transductive learning algorithms. In this work, we present embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing. Embedding propagation leverages interpolations between the extracted features of a neural network based on a similarity graph. We empirically show that embedding propagation yields a smoother embedding manifold. We also show that incorporating embedding propagation to a transductive classifier leads to new state-of-the-art results in mini-Imagenet, tiered-Imagenet, and CUB. Furthermore, we show that embedding propagation results in additional improvement in performance for semi-supervised learning scenarios.
Tasks
Published 2020-03-09
URL https://arxiv.org/abs/2003.04151v1
PDF https://arxiv.org/pdf/2003.04151v1.pdf
PWC https://paperswithcode.com/paper/embedding-propagation-smoother-manifold-for
Repo https://github.com/ElementAI/embedding-propagation
Framework pytorch

Discrete-valued Preference Estimation with Graph Side Information

Title Discrete-valued Preference Estimation with Graph Side Information
Authors Changhun Jo, Kangwook Lee
Abstract Incorporating graph side information into recommender systems has been widely used to better predict ratings, but relatively few works have focused on theoretical guarantees. Ahn et al. (2018) firstly characterized the optimal sample complexity in the presence of graph side information, but the results are limited due to strict, unrealistic assumptions made on the unknown preference matrix. In this work, we propose a new model in which the unknown preference matrix can have any discrete values, thereby relaxing the assumptions made in prior work. Under this new model, we fully characterize the optimal sample complexity and develop a computationally-efficient algorithm that matches the optimal sample complexity. We also show that our algorithm is robust to model errors, and it outperforms existing algorithms on both synthetic and real datasets.
Tasks Recommendation Systems
Published 2020-03-16
URL https://arxiv.org/abs/2003.07040v1
PDF https://arxiv.org/pdf/2003.07040v1.pdf
PWC https://paperswithcode.com/paper/discrete-valued-preference-estimation-with
Repo https://github.com/changhunjo0927/Discrete_Preference_Codesource
Framework none

PeelNet: Textured 3D reconstruction of human body using single view RGB image

Title PeelNet: Textured 3D reconstruction of human body using single view RGB image
Authors Sai Sagar Jinka, Rohan Chacko, Avinash Sharma, P. J. Narayanan
Abstract Reconstructing human shape and pose from a single image is a challenging problem due to issues like severe self-occlusions, clothing variations, and changes in lighting to name a few. Many applications in the entertainment industry, e-commerce, health-care (physiotherapy), and mobile-based AR/VR platforms can benefit from recovering the 3D human shape, pose, and texture. In this paper, we present PeelNet, an end-to-end generative adversarial framework to tackle the problem of textured 3D reconstruction of the human body from a single RGB image. Motivated by ray tracing for generating realistic images of a 3D scene, we tackle this problem by representing the human body as a set of peeled depth and RGB maps which are obtained by extending rays beyond the first intersection with the 3D object. This formulation allows us to handle self-occlusions efficiently. Current parametric model-based approaches fail to model loose clothing and surface-level details and are proposed for the underlying naked human body. Majority of non-parametric approaches are either computationally expensive or provide unsatisfactory results. We present a simple non-parametric solution where the peeled maps are generated from a single RGB image as input. Our proposed peeled depth maps are back-projected to 3D volume to obtain a complete 3D shape. The corresponding RGB maps provide vertex-level texture details. We compare our method against current state-of-the-art methods in 3D reconstruction and demonstrate the effectiveness of our method on BUFF and MonoPerfCap datasets.
Tasks 3D Reconstruction
Published 2020-02-16
URL https://arxiv.org/abs/2002.06664v1
PDF https://arxiv.org/pdf/2002.06664v1.pdf
PWC https://paperswithcode.com/paper/peelnet-textured-3d-reconstruction-of-human
Repo https://github.com/chingswy/HumanPoseMemo
Framework pytorch

Outcome Correlation in Graph Neural Network Regression

Title Outcome Correlation in Graph Neural Network Regression
Authors Junteng Jia, Austin Benson
Abstract Graph neural networks aggregate features in vertex neighborhoods to learn vector representations of all vertices, using supervision from some labeled vertices during training. The predictor is then a function of the vector representation, and predictions are made independently on unlabeled nodes. This widely-adopted approach implicitly assumes that vertex labels are independent after conditioning on their neighborhoods. We show that this strong assumption is far from true on many real-world graph datasets and severely limits predictive power on a number of regression tasks. Given that traditional graph-based semi-supervised learning methods operate in the opposite manner by explicitly modeling the correlation in predicted outcomes, this limitation may not be all that surprising. Here, we address this issue with a simple and interpretable framework that can improve any graph neural network architecture by modeling correlation structure in regression outcome residuals. Specifically, we model the joint distribution of outcome residuals on vertices with a parameterized multivariate Gaussian, where the parameters are estimated by maximizing the marginal likelihood of the observed labels. Our model achieves substantially boosts the performance of graph neural networks, and the learned parameters can also be interpreted as the strength of correlation among connected vertices. To allow us to scale to large networks, we design linear time algorithms for low-variance, unbiased model parameter estimates based on stochastic trace estimation. We also provide a simplified version of our method that makes stronger assumptions on correlation structure but is extremely easy to implement and provides great practical performance in several cases.
Tasks
Published 2020-02-19
URL https://arxiv.org/abs/2002.08274v1
PDF https://arxiv.org/pdf/2002.08274v1.pdf
PWC https://paperswithcode.com/paper/outcome-correlation-in-graph-neural-network
Repo https://github.com/000Justin000/gnn-residual-correlation
Framework none

Explainable Deep Convolutional Candlestick Learner

Title Explainable Deep Convolutional Candlestick Learner
Authors Jun-Hao Chen, Samuel Yen-Chi Chen, Yun-Cheng Tsai, Chih-Shiang Shur
Abstract Candlesticks are graphical representations of price movements for a given period. The traders can discovery the trend of the asset by looking at the candlestick patterns. Although deep convolutional neural networks have achieved great success for recognizing the candlestick patterns, their reasoning hides inside a black box. The traders cannot make sure what the model has learned. In this contribution, we provide a framework which is to explain the reasoning of the learned model determining the specific candlestick patterns of time series. Based on the local search adversarial attacks, we show that the learned model perceives the pattern of the candlesticks in a way similar to the human trader.
Tasks Time Series
Published 2020-01-08
URL https://arxiv.org/abs/2001.02767v3
PDF https://arxiv.org/pdf/2001.02767v3.pdf
PWC https://paperswithcode.com/paper/explainable-deep-convolutional-candlestick
Repo https://github.com/pecu/FinancialVision
Framework tf

Knowledge Distillation for Brain Tumor Segmentation

Title Knowledge Distillation for Brain Tumor Segmentation
Authors Dmitrii Lachinov, Elena Shipunova, Vadim Turlapov
Abstract The segmentation of brain tumors in multimodal MRIs is one of the most challenging tasks in medical image analysis. The recent state of the art algorithms solving this task is based on machine learning approaches and deep learning in particular. The amount of data used for training such models and its variability is a keystone for building an algorithm with high representation power. In this paper, we study the relationship between the performance of the model and the amount of data employed during the training process. On the example of brain tumor segmentation challenge, we compare the model trained with labeled data provided by challenge organizers, and the same model trained in omni-supervised manner using additional unlabeled data annotated with the ensemble of heterogeneous models. As a result, a single model trained with additional data achieves performance close to the ensemble of multiple models and outperforms individual methods.
Tasks Brain Tumor Segmentation
Published 2020-02-10
URL https://arxiv.org/abs/2002.03688v1
PDF https://arxiv.org/pdf/2002.03688v1.pdf
PWC https://paperswithcode.com/paper/knowledge-distillation-for-brain-tumor
Repo https://github.com/lachinov/brats2019
Framework pytorch

Deep Residual-Dense Lattice Network for Speech Enhancement

Title Deep Residual-Dense Lattice Network for Speech Enhancement
Authors Mohammad Nikzad, Aaron Nicolson, Yongsheng Gao, Jun Zhou, Kuldip K. Paliwal, Fanhua Shang
Abstract Convolutional neural networks (CNNs) with residual links (ResNets) and causal dilated convolutional units have been the network of choice for deep learning approaches to speech enhancement. While residual links improve gradient flow during training, feature diminution of shallow layer outputs can occur due to repetitive summations with deeper layer outputs. One strategy to improve feature re-usage is to fuse both ResNets and densely connected CNNs (DenseNets). DenseNets, however, over-allocate parameters for feature re-usage. Motivated by this, we propose the residual-dense lattice network (RDL-Net), which is a new CNN for speech enhancement that employs both residual and dense aggregations without over-allocating parameters for feature re-usage. This is managed through the topology of the RDL blocks, which limit the number of outputs used for dense aggregations. Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. RDL-Nets also use substantially fewer parameters and have a lower computational requirement. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement.
Tasks Speech Enhancement
Published 2020-02-27
URL https://arxiv.org/abs/2002.12794v1
PDF https://arxiv.org/pdf/2002.12794v1.pdf
PWC https://paperswithcode.com/paper/deep-residual-dense-lattice-network-for
Repo https://github.com/nick-nikzad/RDL-SE
Framework tf

NPLDA: A Deep Neural PLDA Model for Speaker Verification

Title NPLDA: A Deep Neural PLDA Model for Speaker Verification
Authors Shreyas Ramoji, Prashant Krishnan, Sriram Ganapathy
Abstract The state-of-art approach for speaker verification consists of a neural network based embedding extractor along with a backend generative model such as the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose a neural network approach for backend modeling in speaker recognition. The likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost. The proposed model, termed as neural PLDA (NPLDA), is initialized using the generative PLDA model parameters. The loss function for the NPLDA model is an approximation of the minimum detection cost function (DCF). The speaker recognition experiments using the NPLDA model are performed on the speaker verificiation task in the VOiCES datasets as well as the SITW challenge dataset. In these experiments, the NPLDA model optimized using the proposed loss function improves significantly over the state-of-art PLDA based speaker verification system.
Tasks Speaker Recognition, Speaker Verification
Published 2020-02-10
URL https://arxiv.org/abs/2002.03562v1
PDF https://arxiv.org/pdf/2002.03562v1.pdf
PWC https://paperswithcode.com/paper/nplda-a-deep-neural-plda-model-for-speaker
Repo https://github.com/iiscleap/NeuralPlda
Framework pytorch

DropClass and DropAdapt: Dropping classes for deep speaker representation learning

Title DropClass and DropAdapt: Dropping classes for deep speaker representation learning
Authors Chau Luu, Peter Bell, Steve Renals
Abstract Many recent works on deep speaker embeddings train their feature extraction networks on large classification tasks, distinguishing between all speakers in a training set. Empirically, this has been shown to produce speaker-discriminative embeddings, even for unseen speakers. However, it is not clear that this is the optimal means of training embeddings that generalize well. This work proposes two approaches to learning embeddings, based on the notion of dropping classes during training. We demonstrate that both approaches can yield performance gains in speaker verification tasks. The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks. Combined with an additive angular margin loss, this method can yield a 7.9% relative improvement in equal error rate (EER) over a strong baseline on VoxCeleb. The second proposed method, DropAdapt, is a means of adapting a trained model to a set of enrolment speakers in an unsupervised manner. This is performed by fine-tuning a model on only those classes which produce high probability predictions when the enrolment speakers are used as input, again also dropping the relevant rows from the output layer. This method yields a large 13.2% relative improvement in EER on VoxCeleb. The code for this paper has been made publicly available.
Tasks Representation Learning, Speaker Verification
Published 2020-02-02
URL https://arxiv.org/abs/2002.00453v1
PDF https://arxiv.org/pdf/2002.00453v1.pdf
PWC https://paperswithcode.com/paper/dropclass-and-dropadapt-dropping-classes-for
Repo https://github.com/cvqluu/dropclass_speaker
Framework pytorch

Drone Based RGBT Vehicle Detection and Counting: A Challenge

Title Drone Based RGBT Vehicle Detection and Counting: A Challenge
Authors Pengfei Zhu, Yiming Sun, Longyin Wen, Yu Feng, Qinghua Hu
Abstract Camera-equipped drones can capture targets on the ground from a wider field of view than static cameras or moving sensors over the ground. In this paper we present a large-scale vehicle detection and counting benchmark, named DroneVehicle, aiming at advancing visual analysis tasks on the drone platform. The images in the benchmark were captured over various urban areas, which include different types of urban roads, residential areas, parking lots, highways, etc., from day to night. Specifically, DroneVehicle consists of 15,532 pairs of images, i.e., RGB images and infrared images with rich annotations, including oriented object bounding boxes, object categories, etc. With intensive amount of effort, our benchmark has 441,642 annotated instances in 31,064 images. As a large-scale dataset with both RGB and thermal infrared (RGBT) images, the benchmark enables extensive evaluation and investigation of visual analysis algorithms on the drone platform. In particular, we design two popular tasks with the benchmark, including object detection and object counting. All these tasks are extremely challenging in the proposed dataset due to factors such as illumination, occlusion, and scale variations. We hope the benchmark largely boost the research and development in visual analysis on drone platforms. The DroneVehicle dataset can be download from https://github.com/VisDrone/DroneVehicle.
Tasks Object Counting, Object Detection
Published 2020-03-05
URL https://arxiv.org/abs/2003.02437v1
PDF https://arxiv.org/pdf/2003.02437v1.pdf
PWC https://paperswithcode.com/paper/drone-based-rgbt-vehicle-detection-and
Repo https://github.com/VisDrone/DroneVehicle
Framework none

Regularizers for Single-step Adversarial Training

Title Regularizers for Single-step Adversarial Training
Authors B. S. Vivek, R. Venkatesh Babu
Abstract The progress in the last decade has enabled machine learning models to achieve impressive performance across a wide range of tasks in Computer Vision. However, a plethora of works have demonstrated the susceptibility of these models to adversarial samples. Adversarial training procedure has been proposed to defend against such adversarial attacks. Adversarial training methods augment mini-batches with adversarial samples, and typically single-step (non-iterative) methods are used for generating these adversarial samples. However, models trained using single-step adversarial training converge to degenerative minima where the model merely appears to be robust. The pseudo robustness of these models is due to the gradient masking effect. Although multi-step adversarial training helps to learn robust models, they are hard to scale due to the use of iterative methods for generating adversarial samples. To address these issues, we propose three different types of regularizers that help to learn robust models using single-step adversarial training methods. The proposed regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model. Performance of models trained using the proposed regularizers is on par with models trained using computationally expensive multi-step adversarial training methods.
Tasks
Published 2020-02-03
URL https://arxiv.org/abs/2002.00614v1
PDF https://arxiv.org/pdf/2002.00614v1.pdf
PWC https://paperswithcode.com/paper/regularizers-for-single-step-adversarial
Repo https://github.com/val-iisc/SAT-Rx
Framework pytorch

Variational Depth Search in ResNets

Title Variational Depth Search in ResNets
Authors Javier Antorán, James Urquhart Allingham, José Miguel Hernández-Lobato
Abstract One-shot neural architecture search allows joint learning of weights and network architecture, reducing computational cost. We limit our search space to the depth of residual networks and formulate an analytically tractable variational objective that allows for obtaining an unbiased approximate posterior over depths in one-shot. We propose a heuristic to prune our networks based on this distribution. We compare our proposed method against manual search over network depths on the MNIST, Fashion-MNIST, SVHN datasets. We find that pruned networks do not incur a loss in predictive performance, obtaining accuracies competitive with unpruned networks. Marginalising over depth allows us to obtain better-calibrated test-time uncertainty estimates than regular networks, in a single forward pass.
Tasks Neural Architecture Search
Published 2020-02-06
URL https://arxiv.org/abs/2002.02797v3
PDF https://arxiv.org/pdf/2002.02797v3.pdf
PWC https://paperswithcode.com/paper/variational-depth-search-in-resnets
Repo https://github.com/anonimoose12345678/arch_uncert
Framework pytorch

Searching Central Difference Convolutional Networks for Face Anti-Spoofing

Title Searching Central Difference Convolutional Networks for Face Anti-Spoofing
Authors Zitong Yu, Chenxu Zhao, Zezheng Wang, Yunxiao Qin, Zhuo Su, Xiaobai Li, Feng Zhou, Guoying Zhao
Abstract Face anti-spoofing (FAS) plays a vital role in face recognition systems. Most state-of-the-art FAS methods 1) rely on stacked convolutions and expert-designed network, which is weak in describing detailed fine-grained information and easily being ineffective when the environment varies (e.g., different illumination), and 2) prefer to use long sequence as input to extract dynamic features, making them difficult to deploy into scenarios which need quick response. Here we propose a novel frame level FAS method based on Central Difference Convolution (CDC), which is able to capture intrinsic detailed patterns via aggregating both intensity and gradient information. A network built with CDC, called the Central Difference Convolutional Network (CDCN), is able to provide more robust modeling capacity than its counterpart built with vanilla convolution. Furthermore, over a specifically designed CDC search space, Neural Architecture Search (NAS) is utilized to discover a more powerful network structure (CDCN++), which can be assembled with Multiscale Attention Fusion Module (MAFM) for further boosting performance. Comprehensive experiments are performed on six benchmark datasets to show that 1) the proposed method not only achieves superior performance on intra-dataset testing (especially 0.2% ACER in Protocol-1 of OULU-NPU dataset), 2) it also generalizes well on cross-dataset testing (particularly 6.5% HTER from CASIA-MFSD to Replay-Attack datasets). The codes are available at \href{https://github.com/ZitongYu/CDCN}{https://github.com/ZitongYu/CDCN}.
Tasks Face Anti-Spoofing, Face Recognition, Neural Architecture Search
Published 2020-03-09
URL https://arxiv.org/abs/2003.04092v1
PDF https://arxiv.org/pdf/2003.04092v1.pdf
PWC https://paperswithcode.com/paper/searching-central-difference-convolutional
Repo https://github.com/ZitongYu/CDCN
Framework pytorch

A Spatio-Temporal Spot-Forecasting Framework for Urban Traffic Prediction

Title A Spatio-Temporal Spot-Forecasting Framework for Urban Traffic Prediction
Authors Rodrigo de Medrano, José L. Aznarte
Abstract Spatio-temporal forecasting is an open research field whose interest is growing exponentially. In this work we focus on creating a complex deep neural framework for spatio-temporal traffic forecasting with comparatively very good performance and that shows to be adaptable over several spatio-temporal conditions while remaining easy to understand and interpret. Our proposal is based on an interpretable attention-based neural network in which several modules are combined in order to capture key spatio-temporal time series components. Through extensive experimentation, we show how the results of our approach are stable and better than those of other state-of-the-art alternatives.
Tasks Spatio-Temporal Forecasting, Time Series, Traffic Prediction
Published 2020-03-31
URL https://arxiv.org/abs/2003.13977v2
PDF https://arxiv.org/pdf/2003.13977v2.pdf
PWC https://paperswithcode.com/paper/a-spatio-temporal-spot-forecasting-framework
Repo https://github.com/rdemedrano/crann_traffic
Framework none

Représentations lexicales pour la détection non supervisée d'événements dans un flux de tweets : étude sur des corpus français et anglais

Title Représentations lexicales pour la détection non supervisée d'événements dans un flux de tweets : étude sur des corpus français et anglais
Authors Béatrice Mazoyer, Nicolas Hervé, Céline Hudelot, Julia Cage
Abstract In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French annotated by our team. We show that recent techniques based on deep neural networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on many applications, are not very suitable for this task. We also experiment with different types of fine-tuning to improve these results on French data. Finally, we propose a detailed analysis of the results obtained, showing the superiority of tf-idf approaches for this task.
Tasks
Published 2020-01-13
URL https://arxiv.org/abs/2001.04139v1
PDF https://arxiv.org/pdf/2001.04139v1.pdf
PWC https://paperswithcode.com/paper/representations-lexicales-pour-la-detection
Repo https://github.com/ina-foss/twembeddings
Framework tf
comments powered by Disqus