Paper Group AWR 40
Embedding Propagation: Smoother Manifold for Few-Shot Classification. Discrete-valued Preference Estimation with Graph Side Information. PeelNet: Textured 3D reconstruction of human body using single view RGB image. Outcome Correlation in Graph Neural Network Regression. Explainable Deep Convolutional Candlestick Learner. Knowledge Distillation for …
Embedding Propagation: Smoother Manifold for Few-Shot Classification
Title | Embedding Propagation: Smoother Manifold for Few-Shot Classification |
Authors | Pau Rodríguez, Issam Laradji, Alexandre Drouin, Alexandre Lacoste |
Abstract | Few-shot classification is challenging because the data distribution of the training set can be widely different to the distribution of the test set as their classes are disjoint. This distribution shift often results in poor generalization. Manifold smoothing has been shown to address the distribution shift problem by extending the decision boundaries and reducing the noise of the class representations. Moreover, manifold smoothness is a key factor for semi-supervised learning and transductive learning algorithms. In this work, we present embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing. Embedding propagation leverages interpolations between the extracted features of a neural network based on a similarity graph. We empirically show that embedding propagation yields a smoother embedding manifold. We also show that incorporating embedding propagation to a transductive classifier leads to new state-of-the-art results in mini-Imagenet, tiered-Imagenet, and CUB. Furthermore, we show that embedding propagation results in additional improvement in performance for semi-supervised learning scenarios. |
Tasks | |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.04151v1 |
https://arxiv.org/pdf/2003.04151v1.pdf | |
PWC | https://paperswithcode.com/paper/embedding-propagation-smoother-manifold-for |
Repo | https://github.com/ElementAI/embedding-propagation |
Framework | pytorch |
Discrete-valued Preference Estimation with Graph Side Information
Title | Discrete-valued Preference Estimation with Graph Side Information |
Authors | Changhun Jo, Kangwook Lee |
Abstract | Incorporating graph side information into recommender systems has been widely used to better predict ratings, but relatively few works have focused on theoretical guarantees. Ahn et al. (2018) firstly characterized the optimal sample complexity in the presence of graph side information, but the results are limited due to strict, unrealistic assumptions made on the unknown preference matrix. In this work, we propose a new model in which the unknown preference matrix can have any discrete values, thereby relaxing the assumptions made in prior work. Under this new model, we fully characterize the optimal sample complexity and develop a computationally-efficient algorithm that matches the optimal sample complexity. We also show that our algorithm is robust to model errors, and it outperforms existing algorithms on both synthetic and real datasets. |
Tasks | Recommendation Systems |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07040v1 |
https://arxiv.org/pdf/2003.07040v1.pdf | |
PWC | https://paperswithcode.com/paper/discrete-valued-preference-estimation-with |
Repo | https://github.com/changhunjo0927/Discrete_Preference_Codesource |
Framework | none |
PeelNet: Textured 3D reconstruction of human body using single view RGB image
Title | PeelNet: Textured 3D reconstruction of human body using single view RGB image |
Authors | Sai Sagar Jinka, Rohan Chacko, Avinash Sharma, P. J. Narayanan |
Abstract | Reconstructing human shape and pose from a single image is a challenging problem due to issues like severe self-occlusions, clothing variations, and changes in lighting to name a few. Many applications in the entertainment industry, e-commerce, health-care (physiotherapy), and mobile-based AR/VR platforms can benefit from recovering the 3D human shape, pose, and texture. In this paper, we present PeelNet, an end-to-end generative adversarial framework to tackle the problem of textured 3D reconstruction of the human body from a single RGB image. Motivated by ray tracing for generating realistic images of a 3D scene, we tackle this problem by representing the human body as a set of peeled depth and RGB maps which are obtained by extending rays beyond the first intersection with the 3D object. This formulation allows us to handle self-occlusions efficiently. Current parametric model-based approaches fail to model loose clothing and surface-level details and are proposed for the underlying naked human body. Majority of non-parametric approaches are either computationally expensive or provide unsatisfactory results. We present a simple non-parametric solution where the peeled maps are generated from a single RGB image as input. Our proposed peeled depth maps are back-projected to 3D volume to obtain a complete 3D shape. The corresponding RGB maps provide vertex-level texture details. We compare our method against current state-of-the-art methods in 3D reconstruction and demonstrate the effectiveness of our method on BUFF and MonoPerfCap datasets. |
Tasks | 3D Reconstruction |
Published | 2020-02-16 |
URL | https://arxiv.org/abs/2002.06664v1 |
https://arxiv.org/pdf/2002.06664v1.pdf | |
PWC | https://paperswithcode.com/paper/peelnet-textured-3d-reconstruction-of-human |
Repo | https://github.com/chingswy/HumanPoseMemo |
Framework | pytorch |
Outcome Correlation in Graph Neural Network Regression
Title | Outcome Correlation in Graph Neural Network Regression |
Authors | Junteng Jia, Austin Benson |
Abstract | Graph neural networks aggregate features in vertex neighborhoods to learn vector representations of all vertices, using supervision from some labeled vertices during training. The predictor is then a function of the vector representation, and predictions are made independently on unlabeled nodes. This widely-adopted approach implicitly assumes that vertex labels are independent after conditioning on their neighborhoods. We show that this strong assumption is far from true on many real-world graph datasets and severely limits predictive power on a number of regression tasks. Given that traditional graph-based semi-supervised learning methods operate in the opposite manner by explicitly modeling the correlation in predicted outcomes, this limitation may not be all that surprising. Here, we address this issue with a simple and interpretable framework that can improve any graph neural network architecture by modeling correlation structure in regression outcome residuals. Specifically, we model the joint distribution of outcome residuals on vertices with a parameterized multivariate Gaussian, where the parameters are estimated by maximizing the marginal likelihood of the observed labels. Our model achieves substantially boosts the performance of graph neural networks, and the learned parameters can also be interpreted as the strength of correlation among connected vertices. To allow us to scale to large networks, we design linear time algorithms for low-variance, unbiased model parameter estimates based on stochastic trace estimation. We also provide a simplified version of our method that makes stronger assumptions on correlation structure but is extremely easy to implement and provides great practical performance in several cases. |
Tasks | |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08274v1 |
https://arxiv.org/pdf/2002.08274v1.pdf | |
PWC | https://paperswithcode.com/paper/outcome-correlation-in-graph-neural-network |
Repo | https://github.com/000Justin000/gnn-residual-correlation |
Framework | none |
Explainable Deep Convolutional Candlestick Learner
Title | Explainable Deep Convolutional Candlestick Learner |
Authors | Jun-Hao Chen, Samuel Yen-Chi Chen, Yun-Cheng Tsai, Chih-Shiang Shur |
Abstract | Candlesticks are graphical representations of price movements for a given period. The traders can discovery the trend of the asset by looking at the candlestick patterns. Although deep convolutional neural networks have achieved great success for recognizing the candlestick patterns, their reasoning hides inside a black box. The traders cannot make sure what the model has learned. In this contribution, we provide a framework which is to explain the reasoning of the learned model determining the specific candlestick patterns of time series. Based on the local search adversarial attacks, we show that the learned model perceives the pattern of the candlesticks in a way similar to the human trader. |
Tasks | Time Series |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.02767v3 |
https://arxiv.org/pdf/2001.02767v3.pdf | |
PWC | https://paperswithcode.com/paper/explainable-deep-convolutional-candlestick |
Repo | https://github.com/pecu/FinancialVision |
Framework | tf |
Knowledge Distillation for Brain Tumor Segmentation
Title | Knowledge Distillation for Brain Tumor Segmentation |
Authors | Dmitrii Lachinov, Elena Shipunova, Vadim Turlapov |
Abstract | The segmentation of brain tumors in multimodal MRIs is one of the most challenging tasks in medical image analysis. The recent state of the art algorithms solving this task is based on machine learning approaches and deep learning in particular. The amount of data used for training such models and its variability is a keystone for building an algorithm with high representation power. In this paper, we study the relationship between the performance of the model and the amount of data employed during the training process. On the example of brain tumor segmentation challenge, we compare the model trained with labeled data provided by challenge organizers, and the same model trained in omni-supervised manner using additional unlabeled data annotated with the ensemble of heterogeneous models. As a result, a single model trained with additional data achieves performance close to the ensemble of multiple models and outperforms individual methods. |
Tasks | Brain Tumor Segmentation |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03688v1 |
https://arxiv.org/pdf/2002.03688v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-distillation-for-brain-tumor |
Repo | https://github.com/lachinov/brats2019 |
Framework | pytorch |
Deep Residual-Dense Lattice Network for Speech Enhancement
Title | Deep Residual-Dense Lattice Network for Speech Enhancement |
Authors | Mohammad Nikzad, Aaron Nicolson, Yongsheng Gao, Jun Zhou, Kuldip K. Paliwal, Fanhua Shang |
Abstract | Convolutional neural networks (CNNs) with residual links (ResNets) and causal dilated convolutional units have been the network of choice for deep learning approaches to speech enhancement. While residual links improve gradient flow during training, feature diminution of shallow layer outputs can occur due to repetitive summations with deeper layer outputs. One strategy to improve feature re-usage is to fuse both ResNets and densely connected CNNs (DenseNets). DenseNets, however, over-allocate parameters for feature re-usage. Motivated by this, we propose the residual-dense lattice network (RDL-Net), which is a new CNN for speech enhancement that employs both residual and dense aggregations without over-allocating parameters for feature re-usage. This is managed through the topology of the RDL blocks, which limit the number of outputs used for dense aggregations. Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. RDL-Nets also use substantially fewer parameters and have a lower computational requirement. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement. |
Tasks | Speech Enhancement |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12794v1 |
https://arxiv.org/pdf/2002.12794v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-residual-dense-lattice-network-for |
Repo | https://github.com/nick-nikzad/RDL-SE |
Framework | tf |
NPLDA: A Deep Neural PLDA Model for Speaker Verification
Title | NPLDA: A Deep Neural PLDA Model for Speaker Verification |
Authors | Shreyas Ramoji, Prashant Krishnan, Sriram Ganapathy |
Abstract | The state-of-art approach for speaker verification consists of a neural network based embedding extractor along with a backend generative model such as the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose a neural network approach for backend modeling in speaker recognition. The likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost. The proposed model, termed as neural PLDA (NPLDA), is initialized using the generative PLDA model parameters. The loss function for the NPLDA model is an approximation of the minimum detection cost function (DCF). The speaker recognition experiments using the NPLDA model are performed on the speaker verificiation task in the VOiCES datasets as well as the SITW challenge dataset. In these experiments, the NPLDA model optimized using the proposed loss function improves significantly over the state-of-art PLDA based speaker verification system. |
Tasks | Speaker Recognition, Speaker Verification |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03562v1 |
https://arxiv.org/pdf/2002.03562v1.pdf | |
PWC | https://paperswithcode.com/paper/nplda-a-deep-neural-plda-model-for-speaker |
Repo | https://github.com/iiscleap/NeuralPlda |
Framework | pytorch |
DropClass and DropAdapt: Dropping classes for deep speaker representation learning
Title | DropClass and DropAdapt: Dropping classes for deep speaker representation learning |
Authors | Chau Luu, Peter Bell, Steve Renals |
Abstract | Many recent works on deep speaker embeddings train their feature extraction networks on large classification tasks, distinguishing between all speakers in a training set. Empirically, this has been shown to produce speaker-discriminative embeddings, even for unseen speakers. However, it is not clear that this is the optimal means of training embeddings that generalize well. This work proposes two approaches to learning embeddings, based on the notion of dropping classes during training. We demonstrate that both approaches can yield performance gains in speaker verification tasks. The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks. Combined with an additive angular margin loss, this method can yield a 7.9% relative improvement in equal error rate (EER) over a strong baseline on VoxCeleb. The second proposed method, DropAdapt, is a means of adapting a trained model to a set of enrolment speakers in an unsupervised manner. This is performed by fine-tuning a model on only those classes which produce high probability predictions when the enrolment speakers are used as input, again also dropping the relevant rows from the output layer. This method yields a large 13.2% relative improvement in EER on VoxCeleb. The code for this paper has been made publicly available. |
Tasks | Representation Learning, Speaker Verification |
Published | 2020-02-02 |
URL | https://arxiv.org/abs/2002.00453v1 |
https://arxiv.org/pdf/2002.00453v1.pdf | |
PWC | https://paperswithcode.com/paper/dropclass-and-dropadapt-dropping-classes-for |
Repo | https://github.com/cvqluu/dropclass_speaker |
Framework | pytorch |
Drone Based RGBT Vehicle Detection and Counting: A Challenge
Title | Drone Based RGBT Vehicle Detection and Counting: A Challenge |
Authors | Pengfei Zhu, Yiming Sun, Longyin Wen, Yu Feng, Qinghua Hu |
Abstract | Camera-equipped drones can capture targets on the ground from a wider field of view than static cameras or moving sensors over the ground. In this paper we present a large-scale vehicle detection and counting benchmark, named DroneVehicle, aiming at advancing visual analysis tasks on the drone platform. The images in the benchmark were captured over various urban areas, which include different types of urban roads, residential areas, parking lots, highways, etc., from day to night. Specifically, DroneVehicle consists of 15,532 pairs of images, i.e., RGB images and infrared images with rich annotations, including oriented object bounding boxes, object categories, etc. With intensive amount of effort, our benchmark has 441,642 annotated instances in 31,064 images. As a large-scale dataset with both RGB and thermal infrared (RGBT) images, the benchmark enables extensive evaluation and investigation of visual analysis algorithms on the drone platform. In particular, we design two popular tasks with the benchmark, including object detection and object counting. All these tasks are extremely challenging in the proposed dataset due to factors such as illumination, occlusion, and scale variations. We hope the benchmark largely boost the research and development in visual analysis on drone platforms. The DroneVehicle dataset can be download from https://github.com/VisDrone/DroneVehicle. |
Tasks | Object Counting, Object Detection |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02437v1 |
https://arxiv.org/pdf/2003.02437v1.pdf | |
PWC | https://paperswithcode.com/paper/drone-based-rgbt-vehicle-detection-and |
Repo | https://github.com/VisDrone/DroneVehicle |
Framework | none |
Regularizers for Single-step Adversarial Training
Title | Regularizers for Single-step Adversarial Training |
Authors | B. S. Vivek, R. Venkatesh Babu |
Abstract | The progress in the last decade has enabled machine learning models to achieve impressive performance across a wide range of tasks in Computer Vision. However, a plethora of works have demonstrated the susceptibility of these models to adversarial samples. Adversarial training procedure has been proposed to defend against such adversarial attacks. Adversarial training methods augment mini-batches with adversarial samples, and typically single-step (non-iterative) methods are used for generating these adversarial samples. However, models trained using single-step adversarial training converge to degenerative minima where the model merely appears to be robust. The pseudo robustness of these models is due to the gradient masking effect. Although multi-step adversarial training helps to learn robust models, they are hard to scale due to the use of iterative methods for generating adversarial samples. To address these issues, we propose three different types of regularizers that help to learn robust models using single-step adversarial training methods. The proposed regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model. Performance of models trained using the proposed regularizers is on par with models trained using computationally expensive multi-step adversarial training methods. |
Tasks | |
Published | 2020-02-03 |
URL | https://arxiv.org/abs/2002.00614v1 |
https://arxiv.org/pdf/2002.00614v1.pdf | |
PWC | https://paperswithcode.com/paper/regularizers-for-single-step-adversarial |
Repo | https://github.com/val-iisc/SAT-Rx |
Framework | pytorch |
Variational Depth Search in ResNets
Title | Variational Depth Search in ResNets |
Authors | Javier Antorán, James Urquhart Allingham, José Miguel Hernández-Lobato |
Abstract | One-shot neural architecture search allows joint learning of weights and network architecture, reducing computational cost. We limit our search space to the depth of residual networks and formulate an analytically tractable variational objective that allows for obtaining an unbiased approximate posterior over depths in one-shot. We propose a heuristic to prune our networks based on this distribution. We compare our proposed method against manual search over network depths on the MNIST, Fashion-MNIST, SVHN datasets. We find that pruned networks do not incur a loss in predictive performance, obtaining accuracies competitive with unpruned networks. Marginalising over depth allows us to obtain better-calibrated test-time uncertainty estimates than regular networks, in a single forward pass. |
Tasks | Neural Architecture Search |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02797v3 |
https://arxiv.org/pdf/2002.02797v3.pdf | |
PWC | https://paperswithcode.com/paper/variational-depth-search-in-resnets |
Repo | https://github.com/anonimoose12345678/arch_uncert |
Framework | pytorch |
Searching Central Difference Convolutional Networks for Face Anti-Spoofing
Title | Searching Central Difference Convolutional Networks for Face Anti-Spoofing |
Authors | Zitong Yu, Chenxu Zhao, Zezheng Wang, Yunxiao Qin, Zhuo Su, Xiaobai Li, Feng Zhou, Guoying Zhao |
Abstract | Face anti-spoofing (FAS) plays a vital role in face recognition systems. Most state-of-the-art FAS methods 1) rely on stacked convolutions and expert-designed network, which is weak in describing detailed fine-grained information and easily being ineffective when the environment varies (e.g., different illumination), and 2) prefer to use long sequence as input to extract dynamic features, making them difficult to deploy into scenarios which need quick response. Here we propose a novel frame level FAS method based on Central Difference Convolution (CDC), which is able to capture intrinsic detailed patterns via aggregating both intensity and gradient information. A network built with CDC, called the Central Difference Convolutional Network (CDCN), is able to provide more robust modeling capacity than its counterpart built with vanilla convolution. Furthermore, over a specifically designed CDC search space, Neural Architecture Search (NAS) is utilized to discover a more powerful network structure (CDCN++), which can be assembled with Multiscale Attention Fusion Module (MAFM) for further boosting performance. Comprehensive experiments are performed on six benchmark datasets to show that 1) the proposed method not only achieves superior performance on intra-dataset testing (especially 0.2% ACER in Protocol-1 of OULU-NPU dataset), 2) it also generalizes well on cross-dataset testing (particularly 6.5% HTER from CASIA-MFSD to Replay-Attack datasets). The codes are available at \href{https://github.com/ZitongYu/CDCN}{https://github.com/ZitongYu/CDCN}. |
Tasks | Face Anti-Spoofing, Face Recognition, Neural Architecture Search |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.04092v1 |
https://arxiv.org/pdf/2003.04092v1.pdf | |
PWC | https://paperswithcode.com/paper/searching-central-difference-convolutional |
Repo | https://github.com/ZitongYu/CDCN |
Framework | pytorch |
A Spatio-Temporal Spot-Forecasting Framework for Urban Traffic Prediction
Title | A Spatio-Temporal Spot-Forecasting Framework for Urban Traffic Prediction |
Authors | Rodrigo de Medrano, José L. Aznarte |
Abstract | Spatio-temporal forecasting is an open research field whose interest is growing exponentially. In this work we focus on creating a complex deep neural framework for spatio-temporal traffic forecasting with comparatively very good performance and that shows to be adaptable over several spatio-temporal conditions while remaining easy to understand and interpret. Our proposal is based on an interpretable attention-based neural network in which several modules are combined in order to capture key spatio-temporal time series components. Through extensive experimentation, we show how the results of our approach are stable and better than those of other state-of-the-art alternatives. |
Tasks | Spatio-Temporal Forecasting, Time Series, Traffic Prediction |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2003.13977v2 |
https://arxiv.org/pdf/2003.13977v2.pdf | |
PWC | https://paperswithcode.com/paper/a-spatio-temporal-spot-forecasting-framework |
Repo | https://github.com/rdemedrano/crann_traffic |
Framework | none |
Représentations lexicales pour la détection non supervisée d'événements dans un flux de tweets : étude sur des corpus français et anglais
Title | Représentations lexicales pour la détection non supervisée d'événements dans un flux de tweets : étude sur des corpus français et anglais |
Authors | Béatrice Mazoyer, Nicolas Hervé, Céline Hudelot, Julia Cage |
Abstract | In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French annotated by our team. We show that recent techniques based on deep neural networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on many applications, are not very suitable for this task. We also experiment with different types of fine-tuning to improve these results on French data. Finally, we propose a detailed analysis of the results obtained, showing the superiority of tf-idf approaches for this task. |
Tasks | |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04139v1 |
https://arxiv.org/pdf/2001.04139v1.pdf | |
PWC | https://paperswithcode.com/paper/representations-lexicales-pour-la-detection |
Repo | https://github.com/ina-foss/twembeddings |
Framework | tf |