Paper Group ANR 516
microbatchGAN: Stimulating Diversity with Multi-Adversarial Discrimination. EMOPAIN Challenge 2020: Multimodal Pain Evaluation from Facial and Bodily Expressions. Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation. Neural Networks are Convex Regularizers: Exact Polynomial-time C …
microbatchGAN: Stimulating Diversity with Multi-Adversarial Discrimination
Title | microbatchGAN: Stimulating Diversity with Multi-Adversarial Discrimination |
Authors | Gonçalo Mordido, Haojin Yang, Christoph Meinel |
Abstract | We propose to tackle the mode collapse problem in generative adversarial networks (GANs) by using multiple discriminators and assigning a different portion of each minibatch, called microbatch, to each discriminator. We gradually change each discriminator’s task from distinguishing between real and fake samples to discriminating samples coming from inside or outside its assigned microbatch by using a diversity parameter $\alpha$. The generator is then forced to promote variety in each minibatch to make the microbatch discrimination harder to achieve by each discriminator. Thus, all models in our framework benefit from having variety in the generated set to reduce their respective losses. We show evidence that our solution promotes sample diversity since early training stages on multiple datasets. |
Tasks | |
Published | 2020-01-10 |
URL | https://arxiv.org/abs/2001.03376v1 |
https://arxiv.org/pdf/2001.03376v1.pdf | |
PWC | https://paperswithcode.com/paper/microbatchgan-stimulating-diversity-with |
Repo | |
Framework | |
EMOPAIN Challenge 2020: Multimodal Pain Evaluation from Facial and Bodily Expressions
Title | EMOPAIN Challenge 2020: Multimodal Pain Evaluation from Facial and Bodily Expressions |
Authors | Joy O. Egede, Siyang Song, Temitayo A. Olugbade, Chongyang Wang, Amanda Williams, Hongying Meng, Min Aung, Nicholas D. Lane, Michel Valstar, Nadia Bianchi-Berthouze |
Abstract | The EmoPain 2020 Challenge is the first international competition aimed at creating a uniform platform for the comparison of machine learning and multimedia processing methods of automatic chronic pain assessment from human expressive behaviour, and also the identification of pain-related behaviours. The objective of the challenge is to promote research in the development of assistive technologies that help improve the quality of life for people with chronic pain via real-time monitoring and feedback to help manage their condition and remain physically active. The challenge also aims to encourage the use of the relatively underutilised, albeit vital bodily expression signals for automatic pain and pain-related emotion recognition. This paper presents a description of the challenge, competition guidelines, bench-marking dataset, and the baseline systems’ architecture and performance on the three sub-tasks: pain estimation from facial expressions, pain recognition from multimodal movement, and protective movement behaviour detection. |
Tasks | Emotion Recognition |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07739v3 |
https://arxiv.org/pdf/2001.07739v3.pdf | |
PWC | https://paperswithcode.com/paper/emopain-challenge-2020-multimodal-pain |
Repo | |
Framework | |
Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation
Title | Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation |
Authors | Chuteng Zhou, Prad Kadambi, Matthew Mattina, Paul N. Whatmough |
Abstract | The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for accelerating neural networks, based on either electronic, optical or photonic devices, which may well achieve lower power consumption than conventional digital electronics. However, these proposed analog accelerators suffer from the intrinsic noise generated by their physical components, which makes it challenging to achieve high accuracy on deep neural networks. Hence, for successful deployment on analog accelerators, it is essential to be able to train deep neural networks to be robust to random continuous noise in the network weights, which is a somewhat new challenge in machine learning. In this paper, we advance the understanding of noisy neural networks. We outline how a noisy neural network has reduced learning capacity as a result of loss of mutual information between its input and output. To combat this, we propose using knowledge distillation combined with noise injection during training to achieve more noise robust networks, which is demonstrated experimentally across different networks and datasets, including ImageNet. Our method achieves models with as much as two times greater noise tolerance compared with the previous best attempts, which is a significant step towards making analog hardware practical for deep learning. |
Tasks | |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.04974v1 |
https://arxiv.org/pdf/2001.04974v1.pdf | |
PWC | https://paperswithcode.com/paper/noisy-machines-understanding-noisy-neural-1 |
Repo | |
Framework | |
Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks
Title | Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks |
Authors | Mert Pilanci, Tolga Ergen |
Abstract | We develop exact representations of two layer neural networks with rectified linear units in terms of a single convex program with number of variables polynomial in the number of training samples and number of hidden neurons. Our theory utilizes semi-infinite duality and minimum norm regularization. Moreover, we show that certain standard multi-layer convolutional neural networks are equivalent to L1 regularized linear models in a polynomial sized discrete Fourier feature space. We also introduce exact semi-definite programming representations of convolutional and fully connected linear multi-layer networks which are polynomial size in both the sample size and dimension. |
Tasks | |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10553v1 |
https://arxiv.org/pdf/2002.10553v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-are-convex-regularizers-exact |
Repo | |
Framework | |
CQ-VQA: Visual Question Answering on Categorized Questions
Title | CQ-VQA: Visual Question Answering on Categorized Questions |
Authors | Aakansha Mishra, Ashish Anand, Prithwijit Guha |
Abstract | This paper proposes CQ-VQA, a novel 2-level hierarchical but end-to-end model to solve the task of visual question answering (VQA). The first level of CQ-VQA, referred to as question categorizer (QC), classifies questions to reduce the potential answer search space. The QC uses attended and fused features of the input question and image. The second level, referred to as answer predictor (AP), comprises of a set of distinct classifiers corresponding to each question category. Depending on the question category predicted by QC, only one of the classifiers of AP remains active. The loss functions of QC and AP are aggregated together to make it an end-to-end model. The proposed model (CQ-VQA) is evaluated on the TDIUC dataset and is benchmarked against state-of-the-art approaches. Results indicate competitive or better performance of CQ-VQA. |
Tasks | Question Answering, Visual Question Answering |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.06800v1 |
https://arxiv.org/pdf/2002.06800v1.pdf | |
PWC | https://paperswithcode.com/paper/cq-vqa-visual-question-answering-on |
Repo | |
Framework | |
Sparse and Structured Visual Attention
Title | Sparse and Structured Visual Attention |
Authors | Pedro Henrique Martins, Vlad Niculae, Zita Marinho, André Martins |
Abstract | Visual attention mechanisms are widely used in multimodal tasks, such as image captioning and visual question answering (VQA). One drawback of softmax-based attention mechanisms is that they assign probability mass to all image regions, regardless of their adjacency structure and of their relevance to the text. In this paper, to better link the image structure with the text, we replace the traditional softmax attention mechanism with two alternative sparsity-promoting transformations: sparsemax, which is able to select the relevant regions only (assigning zero weight to the rest), and a newly proposed Total-Variation Sparse Attention (TVmax), which further encourages the joint selection of adjacent spatial locations. Experiments in image captioning and VQA, using both LSTM and Transformer architectures, show gains in terms of human-rated caption quality, attention relevance, and VQA accuracy, with improved interpretability. |
Tasks | Image Captioning, Question Answering, Visual Question Answering |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05556v1 |
https://arxiv.org/pdf/2002.05556v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-and-structured-visual-attention-1 |
Repo | |
Framework | |
Low-volatility Anomaly and the Adaptive Multi-Factor Model
Title | Low-volatility Anomaly and the Adaptive Multi-Factor Model |
Authors | Robert A. Jarrow, Rinald Murataj, Martin T. Wells, Liao Zhu |
Abstract | The paper explains the low-volatility anomaly from a new perspective. We use the Adaptive Multi-Factor (AMF) model estimated by the Groupwise Interpretable Basis Selection (GIBS) algorithm to find the basis assets significantly related to each of the portfolios. The AMF results show that the two portfolios load on very different factors, which indicates that the volatility is not an independent measure of risk, but are related to the basis assets and risk factors in the related industries. It is the performance of the loaded factors that results in the low-volatility anomaly. The out-performance of the low-volatility portfolio may not because of its low-risk (which contradicts the risk-premium theory), but because of the out-performance of the risk factors the low-volatility portfolio is loaded on. Also, we compare the AMF model with the traditional Fama-French 5-factor (FF5) model in various aspects, which shows the superior performance of the AMF model over FF5 in many perspectives. |
Tasks | |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.08302v1 |
https://arxiv.org/pdf/2003.08302v1.pdf | |
PWC | https://paperswithcode.com/paper/low-volatility-anomaly-and-the-adaptive-multi |
Repo | |
Framework | |
Fast Convergence for Langevin Diffusion with Matrix Manifold Structure
Title | Fast Convergence for Langevin Diffusion with Matrix Manifold Structure |
Authors | Ankur Moitra, Andrej Risteski |
Abstract | In this paper, we study the problem of sampling from distributions of the form p(x) \propto e^{-\beta f(x)} for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex – for which sampling from p may in general require an exponential number of queries. In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. We give a recipe for proving mixing time bounds of Langevin dynamics in order to sample from manifolds of local optima of the function f in settings where the distribution is well-concentrated around them. We specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X \in R^{d \times k} of a low-rank matrix, i.e. f(X) = \A(XX^T) - b^2_2, X \in R^{d \times k}, and \beta the inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output. |
Tasks | Bayesian Inference |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05576v1 |
https://arxiv.org/pdf/2002.05576v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-convergence-for-langevin-diffusion-with |
Repo | |
Framework | |
Semi-supervised Grasp Detection by Representation Learning in a Vector Quantized Latent Space
Title | Semi-supervised Grasp Detection by Representation Learning in a Vector Quantized Latent Space |
Authors | Mridul Mahajan, Tryambak Bhattacharjee, Arya Krishnan, Priya Shukla, G C Nandi |
Abstract | For a robot to perform complex manipulation tasks, it is necessary for it to have a good grasping ability. However, vision based robotic grasp detection is hindered by the unavailability of sufficient labelled data. Furthermore, the application of semi-supervised learning techniques to grasp detection is under-explored. In this paper, a semi-supervised learning based grasp detection approach has been presented, which models a discrete latent space using a Vector Quantized Variational AutoEncoder (VQ-VAE). To the best of our knowledge, this is the first time a Variational AutoEncoder (VAE) has been applied in the domain of robotic grasp detection. The VAE helps the model in generalizing beyond the Cornell Grasping Dataset (CGD) despite having a limited amount of labelled data by also utilizing the unlabelled data. This claim has been validated by testing the model on images, which are not available in the CGD. Along with this, we augment the Generative Grasping Convolutional Neural Network (GGCNN) architecture with the decoder structure used in the VQ-VAE model with the intuition that it should help to regress in the vector-quantized latent space. Subsequently, the model performs significantly better than the existing approaches which do not make use of unlabelled images to improve the grasp. |
Tasks | Representation Learning |
Published | 2020-01-23 |
URL | https://arxiv.org/abs/2001.08477v3 |
https://arxiv.org/pdf/2001.08477v3.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-grasp-detection-by |
Repo | |
Framework | |
A Deep Unsupervised Feature Learning Spiking Neural Network with Binarized Classification Layers for EMNIST Classification
Title | A Deep Unsupervised Feature Learning Spiking Neural Network with Binarized Classification Layers for EMNIST Classification |
Authors | Ruthvik Vaila, John Chiasson, Vishal Saxena |
Abstract | End user AI is trained on large server farms with data collected from the users. With ever increasing demand for IOT devices, there is a need for deep learning approaches that can be implemented (at the edge) in an energy efficient manner. In this work we approach this using spiking neural networks. The unsupervised learning technique of spike timing dependent plasticity (STDP) using binary activations are used to extract features from spiking input data. Gradient descent (backpropagation) is used only on the output layer to perform the training for classification. The accuracies obtained for the balanced EMNIST data set compare favorably with other approaches. The effect of stochastic gradient descent (SGD) approximations on learning capabilities of our network are also explored. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11843v1 |
https://arxiv.org/pdf/2002.11843v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-unsupervised-feature-learning-spiking |
Repo | |
Framework | |
A Tree Adjoining Grammar Representation for Models Of Stochastic Dynamical Systems
Title | A Tree Adjoining Grammar Representation for Models Of Stochastic Dynamical Systems |
Authors | Dhruv Khandelwal, Maarten Schoukens, Roland Tóth |
Abstract | Model structure and complexity selection remains a challenging problem in system identification, especially for parametric non-linear models. Many Evolutionary Algorithm (EA) based methods have been proposed in the literature for estimating model structure and complexity. In most cases, the proposed methods are devised for estimating structure and complexity within a specified model class and hence these methods do not extend to other model structures without significant changes. In this paper, we propose a Tree Adjoining Grammar (TAG) for stochastic parametric models. TAGs can be used to generate models in an EA framework while imposing desirable structural constraints and incorporating prior knowledge. In this paper, we propose a TAG that can systematically generate models ranging from FIRs to polynomial NARMAX models. Furthermore, we demonstrate that TAGs can be easily extended to more general model classes, such as the non-linear Box-Jenkins model class, enabling the realization of flexible and automatic model structure and complexity selection via EA. |
Tasks | |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05320v1 |
https://arxiv.org/pdf/2001.05320v1.pdf | |
PWC | https://paperswithcode.com/paper/a-tree-adjoining-grammar-representation-for |
Repo | |
Framework | |
SMArtCast: Predicting soil moisture interpolations into the future using Earth observation data in a deep learning framework
Title | SMArtCast: Predicting soil moisture interpolations into the future using Earth observation data in a deep learning framework |
Authors | Conrad James Foley, Sagar Vaze, Mohamed El Amine Seddiq, Alexey Unagaev, Natalia Efremova |
Abstract | Soil moisture is critical component of crop health and monitoring it can enable further actions for increasing yield or preventing catastrophic die off. As climate change increases the likelihood of extreme weather events and reduces the predictability of weather, and non-optimal soil moistures for crops may become more likely. In this work, we a series of LSTM architectures to analyze measurements of soil moisture and vegetation indiced derived from satellite imagery. The system learns to predict the future values of these measurements. These spatially sparse values and indices are used as input features to an interpolation method that infer spatially dense moisture map for a future time point. This has the potential to provide advance warning for soil moistures that may be inhospitable to crops across an area with limited monitoring capacity. |
Tasks | |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.10823v1 |
https://arxiv.org/pdf/2003.10823v1.pdf | |
PWC | https://paperswithcode.com/paper/smartcast-predicting-soil-moisture |
Repo | |
Framework | |
DAWSON: A Domain Adaptive Few Shot Generation Framework
Title | DAWSON: A Domain Adaptive Few Shot Generation Framework |
Authors | Weixin Liang, Zixuan Liu, Can Liu |
Abstract | Training a Generative Adversarial Networks (GAN) for a new domain from scratch requires an enormous amount of training data and days of training time. To this end, we propose DAWSON, a Domain Adaptive FewShot Generation FrameworkFor GANs based on meta-learning. A major challenge of applying meta-learning GANs is to obtain gradients for the generator from evaluating it on development sets due to the likelihood-free nature of GANs. To address this challenge, we propose an alternative GAN training procedure that naturally combines the two-step training procedure of GANs and the two-step training procedure of meta-learning algorithms. DAWSON is a plug-and-play framework that supports a broad family of meta-learning algorithms and various GANs with architectural-variants. Based on DAWSON, We also propose MUSIC MATINEE, which is the first few-shot music generation model. Our experiments show that MUSIC MATINEE could quickly adapt to new domains with only tens of songs from the target domains. We also show that DAWSON can learn to generate new digits with only four samples in the MNIST dataset. We release source codes implementation of DAWSON in both PyTorch and Tensorflow, generated music samples on two genres and the lightning video. |
Tasks | Meta-Learning, Music Generation |
Published | 2020-01-02 |
URL | https://arxiv.org/abs/2001.00576v1 |
https://arxiv.org/pdf/2001.00576v1.pdf | |
PWC | https://paperswithcode.com/paper/dawson-a-domain-adaptive-few-shot-generation |
Repo | |
Framework | |
Super-resolution of multispectral satellite images using convolutional neural networks
Title | Super-resolution of multispectral satellite images using convolutional neural networks |
Authors | M. U. Müller, N. Ekhtiari, R. M. Almeida, C. Rieke |
Abstract | Super-resolution aims at increasing image resolution by algorithmic means and has progressed over the recent years due to advances in the fields of computer vision and deep learning. Convolutional Neural Networks based on a variety of architectures have been applied to the problem, e.g. autoencoders and residual networks. While most research focuses on the processing of photographs consisting only of RGB color channels, little work can be found concentrating on multi-band, analytic satellite imagery. Satellite images often include a panchromatic band, which has higher spatial resolution but lower spectral resolution than the other bands. In the field of remote sensing, there is a long tradition of applying pan-sharpening to satellite images, i.e. bringing the multispectral bands to the higher spatial resolution by merging them with the panchromatic band. To our knowledge there are so far no approaches to super-resolution which take advantage of the panchromatic band. In this paper we propose a method to train state-of-the-art CNNs using pairs of lower-resolution multispectral and high-resolution pan-sharpened image tiles in order to create super-resolved analytic images. The derived quality metrics show that the method improves information content of the processed images. We compare the results created by four CNN architectures, with RedNet30 performing best. |
Tasks | Super-Resolution |
Published | 2020-02-03 |
URL | https://arxiv.org/abs/2002.00580v1 |
https://arxiv.org/pdf/2002.00580v1.pdf | |
PWC | https://paperswithcode.com/paper/super-resolution-of-multispectral-satellite |
Repo | |
Framework | |
Generative Adversarial Networks for LHCb Fast Simulation
Title | Generative Adversarial Networks for LHCb Fast Simulation |
Authors | Fedor Ratnikov |
Abstract | LHCb is one of the major experiments operating at the Large Hadron Collider at CERN. The richness of the physics program and the increasing precision of the measurements in LHCb lead to the need of ever larger simulated samples. This need will increase further when the upgraded LHCb detector will start collecting data in the LHC Run 3. Given the computing resources pledged for the production of Monte Carlo simulated events in the next years, the use of fast simulation techniques will be mandatory to cope with the expected dataset size. In LHCb generative models, which are nowadays widely used for computer vision and image processing are being investigated in order to accelerate the generation of showers in the calorimeter and high-level responses of Cherenkov detector. We demonstrate that this approach provides high-fidelity results along with a significant speed increase and discuss possible implication of these results. We also present an implementation of this algorithm into LHCb simulation software and validation tests. |
Tasks | |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09762v1 |
https://arxiv.org/pdf/2003.09762v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-networks-for-lhcb-fast |
Repo | |
Framework | |