July 29, 2019

3195 words 15 mins read

Paper Group AWR 182

PixelSNAIL: An Improved Autoregressive Generative Model. A Data-Oriented Model of Literary Language. Cascade Adversarial Machine Learning Regularized with a Unified Embedding. AMAT: Medial Axis Transform for Natural Images. LOGAN: Membership Inference Attacks Against Generative Models. SVCCA: Singular Vector Canonical Correlation Analysis for Deep …

PixelSNAIL: An Improved Autoregressive Generative Model


Title	PixelSNAIL: An Improved Autoregressive Generative Model
Authors	Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel
Abstract	Autoregressive generative models consistently achieve the best results in density estimation tasks involving high dimensional data, such as images or audio. They pose density estimation as a sequence modeling task, where a recurrent neural network (RNN) models the conditional distribution over the next element conditioned on all previous elements. In this paradigm, the bottleneck is the extent to which the RNN can model long-range dependencies, and the most successful approaches rely on causal convolutions, which offer better access to earlier parts of the sequence than conventional RNNs. Taking inspiration from recent work in meta reinforcement learning, where dealing with long-range dependencies is also essential, we introduce a new generative model architecture that combines causal convolutions with self attention. In this note, we describe the resulting model and present state-of-the-art log-likelihood results on CIFAR-10 (2.85 bits per dim) and $32 \times 32$ ImageNet (3.80 bits per dim). Our implementation is available at https://github.com/neocxi/pixelsnail-public
Tasks	Density Estimation, Image Generation
Published	2017-12-28
URL	http://arxiv.org/abs/1712.09763v1
PDF	http://arxiv.org/pdf/1712.09763v1.pdf
PWC	https://paperswithcode.com/paper/pixelsnail-an-improved-autoregressive
Repo	https://github.com/neocxi/pixelsnail-public
Framework	tf

A Data-Oriented Model of Literary Language


Title	A Data-Oriented Model of Literary Language
Authors	Andreas van Cranenburgh, Rens Bod
Abstract	We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.
Tasks
Published	2017-01-12
URL	http://arxiv.org/abs/1701.03329v2
PDF	http://arxiv.org/pdf/1701.03329v2.pdf
PWC	https://paperswithcode.com/paper/a-data-oriented-model-of-literary-language
Repo	https://github.com/andreasvc/literariness
Framework	none

Cascade Adversarial Machine Learning Regularized with a Unified Embedding


Title	Cascade Adversarial Machine Learning Regularized with a Unified Embedding
Authors	Taesik Na, Jong Hwan Ko, Saibal Mukhopadhyay
Abstract	Injecting adversarial examples during training, known as adversarial training, can improve robustness against one-step attacks, but not for unknown iterative attacks. To address this challenge, we first show iteratively generated adversarial images easily transfer between networks trained with the same strategy. Inspired by this observation, we propose cascade adversarial training, which transfers the knowledge of the end results of adversarial training. We train a network from scratch by injecting iteratively generated adversarial images crafted from already defended networks in addition to one-step adversarial images from the network being trained. We also propose to utilize embedding space for both classification and low-level (pixel-level) similarity learning to ignore unknown pixel level perturbation. During training, we inject adversarial images without replacing their corresponding clean images and penalize the distance between the two embeddings (clean and adversarial). Experimental results show that cascade adversarial training together with our proposed low-level similarity learning efficiently enhances the robustness against iterative attacks, but at the expense of decreased robustness against one-step attacks. We show that combining those two techniques can also improve robustness under the worst case black box attack scenario.
Tasks
Published	2017-08-08
URL	http://arxiv.org/abs/1708.02582v3
PDF	http://arxiv.org/pdf/1708.02582v3.pdf
PWC	https://paperswithcode.com/paper/cascade-adversarial-machine-learning
Repo	https://github.com/taesikna/cascade_adv_training
Framework	tf

AMAT: Medial Axis Transform for Natural Images


Title	AMAT: Medial Axis Transform for Natural Images
Authors	Stavros Tsogkas, Sven Dickinson
Abstract	We introduce Appearance-MAT (AMAT), a generalization of the medial axis transform for natural images, that is framed as a weighted geometric set cover problem. We make the following contributions: i) we extend previous medial point detection methods for color images, by associating each medial point with a local scale; ii) inspired by the invertibility property of the binary MAT, we also associate each medial point with a local encoding that allows us to invert the AMAT, reconstructing the input image; iii) we describe a clustering scheme that takes advantage of the additional scale and appearance information to group individual points into medial branches, providing a shape decomposition of the underlying image regions. In our experiments, we show state-of-the-art performance in medial point detection on Berkeley Medial AXes (BMAX500), a new dataset of medial axes based on the BSDS500 database, and good generalization on the SK506 and WH-SYMMAX datasets. We also measure the quality of reconstructed images from BMAX500, obtained by inverting their computed AMAT. Our approach delivers significantly better reconstruction quality with respect to three baselines, using just 10% of the image pixels. Our code and annotations are available at https://github.com/tsogkas/amat .
Tasks
Published	2017-03-24
URL	http://arxiv.org/abs/1703.08628v2
PDF	http://arxiv.org/pdf/1703.08628v2.pdf
PWC	https://paperswithcode.com/paper/amat-medial-axis-transform-for-natural-images
Repo	https://github.com/tsogkas/amat
Framework	none

LOGAN: Membership Inference Attacks Against Generative Models


Title	LOGAN: Membership Inference Attacks Against Generative Models
Authors	Jamie Hayes, Luca Melis, George Danezis, Emiliano De Cristofaro
Abstract	Generative models estimate the underlying distribution of a dataset to generate realistic samples according to that distribution. In this paper, we present the first membership inference attacks against generative models: given a data point, the adversary determines whether or not it was used to train the model. Our attacks leverage Generative Adversarial Networks (GANs), which combine a discriminative and a generative model, to detect overfitting and recognize inputs that were part of training datasets, using the discriminator’s capacity to learn statistical differences in distributions. We present attacks based on both white-box and black-box access to the target model, against several state-of-the-art generative models, over datasets of complex representations of faces (LFW), objects (CIFAR-10), and medical images (Diabetic Retinopathy). We also discuss the sensitivity of the attacks to different training parameters, and their robustness against mitigation strategies, finding that defenses are either ineffective or lead to significantly worse performances of the generative models in terms of training stability and/or sample quality.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07663v4
PDF	http://arxiv.org/pdf/1705.07663v4.pdf
PWC	https://paperswithcode.com/paper/logan-membership-inference-attacks-against
Repo	https://github.com/jhayes14/gen_mem_inf
Framework	pytorch

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability


Title	SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
Authors	Maithra Raghu, Justin Gilmer, Jason Yosinski, Jascha Sohl-Dickstein
Abstract	We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowing comparison between different layers and networks) and fast to compute (allowing more comparisons to be calculated than with previous methods). We deploy this tool to measure the intrinsic dimensionality of layers, showing in some cases needless over-parameterization; to probe learning dynamics throughout training, finding that networks converge to final representations from the bottom up; to show where class-specific information in networks is formed; and to suggest new training regimes that simultaneously save computation and overfit less. Code: https://github.com/google/svcca/
Tasks
Published	2017-06-19
URL	http://arxiv.org/abs/1706.05806v2
PDF	http://arxiv.org/pdf/1706.05806v2.pdf
PWC	https://paperswithcode.com/paper/svcca-singular-vector-canonical-correlation
Repo	https://github.com/google/svcca
Framework	none

What’s in a Question: Using Visual Questions as a Form of Supervision


Title	What’s in a Question: Using Visual Questions as a Form of Supervision
Authors	Siddha Ganju, Olga Russakovsky, Abhinav Gupta
Abstract	Collecting fully annotated image datasets is challenging and expensive. Many types of weak supervision have been explored: weak manual annotations, web search results, temporal continuity, ambient sound and others. We focus on one particular unexplored mode: visual questions that are asked about images. The key observation that inspires our work is that the question itself provides useful information about the image (even without the answer being available). For instance, the question “what is the breed of the dog?” informs the AI that the animal in the scene is a dog and that there is only one dog present. We make three contributions: (1) providing an extensive qualitative and quantitative analysis of the information contained in human visual questions, (2) proposing two simple but surprisingly effective modifications to the standard visual question answering models that allow them to make use of weak supervision in the form of unanswered questions associated with images and (3) demonstrating that a simple data augmentation strategy inspired by our insights results in a 7.1% improvement on the standard VQA benchmark.
Tasks	Data Augmentation, Visual Question Answering
Published	2017-04-12
URL	http://arxiv.org/abs/1704.03895v1
PDF	http://arxiv.org/pdf/1704.03895v1.pdf
PWC	https://paperswithcode.com/paper/whats-in-a-question-using-visual-questions-as
Repo	https://github.com/sidgan/whats_in_a_question
Framework	none

From Distance Correlation to Multiscale Graph Correlation


Title	From Distance Correlation to Multiscale Graph Correlation
Authors	Cencheng Shen, Carey E. Priebe, Joshua T. Vogelstein
Abstract	Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age. In this paper, we establish a new framework that generalizes distance correlation — a correlation measure that was recently proposed and shown to be universally consistent for dependence testing against all joint distributions of finite moments — to the Multiscale Graph Correlation (MGC). By utilizing the characteristic functions and incorporating the nearest neighbor machinery, we formalize the population version of local distance correlations, define the optimal scale in a given dependency, and name the optimal local correlation as MGC. The new theoretical framework motivates a theoretically sound Sample MGC and allows a number of desirable properties to be proved, including the universal consistency, convergence and almost unbiasedness of the sample version. The advantages of MGC are illustrated via a comprehensive set of simulations with linear, nonlinear, univariate, multivariate, and noisy dependencies, where it loses almost no power in monotone dependencies while achieving better performance in general dependencies, compared to distance correlation and other popular methods.
Tasks
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09768v3
PDF	http://arxiv.org/pdf/1710.09768v3.pdf
PWC	https://paperswithcode.com/paper/from-distance-correlation-to-multiscale-graph
Repo	https://github.com/neurodata/mgc-matlab
Framework	none

Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE


Title	Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE
Authors	Georgios Douzas, Fernando Bacao
Abstract	Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm and its variations generate synthetic samples along a line segment that joins minority class instances. In this paper we propose Geometric SMOTE (G-SMOTE) as a generalization of the SMOTE data generation mechanism. G-SMOTE generates synthetic samples in a geometric region of the input space, around each selected minority instance. While in the basic configuration this region is a hyper-sphere, G-SMOTE allows its deformation to a hyper-spheroid and finally to a line segment, emulating, in the last case, the SMOTE mechanism. The performance of G-SMOTE is compared against multiple standard oversampling algorithms. We present empirical results that show a significant improvement in the quality of the generated data when G-SMOTE is used as an oversampling algorithm.
Tasks
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07377v1
PDF	http://arxiv.org/pdf/1709.07377v1.pdf
PWC	https://paperswithcode.com/paper/geometric-smote-effective-oversampling-for
Repo	https://github.com/AlgoWit/publications
Framework	none

End-to-End Multimodal Emotion Recognition using Deep Neural Networks


Title	End-to-End Multimodal Emotion Recognition using Deep Neural Networks
Authors	Panagiotis Tzirakis, George Trigeorgis, Mihalis A. Nicolaou, Björn Schuller, Stefanos Zafeiriou
Abstract	Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications can be found in many domains including multimedia retrieval and human computer interaction. In recent years, deep neural networks have been used with great success in determining emotional states. Inspired by this success, we propose an emotion recognition system using auditory and visual modalities. To capture the emotional content for various styles of speaking, robust features need to be extracted. To this purpose, we utilize a Convolutional Neural Network (CNN) to extract features from the speech, while for the visual modality a deep residual network (ResNet) of 50 layers. In addition to the importance of feature extraction, a machine learning algorithm needs also to be insensitive to outliers while being able to model the context. To tackle this problem, Long Short-Term Memory (LSTM) networks are utilized. The system is then trained in an end-to-end fashion where - by also taking advantage of the correlations of the each of the streams - we manage to significantly outperform the traditional approaches based on auditory and visual handcrafted features for the prediction of spontaneous and natural emotions on the RECOLA database of the AVEC 2016 research challenge on emotion recognition.
Tasks	Emotion Recognition, Multimodal Emotion Recognition
Published	2017-04-27
URL	http://arxiv.org/abs/1704.08619v1
PDF	http://arxiv.org/pdf/1704.08619v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-multimodal-emotion-recognition
Repo	https://github.com/asfathermou/human-computer-interaction
Framework	tf

CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training


Title	CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training
Authors	Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, Gang Hua
Abstract	We present variational generative adversarial networks, a general learning framework that combines a variational auto-encoder with a generative adversarial network, for synthesizing images in fine-grained categories, such as faces of a specific person or objects in a category. Our approach models an image as a composition of label and latent attributes in a probabilistic model. By varying the fine-grained category label fed into the resulting generative model, we can generate images in a specific category with randomly drawn values on a latent attribute vector. Our approach has two novel aspects. First, we adopt a cross entropy loss for the discriminative and classifier network, but a mean discrepancy objective for the generative network. This kind of asymmetric loss function makes the GAN training more stable. Second, we adopt an encoder network to learn the relationship between the latent space and the real image space, and use pairwise feature matching to keep the structure of generated images. We experiment with natural images of faces, flowers, and birds, and demonstrate that the proposed models are capable of generating realistic and diverse samples with fine-grained category labels. We further show that our models can be applied to other tasks, such as image inpainting, super-resolution, and data augmentation for training better face recognition models.
Tasks	Data Augmentation, Face Recognition, Image Generation, Image Inpainting, Super-Resolution
Published	2017-03-29
URL	http://arxiv.org/abs/1703.10155v2
PDF	http://arxiv.org/pdf/1703.10155v2.pdf
PWC	https://paperswithcode.com/paper/cvae-gan-fine-grained-image-generation
Repo	https://github.com/One-sixth/CVAE-GAN_tensorlayer
Framework	tf

A Discriminative Event Based Model for Alzheimer’s Disease Progression Modeling


Title	A Discriminative Event Based Model for Alzheimer’s Disease Progression Modeling
Authors	Vikram Venkatraghavan, Esther Bron, Wiro Niessen, Stefan Klein
Abstract	The event-based model (EBM) for data-driven disease progression modeling estimates the sequence in which biomarkers for a disease become abnormal. This helps in understanding the dynamics of disease progression and facilitates early diagnosis by staging patients on a disease progression timeline. Existing EBM methods are all generative in nature. In this work we propose a novel discriminative approach to EBM, which is shown to be more accurate as well as computationally more efficient than existing state-of-the art EBM methods. The method first estimates for each subject an approximate ordering of events, by ranking the posterior probabilities of individual biomarkers being abnormal. Subsequently, the central ordering over all subjects is estimated by fitting a generalized Mallows model to these approximate subject-specific orderings based on a novel probabilistic Kendall’s Tau distance. To evaluate the accuracy, we performed extensive experiments on synthetic data simulating the progression of Alzheimer’s disease. Subsequently, the method was applied to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data to estimate the central event ordering in the dataset. The experiments benchmark the accuracy of the new model under various conditions and compare it with existing state-of-the-art EBM methods. The results indicate that discriminative EBM could be a simple and elegant approach to disease progression modeling.
Tasks
Published	2017-02-21
URL	http://arxiv.org/abs/1702.06408v1
PDF	http://arxiv.org/pdf/1702.06408v1.pdf
PWC	https://paperswithcode.com/paper/a-discriminative-event-based-model-for
Repo	https://github.com/88vikram/pyebm
Framework	none

Real-time Convolutional Neural Networks for Emotion and Gender Classification


Title	Real-time Convolutional Neural Networks for Emotion and Gender Classification
Authors	Octavio Arriaga, Matias Valdenegro-Toro, Paul Plöger
Abstract	In this paper we propose an implement a general convolutional neural network (CNN) building framework for designing real-time CNNs. We validate our models by creating a real-time vision system which accomplishes the tasks of face detection, gender classification and emotion classification simultaneously in one blended step using our proposed CNN architecture. After presenting the details of the training procedure setup we proceed to evaluate on standard benchmark sets. We report accuracies of 96% in the IMDB gender dataset and 66% in the FER-2013 emotion dataset. Along with this we also introduced the very recent real-time enabled guided back-propagation visualization technique. Guided back-propagation uncovers the dynamics of the weight changes and evaluates the learned features. We argue that the careful implementation of modern CNN architectures, the use of the current regularization methods and the visualization of previously hidden features are necessary in order to reduce the gap between slow performances and real-time architectures. Our system has been validated by its deployment on a Care-O-bot 3 robot used during RoboCup@Home competitions. All our code, demos and pre-trained architectures have been released under an open-source license in our public repository.
Tasks	Emotion Classification, Face Detection, Gender Prediction
Published	2017-10-20
URL	http://arxiv.org/abs/1710.07557v1
PDF	http://arxiv.org/pdf/1710.07557v1.pdf
PWC	https://paperswithcode.com/paper/real-time-convolutional-neural-networks-for
Repo	https://github.com/ajinkyabedekar/Face-to-Emoji
Framework	none

Dense Transformer Networks


Title	Dense Transformer Networks
Authors	Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji
Abstract	The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on natural and biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods.
Tasks	Semantic Segmentation
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08881v2
PDF	http://arxiv.org/pdf/1705.08881v2.pdf
PWC	https://paperswithcode.com/paper/dense-transformer-networks
Repo	https://github.com/zhengyang-wang/Unet_3D
Framework	tf

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks


Title	Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks
Authors	Guokun Lai, Wei-Cheng Chang, Yiming Yang, Hanxiao Liu
Abstract	Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. Temporal data arise in these real-world applications often involves a mixture of long-term and short-term patterns, for which traditional approaches such as Autoregressive models and Gaussian Process may fail. In this paper, we proposed a novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge. LSTNet uses the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends. Furthermore, we leverage traditional autoregressive model to tackle the scale insensitive problem of the neural network model. In our evaluation on real-world data with complex mixtures of repetitive patterns, LSTNet achieved significant performance improvements over that of several state-of-the-art baseline methods. All the data and experiment codes are available online.
Tasks	Multivariate Time Series Forecasting, Time Series, Time Series Forecasting
Published	2017-03-21
URL	http://arxiv.org/abs/1703.07015v3
PDF	http://arxiv.org/pdf/1703.07015v3.pdf
PWC	https://paperswithcode.com/paper/modeling-long-and-short-term-temporal
Repo	https://github.com/laiguokun/LSTNet
Framework	pytorch