Paper Group AWR 182
PixelSNAIL: An Improved Autoregressive Generative Model. A Data-Oriented Model of Literary Language. Cascade Adversarial Machine Learning Regularized with a Unified Embedding. AMAT: Medial Axis Transform for Natural Images. LOGAN: Membership Inference Attacks Against Generative Models. SVCCA: Singular Vector Canonical Correlation Analysis for Deep …
PixelSNAIL: An Improved Autoregressive Generative Model
Title | PixelSNAIL: An Improved Autoregressive Generative Model |
Authors | Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel |
Abstract | Autoregressive generative models consistently achieve the best results in density estimation tasks involving high dimensional data, such as images or audio. They pose density estimation as a sequence modeling task, where a recurrent neural network (RNN) models the conditional distribution over the next element conditioned on all previous elements. In this paradigm, the bottleneck is the extent to which the RNN can model long-range dependencies, and the most successful approaches rely on causal convolutions, which offer better access to earlier parts of the sequence than conventional RNNs. Taking inspiration from recent work in meta reinforcement learning, where dealing with long-range dependencies is also essential, we introduce a new generative model architecture that combines causal convolutions with self attention. In this note, we describe the resulting model and present state-of-the-art log-likelihood results on CIFAR-10 (2.85 bits per dim) and $32 \times 32$ ImageNet (3.80 bits per dim). Our implementation is available at https://github.com/neocxi/pixelsnail-public |
Tasks | Density Estimation, Image Generation |
Published | 2017-12-28 |
URL | http://arxiv.org/abs/1712.09763v1 |
http://arxiv.org/pdf/1712.09763v1.pdf | |
PWC | https://paperswithcode.com/paper/pixelsnail-an-improved-autoregressive |
Repo | https://github.com/neocxi/pixelsnail-public |
Framework | tf |
A Data-Oriented Model of Literary Language
Title | A Data-Oriented Model of Literary Language |
Authors | Andreas van Cranenburgh, Rens Bod |
Abstract | We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings. |
Tasks | |
Published | 2017-01-12 |
URL | http://arxiv.org/abs/1701.03329v2 |
http://arxiv.org/pdf/1701.03329v2.pdf | |
PWC | https://paperswithcode.com/paper/a-data-oriented-model-of-literary-language |
Repo | https://github.com/andreasvc/literariness |
Framework | none |
Cascade Adversarial Machine Learning Regularized with a Unified Embedding
Title | Cascade Adversarial Machine Learning Regularized with a Unified Embedding |
Authors | Taesik Na, Jong Hwan Ko, Saibal Mukhopadhyay |
Abstract | Injecting adversarial examples during training, known as adversarial training, can improve robustness against one-step attacks, but not for unknown iterative attacks. To address this challenge, we first show iteratively generated adversarial images easily transfer between networks trained with the same strategy. Inspired by this observation, we propose cascade adversarial training, which transfers the knowledge of the end results of adversarial training. We train a network from scratch by injecting iteratively generated adversarial images crafted from already defended networks in addition to one-step adversarial images from the network being trained. We also propose to utilize embedding space for both classification and low-level (pixel-level) similarity learning to ignore unknown pixel level perturbation. During training, we inject adversarial images without replacing their corresponding clean images and penalize the distance between the two embeddings (clean and adversarial). Experimental results show that cascade adversarial training together with our proposed low-level similarity learning efficiently enhances the robustness against iterative attacks, but at the expense of decreased robustness against one-step attacks. We show that combining those two techniques can also improve robustness under the worst case black box attack scenario. |
Tasks | |
Published | 2017-08-08 |
URL | http://arxiv.org/abs/1708.02582v3 |
http://arxiv.org/pdf/1708.02582v3.pdf | |
PWC | https://paperswithcode.com/paper/cascade-adversarial-machine-learning |
Repo | https://github.com/taesikna/cascade_adv_training |
Framework | tf |
AMAT: Medial Axis Transform for Natural Images
Title | AMAT: Medial Axis Transform for Natural Images |
Authors | Stavros Tsogkas, Sven Dickinson |
Abstract | We introduce Appearance-MAT (AMAT), a generalization of the medial axis transform for natural images, that is framed as a weighted geometric set cover problem. We make the following contributions: i) we extend previous medial point detection methods for color images, by associating each medial point with a local scale; ii) inspired by the invertibility property of the binary MAT, we also associate each medial point with a local encoding that allows us to invert the AMAT, reconstructing the input image; iii) we describe a clustering scheme that takes advantage of the additional scale and appearance information to group individual points into medial branches, providing a shape decomposition of the underlying image regions. In our experiments, we show state-of-the-art performance in medial point detection on Berkeley Medial AXes (BMAX500), a new dataset of medial axes based on the BSDS500 database, and good generalization on the SK506 and WH-SYMMAX datasets. We also measure the quality of reconstructed images from BMAX500, obtained by inverting their computed AMAT. Our approach delivers significantly better reconstruction quality with respect to three baselines, using just 10% of the image pixels. Our code and annotations are available at https://github.com/tsogkas/amat . |
Tasks | |
Published | 2017-03-24 |
URL | http://arxiv.org/abs/1703.08628v2 |
http://arxiv.org/pdf/1703.08628v2.pdf | |
PWC | https://paperswithcode.com/paper/amat-medial-axis-transform-for-natural-images |
Repo | https://github.com/tsogkas/amat |
Framework | none |
LOGAN: Membership Inference Attacks Against Generative Models
Title | LOGAN: Membership Inference Attacks Against Generative Models |
Authors | Jamie Hayes, Luca Melis, George Danezis, Emiliano De Cristofaro |
Abstract | Generative models estimate the underlying distribution of a dataset to generate realistic samples according to that distribution. In this paper, we present the first membership inference attacks against generative models: given a data point, the adversary determines whether or not it was used to train the model. Our attacks leverage Generative Adversarial Networks (GANs), which combine a discriminative and a generative model, to detect overfitting and recognize inputs that were part of training datasets, using the discriminator’s capacity to learn statistical differences in distributions. We present attacks based on both white-box and black-box access to the target model, against several state-of-the-art generative models, over datasets of complex representations of faces (LFW), objects (CIFAR-10), and medical images (Diabetic Retinopathy). We also discuss the sensitivity of the attacks to different training parameters, and their robustness against mitigation strategies, finding that defenses are either ineffective or lead to significantly worse performances of the generative models in terms of training stability and/or sample quality. |
Tasks | |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07663v4 |
http://arxiv.org/pdf/1705.07663v4.pdf | |
PWC | https://paperswithcode.com/paper/logan-membership-inference-attacks-against |
Repo | https://github.com/jhayes14/gen_mem_inf |
Framework | pytorch |
SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
Title | SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability |
Authors | Maithra Raghu, Justin Gilmer, Jason Yosinski, Jascha Sohl-Dickstein |
Abstract | We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowing comparison between different layers and networks) and fast to compute (allowing more comparisons to be calculated than with previous methods). We deploy this tool to measure the intrinsic dimensionality of layers, showing in some cases needless over-parameterization; to probe learning dynamics throughout training, finding that networks converge to final representations from the bottom up; to show where class-specific information in networks is formed; and to suggest new training regimes that simultaneously save computation and overfit less. Code: https://github.com/google/svcca/ |
Tasks | |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.05806v2 |
http://arxiv.org/pdf/1706.05806v2.pdf | |
PWC | https://paperswithcode.com/paper/svcca-singular-vector-canonical-correlation |
Repo | https://github.com/google/svcca |
Framework | none |
What’s in a Question: Using Visual Questions as a Form of Supervision
Title | What’s in a Question: Using Visual Questions as a Form of Supervision |
Authors | Siddha Ganju, Olga Russakovsky, Abhinav Gupta |
Abstract | Collecting fully annotated image datasets is challenging and expensive. Many types of weak supervision have been explored: weak manual annotations, web search results, temporal continuity, ambient sound and others. We focus on one particular unexplored mode: visual questions that are asked about images. The key observation that inspires our work is that the question itself provides useful information about the image (even without the answer being available). For instance, the question “what is the breed of the dog?” informs the AI that the animal in the scene is a dog and that there is only one dog present. We make three contributions: (1) providing an extensive qualitative and quantitative analysis of the information contained in human visual questions, (2) proposing two simple but surprisingly effective modifications to the standard visual question answering models that allow them to make use of weak supervision in the form of unanswered questions associated with images and (3) demonstrating that a simple data augmentation strategy inspired by our insights results in a 7.1% improvement on the standard VQA benchmark. |
Tasks | Data Augmentation, Visual Question Answering |
Published | 2017-04-12 |
URL | http://arxiv.org/abs/1704.03895v1 |
http://arxiv.org/pdf/1704.03895v1.pdf | |
PWC | https://paperswithcode.com/paper/whats-in-a-question-using-visual-questions-as |
Repo | https://github.com/sidgan/whats_in_a_question |
Framework | none |
From Distance Correlation to Multiscale Graph Correlation
Title | From Distance Correlation to Multiscale Graph Correlation |
Authors | Cencheng Shen, Carey E. Priebe, Joshua T. Vogelstein |
Abstract | Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age. In this paper, we establish a new framework that generalizes distance correlation — a correlation measure that was recently proposed and shown to be universally consistent for dependence testing against all joint distributions of finite moments — to the Multiscale Graph Correlation (MGC). By utilizing the characteristic functions and incorporating the nearest neighbor machinery, we formalize the population version of local distance correlations, define the optimal scale in a given dependency, and name the optimal local correlation as MGC. The new theoretical framework motivates a theoretically sound Sample MGC and allows a number of desirable properties to be proved, including the universal consistency, convergence and almost unbiasedness of the sample version. The advantages of MGC are illustrated via a comprehensive set of simulations with linear, nonlinear, univariate, multivariate, and noisy dependencies, where it loses almost no power in monotone dependencies while achieving better performance in general dependencies, compared to distance correlation and other popular methods. |
Tasks | |
Published | 2017-10-26 |
URL | http://arxiv.org/abs/1710.09768v3 |
http://arxiv.org/pdf/1710.09768v3.pdf | |
PWC | https://paperswithcode.com/paper/from-distance-correlation-to-multiscale-graph |
Repo | https://github.com/neurodata/mgc-matlab |
Framework | none |
Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE
Title | Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE |
Authors | Georgios Douzas, Fernando Bacao |
Abstract | Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm and its variations generate synthetic samples along a line segment that joins minority class instances. In this paper we propose Geometric SMOTE (G-SMOTE) as a generalization of the SMOTE data generation mechanism. G-SMOTE generates synthetic samples in a geometric region of the input space, around each selected minority instance. While in the basic configuration this region is a hyper-sphere, G-SMOTE allows its deformation to a hyper-spheroid and finally to a line segment, emulating, in the last case, the SMOTE mechanism. The performance of G-SMOTE is compared against multiple standard oversampling algorithms. We present empirical results that show a significant improvement in the quality of the generated data when G-SMOTE is used as an oversampling algorithm. |
Tasks | |
Published | 2017-09-21 |
URL | http://arxiv.org/abs/1709.07377v1 |
http://arxiv.org/pdf/1709.07377v1.pdf | |
PWC | https://paperswithcode.com/paper/geometric-smote-effective-oversampling-for |
Repo | https://github.com/AlgoWit/publications |
Framework | none |
End-to-End Multimodal Emotion Recognition using Deep Neural Networks
Title | End-to-End Multimodal Emotion Recognition using Deep Neural Networks |
Authors | Panagiotis Tzirakis, George Trigeorgis, Mihalis A. Nicolaou, Björn Schuller, Stefanos Zafeiriou |
Abstract | Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications can be found in many domains including multimedia retrieval and human computer interaction. In recent years, deep neural networks have been used with great success in determining emotional states. Inspired by this success, we propose an emotion recognition system using auditory and visual modalities. To capture the emotional content for various styles of speaking, robust features need to be extracted. To this purpose, we utilize a Convolutional Neural Network (CNN) to extract features from the speech, while for the visual modality a deep residual network (ResNet) of 50 layers. In addition to the importance of feature extraction, a machine learning algorithm needs also to be insensitive to outliers while being able to model the context. To tackle this problem, Long Short-Term Memory (LSTM) networks are utilized. The system is then trained in an end-to-end fashion where - by also taking advantage of the correlations of the each of the streams - we manage to significantly outperform the traditional approaches based on auditory and visual handcrafted features for the prediction of spontaneous and natural emotions on the RECOLA database of the AVEC 2016 research challenge on emotion recognition. |
Tasks | Emotion Recognition, Multimodal Emotion Recognition |
Published | 2017-04-27 |
URL | http://arxiv.org/abs/1704.08619v1 |
http://arxiv.org/pdf/1704.08619v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-multimodal-emotion-recognition |
Repo | https://github.com/asfathermou/human-computer-interaction |
Framework | tf |
CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training
Title | CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training |
Authors | Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, Gang Hua |
Abstract | We present variational generative adversarial networks, a general learning framework that combines a variational auto-encoder with a generative adversarial network, for synthesizing images in fine-grained categories, such as faces of a specific person or objects in a category. Our approach models an image as a composition of label and latent attributes in a probabilistic model. By varying the fine-grained category label fed into the resulting generative model, we can generate images in a specific category with randomly drawn values on a latent attribute vector. Our approach has two novel aspects. First, we adopt a cross entropy loss for the discriminative and classifier network, but a mean discrepancy objective for the generative network. This kind of asymmetric loss function makes the GAN training more stable. Second, we adopt an encoder network to learn the relationship between the latent space and the real image space, and use pairwise feature matching to keep the structure of generated images. We experiment with natural images of faces, flowers, and birds, and demonstrate that the proposed models are capable of generating realistic and diverse samples with fine-grained category labels. We further show that our models can be applied to other tasks, such as image inpainting, super-resolution, and data augmentation for training better face recognition models. |
Tasks | Data Augmentation, Face Recognition, Image Generation, Image Inpainting, Super-Resolution |
Published | 2017-03-29 |
URL | http://arxiv.org/abs/1703.10155v2 |
http://arxiv.org/pdf/1703.10155v2.pdf | |
PWC | https://paperswithcode.com/paper/cvae-gan-fine-grained-image-generation |
Repo | https://github.com/One-sixth/CVAE-GAN_tensorlayer |
Framework | tf |
A Discriminative Event Based Model for Alzheimer’s Disease Progression Modeling
Title | A Discriminative Event Based Model for Alzheimer’s Disease Progression Modeling |
Authors | Vikram Venkatraghavan, Esther Bron, Wiro Niessen, Stefan Klein |
Abstract | The event-based model (EBM) for data-driven disease progression modeling estimates the sequence in which biomarkers for a disease become abnormal. This helps in understanding the dynamics of disease progression and facilitates early diagnosis by staging patients on a disease progression timeline. Existing EBM methods are all generative in nature. In this work we propose a novel discriminative approach to EBM, which is shown to be more accurate as well as computationally more efficient than existing state-of-the art EBM methods. The method first estimates for each subject an approximate ordering of events, by ranking the posterior probabilities of individual biomarkers being abnormal. Subsequently, the central ordering over all subjects is estimated by fitting a generalized Mallows model to these approximate subject-specific orderings based on a novel probabilistic Kendall’s Tau distance. To evaluate the accuracy, we performed extensive experiments on synthetic data simulating the progression of Alzheimer’s disease. Subsequently, the method was applied to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data to estimate the central event ordering in the dataset. The experiments benchmark the accuracy of the new model under various conditions and compare it with existing state-of-the-art EBM methods. The results indicate that discriminative EBM could be a simple and elegant approach to disease progression modeling. |
Tasks | |
Published | 2017-02-21 |
URL | http://arxiv.org/abs/1702.06408v1 |
http://arxiv.org/pdf/1702.06408v1.pdf | |
PWC | https://paperswithcode.com/paper/a-discriminative-event-based-model-for |
Repo | https://github.com/88vikram/pyebm |
Framework | none |
Real-time Convolutional Neural Networks for Emotion and Gender Classification
Title | Real-time Convolutional Neural Networks for Emotion and Gender Classification |
Authors | Octavio Arriaga, Matias Valdenegro-Toro, Paul Plöger |
Abstract | In this paper we propose an implement a general convolutional neural network (CNN) building framework for designing real-time CNNs. We validate our models by creating a real-time vision system which accomplishes the tasks of face detection, gender classification and emotion classification simultaneously in one blended step using our proposed CNN architecture. After presenting the details of the training procedure setup we proceed to evaluate on standard benchmark sets. We report accuracies of 96% in the IMDB gender dataset and 66% in the FER-2013 emotion dataset. Along with this we also introduced the very recent real-time enabled guided back-propagation visualization technique. Guided back-propagation uncovers the dynamics of the weight changes and evaluates the learned features. We argue that the careful implementation of modern CNN architectures, the use of the current regularization methods and the visualization of previously hidden features are necessary in order to reduce the gap between slow performances and real-time architectures. Our system has been validated by its deployment on a Care-O-bot 3 robot used during RoboCup@Home competitions. All our code, demos and pre-trained architectures have been released under an open-source license in our public repository. |
Tasks | Emotion Classification, Face Detection, Gender Prediction |
Published | 2017-10-20 |
URL | http://arxiv.org/abs/1710.07557v1 |
http://arxiv.org/pdf/1710.07557v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-convolutional-neural-networks-for |
Repo | https://github.com/ajinkyabedekar/Face-to-Emoji |
Framework | none |
Dense Transformer Networks
Title | Dense Transformer Networks |
Authors | Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji |
Abstract | The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on natural and biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods. |
Tasks | Semantic Segmentation |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08881v2 |
http://arxiv.org/pdf/1705.08881v2.pdf | |
PWC | https://paperswithcode.com/paper/dense-transformer-networks |
Repo | https://github.com/zhengyang-wang/Unet_3D |
Framework | tf |
Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks
Title | Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks |
Authors | Guokun Lai, Wei-Cheng Chang, Yiming Yang, Hanxiao Liu |
Abstract | Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. Temporal data arise in these real-world applications often involves a mixture of long-term and short-term patterns, for which traditional approaches such as Autoregressive models and Gaussian Process may fail. In this paper, we proposed a novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge. LSTNet uses the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends. Furthermore, we leverage traditional autoregressive model to tackle the scale insensitive problem of the neural network model. In our evaluation on real-world data with complex mixtures of repetitive patterns, LSTNet achieved significant performance improvements over that of several state-of-the-art baseline methods. All the data and experiment codes are available online. |
Tasks | Multivariate Time Series Forecasting, Time Series, Time Series Forecasting |
Published | 2017-03-21 |
URL | http://arxiv.org/abs/1703.07015v3 |
http://arxiv.org/pdf/1703.07015v3.pdf | |
PWC | https://paperswithcode.com/paper/modeling-long-and-short-term-temporal |
Repo | https://github.com/laiguokun/LSTNet |
Framework | pytorch |