October 20, 2019

3357 words 16 mins read

Paper Group AWR 348

Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs. Unsupervised Representation Learning by Predicting Image Rotations. Simultaneous Edge Alignment and Learning. RepMet: Representative-based metric learning for classification and one-shot object detection. Deep Learning using Rectified Linear Units (ReLU). …

Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs


Title	Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs
Authors	Dinesh Acharya, Zhiwu Huang, Danda Pani Paudel, Luc Van Gool
Abstract	The extension of image generation to video generation turns out to be a very difficult task, since the temporal dimension of videos introduces an extra challenge during the generation process. Besides, due to the limitation of memory and training stability, the generation becomes increasingly challenging with the increase of the resolution/duration of videos. In this work, we exploit the idea of progressive growing of Generative Adversarial Networks (GANs) for higher resolution video generation. In particular, we begin to produce video samples of low-resolution and short-duration, and then progressively increase both resolution and duration alone (or jointly) by adding new spatiotemporal convolutional layers to the current networks. Starting from the learning on a very raw-level spatial appearance and temporal movement of the video distribution, the proposed progressive method learns spatiotemporal information incrementally to generate higher resolution videos. Furthermore, we introduce a sliced version of Wasserstein GAN (SWGAN) loss to improve the distribution learning on the video data of high-dimension and mixed-spatiotemporal distribution. SWGAN loss replaces the distance between joint distributions by that of one-dimensional marginal distributions, making the loss easier to compute. We evaluate the proposed model on our collected face video dataset of 10,900 videos to generate photorealistic face videos of 256x256x32 resolution. In addition, our model also reaches a record inception score of 14.57 in unsupervised action recognition dataset UCF-101.
Tasks	Image Generation, Temporal Action Localization, Video Generation
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02419v2
PDF	http://arxiv.org/pdf/1810.02419v2.pdf
PWC	https://paperswithcode.com/paper/towards-high-resolution-video-generation-with
Repo	https://github.com/musikisomorphie/swd
Framework	tf

Unsupervised Representation Learning by Predicting Image Rotations


Title	Unsupervised Representation Learning by Predicting Image Rotations
Authors	Spyros Gidaris, Praveer Singh, Nikos Komodakis
Abstract	Over the last years, deep convolutional neural networks (ConvNets) have transformed the field of computer vision thanks to their unparalleled capacity to learn high level semantic image features. However, in order to successfully learn those features, they usually require massive amounts of manually labeled data, which is both expensive and impractical to scale. Therefore, unsupervised semantic feature learning, i.e., learning without requiring manual annotation effort, is of crucial importance in order to successfully harvest the vast amount of visual data that are available today. In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input. We demonstrate both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning. We exhaustively evaluate our method in various unsupervised feature learning benchmarks and we exhibit in all of them state-of-the-art performance. Specifically, our results on those benchmarks demonstrate dramatic improvements w.r.t. prior state-of-the-art approaches in unsupervised representation learning and thus significantly close the gap with supervised feature learning. For instance, in PASCAL VOC 2007 detection task our unsupervised pre-trained AlexNet model achieves the state-of-the-art (among unsupervised methods) mAP of 54.4% that is only 2.4 points lower from the supervised case. We get similarly striking results when we transfer our unsupervised learned features on various other tasks, such as ImageNet classification, PASCAL classification, PASCAL segmentation, and CIFAR-10 classification. The code and models of our paper will be published on: https://github.com/gidariss/FeatureLearningRotNet .
Tasks	Representation Learning, Unsupervised Representation Learning
Published	2018-03-21
URL	http://arxiv.org/abs/1803.07728v1
PDF	http://arxiv.org/pdf/1803.07728v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-representation-learning-by-1
Repo	https://github.com/k-han/AutoNovel
Framework	pytorch

Simultaneous Edge Alignment and Learning


Title	Simultaneous Edge Alignment and Learning
Authors	Zhiding Yu, Weiyang Liu, Yang Zou, Chen Feng, Srikumar Ramalingam, B. V. K. Vijaya Kumar, Jan Kautz
Abstract	Edge detection is among the most fundamental vision problems for its role in perceptual grouping and its wide applications. Recent advances in representation learning have led to considerable improvements in this area. Many state of the art edge detection models are learned with fully convolutional networks (FCNs). However, FCN-based edge learning tends to be vulnerable to misaligned labels due to the delicate structure of edges. While such problem was considered in evaluation benchmarks, similar issue has not been explicitly addressed in general edge learning. In this paper, we show that label misalignment can cause considerably degraded edge learning quality, and address this issue by proposing a simultaneous edge alignment and learning framework. To this end, we formulate a probabilistic model where edge alignment is treated as latent variable optimization, and is learned end-to-end during network training. Experiments show several applications of this work, including improved edge detection with state of the art performance, and automatic refinement of noisy annotations.
Tasks	Edge Detection, Representation Learning
Published	2018-08-06
URL	http://arxiv.org/abs/1808.01992v3
PDF	http://arxiv.org/pdf/1808.01992v3.pdf
PWC	https://paperswithcode.com/paper/simultaneous-edge-alignment-and-learning
Repo	https://github.com/Chrisding/seal
Framework	none

RepMet: Representative-based metric learning for classification and one-shot object detection


Title	RepMet: Representative-based metric learning for classification and one-shot object detection
Authors	Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, Alex M. Bronstein
Abstract	Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples. In this work, we propose a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the training categories in that space, in a single end-to-end training process. Our approach outperforms state-of-the-art methods for DML-based object classification on a variety of standard fine-grained datasets. Furthermore, we demonstrate the effectiveness of our approach on the problem of few-shot object detection, by incorporating the proposed DML architecture as a classification head into a standard object detection model. We achieve the best results on the ImageNet-LOC dataset compared to strong baselines, when only a few training examples are available. We also offer the community a new episodic benchmark based on the ImageNet dataset for the few-shot object detection task.
Tasks	Few-Shot Object Detection, Metric Learning, Object Classification, Object Detection, One-Shot Object Detection
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04728v3
PDF	http://arxiv.org/pdf/1806.04728v3.pdf
PWC	https://paperswithcode.com/paper/repmet-representative-based-metric-learning
Repo	https://github.com/jshtok/RepMet
Framework	mxnet

Deep Learning using Rectified Linear Units (ReLU)


Title	Deep Learning using Rectified Linear Units (ReLU)
Authors	Abien Fred Agarap
Abstract	We introduce the use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN). Conventionally, ReLU is used as an activation function in DNNs, with Softmax function as their classification function. However, there have been several studies on using a classification function other than Softmax, and this study is an addition to those. We accomplish this by taking the activation of the penultimate layer $h_{n - 1}$ in a neural network, then multiply it by weight parameters $\theta$ to get the raw scores $o_{i}$. Afterwards, we threshold the raw scores $o_{i}$ by $0$, i.e. $f(o) = \max(0, o_{i})$, where $f(o)$ is the ReLU function. We provide class predictions $\hat{y}$ through argmax function, i.e. argmax $f(x)$.
Tasks
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08375v2
PDF	http://arxiv.org/pdf/1803.08375v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-using-rectified-linear-units
Repo	https://github.com/AFAgarap/relu-classifier
Framework	none

Scalable Convolutional Dictionary Learning with Constrained Recurrent Sparse Auto-encoders


Title	Scalable Convolutional Dictionary Learning with Constrained Recurrent Sparse Auto-encoders
Authors	Bahareh Tolooshams, Sourav Dey, Demba Ba
Abstract	Given a convolutional dictionary underlying a set of observed signals, can a carefully designed auto-encoder recover the dictionary in the presence of noise? We introduce an auto-encoder architecture, termed constrained recurrent sparse auto-encoder (CRsAE), that answers this question in the affirmative. Given an input signal and an approximate dictionary, the encoder finds a sparse approximation using FISTA. The decoder reconstructs the signal by applying the dictionary to the output of the encoder. The encoder and decoder in CRsAE parallel the sparse-coding and dictionary update steps in optimization-based alternating-minimization schemes for dictionary learning. As such, the parameters of the encoder and decoder are not independent, a constraint which we enforce for the first time. We derive the back-propagation algorithm for CRsAE. CRsAE is a framework for blind source separation that, only knowing the number of sources (dictionary elements), and assuming sparsely-many can overlap, is able to separate them. We demonstrate its utility in the context of spike sorting, a source separation problem in computational neuroscience. We demonstrate the ability of CRsAE to recover the underlying dictionary and characterize its sensitivity as a function of SNR.
Tasks	Dictionary Learning
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04734v1
PDF	http://arxiv.org/pdf/1807.04734v1.pdf
PWC	https://paperswithcode.com/paper/scalable-convolutional-dictionary-learning
Repo	https://github.com/ds2p/crsae
Framework	none

Self-Paced Learning with Adaptive Deep Visual Embeddings


Title	Self-Paced Learning with Adaptive Deep Visual Embeddings
Authors	Vithursan Thangarasa, Graham W. Taylor
Abstract	Selecting the most appropriate data examples to present a deep neural network (DNN) at different stages of training is an unsolved challenge. Though practitioners typically ignore this problem, a non-trivial data scheduling method may result in a significant improvement in both convergence and generalization performance. In this paper, we introduce Self-Paced Learning with Adaptive Deep Visual Embeddings (SPL-ADVisE), a novel end-to-end training protocol that unites self-paced learning (SPL) and deep metric learning (DML). We leverage the Magnet Loss to train an embedding convolutional neural network (CNN) to learn a salient representation space. The student CNN classifier dynamically selects similar instance-level training examples to form a mini-batch, where the easiness from the cross-entropy loss and the true diverseness of examples from the learned metric space serve as sample importance priors. To demonstrate the effectiveness of SPL-ADVisE, we use deep CNN architectures for the task of supervised image classification on several coarse- and fine-grained visual recognition datasets. Results show that, across all datasets, the proposed method converges faster and reaches a higher final accuracy than other SPL variants, particularly on fine-grained classes.
Tasks	Fine-Grained Visual Recognition, Image Classification, Metric Learning
Published	2018-07-24
URL	http://arxiv.org/abs/1807.09200v1
PDF	http://arxiv.org/pdf/1807.09200v1.pdf
PWC	https://paperswithcode.com/paper/self-paced-learning-with-adaptive-deep-visual
Repo	https://github.com/vithursant/SPL-ADVisE
Framework	pytorch

Hierarchical Graph Representation Learning with Differentiable Pooling


Title	Hierarchical Graph Representation Learning with Differentiable Pooling
Authors	Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, Jure Leskovec
Abstract	Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs—a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DiffPool, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DiffPool learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DiffPool yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.
Tasks	Graph Classification, Graph Representation Learning, Link Prediction, Node Classification, Representation Learning
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08804v4
PDF	http://arxiv.org/pdf/1806.08804v4.pdf
PWC	https://paperswithcode.com/paper/hierarchical-graph-representation-learning
Repo	https://github.com/VoVAllen/diffpool
Framework	pytorch

Link Prediction in Networks with Core-Fringe Data


Title	Link Prediction in Networks with Core-Fringe Data
Authors	Austin R. Benson, Jon Kleinberg
Abstract	Data collection often involves the partial measurement of a larger system. A common example arises in collecting network data: we often obtain network datasets by recording all of the interactions among a small set of core nodes, so that we end up with a measurement of the network consisting of these core nodes along with a potentially much larger set of fringe nodes that have links to the core. Given the ubiquity of this process for assembling network data, it is crucial to understand the role of such a `core-fringe’ structure. Here we study how the inclusion of fringe nodes affects the standard task of network link prediction. One might initially think the inclusion of any additional data is useful, and hence that it should be beneficial to include all fringe nodes that are available. However, we find that this is not true; in fact, there is substantial variability in the value of the fringe nodes for prediction. Once an algorithm is selected, in some datasets, including any additional data from the fringe can actually hurt prediction performance; in other datasets, including some amount of fringe information is useful before prediction performance saturates or even declines; and in further cases, including the entire fringe leads to the best performance. While such variety might seem surprising, we show that these behaviors are exhibited by simple random graph models. \|
Tasks	Link Prediction
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11540v2
PDF	http://arxiv.org/pdf/1811.11540v2.pdf
PWC	https://paperswithcode.com/paper/link-prediction-in-networks-with-core-fringe
Repo	https://github.com/arbenson/cflp
Framework	none

Variational Wasserstein Clustering


Title	Variational Wasserstein Clustering
Authors	Liang Mi, Wen Zhang, Xianfeng Gu, Yalin Wang
Abstract	We propose a new clustering method based on optimal transportation. We solve optimal transportation with variational principles, and investigate the use of power diagrams as transportation plans for aggregating arbitrary domains into a fixed number of clusters. We iteratively drive centroids through target domains while maintaining the minimum clustering energy by adjusting the power diagrams. Thus, we simultaneously pursue clustering and the Wasserstein distances between the centroids and the target domains, resulting in a measure-preserving mapping. We demonstrate the use of our method in domain adaptation, remeshing, and representation learning on synthetic and real data.
Tasks	Domain Adaptation, Representation Learning
Published	2018-06-23
URL	http://arxiv.org/abs/1806.09045v4
PDF	http://arxiv.org/pdf/1806.09045v4.pdf
PWC	https://paperswithcode.com/paper/variational-wasserstein-clustering
Repo	https://github.com/icemiliang/pyvot
Framework	pytorch

RS-Net: Regression-Segmentation 3D CNN for Synthesis of Full Resolution Missing Brain MRI in the Presence of Tumours


Title	RS-Net: Regression-Segmentation 3D CNN for Synthesis of Full Resolution Missing Brain MRI in the Presence of Tumours
Authors	Raghav Mehta, Tal Arbel
Abstract	Accurate synthesis of a full 3D MR image containing tumours from available MRI (e.g. to replace an image that is currently unavailable or corrupted) would provide a clinician as well as downstream inference methods with important complementary information for disease analysis. In this paper, we present an end-to-end 3D convolution neural network that takes a set of acquired MR image sequences (e.g. T1, T2, T1ce) as input and concurrently performs (1) regression of the missing full resolution 3D MRI (e.g. FLAIR) and (2) segmentation of the tumour into subtypes (e.g. enhancement, core). The hypothesis is that this would focus the network to perform accurate synthesis in the area of the tumour. Experiments on the BraTS 2015 and 2017 datasets [1] show that: (1) the proposed method gives better performance than state-of-the-art methods in terms of established global evaluation metrics (e.g. PSNR), (2) replacing real MR volumes with the synthesized MRI does not lead to significant degradation in tumour and sub-structure segmentation accuracy. The system further provides uncertainty estimates based on Monte Carlo (MC) dropout [11] for the synthesized volume at each voxel, permitting quantification of the system’s confidence in the output at each location.
Tasks
Published	2018-07-28
URL	http://arxiv.org/abs/1807.10972v1
PDF	http://arxiv.org/pdf/1807.10972v1.pdf
PWC	https://paperswithcode.com/paper/rs-net-regression-segmentation-3d-cnn-for
Repo	https://github.com/RagMeh11/RS-Net
Framework	none

Enhancing Perceptual Attributes with Bayesian Style Generation


Title	Enhancing Perceptual Attributes with Bayesian Style Generation
Authors	Aliaksandr Siarohin, Gloria Zen, Nicu Sebe, Elisa Ricci
Abstract	Deep learning has brought an unprecedented progress in computer vision and significant advances have been made in predicting subjective properties inherent to visual data (e.g., memorability, aesthetic quality, evoked emotions, etc.). Recently, some research works have even proposed deep learning approaches to modify images such as to appropriately alter these properties. Following this research line, this paper introduces a novel deep learning framework for synthesizing images in order to enhance a predefined perceptual attribute. Our approach takes as input a natural image and exploits recent models for deep style transfer and generative adversarial networks to change its style in order to modify a specific high-level attribute. Differently from previous works focusing on enhancing a specific property of a visual content, we propose a general framework and demonstrate its effectiveness in two use cases, i.e. increasing image memorability and generating scary pictures. We evaluate the proposed approach on publicly available benchmarks, demonstrating its advantages over state of the art methods.
Tasks	Style Transfer
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00717v1
PDF	http://arxiv.org/pdf/1812.00717v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-perceptual-attributes-with-bayesian
Repo	https://github.com/aliaksandrsiarohin/bae
Framework	tf

Data Augmentation for Skin Lesion Analysis


Title	Data Augmentation for Skin Lesion Analysis
Authors	Fábio Perez, Cristina Vasconcelos, Sandra Avila, Eduardo Valle
Abstract	Deep learning models show remarkable results in automated skin lesion analysis. However, these models demand considerable amounts of data, while the availability of annotated skin lesion images is often limited. Data augmentation can expand the training dataset by transforming input images. In this work, we investigate the impact of 13 data augmentation scenarios for melanoma classification trained on three CNNs (Inception-v4, ResNet, and DenseNet). Scenarios include traditional color and geometric transforms, and more unusual augmentations such as elastic transforms, random erasing and a novel augmentation that mixes different lesions. We also explore the use of data augmentation at test-time and the impact of data augmentation on various dataset sizes. Our results confirm the importance of data augmentation in both training and testing and show that it can lead to more performance gains than obtaining new images. The best scenario results in an AUC of 0.882 for melanoma classification without using external data, outperforming the top-ranked submission (0.874) for the ISIC Challenge 2017, which was trained with additional data.
Tasks	Data Augmentation, Skin Cancer Classification, Skin Lesion Classification
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01442v1
PDF	http://arxiv.org/pdf/1809.01442v1.pdf
PWC	https://paperswithcode.com/paper/data-augmentation-for-skin-lesion-analysis
Repo	https://github.com/fabioperez/skin-data-augmentation
Framework	pytorch

Transductive Adversarial Networks (TAN)


Title	Transductive Adversarial Networks (TAN)
Authors	Sean Rowan
Abstract	Transductive Adversarial Networks (TAN) is a novel domain-adaptation machine learning framework that is designed for learning a conditional probability distribution on unlabelled input data in a target domain, while also only having access to: (1) easily obtained labelled data from a related source domain, which may have a different conditional probability distribution than the target domain, and (2) a marginalised prior distribution on the labels for the target domain. TAN leverages a fully adversarial training procedure and a unique generator/encoder architecture which approximates the transductive combination of the available source- and target-domain data. A benefit of TAN is that it allows the distance between the source- and target-domain label-vector marginal probability distributions to be greater than 0 (i.e. different tasks across the source and target domains) whereas other domain-adaptation algorithms require this distance to equal 0 (i.e. a single task across the source and target domains). TAN can, however, still handle the latter case and is a more generalised approach to this case. Another benefit of TAN is that due to being a fully adversarial algorithm, it has the potential to accurately approximate highly complex distributions. Theoretical analysis demonstrates the viability of the TAN framework.
Tasks	Domain Adaptation
Published	2018-02-08
URL	http://arxiv.org/abs/1802.02798v1
PDF	http://arxiv.org/pdf/1802.02798v1.pdf
PWC	https://paperswithcode.com/paper/transductive-adversarial-networks-tan
Repo	https://github.com/sean-rowan/tan
Framework	none

A Grammar-Based Structural CNN Decoder for Code Generation


Title	A Grammar-Based Structural CNN Decoder for Code Generation
Authors	Zeyu Sun, Qihao Zhu, Lili Mou, Yingfei Xiong, Ge Li, Lu Zhang
Abstract	Code generation maps a program description to executable source code in a programming language. Existing approaches mainly rely on a recurrent neural network (RNN) as the decoder. However, we find that a program contains significantly more tokens than a natural language sentence, and thus it may be inappropriate for RNN to capture such a long sequence. In this paper, we propose a grammar-based structural convolutional neural network (CNN) for code generation. Our model generates a program by predicting the grammar rules of the programming language; we design several CNN modules, including the tree-based convolution and pre-order convolution, whose information is further aggregated by dedicated attentive pooling layers. Experimental results on the HearthStone benchmark dataset show that our CNN code generator significantly outperforms the previous state-of-the-art method by 5 percentage points; additional experiments on several semantic parsing tasks demonstrate the robustness of our model. We also conduct in-depth ablation test to better understand each component of our model.
Tasks	Code Generation, Semantic Parsing
Published	2018-11-14
URL	http://arxiv.org/abs/1811.06837v1
PDF	http://arxiv.org/pdf/1811.06837v1.pdf
PWC	https://paperswithcode.com/paper/a-grammar-based-structural-cnn-decoder-for
Repo	https://github.com/zysszy/GrammarCNN
Framework	tf