October 21, 2019

3134 words 15 mins read

Paper Group AWR 86

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing. Learning Speaker Representations with Mutual Information. Efficient, Certifiably Optimal Clustering with Applications to Latent Variable Graphical Models. A Survey of Learning Causality with Data: Problems and Methods. SADA: Semantic Adversarial Diagnostic Attacks fo …

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing


Title	ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing
Authors	Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, Simon Lucey
Abstract	We address the problem of finding realistic geometric corrections to a foreground object such that it appears natural when composited into a background image. To achieve this, we propose a novel Generative Adversarial Network (GAN) architecture that utilizes Spatial Transformer Networks (STNs) as the generator, which we call Spatial Transformer GANs (ST-GANs). ST-GANs seek image realism by operating in the geometric warp parameter space. In particular, we exploit an iterative STN warping scheme and propose a sequential training strategy that achieves better results compared to naive training of a single generator. One of the key advantages of ST-GAN is its applicability to high-resolution images indirectly since the predicted warp parameters are transferable between reference frames. We demonstrate our approach in two applications: (1) visualizing how indoor furniture (e.g. from product images) might be perceived in a room, (2) hallucinating how accessories like glasses would look when matched with real portraits.
Tasks
Published	2018-03-05
URL	http://arxiv.org/abs/1803.01837v1
PDF	http://arxiv.org/pdf/1803.01837v1.pdf
PWC	https://paperswithcode.com/paper/st-gan-spatial-transformer-generative
Repo	https://github.com/chenhsuanlin/spatial-transformer-GAN
Framework	tf

Learning Speaker Representations with Mutual Information


Title	Learning Speaker Representations with Mutual Information
Authors	Mirco Ravanelli, Yoshua Bengio
Abstract	Learning good representations is of crucial importance in deep learning. Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way. Even though the mutual information between two random variables is hard to measure directly in high dimensional spaces, some recent studies have shown that an implicit optimization of MI can be achieved with an encoder-discriminator architecture similar to that of Generative Adversarial Networks (GANs). In this work, we learn representations that capture speaker identities by maximizing the mutual information between the encoded representations of chunks of speech randomly sampled from the same sentence. The proposed encoder relies on the SincNet architecture and transforms raw speech waveform into a compact feature vector. The discriminator is fed by either positive samples (of the joint distribution of encoded chunks) or negative samples (from the product of the marginals) and is trained to separate them. We report experiments showing that this approach effectively learns useful speaker representations, leading to promising results on speaker identification and verification tasks. Our experiments consider both unsupervised and semi-supervised settings and compare the performance achieved with different objective functions.
Tasks	Speaker Identification
Published	2018-12-01
URL	http://arxiv.org/abs/1812.00271v2
PDF	http://arxiv.org/pdf/1812.00271v2.pdf
PWC	https://paperswithcode.com/paper/learning-speaker-representations-with-mutual
Repo	https://github.com/Js-Mim/rl_singing_voice
Framework	pytorch

Efficient, Certifiably Optimal Clustering with Applications to Latent Variable Graphical Models


Title	Efficient, Certifiably Optimal Clustering with Applications to Latent Variable Graphical Models
Authors	Carson Eisenach, Han Liu
Abstract	Motivated by the task of clustering either $d$ variables or $d$ points into $K$ groups, we investigate efficient algorithms to solve the Peng-Wei (P-W) $K$-means semi-definite programming (SDP) relaxation. The P-W SDP has been shown in the literature to have good statistical properties in a variety of settings, but remains intractable to solve in practice. To this end we propose FORCE, a new algorithm to solve this SDP relaxation. Compared to the naive interior point method, our method reduces the computational complexity of solving the SDP from $\tilde{O}(d^7\log\epsilon^{-1})$ to $\tilde{O}(d^{6}K^{-2}\epsilon^{-1})$ arithmetic operations for an $\epsilon$-optimal solution. Our method combines a primal first-order method with a dual optimality certificate search, which when successful, allows for early termination of the primal method. We show for certain variable clustering problems that, with high probability, FORCE is guaranteed to find the optimal solution to the SDP relaxation and provide a certificate of exact optimality. As verified by our numerical experiments, this allows FORCE to solve the P-W SDP with dimensions in the hundreds in only tens of seconds. For a variation of the P-W SDP where $K$ is not known a priori a slight modification of FORCE reduces the computational complexity of solving this problem as well: from $\tilde{O}(d^7\log\epsilon^{-1})$ using a standard SDP solver to $\tilde{O}(d^{4}\epsilon^{-1})$.
Tasks
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00530v3
PDF	http://arxiv.org/pdf/1806.00530v3.pdf
PWC	https://paperswithcode.com/paper/efficient-certifiably-optimal-clustering-with
Repo	https://github.com/ceisenach/R_GFORCE
Framework	none

A Survey of Learning Causality with Data: Problems and Methods


Title	A Survey of Learning Causality with Data: Problems and Methods
Authors	Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, Huan Liu
Abstract	The era of big data provides researchers with convenient access to copious data. However, people often have little knowledge about it. The increasing prevalence of big data is challenging the traditional methods of learning causality because they are developed for the cases with limited amount of data and solid prior causal knowledge. This survey aims to close the gap between big data and learning causality with a comprehensive and structured review of traditional and frontier methods and a discussion about some open problems of learning causality. We begin with preliminaries of learning causality. Then we categorize and revisit methods of learning causality for the typical problems and data types. After that, we discuss the connections between learning causality and machine learning. At the end, some open problems are presented to show the great potential of learning causality with data.
Tasks
Published	2018-09-25
URL	http://arxiv.org/abs/1809.09337v3
PDF	http://arxiv.org/pdf/1809.09337v3.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-learning-causality-with-data
Repo	https://github.com/rguo12/awesome-causality-algorithms
Framework	none

SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications


Title	SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications
Authors	Abdullah Hamdi, Matthias Müller, Bernard Ghanem
Abstract	One major factor impeding more widespread adoption of deep neural networks (DNNs) is their lack of robustness, which is essential for safety-critical applications such as autonomous driving. This has motivated much recent work on adversarial attacks for DNNs, which mostly focus on pixel-level perturbations void of semantic meaning. In contrast, we present a general framework for adversarial attacks on trained agents, which covers semantic perturbations to the environment of the agent performing the task as well as pixel-level attacks. To do this, we re-frame the adversarial attack problem as learning a distribution of parameters that always fools the agent. In the semantic case, our proposed adversary (denoted as BBGAN) is trained to sample parameters that describe the environment with which the black-box agent interacts, such that the agent performs its dedicated task poorly in this environment. We apply BBGAN on three different tasks, primarily targeting aspects of autonomous navigation: object detection, self-driving, and autonomous UAV racing. On these tasks, BBGAN can generate failure cases that consistently fool a trained agent.
Tasks	Adversarial Attack, Autonomous Driving, Autonomous Navigation, Object Detection
Published	2018-12-05
URL	https://arxiv.org/abs/1812.02132v3
PDF	https://arxiv.org/pdf/1812.02132v3.pdf
PWC	https://paperswithcode.com/paper/sada-semantic-adversarial-diagnostic-attacks
Repo	https://github.com/ajhamdi/SADA
Framework	tf

Automatic Induction of Neural Network Decision Tree Algorithms


Title	Automatic Induction of Neural Network Decision Tree Algorithms
Authors	Chapman Siu
Abstract	This work presents an approach to automatically induction for non-greedy decision trees constructed from neural network architecture. This construction can be used to transfer weights when growing or pruning a decision tree, allowing non-greedy decision tree algorithms to automatically learn and adapt to the ideal architecture. In this work, we examine the underpinning ideas within ensemble modelling and Bayesian model averaging which allow our neural network to asymptotically approach the ideal architecture through weights transfer. Experimental results demonstrate that this approach improves models over fixed set of hyperparameters for decision tree models and decision forest models.
Tasks
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10735v4
PDF	http://arxiv.org/pdf/1811.10735v4.pdf
PWC	https://paperswithcode.com/paper/automatic-induction-of-neural-network
Repo	https://github.com/chappers/automatic-induction-neural-decision-tree
Framework	tf

MobileNetV2: Inverted Residuals and Linear Bottlenecks


Title	MobileNetV2: Inverted Residuals and Linear Bottlenecks
Authors	Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
Abstract	In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters
Tasks	Image Classification, Object Detection, Semantic Segmentation
Published	2018-01-13
URL	http://arxiv.org/abs/1801.04381v4
PDF	http://arxiv.org/pdf/1801.04381v4.pdf
PWC	https://paperswithcode.com/paper/mobilenetv2-inverted-residuals-and-linear
Repo	https://github.com/zym1119/MobileNetV2_pytorch_cifar
Framework	pytorch

A Biologically Plausible Learning Rule for Deep Learning in the Brain


Title	A Biologically Plausible Learning Rule for Deep Learning in the Brain
Authors	Isabella Pozzi, Sander Bohté, Pieter Roelfsema
Abstract	Researchers have proposed that deep learning, which is providing important progress in a wide range of high complexity tasks, might inspire new insights into learning in the brain. However, the methods used for deep learning by artificial neural networks are biologically unrealistic and would need to be replaced by biologically realistic counterparts. Previous biologically plausible reinforcement learning rules, like AGREL and AuGMEnT, showed promising results but focused on shallow networks with three layers. Will these learning rules also generalize to networks with more layers and can they handle tasks of higher complexity? We demonstrate the learning scheme on classical and hard image-classification benchmarks, namely MNIST, CIFAR10 and CIFAR100, cast as direct reward tasks, both for fully connected, convolutional and locally connected architectures. We show that our learning rule - Q-AGREL - performs comparably to supervised learning via error-backpropagation, with this type of trial-and-error reinforcement learning requiring only 1.5-2.5 times more epochs, even when classifying 100 different classes as in CIFAR100. Our results provide new insights into how deep learning may be implemented in the brain.
Tasks	Image Classification
Published	2018-11-05
URL	https://arxiv.org/abs/1811.01768v3
PDF	https://arxiv.org/pdf/1811.01768v3.pdf
PWC	https://paperswithcode.com/paper/a-biologically-plausible-learning-rule-for
Repo	https://github.com/csxeba/Reproducing-Q-AGREL
Framework	none

Deeply Supervised Rotation Equivariant Network for Lesion Segmentation in Dermoscopy Images


Title	Deeply Supervised Rotation Equivariant Network for Lesion Segmentation in Dermoscopy Images
Authors	Xiaomeng Li, Lequan Yu, Chi-Wing Fu, Pheng-Ann Heng
Abstract	Automatic lesion segmentation in dermoscopy images is an essential step for computer-aided diagnosis of melanoma. The dermoscopy images exhibits rotational and reflectional symmetry, however, this geometric property has not been encoded in the state-of-the-art convolutional neural networks based skin lesion segmentation methods. In this paper, we present a deeply supervised rotation equivariant network for skin lesion segmentation by extending the recent group rotation equivariant network~\cite{cohen2016group}. Specifically, we propose the G-upsampling and G-projection operations to adapt the rotation equivariant classification network for our skin lesion segmentation problem. To further increase the performance, we integrate the deep supervision scheme into our proposed rotation equivariant segmentation architecture. The whole framework is equivariant to input transformations, including rotation and reflection, which improves the network efficiency and thus contributes to the segmentation performance. We extensively evaluate our method on the ISIC 2017 skin lesion challenge dataset. The experimental results show that our rotation equivariant networks consistently excel the regular counterparts with the same model complexity under different experimental settings. Our best model achieves 77.23%(JA) on the test dataset, outperforming the state-of-the-art challenging methods and further demonstrating the effectiveness of our proposed deeply supervised rotation equivariant segmentation network. Our best model also outperforms the state-of-the-art challenging methods, which further demonstrate the effectiveness of our proposed deeply supervised rotation equivariant segmentation network.
Tasks	Lesion Segmentation
Published	2018-07-08
URL	http://arxiv.org/abs/1807.02804v1
PDF	http://arxiv.org/pdf/1807.02804v1.pdf
PWC	https://paperswithcode.com/paper/deeply-supervised-rotation-equivariant
Repo	https://github.com/xmengli999/Deeply-Supervised-Rotation-Equivariant-Network-for-Lesion-Segmentation
Framework	pytorch

The Helmholtz Method: Using Perceptual Compression to Reduce Machine Learning Complexity


Title	The Helmholtz Method: Using Perceptual Compression to Reduce Machine Learning Complexity
Authors	Gerald Friedland, Jingkang Wang, Ruoxi Jia, Bo Li
Abstract	This paper proposes a fundamental answer to a frequently asked question in multimedia computing and machine learning: Do artifacts from perceptual compression contribute to error in the machine learning process and if so, how much? Our approach to the problem is a reinterpretation of the Helmholtz Free Energy formula from physics to explain the relationship between content and noise when using sensors (such as cameras or microphones) to capture multimedia data. The reinterpretation allows a bit-measurement of the noise contained in images, audio, and video by combining a classifier with perceptual compression, such as JPEG or MP3. Our experiments on CIFAR-10 as well as Fraunhofer’s IDMT-SMT-Audio-Effects dataset indicate that, at the right quality level, perceptual compression is actually not harmful but contributes to a significant reduction of complexity of the machine learning process. That is, our noise quantification method can be used to speed up the training of deep learning classifiers significantly while maintaining, or sometimes even improving, overall classification accuracy. Moreover, our results provide insights into the reasons for the success of deep learning.
Tasks
Published	2018-07-10
URL	http://arxiv.org/abs/1807.10569v1
PDF	http://arxiv.org/pdf/1807.10569v1.pdf
PWC	https://paperswithcode.com/paper/the-helmholtz-method-using-perceptual
Repo	https://github.com/wangjksjtu/Helmholtz-DL
Framework	tf

TVQA: Localized, Compositional Video Question Answering


Title	TVQA: Localized, Compositional Video Question Answering
Authors	Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg
Abstract	Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks. However, due to data limitations, there has been much less work on video-based QA. In this paper, we present TVQA, a large-scale video QA dataset based on 6 popular TV shows. TVQA consists of 152,545 QA pairs from 21,793 clips, spanning over 460 hours of video. Questions are designed to be compositional in nature, requiring systems to jointly localize relevant moments within a clip, comprehend subtitle-based dialogue, and recognize relevant visual concepts. We provide analyses of this new dataset as well as several baselines and a multi-stream end-to-end trainable neural network framework for the TVQA task. The dataset is publicly available at http://tvqa.cs.unc.edu.
Tasks	Video Question Answering
Published	2018-09-05
URL	https://arxiv.org/abs/1809.01696v2
PDF	https://arxiv.org/pdf/1809.01696v2.pdf
PWC	https://paperswithcode.com/paper/tvqa-localized-compositional-video-question
Repo	https://github.com/jayleicn/TVQA
Framework	pytorch

Weakly- and Semi-Supervised Panoptic Segmentation


Title	Weakly- and Semi-Supervised Panoptic Segmentation
Authors	Qizhu Li, Anurag Arnab, Philip H. S. Torr
Abstract	We present a weakly supervised model that jointly performs both semantic- and instance-segmentation – a particularly relevant problem given the substantial cost of obtaining pixel-perfect annotation for these tasks. In contrast to many popular instance segmentation approaches based on object detectors, our method does not predict any overlapping instances. Moreover, we are able to segment both “thing” and “stuff” classes, and thus explain all the pixels in the image. “Thing” classes are weakly-supervised with bounding boxes, and “stuff” with image-level tags. We obtain state-of-the-art results on Pascal VOC, for both full and weak supervision (which achieves about 95% of fully-supervised performance). Furthermore, we present the first weakly-supervised results on Cityscapes for both semantic- and instance-segmentation. Finally, we use our weakly supervised framework to analyse the relationship between annotation quality and predictive performance, which is of interest to dataset creators.
Tasks	Instance Segmentation, Panoptic Segmentation, Semantic Segmentation, Weakly-supervised instance segmentation, Weakly-supervised panoptic segmentation, Weakly-Supervised Semantic Segmentation
Published	2018-08-10
URL	http://arxiv.org/abs/1808.03575v3
PDF	http://arxiv.org/pdf/1808.03575v3.pdf
PWC	https://paperswithcode.com/paper/weakly-and-semi-supervised-panoptic
Repo	https://github.com/qizhuli/Weakly-Supervised-Panoptic-Segmentation
Framework	none

Learning State Representations for Query Optimization with Deep Reinforcement Learning


Title	Learning State Representations for Query Optimization with Deep Reinforcement Learning
Authors	Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi
Abstract	Deep reinforcement learning is quickly changing the field of artificial intelligence. These models are able to capture a high level understanding of their environment, enabling them to learn difficult dynamic tasks in a variety of domains. In the database field, query optimization remains a difficult problem. Our goal in this work is to explore the capabilities of deep reinforcement learning in the context of query optimization. At each state, we build queries incrementally and encode properties of subqueries through a learned representation. The challenge here lies in the formation of the state transition function, which defines how the current subquery state combines with the next query operation (action) to yield the next state. As a first step in this direction, we focus the state representation problem and the formation of the state transition function. We describe our approach and show preliminary results. We further discuss how we can use the state representation to improve query optimization using reinforcement learning.
Tasks
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08604v1
PDF	http://arxiv.org/pdf/1803.08604v1.pdf
PWC	https://paperswithcode.com/paper/learning-state-representations-for-query
Repo	https://github.com/jinw18/Readings_MLDB
Framework	none

Learning recurrent dynamics in spiking networks


Title	Learning recurrent dynamics in spiking networks
Authors	Christopher Kim, Carson Chow
Abstract	Spiking activity of neurons engaged in learning and performing a task show complex spatiotemporal dynamics. While the output of recurrent network models can learn to perform various tasks, the possible range of recurrent dynamics that emerge after learning remains unknown. Here we show that modifying the recurrent connectivity with a recursive least squares algorithm provides sufficient flexibility for synaptic and spiking rate dynamics of spiking networks to produce a wide range of spatiotemporal activity. We apply the training method to learn arbitrary firing patterns, stabilize irregular spiking activity of a balanced network, and reproduce the heterogeneous spiking rate patterns of cortical neurons engaged in motor planning and movement. We identify sufficient conditions for successful learning, characterize two types of learning errors, and assess the network capacity. Our findings show that synaptically-coupled recurrent spiking networks possess a vast computational capability that can support the diverse activity patterns in the brain.
Tasks
Published	2018-03-18
URL	http://arxiv.org/abs/1803.06622v2
PDF	http://arxiv.org/pdf/1803.06622v2.pdf
PWC	https://paperswithcode.com/paper/learning-recurrent-dynamics-in-spiking
Repo	https://github.com/chrismkkim/SpikeLearning
Framework	none

Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation


Title	Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation
Authors	Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, Qi Su
Abstract	Most of the Neural Machine Translation (NMT) models are based on the sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped with the attention mechanism. However, the conventional attention mechanism treats the decoding at each time step equally with the same matrix, which is problematic since the softness of the attention for different types of words (e.g. content words and function words) should differ. Therefore, we propose a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature. Experimental results on the Chinese-English translation and English-Vietnamese translation demonstrate that our model outperforms the baseline models, and the analysis and the case study show that our model can attend to the most relevant elements in the source-side contexts and generate the translation of high quality.
Tasks	Machine Translation
Published	2018-08-22
URL	http://arxiv.org/abs/1808.07374v2
PDF	http://arxiv.org/pdf/1808.07374v2.pdf
PWC	https://paperswithcode.com/paper/learning-when-to-concentrate-or-divert
Repo	https://github.com/lancopku/SACT
Framework	pytorch