Paper Group AWR 86
ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing. Learning Speaker Representations with Mutual Information. Efficient, Certifiably Optimal Clustering with Applications to Latent Variable Graphical Models. A Survey of Learning Causality with Data: Problems and Methods. SADA: Semantic Adversarial Diagnostic Attacks fo …
ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing
Title | ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing |
Authors | Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, Simon Lucey |
Abstract | We address the problem of finding realistic geometric corrections to a foreground object such that it appears natural when composited into a background image. To achieve this, we propose a novel Generative Adversarial Network (GAN) architecture that utilizes Spatial Transformer Networks (STNs) as the generator, which we call Spatial Transformer GANs (ST-GANs). ST-GANs seek image realism by operating in the geometric warp parameter space. In particular, we exploit an iterative STN warping scheme and propose a sequential training strategy that achieves better results compared to naive training of a single generator. One of the key advantages of ST-GAN is its applicability to high-resolution images indirectly since the predicted warp parameters are transferable between reference frames. We demonstrate our approach in two applications: (1) visualizing how indoor furniture (e.g. from product images) might be perceived in a room, (2) hallucinating how accessories like glasses would look when matched with real portraits. |
Tasks | |
Published | 2018-03-05 |
URL | http://arxiv.org/abs/1803.01837v1 |
http://arxiv.org/pdf/1803.01837v1.pdf | |
PWC | https://paperswithcode.com/paper/st-gan-spatial-transformer-generative |
Repo | https://github.com/chenhsuanlin/spatial-transformer-GAN |
Framework | tf |
Learning Speaker Representations with Mutual Information
Title | Learning Speaker Representations with Mutual Information |
Authors | Mirco Ravanelli, Yoshua Bengio |
Abstract | Learning good representations is of crucial importance in deep learning. Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way. Even though the mutual information between two random variables is hard to measure directly in high dimensional spaces, some recent studies have shown that an implicit optimization of MI can be achieved with an encoder-discriminator architecture similar to that of Generative Adversarial Networks (GANs). In this work, we learn representations that capture speaker identities by maximizing the mutual information between the encoded representations of chunks of speech randomly sampled from the same sentence. The proposed encoder relies on the SincNet architecture and transforms raw speech waveform into a compact feature vector. The discriminator is fed by either positive samples (of the joint distribution of encoded chunks) or negative samples (from the product of the marginals) and is trained to separate them. We report experiments showing that this approach effectively learns useful speaker representations, leading to promising results on speaker identification and verification tasks. Our experiments consider both unsupervised and semi-supervised settings and compare the performance achieved with different objective functions. |
Tasks | Speaker Identification |
Published | 2018-12-01 |
URL | http://arxiv.org/abs/1812.00271v2 |
http://arxiv.org/pdf/1812.00271v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-speaker-representations-with-mutual |
Repo | https://github.com/Js-Mim/rl_singing_voice |
Framework | pytorch |
Efficient, Certifiably Optimal Clustering with Applications to Latent Variable Graphical Models
Title | Efficient, Certifiably Optimal Clustering with Applications to Latent Variable Graphical Models |
Authors | Carson Eisenach, Han Liu |
Abstract | Motivated by the task of clustering either $d$ variables or $d$ points into $K$ groups, we investigate efficient algorithms to solve the Peng-Wei (P-W) $K$-means semi-definite programming (SDP) relaxation. The P-W SDP has been shown in the literature to have good statistical properties in a variety of settings, but remains intractable to solve in practice. To this end we propose FORCE, a new algorithm to solve this SDP relaxation. Compared to the naive interior point method, our method reduces the computational complexity of solving the SDP from $\tilde{O}(d^7\log\epsilon^{-1})$ to $\tilde{O}(d^{6}K^{-2}\epsilon^{-1})$ arithmetic operations for an $\epsilon$-optimal solution. Our method combines a primal first-order method with a dual optimality certificate search, which when successful, allows for early termination of the primal method. We show for certain variable clustering problems that, with high probability, FORCE is guaranteed to find the optimal solution to the SDP relaxation and provide a certificate of exact optimality. As verified by our numerical experiments, this allows FORCE to solve the P-W SDP with dimensions in the hundreds in only tens of seconds. For a variation of the P-W SDP where $K$ is not known a priori a slight modification of FORCE reduces the computational complexity of solving this problem as well: from $\tilde{O}(d^7\log\epsilon^{-1})$ using a standard SDP solver to $\tilde{O}(d^{4}\epsilon^{-1})$. |
Tasks | |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00530v3 |
http://arxiv.org/pdf/1806.00530v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-certifiably-optimal-clustering-with |
Repo | https://github.com/ceisenach/R_GFORCE |
Framework | none |
A Survey of Learning Causality with Data: Problems and Methods
Title | A Survey of Learning Causality with Data: Problems and Methods |
Authors | Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, Huan Liu |
Abstract | The era of big data provides researchers with convenient access to copious data. However, people often have little knowledge about it. The increasing prevalence of big data is challenging the traditional methods of learning causality because they are developed for the cases with limited amount of data and solid prior causal knowledge. This survey aims to close the gap between big data and learning causality with a comprehensive and structured review of traditional and frontier methods and a discussion about some open problems of learning causality. We begin with preliminaries of learning causality. Then we categorize and revisit methods of learning causality for the typical problems and data types. After that, we discuss the connections between learning causality and machine learning. At the end, some open problems are presented to show the great potential of learning causality with data. |
Tasks | |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09337v3 |
http://arxiv.org/pdf/1809.09337v3.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-learning-causality-with-data |
Repo | https://github.com/rguo12/awesome-causality-algorithms |
Framework | none |
SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications
Title | SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications |
Authors | Abdullah Hamdi, Matthias Müller, Bernard Ghanem |
Abstract | One major factor impeding more widespread adoption of deep neural networks (DNNs) is their lack of robustness, which is essential for safety-critical applications such as autonomous driving. This has motivated much recent work on adversarial attacks for DNNs, which mostly focus on pixel-level perturbations void of semantic meaning. In contrast, we present a general framework for adversarial attacks on trained agents, which covers semantic perturbations to the environment of the agent performing the task as well as pixel-level attacks. To do this, we re-frame the adversarial attack problem as learning a distribution of parameters that always fools the agent. In the semantic case, our proposed adversary (denoted as BBGAN) is trained to sample parameters that describe the environment with which the black-box agent interacts, such that the agent performs its dedicated task poorly in this environment. We apply BBGAN on three different tasks, primarily targeting aspects of autonomous navigation: object detection, self-driving, and autonomous UAV racing. On these tasks, BBGAN can generate failure cases that consistently fool a trained agent. |
Tasks | Adversarial Attack, Autonomous Driving, Autonomous Navigation, Object Detection |
Published | 2018-12-05 |
URL | https://arxiv.org/abs/1812.02132v3 |
https://arxiv.org/pdf/1812.02132v3.pdf | |
PWC | https://paperswithcode.com/paper/sada-semantic-adversarial-diagnostic-attacks |
Repo | https://github.com/ajhamdi/SADA |
Framework | tf |
Automatic Induction of Neural Network Decision Tree Algorithms
Title | Automatic Induction of Neural Network Decision Tree Algorithms |
Authors | Chapman Siu |
Abstract | This work presents an approach to automatically induction for non-greedy decision trees constructed from neural network architecture. This construction can be used to transfer weights when growing or pruning a decision tree, allowing non-greedy decision tree algorithms to automatically learn and adapt to the ideal architecture. In this work, we examine the underpinning ideas within ensemble modelling and Bayesian model averaging which allow our neural network to asymptotically approach the ideal architecture through weights transfer. Experimental results demonstrate that this approach improves models over fixed set of hyperparameters for decision tree models and decision forest models. |
Tasks | |
Published | 2018-11-26 |
URL | http://arxiv.org/abs/1811.10735v4 |
http://arxiv.org/pdf/1811.10735v4.pdf | |
PWC | https://paperswithcode.com/paper/automatic-induction-of-neural-network |
Repo | https://github.com/chappers/automatic-induction-neural-decision-tree |
Framework | tf |
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Title | MobileNetV2: Inverted Residuals and Linear Bottlenecks |
Authors | Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen |
Abstract | In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters |
Tasks | Image Classification, Object Detection, Semantic Segmentation |
Published | 2018-01-13 |
URL | http://arxiv.org/abs/1801.04381v4 |
http://arxiv.org/pdf/1801.04381v4.pdf | |
PWC | https://paperswithcode.com/paper/mobilenetv2-inverted-residuals-and-linear |
Repo | https://github.com/zym1119/MobileNetV2_pytorch_cifar |
Framework | pytorch |
A Biologically Plausible Learning Rule for Deep Learning in the Brain
Title | A Biologically Plausible Learning Rule for Deep Learning in the Brain |
Authors | Isabella Pozzi, Sander Bohté, Pieter Roelfsema |
Abstract | Researchers have proposed that deep learning, which is providing important progress in a wide range of high complexity tasks, might inspire new insights into learning in the brain. However, the methods used for deep learning by artificial neural networks are biologically unrealistic and would need to be replaced by biologically realistic counterparts. Previous biologically plausible reinforcement learning rules, like AGREL and AuGMEnT, showed promising results but focused on shallow networks with three layers. Will these learning rules also generalize to networks with more layers and can they handle tasks of higher complexity? We demonstrate the learning scheme on classical and hard image-classification benchmarks, namely MNIST, CIFAR10 and CIFAR100, cast as direct reward tasks, both for fully connected, convolutional and locally connected architectures. We show that our learning rule - Q-AGREL - performs comparably to supervised learning via error-backpropagation, with this type of trial-and-error reinforcement learning requiring only 1.5-2.5 times more epochs, even when classifying 100 different classes as in CIFAR100. Our results provide new insights into how deep learning may be implemented in the brain. |
Tasks | Image Classification |
Published | 2018-11-05 |
URL | https://arxiv.org/abs/1811.01768v3 |
https://arxiv.org/pdf/1811.01768v3.pdf | |
PWC | https://paperswithcode.com/paper/a-biologically-plausible-learning-rule-for |
Repo | https://github.com/csxeba/Reproducing-Q-AGREL |
Framework | none |
Deeply Supervised Rotation Equivariant Network for Lesion Segmentation in Dermoscopy Images
Title | Deeply Supervised Rotation Equivariant Network for Lesion Segmentation in Dermoscopy Images |
Authors | Xiaomeng Li, Lequan Yu, Chi-Wing Fu, Pheng-Ann Heng |
Abstract | Automatic lesion segmentation in dermoscopy images is an essential step for computer-aided diagnosis of melanoma. The dermoscopy images exhibits rotational and reflectional symmetry, however, this geometric property has not been encoded in the state-of-the-art convolutional neural networks based skin lesion segmentation methods. In this paper, we present a deeply supervised rotation equivariant network for skin lesion segmentation by extending the recent group rotation equivariant network~\cite{cohen2016group}. Specifically, we propose the G-upsampling and G-projection operations to adapt the rotation equivariant classification network for our skin lesion segmentation problem. To further increase the performance, we integrate the deep supervision scheme into our proposed rotation equivariant segmentation architecture. The whole framework is equivariant to input transformations, including rotation and reflection, which improves the network efficiency and thus contributes to the segmentation performance. We extensively evaluate our method on the ISIC 2017 skin lesion challenge dataset. The experimental results show that our rotation equivariant networks consistently excel the regular counterparts with the same model complexity under different experimental settings. Our best model achieves 77.23%(JA) on the test dataset, outperforming the state-of-the-art challenging methods and further demonstrating the effectiveness of our proposed deeply supervised rotation equivariant segmentation network. Our best model also outperforms the state-of-the-art challenging methods, which further demonstrate the effectiveness of our proposed deeply supervised rotation equivariant segmentation network. |
Tasks | Lesion Segmentation |
Published | 2018-07-08 |
URL | http://arxiv.org/abs/1807.02804v1 |
http://arxiv.org/pdf/1807.02804v1.pdf | |
PWC | https://paperswithcode.com/paper/deeply-supervised-rotation-equivariant |
Repo | https://github.com/xmengli999/Deeply-Supervised-Rotation-Equivariant-Network-for-Lesion-Segmentation |
Framework | pytorch |
The Helmholtz Method: Using Perceptual Compression to Reduce Machine Learning Complexity
Title | The Helmholtz Method: Using Perceptual Compression to Reduce Machine Learning Complexity |
Authors | Gerald Friedland, Jingkang Wang, Ruoxi Jia, Bo Li |
Abstract | This paper proposes a fundamental answer to a frequently asked question in multimedia computing and machine learning: Do artifacts from perceptual compression contribute to error in the machine learning process and if so, how much? Our approach to the problem is a reinterpretation of the Helmholtz Free Energy formula from physics to explain the relationship between content and noise when using sensors (such as cameras or microphones) to capture multimedia data. The reinterpretation allows a bit-measurement of the noise contained in images, audio, and video by combining a classifier with perceptual compression, such as JPEG or MP3. Our experiments on CIFAR-10 as well as Fraunhofer’s IDMT-SMT-Audio-Effects dataset indicate that, at the right quality level, perceptual compression is actually not harmful but contributes to a significant reduction of complexity of the machine learning process. That is, our noise quantification method can be used to speed up the training of deep learning classifiers significantly while maintaining, or sometimes even improving, overall classification accuracy. Moreover, our results provide insights into the reasons for the success of deep learning. |
Tasks | |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.10569v1 |
http://arxiv.org/pdf/1807.10569v1.pdf | |
PWC | https://paperswithcode.com/paper/the-helmholtz-method-using-perceptual |
Repo | https://github.com/wangjksjtu/Helmholtz-DL |
Framework | tf |
TVQA: Localized, Compositional Video Question Answering
Title | TVQA: Localized, Compositional Video Question Answering |
Authors | Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg |
Abstract | Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks. However, due to data limitations, there has been much less work on video-based QA. In this paper, we present TVQA, a large-scale video QA dataset based on 6 popular TV shows. TVQA consists of 152,545 QA pairs from 21,793 clips, spanning over 460 hours of video. Questions are designed to be compositional in nature, requiring systems to jointly localize relevant moments within a clip, comprehend subtitle-based dialogue, and recognize relevant visual concepts. We provide analyses of this new dataset as well as several baselines and a multi-stream end-to-end trainable neural network framework for the TVQA task. The dataset is publicly available at http://tvqa.cs.unc.edu. |
Tasks | Video Question Answering |
Published | 2018-09-05 |
URL | https://arxiv.org/abs/1809.01696v2 |
https://arxiv.org/pdf/1809.01696v2.pdf | |
PWC | https://paperswithcode.com/paper/tvqa-localized-compositional-video-question |
Repo | https://github.com/jayleicn/TVQA |
Framework | pytorch |
Weakly- and Semi-Supervised Panoptic Segmentation
Title | Weakly- and Semi-Supervised Panoptic Segmentation |
Authors | Qizhu Li, Anurag Arnab, Philip H. S. Torr |
Abstract | We present a weakly supervised model that jointly performs both semantic- and instance-segmentation – a particularly relevant problem given the substantial cost of obtaining pixel-perfect annotation for these tasks. In contrast to many popular instance segmentation approaches based on object detectors, our method does not predict any overlapping instances. Moreover, we are able to segment both “thing” and “stuff” classes, and thus explain all the pixels in the image. “Thing” classes are weakly-supervised with bounding boxes, and “stuff” with image-level tags. We obtain state-of-the-art results on Pascal VOC, for both full and weak supervision (which achieves about 95% of fully-supervised performance). Furthermore, we present the first weakly-supervised results on Cityscapes for both semantic- and instance-segmentation. Finally, we use our weakly supervised framework to analyse the relationship between annotation quality and predictive performance, which is of interest to dataset creators. |
Tasks | Instance Segmentation, Panoptic Segmentation, Semantic Segmentation, Weakly-supervised instance segmentation, Weakly-supervised panoptic segmentation, Weakly-Supervised Semantic Segmentation |
Published | 2018-08-10 |
URL | http://arxiv.org/abs/1808.03575v3 |
http://arxiv.org/pdf/1808.03575v3.pdf | |
PWC | https://paperswithcode.com/paper/weakly-and-semi-supervised-panoptic |
Repo | https://github.com/qizhuli/Weakly-Supervised-Panoptic-Segmentation |
Framework | none |
Learning State Representations for Query Optimization with Deep Reinforcement Learning
Title | Learning State Representations for Query Optimization with Deep Reinforcement Learning |
Authors | Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi |
Abstract | Deep reinforcement learning is quickly changing the field of artificial intelligence. These models are able to capture a high level understanding of their environment, enabling them to learn difficult dynamic tasks in a variety of domains. In the database field, query optimization remains a difficult problem. Our goal in this work is to explore the capabilities of deep reinforcement learning in the context of query optimization. At each state, we build queries incrementally and encode properties of subqueries through a learned representation. The challenge here lies in the formation of the state transition function, which defines how the current subquery state combines with the next query operation (action) to yield the next state. As a first step in this direction, we focus the state representation problem and the formation of the state transition function. We describe our approach and show preliminary results. We further discuss how we can use the state representation to improve query optimization using reinforcement learning. |
Tasks | |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08604v1 |
http://arxiv.org/pdf/1803.08604v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-state-representations-for-query |
Repo | https://github.com/jinw18/Readings_MLDB |
Framework | none |
Learning recurrent dynamics in spiking networks
Title | Learning recurrent dynamics in spiking networks |
Authors | Christopher Kim, Carson Chow |
Abstract | Spiking activity of neurons engaged in learning and performing a task show complex spatiotemporal dynamics. While the output of recurrent network models can learn to perform various tasks, the possible range of recurrent dynamics that emerge after learning remains unknown. Here we show that modifying the recurrent connectivity with a recursive least squares algorithm provides sufficient flexibility for synaptic and spiking rate dynamics of spiking networks to produce a wide range of spatiotemporal activity. We apply the training method to learn arbitrary firing patterns, stabilize irregular spiking activity of a balanced network, and reproduce the heterogeneous spiking rate patterns of cortical neurons engaged in motor planning and movement. We identify sufficient conditions for successful learning, characterize two types of learning errors, and assess the network capacity. Our findings show that synaptically-coupled recurrent spiking networks possess a vast computational capability that can support the diverse activity patterns in the brain. |
Tasks | |
Published | 2018-03-18 |
URL | http://arxiv.org/abs/1803.06622v2 |
http://arxiv.org/pdf/1803.06622v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-recurrent-dynamics-in-spiking |
Repo | https://github.com/chrismkkim/SpikeLearning |
Framework | none |
Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation
Title | Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation |
Authors | Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, Qi Su |
Abstract | Most of the Neural Machine Translation (NMT) models are based on the sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped with the attention mechanism. However, the conventional attention mechanism treats the decoding at each time step equally with the same matrix, which is problematic since the softness of the attention for different types of words (e.g. content words and function words) should differ. Therefore, we propose a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature. Experimental results on the Chinese-English translation and English-Vietnamese translation demonstrate that our model outperforms the baseline models, and the analysis and the case study show that our model can attend to the most relevant elements in the source-side contexts and generate the translation of high quality. |
Tasks | Machine Translation |
Published | 2018-08-22 |
URL | http://arxiv.org/abs/1808.07374v2 |
http://arxiv.org/pdf/1808.07374v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-when-to-concentrate-or-divert |
Repo | https://github.com/lancopku/SACT |
Framework | pytorch |