October 20, 2019

2972 words 14 mins read

Paper Group AWR 295

Paper Group AWR 295

Random mesh projectors for inverse problems. Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata. Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks. Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos. Training Generative Adversarial Networks Via Turing T …

Random mesh projectors for inverse problems

Title Random mesh projectors for inverse problems
Authors Sidharth Gupta, Konik Kothari, Maarten V. de Hoop, Ivan Dokmanić
Abstract We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to directly learn the mapping from the measured data to the reconstruction becomes unstable. Instead, we propose to first learn an ensemble of simpler mappings from the data to projections of the unknown image into random piecewise-constant subspaces. We then combine the projections to form a final reconstruction by solving a deconvolution-like problem. We show experimentally that the proposed method is more robust to measurement noise and corruptions not seen during training than a directly learned inverse.
Tasks
Published 2018-05-29
URL http://arxiv.org/abs/1805.11718v3
PDF http://arxiv.org/pdf/1805.11718v3.pdf
PWC https://paperswithcode.com/paper/random-mesh-projectors-for-inverse-problems
Repo https://github.com/swing-research/deepmesh
Framework none

Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

Title Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata
Authors Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frédérique Laforest, Jonathon Hare, Elena Simperl
Abstract While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.
Tasks
Published 2018-03-19
URL http://arxiv.org/abs/1803.07116v2
PDF http://arxiv.org/pdf/1803.07116v2.pdf
PWC https://paperswithcode.com/paper/learning-to-generate-wikipedia-summaries-for
Repo https://github.com/pvougiou/Wikidata2Wikipedia
Framework none

Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks

Title Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks
Authors Yash Upadhyay, Paul Schrater
Abstract In this paper, we propose Generative Adversarial Network (GAN) architectures that use Capsule Networks for image-synthesis. Based on the principal of positional-equivariance of features, Capsule Network’s ability to encode spatial relationships between the features of the image helps it become a more powerful critic in comparison to Convolutional Neural Networks (CNNs) used in current architectures for image synthesis. Our proposed GAN architectures learn the data manifold much faster and therefore, synthesize visually accurate images in significantly lesser number of training samples and training epochs in comparison to GANs and its variants that use CNNs. Apart from analyzing the quantitative results corresponding the images generated by different architectures, we also explore the reasons for the lower coverage and diversity explored by the GAN architectures that use CNN critics.
Tasks Image Generation
Published 2018-06-11
URL http://arxiv.org/abs/1806.03796v4
PDF http://arxiv.org/pdf/1806.03796v4.pdf
PWC https://paperswithcode.com/paper/generative-adversarial-network-architectures
Repo https://github.com/yash-1995-2006/Conditional-and-nonConditional-Capsule-GANs
Framework tf

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

Title Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos
Authors Yang Wang, Zhenheng Yang, Peng Wang, Yi Yang, Chenxu Luo, Wei Xu
Abstract Learning depth and optical flow via deep neural networks by watching videos has made significant progress recently. In this paper, we jointly solve the two tasks by exploiting the underlying geometric rules within stereo videos. Specifically, given two consecutive stereo image pairs from a video, we first estimate depth, camera ego-motion and optical flow from three neural networks. Then the whole scene is decomposed into moving foreground and static background by compar- ing the estimated optical flow and rigid flow derived from the depth and ego-motion. We propose a novel consistency loss to let the optical flow learn from the more accurate rigid flow in static regions. We also design a rigid alignment module which helps refine ego-motion estimation by using the estimated depth and optical flow. Experiments on the KITTI dataset show that our results significantly outperform other state-of- the-art algorithms. Source codes can be found at https: //github.com/baidu-research/UnDepthflow
Tasks Motion Estimation, Optical Flow Estimation
Published 2018-10-08
URL http://arxiv.org/abs/1810.03654v1
PDF http://arxiv.org/pdf/1810.03654v1.pdf
PWC https://paperswithcode.com/paper/joint-unsupervised-learning-of-optical-flow
Repo https://github.com/baidu-research/UnDepthflow
Framework tf

Training Generative Adversarial Networks Via Turing Test

Title Training Generative Adversarial Networks Via Turing Test
Authors Jianlin Su
Abstract In this article, we introduce a new mode for training Generative Adversarial Networks (GANs). Rather than minimizing the distance of evidence distribution $\tilde{p}(x)$ and the generative distribution $q(x)$, we minimize the distance of $\tilde{p}(x_r)q(x_f)$ and $\tilde{p}(x_f)q(x_r)$. This adversarial pattern can be interpreted as a Turing test in GANs. It allows us to use information of real samples during training generator and accelerates the whole training procedure. We even find that just proportionally increasing the size of discriminator and generator, it succeeds on 256x256 resolution without adjusting hyperparameters carefully.
Tasks
Published 2018-10-25
URL http://arxiv.org/abs/1810.10948v2
PDF http://arxiv.org/pdf/1810.10948v2.pdf
PWC https://paperswithcode.com/paper/training-generative-adversarial-networks-via
Repo https://github.com/bojone/T-GANs
Framework tf

Multi-component Image Translation for Deep Domain Generalization

Title Multi-component Image Translation for Deep Domain Generalization
Authors Mohammad Mahfujur Rahman, Clinton Fookes, Mahsa Baktashmotlagh, Sridha Sridharan
Abstract Domain adaption (DA) and domain generalization (DG) are two closely related methods which are both concerned with the task of assigning labels to an unlabeled data set. The only dissimilarity between these approaches is that DA can access the target data during the training phase, while the target data is totally unseen during the training phase in DG. The task of DG is challenging as we have no earlier knowledge of the target samples. If DA methods are applied directly to DG by a simple exclusion of the target data from training, poor performance will result for a given task. In this paper, we tackle the domain generalization challenge in two ways. In our first approach, we propose a novel deep domain generalization architecture utilizing synthetic data generated by a Generative Adversarial Network (GAN). The discrepancy between the generated images and synthetic images is minimized using existing domain discrepancy metrics such as maximum mean discrepancy or correlation alignment. In our second approach, we introduce a protocol for applying DA methods to a DG scenario by excluding the target data from the training phase, splitting the source data to training and validation parts, and treating the validation data as target data for DA. We conduct extensive experiments on four cross-domain benchmark datasets. Experimental results signify our proposed model outperforms the current state-of-the-art methods for DG.
Tasks Domain Adaptation, Domain Generalization
Published 2018-12-21
URL http://arxiv.org/abs/1812.08974v1
PDF http://arxiv.org/pdf/1812.08974v1.pdf
PWC https://paperswithcode.com/paper/multi-component-image-translation-for-deep
Repo https://github.com/mahfujur1/Domain-Adaptation-Papers-and-Codes
Framework none

Robust Place Categorization with Deep Domain Generalization

Title Robust Place Categorization with Deep Domain Generalization
Authors Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Abstract Traditional place categorization approaches in robot vision assume that training and test images have similar visual appearance. Therefore, any seasonal, illumination and environmental changes typically lead to severe degradation in performance. To cope with this problem, recent works have proposed to adopt domain adaptation techniques. While effective, these methods assume that some prior information about the scenario where the robot will operate is available at training time. Unfortunately, in many cases this assumption does not hold, as we often do not know where a robot will be deployed. To overcome this issue, in this paper we present an approach which aims at learning classification models able to generalize to unseen scenarios. Specifically, we propose a novel deep learning framework for domain generalization. Our method develops from the intuition that, given a set of different classification models associated to known domains (e.g. corresponding to multiple environments, robots), the best model for a new sample in the novel domain can be computed directly at test time by optimally combining the known models. To implement our idea, we exploit recent advances in deep domain adaptation and design a Convolutional Neural Network architecture with novel layers performing a weighted version of Batch Normalization. Our experiments, conducted on three common datasets for robot place categorization, confirm the validity of our contribution.
Tasks Domain Adaptation, Domain Generalization
Published 2018-05-30
URL http://arxiv.org/abs/1805.12048v1
PDF http://arxiv.org/pdf/1805.12048v1.pdf
PWC https://paperswithcode.com/paper/robust-place-categorization-with-deep-domain
Repo https://github.com/mancinimassimiliano/caffe
Framework none

A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization

Title A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization
Authors Raphael Abreu, Joel dos Santos, Eduardo Bezerra
Abstract In mulsemedia applications, traditional media content (text, image, audio, video, etc.) can be related to media objects that target other human senses (e.g., smell, haptics, taste). Such applications aim at bridging the virtual and real worlds through sensors and actuators. Actuators are responsible for the execution of sensory effects (e.g., wind, heat, light), which produce sensory stimulations on the users. In these applications sensory stimulation must happen in a timely manner regarding the other traditional media content being presented. For example, at the moment in which an explosion is presented in the audiovisual content, it may be adequate to activate actuators that produce heat and light. It is common to use some declarative multimedia authoring language to relate the timestamp in which each media object is to be presented to the execution of some sensory effect. One problem in this setting is that the synchronization of media objects and sensory effects is done manually by the author(s) of the application, a process which is time-consuming and error prone. In this paper, we present a bimodal neural network architecture to assist the synchronization task in mulsemedia applications. Our approach is based on the idea that audio and video signals can be used simultaneously to identify the timestamps in which some sensory effect should be executed. Our learning architecture combines audio and video signals for the prediction of scene components. For evaluation purposes, we construct a dataset based on Google’s AudioSet. We provide experiments to validate our bimodal architecture. Our results show that the bimodal approach produces better results when compared to several variants of unimodal architectures.
Tasks
Published 2018-04-28
URL http://arxiv.org/abs/1804.10822v1
PDF http://arxiv.org/pdf/1804.10822v1.pdf
PWC https://paperswithcode.com/paper/a-bimodal-learning-approach-to-assist-multi
Repo https://github.com/MLRG-CEFET-RJ/bimodal_audioset
Framework tf

Universal Language Model Fine-tuning for Text Classification

Title Universal Language Model Fine-tuning for Text Classification
Authors Jeremy Howard, Sebastian Ruder
Abstract Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open-source our pretrained models and code.
Tasks Language Modelling, Sentiment Analysis, Text Classification, Transfer Learning
Published 2018-01-18
URL http://arxiv.org/abs/1801.06146v5
PDF http://arxiv.org/pdf/1801.06146v5.pdf
PWC https://paperswithcode.com/paper/universal-language-model-fine-tuning-for-text
Repo https://github.com/SkullFang/ULMFIT_NLP_Classification
Framework none

Who Let The Dogs Out? Modeling Dog Behavior From Visual Data

Title Who Let The Dogs Out? Modeling Dog Behavior From Visual Data
Authors Kiana Ehsani, Hessam Bagherinezhad, Joseph Redmon, Roozbeh Mottaghi, Ali Farhadi
Abstract We introduce the task of directly modeling a visually intelligent agent. Computer vision typically focuses on solving various subtasks related to visual intelligence. We depart from this standard approach to computer vision; instead we directly model a visually intelligent agent. Our model takes visual information as input and directly predicts the actions of the agent. Toward this end we introduce DECADE, a large-scale dataset of ego-centric videos from a dog’s perspective as well as her corresponding movements. Using this data we model how the dog acts and how the dog plans her movements. We show under a variety of metrics that given just visual input we can successfully model this intelligent agent in many situations. Moreover, the representation learned by our model encodes distinct information compared to representations trained on image classification, and our learned representation can generalize to other domains. In particular, we show strong results on the task of walkable surface estimation by using this dog modeling task as representation learning.
Tasks Representation Learning
Published 2018-03-28
URL http://arxiv.org/abs/1803.10827v2
PDF http://arxiv.org/pdf/1803.10827v2.pdf
PWC https://paperswithcode.com/paper/who-let-the-dogs-out-modeling-dog-behavior
Repo https://github.com/ehsanik/dogTorch
Framework pytorch

Constrained Graph Variational Autoencoders for Molecule Design

Title Constrained Graph Variational Autoencoders for Molecule Design
Authors Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, Alexander L. Gaunt
Abstract Graphs are ubiquitous data structures for representing interactions between entities. With an emphasis on the use of graphs to represent chemical molecules, we explore the task of learning to generate graphs that conform to a distribution observed in training data. We propose a variational autoencoder model in which both encoder and decoder are graph-structured. Our decoder assumes a sequential ordering of graph extension steps and we discuss and analyze design choices that mitigate the potential downsides of this linearization. Experiments compare our approach with a wide range of baselines on the molecule generation task and show that our method is more successful at matching the statistics of the original dataset on semantically important metrics. Furthermore, we show that by using appropriate shaping of the latent space, our model allows us to design molecules that are (locally) optimal in desired properties.
Tasks
Published 2018-05-23
URL http://arxiv.org/abs/1805.09076v2
PDF http://arxiv.org/pdf/1805.09076v2.pdf
PWC https://paperswithcode.com/paper/constrained-graph-variational-autoencoders
Repo https://github.com/Microsoft/constrained-graph-variational-autoencoder
Framework tf

DAGs with NO TEARS: Continuous Optimization for Structure Learning

Title DAGs with NO TEARS: Continuous Optimization for Structure Learning
Authors Xun Zheng, Bryon Aragam, Pradeep Ravikumar, Eric P. Xing
Abstract Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this paper, we introduce a fundamentally different strategy: We formulate the structure learning problem as a purely \emph{continuous} optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, which also makes implementation effortless. The proposed method outperforms existing ones, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree. Code implementing the proposed algorithm is open-source and publicly available at https://github.com/xunzheng/notears.
Tasks
Published 2018-03-04
URL http://arxiv.org/abs/1803.01422v2
PDF http://arxiv.org/pdf/1803.01422v2.pdf
PWC https://paperswithcode.com/paper/dags-with-no-tears-continuous-optimization
Repo https://github.com/jmoss20/notears
Framework none

An Introduction to Deep Reinforcement Learning

Title An Introduction to Deep Reinforcement Learning
Authors Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, Joelle Pineau
Abstract Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
Tasks Decision Making
Published 2018-11-30
URL http://arxiv.org/abs/1811.12560v2
PDF http://arxiv.org/pdf/1811.12560v2.pdf
PWC https://paperswithcode.com/paper/an-introduction-to-deep-reinforcement
Repo https://github.com/marcsv87/DL-RL
Framework pytorch

One-Shot Segmentation in Clutter

Title One-Shot Segmentation in Clutter
Authors Claudio Michaelis, Matthias Bethge, Alexander S. Ecker
Abstract We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call $\textit{cluttered Omniglot}$. Using a baseline architecture combining a Siamese embedding for detection with a U-net for segmentation we show that increasing levels of clutter make the task progressively harder. Using oracle models with access to various amounts of ground-truth information, we evaluate different aspects of the problem and show that in this kind of visual search task, detection and segmentation are two intertwined problems, the solution to each of which helps solving the other. We therefore introduce $\textit{MaskNet}$, an improved model that attends to multiple candidate locations, generates segmentation proposals to mask out background clutter and selects among the segmented objects. Our findings suggest that such image recognition models based on an iterative refinement of object detection and foreground segmentation may provide a way to deal with highly cluttered scenes.
Tasks Omniglot, One-Shot Segmentation
Published 2018-03-26
URL http://arxiv.org/abs/1803.09597v2
PDF http://arxiv.org/pdf/1803.09597v2.pdf
PWC https://paperswithcode.com/paper/one-shot-segmentation-in-clutter
Repo https://github.com/michaelisc/cluttered-omniglot
Framework tf

Teacher-Student Compression with Generative Adversarial Networks

Title Teacher-Student Compression with Generative Adversarial Networks
Authors Ruishan Liu, Nicolo Fusi, Lester Mackey
Abstract More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory-constrained devices. Teacher-student compression (TSC), also known as distillation, alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. However, when fresh data is unavailable for the compression task, the teacher’s training data is typically reused, leading to suboptimal compression. In this work, we propose to augment the compression dataset with synthetic data from a generative adversarial network (GAN) designed to approximate the training data distribution. Our GAN-assisted TSC (GAN-TSC) significantly improves student accuracy for expensive models such as large random forests and deep neural networks on both tabular and image datasets. Building on these results, we propose a comprehensive metric—the TSC Score—to evaluate the quality of synthetic datasets based on their induced TSC performance. The TSC Score captures both data diversity and class affinity, and we illustrate its benefits over the popular Inception Score in the context of image classification.
Tasks Image Classification, Model Compression
Published 2018-12-05
URL https://arxiv.org/abs/1812.02271v4
PDF https://arxiv.org/pdf/1812.02271v4.pdf
PWC https://paperswithcode.com/paper/model-compression-with-generative-adversarial
Repo https://github.com/RuishanLiu/GAN-TSC
Framework tf
comments powered by Disqus