October 20, 2019

2972 words 14 mins read

Paper Group AWR 295

Random mesh projectors for inverse problems. Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata. Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks. Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos. Training Generative Adversarial Networks Via Turing T …

Random mesh projectors for inverse problems


Title	Random mesh projectors for inverse problems
Authors	Sidharth Gupta, Konik Kothari, Maarten V. de Hoop, Ivan Dokmanić
Abstract	We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to directly learn the mapping from the measured data to the reconstruction becomes unstable. Instead, we propose to first learn an ensemble of simpler mappings from the data to projections of the unknown image into random piecewise-constant subspaces. We then combine the projections to form a final reconstruction by solving a deconvolution-like problem. We show experimentally that the proposed method is more robust to measurement noise and corruptions not seen during training than a directly learned inverse.
Tasks
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11718v3
PDF	http://arxiv.org/pdf/1805.11718v3.pdf
PWC	https://paperswithcode.com/paper/random-mesh-projectors-for-inverse-problems
Repo	https://github.com/swing-research/deepmesh
Framework	none

Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata


Title	Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata
Authors	Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frédérique Laforest, Jonathon Hare, Elena Simperl
Abstract	While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.
Tasks
Published	2018-03-19
URL	http://arxiv.org/abs/1803.07116v2
PDF	http://arxiv.org/pdf/1803.07116v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-generate-wikipedia-summaries-for
Repo	https://github.com/pvougiou/Wikidata2Wikipedia
Framework	none

Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks


Title	Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks
Authors	Yash Upadhyay, Paul Schrater
Abstract	In this paper, we propose Generative Adversarial Network (GAN) architectures that use Capsule Networks for image-synthesis. Based on the principal of positional-equivariance of features, Capsule Network’s ability to encode spatial relationships between the features of the image helps it become a more powerful critic in comparison to Convolutional Neural Networks (CNNs) used in current architectures for image synthesis. Our proposed GAN architectures learn the data manifold much faster and therefore, synthesize visually accurate images in significantly lesser number of training samples and training epochs in comparison to GANs and its variants that use CNNs. Apart from analyzing the quantitative results corresponding the images generated by different architectures, we also explore the reasons for the lower coverage and diversity explored by the GAN architectures that use CNN critics.
Tasks	Image Generation
Published	2018-06-11
URL	http://arxiv.org/abs/1806.03796v4
PDF	http://arxiv.org/pdf/1806.03796v4.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-network-architectures
Repo	https://github.com/yash-1995-2006/Conditional-and-nonConditional-Capsule-GANs
Framework	tf

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos


Title	Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos
Authors	Yang Wang, Zhenheng Yang, Peng Wang, Yi Yang, Chenxu Luo, Wei Xu
Abstract	Learning depth and optical flow via deep neural networks by watching videos has made significant progress recently. In this paper, we jointly solve the two tasks by exploiting the underlying geometric rules within stereo videos. Specifically, given two consecutive stereo image pairs from a video, we first estimate depth, camera ego-motion and optical flow from three neural networks. Then the whole scene is decomposed into moving foreground and static background by compar- ing the estimated optical flow and rigid flow derived from the depth and ego-motion. We propose a novel consistency loss to let the optical flow learn from the more accurate rigid flow in static regions. We also design a rigid alignment module which helps refine ego-motion estimation by using the estimated depth and optical flow. Experiments on the KITTI dataset show that our results significantly outperform other state-of- the-art algorithms. Source codes can be found at https: //github.com/baidu-research/UnDepthflow
Tasks	Motion Estimation, Optical Flow Estimation
Published	2018-10-08
URL	http://arxiv.org/abs/1810.03654v1
PDF	http://arxiv.org/pdf/1810.03654v1.pdf
PWC	https://paperswithcode.com/paper/joint-unsupervised-learning-of-optical-flow
Repo	https://github.com/baidu-research/UnDepthflow
Framework	tf

Training Generative Adversarial Networks Via Turing Test


Title	Training Generative Adversarial Networks Via Turing Test
Authors	Jianlin Su
Abstract	In this article, we introduce a new mode for training Generative Adversarial Networks (GANs). Rather than minimizing the distance of evidence distribution $\tilde{p}(x)$ and the generative distribution $q(x)$, we minimize the distance of $\tilde{p}(x_r)q(x_f)$ and $\tilde{p}(x_f)q(x_r)$. This adversarial pattern can be interpreted as a Turing test in GANs. It allows us to use information of real samples during training generator and accelerates the whole training procedure. We even find that just proportionally increasing the size of discriminator and generator, it succeeds on 256x256 resolution without adjusting hyperparameters carefully.
Tasks
Published	2018-10-25
URL	http://arxiv.org/abs/1810.10948v2
PDF	http://arxiv.org/pdf/1810.10948v2.pdf
PWC	https://paperswithcode.com/paper/training-generative-adversarial-networks-via
Repo	https://github.com/bojone/T-GANs
Framework	tf

Multi-component Image Translation for Deep Domain Generalization


Title	Multi-component Image Translation for Deep Domain Generalization
Authors	Mohammad Mahfujur Rahman, Clinton Fookes, Mahsa Baktashmotlagh, Sridha Sridharan
Abstract	Domain adaption (DA) and domain generalization (DG) are two closely related methods which are both concerned with the task of assigning labels to an unlabeled data set. The only dissimilarity between these approaches is that DA can access the target data during the training phase, while the target data is totally unseen during the training phase in DG. The task of DG is challenging as we have no earlier knowledge of the target samples. If DA methods are applied directly to DG by a simple exclusion of the target data from training, poor performance will result for a given task. In this paper, we tackle the domain generalization challenge in two ways. In our first approach, we propose a novel deep domain generalization architecture utilizing synthetic data generated by a Generative Adversarial Network (GAN). The discrepancy between the generated images and synthetic images is minimized using existing domain discrepancy metrics such as maximum mean discrepancy or correlation alignment. In our second approach, we introduce a protocol for applying DA methods to a DG scenario by excluding the target data from the training phase, splitting the source data to training and validation parts, and treating the validation data as target data for DA. We conduct extensive experiments on four cross-domain benchmark datasets. Experimental results signify our proposed model outperforms the current state-of-the-art methods for DG.
Tasks	Domain Adaptation, Domain Generalization
Published	2018-12-21
URL	http://arxiv.org/abs/1812.08974v1
PDF	http://arxiv.org/pdf/1812.08974v1.pdf
PWC	https://paperswithcode.com/paper/multi-component-image-translation-for-deep
Repo	https://github.com/mahfujur1/Domain-Adaptation-Papers-and-Codes
Framework	none

Robust Place Categorization with Deep Domain Generalization


Title	Robust Place Categorization with Deep Domain Generalization
Authors	Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Abstract	Traditional place categorization approaches in robot vision assume that training and test images have similar visual appearance. Therefore, any seasonal, illumination and environmental changes typically lead to severe degradation in performance. To cope with this problem, recent works have proposed to adopt domain adaptation techniques. While effective, these methods assume that some prior information about the scenario where the robot will operate is available at training time. Unfortunately, in many cases this assumption does not hold, as we often do not know where a robot will be deployed. To overcome this issue, in this paper we present an approach which aims at learning classification models able to generalize to unseen scenarios. Specifically, we propose a novel deep learning framework for domain generalization. Our method develops from the intuition that, given a set of different classification models associated to known domains (e.g. corresponding to multiple environments, robots), the best model for a new sample in the novel domain can be computed directly at test time by optimally combining the known models. To implement our idea, we exploit recent advances in deep domain adaptation and design a Convolutional Neural Network architecture with novel layers performing a weighted version of Batch Normalization. Our experiments, conducted on three common datasets for robot place categorization, confirm the validity of our contribution.
Tasks	Domain Adaptation, Domain Generalization
Published	2018-05-30
URL	http://arxiv.org/abs/1805.12048v1
PDF	http://arxiv.org/pdf/1805.12048v1.pdf
PWC	https://paperswithcode.com/paper/robust-place-categorization-with-deep-domain
Repo	https://github.com/mancinimassimiliano/caffe
Framework	none

A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization


Title	A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization
Authors	Raphael Abreu, Joel dos Santos, Eduardo Bezerra
Abstract	In mulsemedia applications, traditional media content (text, image, audio, video, etc.) can be related to media objects that target other human senses (e.g., smell, haptics, taste). Such applications aim at bridging the virtual and real worlds through sensors and actuators. Actuators are responsible for the execution of sensory effects (e.g., wind, heat, light), which produce sensory stimulations on the users. In these applications sensory stimulation must happen in a timely manner regarding the other traditional media content being presented. For example, at the moment in which an explosion is presented in the audiovisual content, it may be adequate to activate actuators that produce heat and light. It is common to use some declarative multimedia authoring language to relate the timestamp in which each media object is to be presented to the execution of some sensory effect. One problem in this setting is that the synchronization of media objects and sensory effects is done manually by the author(s) of the application, a process which is time-consuming and error prone. In this paper, we present a bimodal neural network architecture to assist the synchronization task in mulsemedia applications. Our approach is based on the idea that audio and video signals can be used simultaneously to identify the timestamps in which some sensory effect should be executed. Our learning architecture combines audio and video signals for the prediction of scene components. For evaluation purposes, we construct a dataset based on Google’s AudioSet. We provide experiments to validate our bimodal architecture. Our results show that the bimodal approach produces better results when compared to several variants of unimodal architectures.
Tasks
Published	2018-04-28
URL	http://arxiv.org/abs/1804.10822v1
PDF	http://arxiv.org/pdf/1804.10822v1.pdf
PWC	https://paperswithcode.com/paper/a-bimodal-learning-approach-to-assist-multi
Repo	https://github.com/MLRG-CEFET-RJ/bimodal_audioset
Framework	tf

Universal Language Model Fine-tuning for Text Classification


Title	Universal Language Model Fine-tuning for Text Classification
Authors	Jeremy Howard, Sebastian Ruder
Abstract	Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open-source our pretrained models and code.
Tasks	Language Modelling, Sentiment Analysis, Text Classification, Transfer Learning
Published	2018-01-18
URL	http://arxiv.org/abs/1801.06146v5
PDF	http://arxiv.org/pdf/1801.06146v5.pdf
PWC	https://paperswithcode.com/paper/universal-language-model-fine-tuning-for-text
Repo	https://github.com/SkullFang/ULMFIT_NLP_Classification
Framework	none

Who Let The Dogs Out? Modeling Dog Behavior From Visual Data


Title	Who Let The Dogs Out? Modeling Dog Behavior From Visual Data
Authors	Kiana Ehsani, Hessam Bagherinezhad, Joseph Redmon, Roozbeh Mottaghi, Ali Farhadi
Abstract	We introduce the task of directly modeling a visually intelligent agent. Computer vision typically focuses on solving various subtasks related to visual intelligence. We depart from this standard approach to computer vision; instead we directly model a visually intelligent agent. Our model takes visual information as input and directly predicts the actions of the agent. Toward this end we introduce DECADE, a large-scale dataset of ego-centric videos from a dog’s perspective as well as her corresponding movements. Using this data we model how the dog acts and how the dog plans her movements. We show under a variety of metrics that given just visual input we can successfully model this intelligent agent in many situations. Moreover, the representation learned by our model encodes distinct information compared to representations trained on image classification, and our learned representation can generalize to other domains. In particular, we show strong results on the task of walkable surface estimation by using this dog modeling task as representation learning.
Tasks	Representation Learning
Published	2018-03-28
URL	http://arxiv.org/abs/1803.10827v2
PDF	http://arxiv.org/pdf/1803.10827v2.pdf
PWC	https://paperswithcode.com/paper/who-let-the-dogs-out-modeling-dog-behavior
Repo	https://github.com/ehsanik/dogTorch
Framework	pytorch

Constrained Graph Variational Autoencoders for Molecule Design


Title	Constrained Graph Variational Autoencoders for Molecule Design
Authors	Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, Alexander L. Gaunt
Abstract	Graphs are ubiquitous data structures for representing interactions between entities. With an emphasis on the use of graphs to represent chemical molecules, we explore the task of learning to generate graphs that conform to a distribution observed in training data. We propose a variational autoencoder model in which both encoder and decoder are graph-structured. Our decoder assumes a sequential ordering of graph extension steps and we discuss and analyze design choices that mitigate the potential downsides of this linearization. Experiments compare our approach with a wide range of baselines on the molecule generation task and show that our method is more successful at matching the statistics of the original dataset on semantically important metrics. Furthermore, we show that by using appropriate shaping of the latent space, our model allows us to design molecules that are (locally) optimal in desired properties.
Tasks
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09076v2
PDF	http://arxiv.org/pdf/1805.09076v2.pdf
PWC	https://paperswithcode.com/paper/constrained-graph-variational-autoencoders
Repo	https://github.com/Microsoft/constrained-graph-variational-autoencoder
Framework	tf

DAGs with NO TEARS: Continuous Optimization for Structure Learning


Title	DAGs with NO TEARS: Continuous Optimization for Structure Learning
Authors	Xun Zheng, Bryon Aragam, Pradeep Ravikumar, Eric P. Xing
Abstract	Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this paper, we introduce a fundamentally different strategy: We formulate the structure learning problem as a purely \emph{continuous} optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, which also makes implementation effortless. The proposed method outperforms existing ones, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree. Code implementing the proposed algorithm is open-source and publicly available at https://github.com/xunzheng/notears.
Tasks
Published	2018-03-04
URL	http://arxiv.org/abs/1803.01422v2
PDF	http://arxiv.org/pdf/1803.01422v2.pdf
PWC	https://paperswithcode.com/paper/dags-with-no-tears-continuous-optimization
Repo	https://github.com/jmoss20/notears
Framework	none

An Introduction to Deep Reinforcement Learning


Title	An Introduction to Deep Reinforcement Learning
Authors	Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, Joelle Pineau
Abstract	Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
Tasks	Decision Making
Published	2018-11-30
URL	http://arxiv.org/abs/1811.12560v2
PDF	http://arxiv.org/pdf/1811.12560v2.pdf
PWC	https://paperswithcode.com/paper/an-introduction-to-deep-reinforcement
Repo	https://github.com/marcsv87/DL-RL
Framework	pytorch

One-Shot Segmentation in Clutter


Title	One-Shot Segmentation in Clutter
Authors	Claudio Michaelis, Matthias Bethge, Alexander S. Ecker
Abstract	We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call $\textit{cluttered Omniglot}$. Using a baseline architecture combining a Siamese embedding for detection with a U-net for segmentation we show that increasing levels of clutter make the task progressively harder. Using oracle models with access to various amounts of ground-truth information, we evaluate different aspects of the problem and show that in this kind of visual search task, detection and segmentation are two intertwined problems, the solution to each of which helps solving the other. We therefore introduce $\textit{MaskNet}$, an improved model that attends to multiple candidate locations, generates segmentation proposals to mask out background clutter and selects among the segmented objects. Our findings suggest that such image recognition models based on an iterative refinement of object detection and foreground segmentation may provide a way to deal with highly cluttered scenes.
Tasks	Omniglot, One-Shot Segmentation
Published	2018-03-26
URL	http://arxiv.org/abs/1803.09597v2
PDF	http://arxiv.org/pdf/1803.09597v2.pdf
PWC	https://paperswithcode.com/paper/one-shot-segmentation-in-clutter
Repo	https://github.com/michaelisc/cluttered-omniglot
Framework	tf

Teacher-Student Compression with Generative Adversarial Networks


Title	Teacher-Student Compression with Generative Adversarial Networks
Authors	Ruishan Liu, Nicolo Fusi, Lester Mackey
Abstract	More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory-constrained devices. Teacher-student compression (TSC), also known as distillation, alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. However, when fresh data is unavailable for the compression task, the teacher’s training data is typically reused, leading to suboptimal compression. In this work, we propose to augment the compression dataset with synthetic data from a generative adversarial network (GAN) designed to approximate the training data distribution. Our GAN-assisted TSC (GAN-TSC) significantly improves student accuracy for expensive models such as large random forests and deep neural networks on both tabular and image datasets. Building on these results, we propose a comprehensive metric—the TSC Score—to evaluate the quality of synthetic datasets based on their induced TSC performance. The TSC Score captures both data diversity and class affinity, and we illustrate its benefits over the popular Inception Score in the context of image classification.
Tasks	Image Classification, Model Compression
Published	2018-12-05
URL	https://arxiv.org/abs/1812.02271v4
PDF	https://arxiv.org/pdf/1812.02271v4.pdf
PWC	https://paperswithcode.com/paper/model-compression-with-generative-adversarial
Repo	https://github.com/RuishanLiu/GAN-TSC
Framework	tf