October 16, 2019

2937 words 14 mins read

Paper Group ANR 1160

Tensor Monte Carlo: particle methods for the GPU era. AI in Game Playing: Sokoban Solver. QuAC : Question Answering in Context. Efficient Egocentric Visual Perception Combining Eye-tracking, a Software Retina and Deep Learning. Matrix Factorization via Deep Learning. Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning. Perf …

Tensor Monte Carlo: particle methods for the GPU era


Title	Tensor Monte Carlo: particle methods for the GPU era
Authors	Laurence Aitchison
Abstract	Multi-sample, importance-weighted variational autoencoders (IWAE) give tighter bounds and more accurate uncertainty estimates than variational autoencoders (VAE) trained with a standard single-sample objective. However, IWAEs scale poorly: as the latent dimensionality grows, they require exponentially many samples to retain the benefits of importance weighting. While sequential Monte-Carlo (SMC) can address this problem, it is prohibitively slow because the resampling step imposes sequential structure which cannot be parallelised, and moreover, resampling is non-differentiable which is problematic when learning approximate posteriors. To address these issues, we developed tensor Monte-Carlo (TMC) which gives exponentially many importance samples by separately drawing $K$ samples for each of the $n$ latent variables, then averaging over all $K^n$ possible combinations. While the sum over exponentially many terms might seem to be intractable, in many cases it can be computed efficiently as a series of tensor inner-products. We show that TMC is superior to IWAE on a generative model with multiple stochastic layers trained on the MNIST handwritten digit database, and we show that TMC can be combined with standard variance reduction techniques.
Tasks
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08593v3
PDF	http://arxiv.org/pdf/1806.08593v3.pdf
PWC	https://paperswithcode.com/paper/tensor-monte-carlo-particle-methods-for-the
Repo
Framework

AI in Game Playing: Sokoban Solver


Title	AI in Game Playing: Sokoban Solver
Authors	Anand Venkatesan, Atishay Jain, Rakesh Grewal
Abstract	Artificial Intelligence is becoming instrumental in a variety of applications. Games serve as a good breeding ground for trying and testing these algorithms in a sandbox with simpler constraints in comparison to real life. In this project, we aim to develop an AI agent that can solve the classical Japanese game of Sokoban using various algorithms and heuristics and compare their performances through standard metrics.
Tasks
Published	2018-06-29
URL	http://arxiv.org/abs/1807.00049v1
PDF	http://arxiv.org/pdf/1807.00049v1.pdf
PWC	https://paperswithcode.com/paper/ai-in-game-playing-sokoban-solver
Repo
Framework

QuAC : Question Answering in Context


Title	QuAC : Question Answering in Context
Authors	Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer
Abstract	We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total). The dialogs involve two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as we show in a detailed qualitative evaluation. We also report results for a number of reference models, including a recently state-of-the-art reading comprehension architecture extended to model dialog context. Our best model underperforms humans by 20 F1, suggesting that there is significant room for future work on this data. Dataset, baseline, and leaderboard available at http://quac.ai.
Tasks	Question Answering, Reading Comprehension
Published	2018-08-21
URL	http://arxiv.org/abs/1808.07036v3
PDF	http://arxiv.org/pdf/1808.07036v3.pdf
PWC	https://paperswithcode.com/paper/quac-question-answering-in-context
Repo
Framework

Efficient Egocentric Visual Perception Combining Eye-tracking, a Software Retina and Deep Learning


Title	Efficient Egocentric Visual Perception Combining Eye-tracking, a Software Retina and Deep Learning
Authors	Nina Hristozova, Piotr Ozimek, Jan Paul Siebert
Abstract	We present ongoing work to harness biological approaches to achieving highly efficient egocentric perception by combining the space-variant imaging architecture of the mammalian retina with Deep Learning methods. By pre-processing images collected by means of eye-tracking glasses to control the fixation locations of a software retina model, we demonstrate that we can reduce the input to a DCNN by a factor of 3, reduce the required number of training epochs and obtain over 98% classification rates when training and validating the system on a database of over 26,000 images of 9 object classes.
Tasks	Eye Tracking
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01633v1
PDF	http://arxiv.org/pdf/1809.01633v1.pdf
PWC	https://paperswithcode.com/paper/efficient-egocentric-visual-perception
Repo
Framework

Matrix Factorization via Deep Learning


Title	Matrix Factorization via Deep Learning
Authors	Duc Minh Nguyen, Evaggelia Tsiligianni, Nikos Deligiannis
Abstract	Matrix completion is one of the key problems in signal processing and machine learning. In recent years, deep-learning-based models have achieved state-of-the-art results in matrix completion. Nevertheless, they suffer from two drawbacks: (i) they can not be extended easily to rows or columns unseen during training; and (ii) their results are often degraded in case discrete predictions are required. This paper addresses these two drawbacks by presenting a deep matrix factorization model and a generic method to allow joint training of the factorization model and the discretization operator. Experiments on a real movie rating dataset show the efficacy of the proposed models.
Tasks	Matrix Completion
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01478v1
PDF	http://arxiv.org/pdf/1812.01478v1.pdf
PWC	https://paperswithcode.com/paper/matrix-factorization-via-deep-learning
Repo
Framework

Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning


Title	Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning
Authors	Aapo Hyvarinen, Hiroaki Sasaki, Richard E. Turner
Abstract	Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. It is based on augmenting the data by an auxiliary variable, such as the time index, the history of the time series, or any other available information. We propose to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized. This enables the framework to be implemented algorithmically through logistic regression, possibly in a neural network. We provide a comprehensive proof of the identifiability of the model as well as the consistency of our estimation method. The approach not only provides a general theoretical framework combining and generalizing previously proposed nonlinear ICA models and algorithms, but also brings practical advantages.
Tasks	Representation Learning, Time Series, Unsupervised Representation Learning
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08651v3
PDF	http://arxiv.org/pdf/1805.08651v3.pdf
PWC	https://paperswithcode.com/paper/nonlinear-ica-using-auxiliary-variables-and
Repo
Framework

Performance Estimation of Synthesis Flows cross Technologies using LSTMs and Transfer Learning


Title	Performance Estimation of Synthesis Flows cross Technologies using LSTMs and Transfer Learning
Authors	Cunxi Yu, Wang Zhou
Abstract	Due to the increasing complexity of Integrated Circuits (ICs) and System-on-Chip (SoC), developing high-quality synthesis flows within a short market time becomes more challenging. We propose a general approach that precisely estimates the Quality-of-Result (QoR), such as delay and area, of unseen synthesis flows for specific designs. The main idea is training a Recurrent Neural Network (RNN) regressor, where the flows are inputs and QoRs are ground truth. The RNN regressor is constructed with Long Short-Term Memory (LSTM) and fully-connected layers. This approach is demonstrated with 1.2 million data points collected using 14nm, 7nm regular-voltage (RVT), and 7nm low-voltage (LVT) FinFET technologies with twelve IC designs. The accuracy of predicting the QoRs (delay and area) within one technology is $\boldsymbol{\geq}$\textbf{98.0}% over $\sim$240,000 test points. To enable accurate predictions cross different technologies and different IC designs, we propose a transfer-learning approach that utilizes the model pre-trained with 14nm datasets. Our transfer learning approach obtains estimation accuracy $\geq$96.3% over $\sim$960,000 test points, using only 100 data points for training.
Tasks	Transfer Learning
Published	2018-11-14
URL	http://arxiv.org/abs/1811.06017v1
PDF	http://arxiv.org/pdf/1811.06017v1.pdf
PWC	https://paperswithcode.com/paper/performance-estimation-of-synthesis-flows
Repo
Framework

TwoStreamVAN: Improving Motion Modeling in Video Generation


Title	TwoStreamVAN: Improving Motion Modeling in Video Generation
Authors	Ximeng Sun, Huijuan Xu, Kate Saenko
Abstract	Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content. Existing methods entangle the two intrinsically different tasks of motion and content creation in a single generator network, but this approach struggles to simultaneously generate plausible motion and content. To im-prove motion modeling in video generation tasks, we propose a two-stream model that disentangles motion generation from content generation, called a Two-Stream Variational Adversarial Network (TwoStreamVAN). Given an action label and a noise vector, our model is able to create clear and consistent motion, and thus yields photorealistic videos. The key idea is to progressively generate and fuse multi-scale motion with its corresponding spatial content. Our model significantly outperforms existing methods on the standard Weizmann Human Action, MUG Facial Expression, and VoxCeleb datasets, as well as our new dataset of diverse human actions with challenging and complex motion. Our code is available at https://github.com/sunxm2357/TwoStreamVAN/.
Tasks	Video Generation
Published	2018-12-03
URL	https://arxiv.org/abs/1812.01037v2
PDF	https://arxiv.org/pdf/1812.01037v2.pdf
PWC	https://paperswithcode.com/paper/a-two-stream-variational-adversarial-network
Repo
Framework

Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment


Title	Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment
Authors	Dinghan Shen, Xinyuan Zhang, Ricardo Henao, Lawrence Carin
Abstract	Network embeddings, which learn low-dimensional representations for each vertex in a large-scale network, have received considerable attention in recent years. For a wide range of applications, vertices in a network are typically accompanied by rich textual information such as user profiles, paper abstracts, etc. We propose to incorporate semantic features into network embeddings by matching important words between text sequences for all pairs of vertices. We introduce a word-by-word alignment framework that measures the compatibility of embeddings between word pairs, and then adaptively accumulates these alignment features with a simple yet effective aggregation function. In experiments, we evaluate the proposed framework on three real-world benchmarks for downstream tasks, including link prediction and multi-label vertex classification. Results demonstrate that our model outperforms state-of-the-art network embedding methods by a large margin.
Tasks	Link Prediction, Network Embedding, Word Alignment
Published	2018-08-29
URL	http://arxiv.org/abs/1808.09633v1
PDF	http://arxiv.org/pdf/1808.09633v1.pdf
PWC	https://paperswithcode.com/paper/improved-semantic-aware-network-embedding
Repo
Framework

Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks


Title	Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks
Authors	S. Mohammad Mostafavi I., Lin Wang, Yo-Sung Ho, Kuk-Jin Yoon
Abstract	Event cameras have a lot of advantages over traditional cameras, such as low latency, high temporal resolution, and high dynamic range. However, since the outputs of event cameras are the sequences of asynchronous events overtime rather than actual intensity images, existing algorithms could not be directly applied. Therefore, it is demanding to generate intensity images from events for other tasks. In this paper, we unlock the potential of event camera-based conditional generative adversarial networks to create images/videos from an adjustable portion of the event data stream. The stacks of space-time coordinates of events are used as inputs and the network is trained to reproduce images based on the spatio-temporal intensity changes. The usefulness of event cameras to generate high dynamic range(HDR) images even in extreme illumination conditions and also non blurred images under rapid motion is also shown.In addition, the possibility of generating very high frame rate videos is demonstrated, theoretically up to 1 million frames per second (FPS) since the temporal resolution of event cameras are about 1{\mu}s. Proposed methods are evaluated by comparing the results with the intensity images captured on the same pixel grid-line of events using online available real datasets and synthetic datasets produced by the event camera simulator.
Tasks	Video Generation
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08230v1
PDF	http://arxiv.org/pdf/1811.08230v1.pdf
PWC	https://paperswithcode.com/paper/event-based-high-dynamic-range-image-and-very
Repo
Framework

Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks


Title	Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks
Authors	Koichi Hamada, Kentaro Tachibana, Tianqi Li, Hiroto Honda, Yusuke Uchida
Abstract	We propose Progressive Structure-conditional Generative Adversarial Networks (PSGAN), a new framework that can generate full-body and high-resolution character images based on structural information. Recent progress in generative adversarial networks with progressive training has made it possible to generate high-resolution images. However, existing approaches have limitations in achieving both high image quality and structural consistency at the same time. Our method tackles the limitations by progressively increasing the resolution of both generated images and structural conditions during training. In this paper, we empirically demonstrate the effectiveness of this method by showing the comparison with existing approaches and video generation results of diverse anime characters at 1024x1024 based on target pose sequences. We also create a novel dataset containing full-body 1024x1024 high-resolution images and exact 2D pose keypoints using Unity 3D Avatar models.
Tasks	Video Generation
Published	2018-09-06
URL	http://arxiv.org/abs/1809.01890v1
PDF	http://arxiv.org/pdf/1809.01890v1.pdf
PWC	https://paperswithcode.com/paper/full-body-high-resolution-anime-generation
Repo
Framework

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation


Title	Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation
Authors	Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, Boqing Gong
Abstract	The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip. In this paper, for the sake of both furthering this exploration and our own interest in a realistic application, we study image-to-video translation and particularly focus on the videos of facial expressions. This problem challenges the deep neural networks by another temporal dimension comparing to the image-to-image translation. Moreover, its single input image fails most existing video generation methods that rely on recurrent models. We propose a user-controllable approach so as to generate video clips of various lengths from a single face image. The lengths and types of the expressions are controlled by users. To this end, we design a novel neural network architecture that can incorporate the user input into its skip connections and propose several improvements to the adversarial training method for the neural network. Experiments and user studies verify the effectiveness of our approach. Especially, we would like to highlight that even for the face images in the wild (downloaded from the Web and the authors’ own photos), our model can generate high-quality facial expression videos of which about 50% are labeled as real by Amazon Mechanical Turk workers.
Tasks	Image-to-Image Translation, Video Generation
Published	2018-08-09
URL	http://arxiv.org/abs/1808.02992v1
PDF	http://arxiv.org/pdf/1808.02992v1.pdf
PWC	https://paperswithcode.com/paper/controllable-image-to-video-translation-a
Repo
Framework

Learning to Forecast and Refine Residual Motion for Image-to-Video Generation


Title	Learning to Forecast and Refine Residual Motion for Image-to-Video Generation
Authors	Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris Metaxas
Abstract	We consider the problem of image-to-video translation, where an input image is translated into an output video containing motions of a single object. Recent methods for such problems typically train transformation networks to generate future frames conditioned on the structure sequence. Parallel work has shown that short high-quality motions can be generated by spatiotemporal generative networks that leverage temporal knowledge from the training data. We combine the benefits of both approaches and propose a two-stage generation framework where videos are generated from structures and then refined by temporal signals. To model motions more efficiently, we train networks to learn residual motion between the current and future frames, which avoids learning motion-irrelevant details. We conduct extensive experiments on two image-to-video translation tasks: facial expression retargeting and human pose forecasting. Superior results over the state-of-the-art methods on both tasks demonstrate the effectiveness of our approach.
Tasks	Human Pose Forecasting, Video Generation
Published	2018-07-26
URL	http://arxiv.org/abs/1807.09951v1
PDF	http://arxiv.org/pdf/1807.09951v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-forecast-and-refine-residual
Repo
Framework

Multi-task Prediction of Patient Workload


Title	Multi-task Prediction of Patient Workload
Authors	Mohammad Hessam Olya, Dongxiao Zhu, Kai Yang
Abstract	Developing reliable workload predictive models can affect many aspects of clinical decision making procedure. The primary challenge in healthcare systems is handling the demand uncertainty over the time. This issue becomes more critical for the healthcare facilities that provide service for chronic disease treatment because of the need for continuous treatments over the time. Although some researchers focused on exploring the methods for workload prediction recently, few types of research mainly focused on forecasting a quantitative measure for the workload of healthcare providers. Also, among the mentioned studies most of them just focused on workload prediction within one facility. The drawback of the previous studies is the problem is not investigated for multiple facilities where the quality of provided service, the equipment, and resources used for provided service as well as the diagnosis and treatment procedures may differ even for patients with similar conditions. To tackle the mentioned issue, this paper suggests a framework for patient workload prediction by using patients data from VA facilities across the US. To capture the information of patients with similar attributes and make the prediction more accurate, a heuristic cluster based algorithm for single task learning as well as a multi task learning approach are developed in this research.
Tasks	Decision Making, Multi-Task Learning
Published	2018-12-27
URL	http://arxiv.org/abs/1901.00746v1
PDF	http://arxiv.org/pdf/1901.00746v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-prediction-of-patient-workload
Repo
Framework

Dual Policy Iteration


Title	Dual Policy Iteration
Authors	Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
Abstract	Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [2], AlphaGo-Zero from [27]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan multiple steps ahead. The reactive policy is updated under supervision from the non-reactive policy, while the non-reactive policy is improved with guidance from the reactive policy. In this work we study this Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. We also develop a special instance of this framework which reduces the update of non-reactive policies to model-based optimal control using learned local models, and provides a theoretically sound way of unifying model-free and model-based RL approaches with unknown dynamics. We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes.
Tasks	Continuous Control
Published	2018-05-28
URL	http://arxiv.org/abs/1805.10755v2
PDF	http://arxiv.org/pdf/1805.10755v2.pdf
PWC	https://paperswithcode.com/paper/dual-policy-iteration
Repo
Framework