Paper Group ANR 1160
Tensor Monte Carlo: particle methods for the GPU era. AI in Game Playing: Sokoban Solver. QuAC : Question Answering in Context. Efficient Egocentric Visual Perception Combining Eye-tracking, a Software Retina and Deep Learning. Matrix Factorization via Deep Learning. Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning. Perf …
Tensor Monte Carlo: particle methods for the GPU era
Title | Tensor Monte Carlo: particle methods for the GPU era |
Authors | Laurence Aitchison |
Abstract | Multi-sample, importance-weighted variational autoencoders (IWAE) give tighter bounds and more accurate uncertainty estimates than variational autoencoders (VAE) trained with a standard single-sample objective. However, IWAEs scale poorly: as the latent dimensionality grows, they require exponentially many samples to retain the benefits of importance weighting. While sequential Monte-Carlo (SMC) can address this problem, it is prohibitively slow because the resampling step imposes sequential structure which cannot be parallelised, and moreover, resampling is non-differentiable which is problematic when learning approximate posteriors. To address these issues, we developed tensor Monte-Carlo (TMC) which gives exponentially many importance samples by separately drawing $K$ samples for each of the $n$ latent variables, then averaging over all $K^n$ possible combinations. While the sum over exponentially many terms might seem to be intractable, in many cases it can be computed efficiently as a series of tensor inner-products. We show that TMC is superior to IWAE on a generative model with multiple stochastic layers trained on the MNIST handwritten digit database, and we show that TMC can be combined with standard variance reduction techniques. |
Tasks | |
Published | 2018-06-22 |
URL | http://arxiv.org/abs/1806.08593v3 |
http://arxiv.org/pdf/1806.08593v3.pdf | |
PWC | https://paperswithcode.com/paper/tensor-monte-carlo-particle-methods-for-the |
Repo | |
Framework | |
AI in Game Playing: Sokoban Solver
Title | AI in Game Playing: Sokoban Solver |
Authors | Anand Venkatesan, Atishay Jain, Rakesh Grewal |
Abstract | Artificial Intelligence is becoming instrumental in a variety of applications. Games serve as a good breeding ground for trying and testing these algorithms in a sandbox with simpler constraints in comparison to real life. In this project, we aim to develop an AI agent that can solve the classical Japanese game of Sokoban using various algorithms and heuristics and compare their performances through standard metrics. |
Tasks | |
Published | 2018-06-29 |
URL | http://arxiv.org/abs/1807.00049v1 |
http://arxiv.org/pdf/1807.00049v1.pdf | |
PWC | https://paperswithcode.com/paper/ai-in-game-playing-sokoban-solver |
Repo | |
Framework | |
QuAC : Question Answering in Context
Title | QuAC : Question Answering in Context |
Authors | Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer |
Abstract | We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total). The dialogs involve two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as we show in a detailed qualitative evaluation. We also report results for a number of reference models, including a recently state-of-the-art reading comprehension architecture extended to model dialog context. Our best model underperforms humans by 20 F1, suggesting that there is significant room for future work on this data. Dataset, baseline, and leaderboard available at http://quac.ai. |
Tasks | Question Answering, Reading Comprehension |
Published | 2018-08-21 |
URL | http://arxiv.org/abs/1808.07036v3 |
http://arxiv.org/pdf/1808.07036v3.pdf | |
PWC | https://paperswithcode.com/paper/quac-question-answering-in-context |
Repo | |
Framework | |
Efficient Egocentric Visual Perception Combining Eye-tracking, a Software Retina and Deep Learning
Title | Efficient Egocentric Visual Perception Combining Eye-tracking, a Software Retina and Deep Learning |
Authors | Nina Hristozova, Piotr Ozimek, Jan Paul Siebert |
Abstract | We present ongoing work to harness biological approaches to achieving highly efficient egocentric perception by combining the space-variant imaging architecture of the mammalian retina with Deep Learning methods. By pre-processing images collected by means of eye-tracking glasses to control the fixation locations of a software retina model, we demonstrate that we can reduce the input to a DCNN by a factor of 3, reduce the required number of training epochs and obtain over 98% classification rates when training and validating the system on a database of over 26,000 images of 9 object classes. |
Tasks | Eye Tracking |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01633v1 |
http://arxiv.org/pdf/1809.01633v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-egocentric-visual-perception |
Repo | |
Framework | |
Matrix Factorization via Deep Learning
Title | Matrix Factorization via Deep Learning |
Authors | Duc Minh Nguyen, Evaggelia Tsiligianni, Nikos Deligiannis |
Abstract | Matrix completion is one of the key problems in signal processing and machine learning. In recent years, deep-learning-based models have achieved state-of-the-art results in matrix completion. Nevertheless, they suffer from two drawbacks: (i) they can not be extended easily to rows or columns unseen during training; and (ii) their results are often degraded in case discrete predictions are required. This paper addresses these two drawbacks by presenting a deep matrix factorization model and a generic method to allow joint training of the factorization model and the discretization operator. Experiments on a real movie rating dataset show the efficacy of the proposed models. |
Tasks | Matrix Completion |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01478v1 |
http://arxiv.org/pdf/1812.01478v1.pdf | |
PWC | https://paperswithcode.com/paper/matrix-factorization-via-deep-learning |
Repo | |
Framework | |
Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning
Title | Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning |
Authors | Aapo Hyvarinen, Hiroaki Sasaki, Richard E. Turner |
Abstract | Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. It is based on augmenting the data by an auxiliary variable, such as the time index, the history of the time series, or any other available information. We propose to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized. This enables the framework to be implemented algorithmically through logistic regression, possibly in a neural network. We provide a comprehensive proof of the identifiability of the model as well as the consistency of our estimation method. The approach not only provides a general theoretical framework combining and generalizing previously proposed nonlinear ICA models and algorithms, but also brings practical advantages. |
Tasks | Representation Learning, Time Series, Unsupervised Representation Learning |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08651v3 |
http://arxiv.org/pdf/1805.08651v3.pdf | |
PWC | https://paperswithcode.com/paper/nonlinear-ica-using-auxiliary-variables-and |
Repo | |
Framework | |
Performance Estimation of Synthesis Flows cross Technologies using LSTMs and Transfer Learning
Title | Performance Estimation of Synthesis Flows cross Technologies using LSTMs and Transfer Learning |
Authors | Cunxi Yu, Wang Zhou |
Abstract | Due to the increasing complexity of Integrated Circuits (ICs) and System-on-Chip (SoC), developing high-quality synthesis flows within a short market time becomes more challenging. We propose a general approach that precisely estimates the Quality-of-Result (QoR), such as delay and area, of unseen synthesis flows for specific designs. The main idea is training a Recurrent Neural Network (RNN) regressor, where the flows are inputs and QoRs are ground truth. The RNN regressor is constructed with Long Short-Term Memory (LSTM) and fully-connected layers. This approach is demonstrated with 1.2 million data points collected using 14nm, 7nm regular-voltage (RVT), and 7nm low-voltage (LVT) FinFET technologies with twelve IC designs. The accuracy of predicting the QoRs (delay and area) within one technology is $\boldsymbol{\geq}$\textbf{98.0}% over $\sim$240,000 test points. To enable accurate predictions cross different technologies and different IC designs, we propose a transfer-learning approach that utilizes the model pre-trained with 14nm datasets. Our transfer learning approach obtains estimation accuracy $\geq$96.3% over $\sim$960,000 test points, using only 100 data points for training. |
Tasks | Transfer Learning |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.06017v1 |
http://arxiv.org/pdf/1811.06017v1.pdf | |
PWC | https://paperswithcode.com/paper/performance-estimation-of-synthesis-flows |
Repo | |
Framework | |
TwoStreamVAN: Improving Motion Modeling in Video Generation
Title | TwoStreamVAN: Improving Motion Modeling in Video Generation |
Authors | Ximeng Sun, Huijuan Xu, Kate Saenko |
Abstract | Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content. Existing methods entangle the two intrinsically different tasks of motion and content creation in a single generator network, but this approach struggles to simultaneously generate plausible motion and content. To im-prove motion modeling in video generation tasks, we propose a two-stream model that disentangles motion generation from content generation, called a Two-Stream Variational Adversarial Network (TwoStreamVAN). Given an action label and a noise vector, our model is able to create clear and consistent motion, and thus yields photorealistic videos. The key idea is to progressively generate and fuse multi-scale motion with its corresponding spatial content. Our model significantly outperforms existing methods on the standard Weizmann Human Action, MUG Facial Expression, and VoxCeleb datasets, as well as our new dataset of diverse human actions with challenging and complex motion. Our code is available at https://github.com/sunxm2357/TwoStreamVAN/. |
Tasks | Video Generation |
Published | 2018-12-03 |
URL | https://arxiv.org/abs/1812.01037v2 |
https://arxiv.org/pdf/1812.01037v2.pdf | |
PWC | https://paperswithcode.com/paper/a-two-stream-variational-adversarial-network |
Repo | |
Framework | |
Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment
Title | Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment |
Authors | Dinghan Shen, Xinyuan Zhang, Ricardo Henao, Lawrence Carin |
Abstract | Network embeddings, which learn low-dimensional representations for each vertex in a large-scale network, have received considerable attention in recent years. For a wide range of applications, vertices in a network are typically accompanied by rich textual information such as user profiles, paper abstracts, etc. We propose to incorporate semantic features into network embeddings by matching important words between text sequences for all pairs of vertices. We introduce a word-by-word alignment framework that measures the compatibility of embeddings between word pairs, and then adaptively accumulates these alignment features with a simple yet effective aggregation function. In experiments, we evaluate the proposed framework on three real-world benchmarks for downstream tasks, including link prediction and multi-label vertex classification. Results demonstrate that our model outperforms state-of-the-art network embedding methods by a large margin. |
Tasks | Link Prediction, Network Embedding, Word Alignment |
Published | 2018-08-29 |
URL | http://arxiv.org/abs/1808.09633v1 |
http://arxiv.org/pdf/1808.09633v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-semantic-aware-network-embedding |
Repo | |
Framework | |
Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks
Title | Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks |
Authors | S. Mohammad Mostafavi I., Lin Wang, Yo-Sung Ho, Kuk-Jin Yoon |
Abstract | Event cameras have a lot of advantages over traditional cameras, such as low latency, high temporal resolution, and high dynamic range. However, since the outputs of event cameras are the sequences of asynchronous events overtime rather than actual intensity images, existing algorithms could not be directly applied. Therefore, it is demanding to generate intensity images from events for other tasks. In this paper, we unlock the potential of event camera-based conditional generative adversarial networks to create images/videos from an adjustable portion of the event data stream. The stacks of space-time coordinates of events are used as inputs and the network is trained to reproduce images based on the spatio-temporal intensity changes. The usefulness of event cameras to generate high dynamic range(HDR) images even in extreme illumination conditions and also non blurred images under rapid motion is also shown.In addition, the possibility of generating very high frame rate videos is demonstrated, theoretically up to 1 million frames per second (FPS) since the temporal resolution of event cameras are about 1{\mu}s. Proposed methods are evaluated by comparing the results with the intensity images captured on the same pixel grid-line of events using online available real datasets and synthetic datasets produced by the event camera simulator. |
Tasks | Video Generation |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08230v1 |
http://arxiv.org/pdf/1811.08230v1.pdf | |
PWC | https://paperswithcode.com/paper/event-based-high-dynamic-range-image-and-very |
Repo | |
Framework | |
Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks
Title | Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks |
Authors | Koichi Hamada, Kentaro Tachibana, Tianqi Li, Hiroto Honda, Yusuke Uchida |
Abstract | We propose Progressive Structure-conditional Generative Adversarial Networks (PSGAN), a new framework that can generate full-body and high-resolution character images based on structural information. Recent progress in generative adversarial networks with progressive training has made it possible to generate high-resolution images. However, existing approaches have limitations in achieving both high image quality and structural consistency at the same time. Our method tackles the limitations by progressively increasing the resolution of both generated images and structural conditions during training. In this paper, we empirically demonstrate the effectiveness of this method by showing the comparison with existing approaches and video generation results of diverse anime characters at 1024x1024 based on target pose sequences. We also create a novel dataset containing full-body 1024x1024 high-resolution images and exact 2D pose keypoints using Unity 3D Avatar models. |
Tasks | Video Generation |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.01890v1 |
http://arxiv.org/pdf/1809.01890v1.pdf | |
PWC | https://paperswithcode.com/paper/full-body-high-resolution-anime-generation |
Repo | |
Framework | |
Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation
Title | Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation |
Authors | Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, Boqing Gong |
Abstract | The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip. In this paper, for the sake of both furthering this exploration and our own interest in a realistic application, we study image-to-video translation and particularly focus on the videos of facial expressions. This problem challenges the deep neural networks by another temporal dimension comparing to the image-to-image translation. Moreover, its single input image fails most existing video generation methods that rely on recurrent models. We propose a user-controllable approach so as to generate video clips of various lengths from a single face image. The lengths and types of the expressions are controlled by users. To this end, we design a novel neural network architecture that can incorporate the user input into its skip connections and propose several improvements to the adversarial training method for the neural network. Experiments and user studies verify the effectiveness of our approach. Especially, we would like to highlight that even for the face images in the wild (downloaded from the Web and the authors’ own photos), our model can generate high-quality facial expression videos of which about 50% are labeled as real by Amazon Mechanical Turk workers. |
Tasks | Image-to-Image Translation, Video Generation |
Published | 2018-08-09 |
URL | http://arxiv.org/abs/1808.02992v1 |
http://arxiv.org/pdf/1808.02992v1.pdf | |
PWC | https://paperswithcode.com/paper/controllable-image-to-video-translation-a |
Repo | |
Framework | |
Learning to Forecast and Refine Residual Motion for Image-to-Video Generation
Title | Learning to Forecast and Refine Residual Motion for Image-to-Video Generation |
Authors | Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris Metaxas |
Abstract | We consider the problem of image-to-video translation, where an input image is translated into an output video containing motions of a single object. Recent methods for such problems typically train transformation networks to generate future frames conditioned on the structure sequence. Parallel work has shown that short high-quality motions can be generated by spatiotemporal generative networks that leverage temporal knowledge from the training data. We combine the benefits of both approaches and propose a two-stage generation framework where videos are generated from structures and then refined by temporal signals. To model motions more efficiently, we train networks to learn residual motion between the current and future frames, which avoids learning motion-irrelevant details. We conduct extensive experiments on two image-to-video translation tasks: facial expression retargeting and human pose forecasting. Superior results over the state-of-the-art methods on both tasks demonstrate the effectiveness of our approach. |
Tasks | Human Pose Forecasting, Video Generation |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.09951v1 |
http://arxiv.org/pdf/1807.09951v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-forecast-and-refine-residual |
Repo | |
Framework | |
Multi-task Prediction of Patient Workload
Title | Multi-task Prediction of Patient Workload |
Authors | Mohammad Hessam Olya, Dongxiao Zhu, Kai Yang |
Abstract | Developing reliable workload predictive models can affect many aspects of clinical decision making procedure. The primary challenge in healthcare systems is handling the demand uncertainty over the time. This issue becomes more critical for the healthcare facilities that provide service for chronic disease treatment because of the need for continuous treatments over the time. Although some researchers focused on exploring the methods for workload prediction recently, few types of research mainly focused on forecasting a quantitative measure for the workload of healthcare providers. Also, among the mentioned studies most of them just focused on workload prediction within one facility. The drawback of the previous studies is the problem is not investigated for multiple facilities where the quality of provided service, the equipment, and resources used for provided service as well as the diagnosis and treatment procedures may differ even for patients with similar conditions. To tackle the mentioned issue, this paper suggests a framework for patient workload prediction by using patients data from VA facilities across the US. To capture the information of patients with similar attributes and make the prediction more accurate, a heuristic cluster based algorithm for single task learning as well as a multi task learning approach are developed in this research. |
Tasks | Decision Making, Multi-Task Learning |
Published | 2018-12-27 |
URL | http://arxiv.org/abs/1901.00746v1 |
http://arxiv.org/pdf/1901.00746v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-prediction-of-patient-workload |
Repo | |
Framework | |
Dual Policy Iteration
Title | Dual Policy Iteration |
Authors | Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell |
Abstract | Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [2], AlphaGo-Zero from [27]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan multiple steps ahead. The reactive policy is updated under supervision from the non-reactive policy, while the non-reactive policy is improved with guidance from the reactive policy. In this work we study this Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. We also develop a special instance of this framework which reduces the update of non-reactive policies to model-based optimal control using learned local models, and provides a theoretically sound way of unifying model-free and model-based RL approaches with unknown dynamics. We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes. |
Tasks | Continuous Control |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.10755v2 |
http://arxiv.org/pdf/1805.10755v2.pdf | |
PWC | https://paperswithcode.com/paper/dual-policy-iteration |
Repo | |
Framework | |