October 19, 2019

2860 words 14 mins read

Paper Group ANR 172

Paper Group ANR 172

Artificial Retina Using A Hybrid Neural Network With Spatial Transform Capability. A Hybrid Variational Autoencoder for Collaborative Filtering. Sample-Efficient Policy Learning based on Completely Behavior Cloning. Flow-Grounded Spatial-Temporal Video Prediction from Still Images. Video Prediction with Appearance and Motion Conditions. Robust 3D H …

Artificial Retina Using A Hybrid Neural Network With Spatial Transform Capability

Title Artificial Retina Using A Hybrid Neural Network With Spatial Transform Capability
Authors Richard Wood, Alexander McGlashan, C. B. Moon, W. Y. Kim
Abstract This paper covers the design and programming of a hybrid (digital/analog) neural network to function as an artificial retina with the ability to perform a spatial discrete cosine transform. We describe the structure of the circuit, which uses an analog cell that is interlinked using a programmable digital array. The paper is broken into three main parts. First, we present the results of a Matlab simulation. Then we show the circuit simulation in Spice. This is followed by a demonstration of the practical device. This system has intentionally separated components with the specialty analog circuits being separated from the readily available digital field programmable gate array (FPGA) components. Further development includes the use of rapid manufacture-able organic electronics used for the analog components. The planned uses for this platform include crowd development of software that uses the underlying pulse based processing. The development package will include simulators in the form of Matlab and Spice type software platforms.
Tasks
Published 2018-11-26
URL http://arxiv.org/abs/1811.10126v1
PDF http://arxiv.org/pdf/1811.10126v1.pdf
PWC https://paperswithcode.com/paper/artificial-retina-using-a-hybrid-neural
Repo
Framework

A Hybrid Variational Autoencoder for Collaborative Filtering

Title A Hybrid Variational Autoencoder for Collaborative Filtering
Authors Kilol Gupta, Mukund Yelahanka Raghuprasad, Pankhuri Kumar
Abstract In today’s day and age when almost every industry has an online presence with users interacting in online marketplaces, personalized recommendations have become quite important. Traditionally, the problem of collaborative filtering has been tackled using Matrix Factorization which is linear in nature. We extend the work of [11] on using variational autoencoders (VAEs) for collaborative filtering with implicit feedback by proposing a hybrid, multi-modal approach. Our approach combines movie embeddings (learned from a sibling VAE network) with user ratings from the Movielens 20M dataset and applies it to the task of movie recommendation. We empirically show how the VAE network is empowered by incorporating movie embeddings. We also visualize movie and user embeddings by clustering their latent representations obtained from a VAE.
Tasks
Published 2018-07-14
URL http://arxiv.org/abs/1808.01006v2
PDF http://arxiv.org/pdf/1808.01006v2.pdf
PWC https://paperswithcode.com/paper/a-hybrid-variational-autoencoder-for
Repo
Framework

Sample-Efficient Policy Learning based on Completely Behavior Cloning

Title Sample-Efficient Policy Learning based on Completely Behavior Cloning
Authors Qiming Zou, Ling Wang, Ke Lu, Yu Li
Abstract Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise affine (PWA) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better.
Tasks
Published 2018-11-09
URL http://arxiv.org/abs/1811.03853v1
PDF http://arxiv.org/pdf/1811.03853v1.pdf
PWC https://paperswithcode.com/paper/sample-efficient-policy-learning-based-on
Repo
Framework

Flow-Grounded Spatial-Temporal Video Prediction from Still Images

Title Flow-Grounded Spatial-Temporal Video Prediction from Still Images
Authors Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang
Abstract Existing video prediction methods mainly rely on observing multiple historical frames or focus on predicting the next one-frame. In this work, we study the problem of generating consecutive multiple future frames by observing one single still image only. We formulate the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase. The multi-flow prediction is modeled in a variational probabilistic manner with spatial-temporal relationships learned through 3D convolutions. The flow-to-frame synthesis is modeled as a generative process in order to keep the predicted results lying closer to the manifold shape of real video sequence. Such a two-phase design prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results. Extensive experimental results on videos with different types of motion show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and human perceptual evaluation.
Tasks Video Prediction
Published 2018-07-25
URL http://arxiv.org/abs/1807.09755v2
PDF http://arxiv.org/pdf/1807.09755v2.pdf
PWC https://paperswithcode.com/paper/flow-grounded-spatial-temporal-video
Repo
Framework

Video Prediction with Appearance and Motion Conditions

Title Video Prediction with Appearance and Motion Conditions
Authors Yunseok Jang, Gunhee Kim, Yale Song
Abstract Video prediction aims to generate realistic future frames by learning dynamic visual patterns. One fundamental challenge is to deal with future uncertainty: How should a model behave when there are multiple correct, equally probable future? We propose an Appearance-Motion Conditional GAN to address this challenge. We provide appearance and motion information as conditions that specify how the future may look like, reducing the level of uncertainty. Our model consists of a generator, two discriminators taking charge of appearance and motion pathways, and a perceptual ranking module that encourages videos of similar conditions to look similar. To train our model, we develop a novel conditioning scheme that consists of different combinations of appearance and motion conditions. We evaluate our model using facial expression and human action datasets and report favorable results compared to existing methods.
Tasks Video Prediction
Published 2018-07-07
URL http://arxiv.org/abs/1807.02635v1
PDF http://arxiv.org/pdf/1807.02635v1.pdf
PWC https://paperswithcode.com/paper/video-prediction-with-appearance-and-motion
Repo
Framework

Robust 3D Human Motion Reconstruction Via Dynamic Template Construction

Title Robust 3D Human Motion Reconstruction Via Dynamic Template Construction
Authors Zhong Li, Yu Ji, Wei Yang, Jinwei Ye, Jingyi Yu
Abstract In multi-view human body capture systems, the recovered 3D geometry or even the acquired imagery data can be heavily corrupted due to occlusions, noise, limited field of- view, etc. Direct estimation of 3D pose, body shape or motion on these low-quality data has been traditionally challenging.In this paper, we present a graph-based non-rigid shape registration framework that can simultaneously recover 3D human body geometry and estimate pose/motion at high fidelity.Our approach first generates a global full-body template by registering all poses in the acquired motion sequence.We then construct a deformable graph by utilizing the rigid components in the global template. We directly warp the global template graph back to each motion frame in order to fill in missing geometry. Specifically, we combine local rigidity and temporal coherence constraints to maintain geometry and motion consistencies. Comprehensive experiments on various scenes show that our method is accurate and robust even in the presence of drastic motions.
Tasks
Published 2018-01-31
URL http://arxiv.org/abs/1801.10434v1
PDF http://arxiv.org/pdf/1801.10434v1.pdf
PWC https://paperswithcode.com/paper/robust-3d-human-motion-reconstruction-via
Repo
Framework

Uncorrelated Feature Encoding for Faster Image Style Transfer

Title Uncorrelated Feature Encoding for Faster Image Style Transfer
Authors Minseong Kim, Jongju Shin, Myung-Cheol Roh, Hyun-Chul Choi
Abstract Recent fast style transfer methods use a pre-trained convolutional neural network as a feature encoder and a perceptual loss network. Although the pre-trained network is used to generate responses of receptive fields effective for representing style and content of image, it is not optimized for image style transfer but rather for image classification. Furthermore, it also requires a time-consuming and correlation-considering feature alignment process for image style transfer because of its inter-channel correlation. In this paper, we propose an end-to-end learning method which optimizes an encoder/decoder network for the purpose of style transfer as well as relieves the feature alignment complexity from considering inter-channel correlation. We used uncorrelation loss, i.e., the total correlation coefficient between the responses of different encoder channels, with style and content losses for training style transfer network. This makes the encoder network to be trained to generate inter-channel uncorrelated features and to be optimized for the task of image style transfer which maintained the quality of image style only with a light-weighted and correlation-unaware feature alignment process. Moreover, our method drastically reduced redundant channels of the encoded feature and this resulted in the efficient size of structure of network and faster forward processing speed. Our method can also be applied to cascade network scheme for multiple scaled style transferring and allows user-control of style strength by using a content-style trade-off parameter.
Tasks Image Classification, Style Transfer
Published 2018-07-04
URL http://arxiv.org/abs/1807.01493v1
PDF http://arxiv.org/pdf/1807.01493v1.pdf
PWC https://paperswithcode.com/paper/uncorrelated-feature-encoding-for-faster
Repo
Framework

MVOR: A Multi-view RGB-D Operating Room Dataset for 2D and 3D Human Pose Estimation

Title MVOR: A Multi-view RGB-D Operating Room Dataset for 2D and 3D Human Pose Estimation
Authors Vinkle Srivastav, Thibaut Issenhuth, Abdolrahim Kadkhodamohammadi, Michel de Mathelin, Afshin Gangi, Nicolas Padoy
Abstract Person detection and pose estimation is a key requirement to develop intelligent context-aware assistance systems. To foster the development of human pose estimation methods and their applications in the Operating Room (OR), we release the Multi-View Operating Room (MVOR) dataset, the first public dataset recorded during real clinical interventions. It consists of 732 synchronized multi-view frames recorded by three RGB-D cameras in a hybrid OR. It also includes the visual challenges present in such environments, such as occlusions and clutter. We provide camera calibration parameters, color and depth frames, human bounding boxes, and 2D/3D pose annotations. In this paper, we present the dataset, its annotations, as well as baseline results from several recent person detection and 2D/3D pose estimation methods. Since we need to blur some parts of the images to hide identity and nudity in the released dataset, we also present a comparative study of how the baselines have been impacted by the blurring. Results show a large margin for improvement and suggest that the MVOR dataset can be useful to compare the performance of the different methods.
Tasks 3D Human Pose Estimation, 3D Pose Estimation, Calibration, Human Detection, Pose Estimation
Published 2018-08-24
URL https://arxiv.org/abs/1808.08180v2
PDF https://arxiv.org/pdf/1808.08180v2.pdf
PWC https://paperswithcode.com/paper/mvor-a-multi-view-rgb-d-operating-room
Repo
Framework

Data-Driven Methods for Solving Algebra Word Problems

Title Data-Driven Methods for Solving Algebra Word Problems
Authors Benjamin Robaidek, Rik Koncel-Kedziorski, Hannaneh Hajishirzi
Abstract We explore contemporary, data-driven techniques for solving math word problems over recent large-scale datasets. We show that well-tuned neural equation classifiers can outperform more sophisticated models such as sequence to sequence and self-attention across these datasets. Our error analysis indicates that, while fully data driven models show some promise, semantic and world knowledge is necessary for further advances.
Tasks
Published 2018-04-28
URL http://arxiv.org/abs/1804.10718v1
PDF http://arxiv.org/pdf/1804.10718v1.pdf
PWC https://paperswithcode.com/paper/data-driven-methods-for-solving-algebra-word
Repo
Framework

Neuromodulated Learning in Deep Neural Networks

Title Neuromodulated Learning in Deep Neural Networks
Authors Dennis G Wilson, Sylvain Cussat-Blanc, Hervé Luga, Kyle Harrington
Abstract In the brain, learning signals change over time and synaptic location, and are applied based on the learning history at the synapse, in the complex process of neuromodulation. Learning in artificial neural networks, on the other hand, is shaped by hyper-parameters set before learning starts, which remain static throughout learning, and which are uniform for the entire network. In this work, we propose a method of deep artificial neuromodulation which applies the concepts of biological neuromodulation to stochastic gradient descent. Evolved neuromodulatory dynamics modify learning parameters at each layer in a deep neural network over the course of the network’s training. We show that the same neuromodulatory dynamics can be applied to different models and can scale to new problems not encountered during evolution. Finally, we examine the evolved neuromodulation, showing that evolution found dynamic, location-specific learning strategies.
Tasks
Published 2018-12-05
URL http://arxiv.org/abs/1812.03365v1
PDF http://arxiv.org/pdf/1812.03365v1.pdf
PWC https://paperswithcode.com/paper/neuromodulated-learning-in-deep-neural
Repo
Framework

Cost-Sensitive Learning for Predictive Maintenance

Title Cost-Sensitive Learning for Predictive Maintenance
Authors Stephan Spiegel, Fabian Mueller, Dorothea Weismann, John Bird
Abstract In predictive maintenance, model performance is usually assessed by means of precision, recall, and F1-score. However, employing the model with best performance, e.g. highest F1-score, does not necessarily result in minimum maintenance cost, but can instead lead to additional expenses. Thus, we propose to perform model selection based on the economic costs associated with the particular maintenance application. We show that cost-sensitive learning for predictive maintenance can result in significant cost reduction and fault tolerant policies, since it allows to incorporate various business constraints and requirements.
Tasks Model Selection
Published 2018-09-28
URL http://arxiv.org/abs/1809.10979v1
PDF http://arxiv.org/pdf/1809.10979v1.pdf
PWC https://paperswithcode.com/paper/cost-sensitive-learning-for-predictive
Repo
Framework

A study on speech enhancement using exponent-only floating point quantized neural network (EOFP-QNN)

Title A study on speech enhancement using exponent-only floating point quantized neural network (EOFP-QNN)
Authors Yi-Te Hsu, Yu-Chen Lin, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo
Abstract Numerous studies have investigated the effectiveness of neural network quantization on pattern classification tasks. The present study, for the first time, investigated the performance of speech enhancement (a regression task in speech processing) using a novel exponent-only floating-point quantized neural network (EOFP-QNN). The proposed EOFP-QNN consists of two stages: mantissa-quantization and exponent-quantization. In the mantissa-quantization stage, EOFP-QNN learns how to quantize the mantissa bits of the model parameters while preserving the regression accuracy using the least mantissa precision. In the exponent-quantization stage, the exponent part of the parameters is further quantized without causing any additional performance degradation. We evaluated the proposed EOFP quantization technique on two types of neural networks, namely, bidirectional long short-term memory (BLSTM) and fully convolutional neural network (FCN), on a speech enhancement task. Experimental results showed that the model sizes can be significantly reduced (the model sizes of the quantized BLSTM and FCN models were only 18.75% and 21.89%, respectively, compared to those of the original models) while maintaining satisfactory speech-enhancement performance.
Tasks Quantization, Speech Enhancement
Published 2018-08-17
URL http://arxiv.org/abs/1808.06474v4
PDF http://arxiv.org/pdf/1808.06474v4.pdf
PWC https://paperswithcode.com/paper/a-study-on-speech-enhancement-using-exponent
Repo
Framework

Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

Title Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
Authors Mahammad Humayoo, Xueqi Cheng
Abstract Off-policy learning is more unstable compared to on-policy learning in reinforcement learning (RL). One reason for the instability of off-policy learning is a discrepancy between the target ($\pi$) and behavior (b) policy distributions. The discrepancy between $\pi$ and b distributions can be alleviated by employing a smooth variant of the importance sampling (IS), such as the relative importance sampling (RIS). RIS has parameter $\beta\in[0, 1]$ which controls smoothness. To cope with instability, we present the first relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free algorithms in RL. In our method, the network yields a target policy (the actor), a value function (the critic) assessing the current policy ($\pi$) using samples drawn from behavior policy. We use action value generated from the behavior policy in reward function to train our algorithm rather than from the target policy. We also use deep neural networks to train both actor and critic. We evaluated our algorithm on a number of Open AI Gym benchmark problems and demonstrate better or comparable performance to several state-of-the-art RL baselines.
Tasks
Published 2018-10-30
URL https://arxiv.org/abs/1810.12558v6
PDF https://arxiv.org/pdf/1810.12558v6.pdf
PWC https://paperswithcode.com/paper/relative-importance-sampling-for-off-policy
Repo
Framework

Learning representations of molecules and materials with atomistic neural networks

Title Learning representations of molecules and materials with atomistic neural networks
Authors Kristof T. Schütt, Alexandre Tkatchenko, Klaus-Robert Müller
Abstract Deep Learning has been shown to learn efficient representations for structured data such as image, text or audio. In this chapter, we present neural network architectures that are able to learn efficient representations of molecules and materials. In particular, the continuous-filter convolutional network SchNet accurately predicts chemical properties across compositional and configurational space on a variety of datasets. Beyond that, we analyze the obtained representations to find evidence that their spatial and chemical properties agree with chemical intuition.
Tasks
Published 2018-12-11
URL http://arxiv.org/abs/1812.04690v1
PDF http://arxiv.org/pdf/1812.04690v1.pdf
PWC https://paperswithcode.com/paper/learning-representations-of-molecules-and
Repo
Framework

Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings

Title Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings
Authors Huan Gui, Qi Zhu, Liyuan Liu, Aston Zhang, Jiawei Han
Abstract Expert finding is an important task in both industry and academia. It is challenging to rank candidates with appropriate expertise for various queries. In addition, different types of objects interact with one another, which naturally forms heterogeneous information networks. We study the task of expert finding in heterogeneous bibliographical networks based on two aspects: textual content analysis and authority ranking. Regarding the textual content analysis, we propose a new method for query expansion via locally-trained embedding learning with concept hierarchy as guidance, which is particularly tailored for specific queries with narrow semantic meanings. Compared with global embedding learning, locally-trained embedding learning projects the terms into a latent semantic space constrained on relevant topics, therefore it preserves more precise and subtle information for specific queries. Considering the candidate ranking, the heterogeneous information network structure, while being largely ignored in the previous studies of expert finding, provides additional information. Specifically, different types of interactions among objects play different roles. We propose a ranking algorithm to estimate the authority of objects in the network, treating each strongly-typed edge type individually. To demonstrate the effectiveness of the proposed framework, we apply the proposed method to a large-scale bibliographical dataset with over two million entries and one million researcher candidates. The experiment results show that the proposed framework outperforms existing methods for both general and specific queries.
Tasks
Published 2018-03-09
URL http://arxiv.org/abs/1803.03370v1
PDF http://arxiv.org/pdf/1803.03370v1.pdf
PWC https://paperswithcode.com/paper/expert-finding-in-heterogeneous-bibliographic
Repo
Framework
comments powered by Disqus