October 19, 2019

2860 words 14 mins read

Paper Group ANR 172

Artificial Retina Using A Hybrid Neural Network With Spatial Transform Capability. A Hybrid Variational Autoencoder for Collaborative Filtering. Sample-Efficient Policy Learning based on Completely Behavior Cloning. Flow-Grounded Spatial-Temporal Video Prediction from Still Images. Video Prediction with Appearance and Motion Conditions. Robust 3D H …

Artificial Retina Using A Hybrid Neural Network With Spatial Transform Capability


Title	Artificial Retina Using A Hybrid Neural Network With Spatial Transform Capability
Authors	Richard Wood, Alexander McGlashan, C. B. Moon, W. Y. Kim
Abstract	This paper covers the design and programming of a hybrid (digital/analog) neural network to function as an artificial retina with the ability to perform a spatial discrete cosine transform. We describe the structure of the circuit, which uses an analog cell that is interlinked using a programmable digital array. The paper is broken into three main parts. First, we present the results of a Matlab simulation. Then we show the circuit simulation in Spice. This is followed by a demonstration of the practical device. This system has intentionally separated components with the specialty analog circuits being separated from the readily available digital field programmable gate array (FPGA) components. Further development includes the use of rapid manufacture-able organic electronics used for the analog components. The planned uses for this platform include crowd development of software that uses the underlying pulse based processing. The development package will include simulators in the form of Matlab and Spice type software platforms.
Tasks
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10126v1
PDF	http://arxiv.org/pdf/1811.10126v1.pdf
PWC	https://paperswithcode.com/paper/artificial-retina-using-a-hybrid-neural
Repo
Framework

A Hybrid Variational Autoencoder for Collaborative Filtering


Title	A Hybrid Variational Autoencoder for Collaborative Filtering
Authors	Kilol Gupta, Mukund Yelahanka Raghuprasad, Pankhuri Kumar
Abstract	In today’s day and age when almost every industry has an online presence with users interacting in online marketplaces, personalized recommendations have become quite important. Traditionally, the problem of collaborative filtering has been tackled using Matrix Factorization which is linear in nature. We extend the work of [11] on using variational autoencoders (VAEs) for collaborative filtering with implicit feedback by proposing a hybrid, multi-modal approach. Our approach combines movie embeddings (learned from a sibling VAE network) with user ratings from the Movielens 20M dataset and applies it to the task of movie recommendation. We empirically show how the VAE network is empowered by incorporating movie embeddings. We also visualize movie and user embeddings by clustering their latent representations obtained from a VAE.
Tasks
Published	2018-07-14
URL	http://arxiv.org/abs/1808.01006v2
PDF	http://arxiv.org/pdf/1808.01006v2.pdf
PWC	https://paperswithcode.com/paper/a-hybrid-variational-autoencoder-for
Repo
Framework

Sample-Efficient Policy Learning based on Completely Behavior Cloning


Title	Sample-Efficient Policy Learning based on Completely Behavior Cloning
Authors	Qiming Zou, Ling Wang, Ke Lu, Yu Li
Abstract	Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise affine (PWA) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better.
Tasks
Published	2018-11-09
URL	http://arxiv.org/abs/1811.03853v1
PDF	http://arxiv.org/pdf/1811.03853v1.pdf
PWC	https://paperswithcode.com/paper/sample-efficient-policy-learning-based-on
Repo
Framework

Flow-Grounded Spatial-Temporal Video Prediction from Still Images


Title	Flow-Grounded Spatial-Temporal Video Prediction from Still Images
Authors	Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang
Abstract	Existing video prediction methods mainly rely on observing multiple historical frames or focus on predicting the next one-frame. In this work, we study the problem of generating consecutive multiple future frames by observing one single still image only. We formulate the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase. The multi-flow prediction is modeled in a variational probabilistic manner with spatial-temporal relationships learned through 3D convolutions. The flow-to-frame synthesis is modeled as a generative process in order to keep the predicted results lying closer to the manifold shape of real video sequence. Such a two-phase design prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results. Extensive experimental results on videos with different types of motion show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and human perceptual evaluation.
Tasks	Video Prediction
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09755v2
PDF	http://arxiv.org/pdf/1807.09755v2.pdf
PWC	https://paperswithcode.com/paper/flow-grounded-spatial-temporal-video
Repo
Framework

Video Prediction with Appearance and Motion Conditions


Title	Video Prediction with Appearance and Motion Conditions
Authors	Yunseok Jang, Gunhee Kim, Yale Song
Abstract	Video prediction aims to generate realistic future frames by learning dynamic visual patterns. One fundamental challenge is to deal with future uncertainty: How should a model behave when there are multiple correct, equally probable future? We propose an Appearance-Motion Conditional GAN to address this challenge. We provide appearance and motion information as conditions that specify how the future may look like, reducing the level of uncertainty. Our model consists of a generator, two discriminators taking charge of appearance and motion pathways, and a perceptual ranking module that encourages videos of similar conditions to look similar. To train our model, we develop a novel conditioning scheme that consists of different combinations of appearance and motion conditions. We evaluate our model using facial expression and human action datasets and report favorable results compared to existing methods.
Tasks	Video Prediction
Published	2018-07-07
URL	http://arxiv.org/abs/1807.02635v1
PDF	http://arxiv.org/pdf/1807.02635v1.pdf
PWC	https://paperswithcode.com/paper/video-prediction-with-appearance-and-motion
Repo
Framework

Robust 3D Human Motion Reconstruction Via Dynamic Template Construction


Title	Robust 3D Human Motion Reconstruction Via Dynamic Template Construction
Authors	Zhong Li, Yu Ji, Wei Yang, Jinwei Ye, Jingyi Yu
Abstract	In multi-view human body capture systems, the recovered 3D geometry or even the acquired imagery data can be heavily corrupted due to occlusions, noise, limited field of- view, etc. Direct estimation of 3D pose, body shape or motion on these low-quality data has been traditionally challenging.In this paper, we present a graph-based non-rigid shape registration framework that can simultaneously recover 3D human body geometry and estimate pose/motion at high fidelity.Our approach first generates a global full-body template by registering all poses in the acquired motion sequence.We then construct a deformable graph by utilizing the rigid components in the global template. We directly warp the global template graph back to each motion frame in order to fill in missing geometry. Specifically, we combine local rigidity and temporal coherence constraints to maintain geometry and motion consistencies. Comprehensive experiments on various scenes show that our method is accurate and robust even in the presence of drastic motions.
Tasks
Published	2018-01-31
URL	http://arxiv.org/abs/1801.10434v1
PDF	http://arxiv.org/pdf/1801.10434v1.pdf
PWC	https://paperswithcode.com/paper/robust-3d-human-motion-reconstruction-via
Repo
Framework

Uncorrelated Feature Encoding for Faster Image Style Transfer


Title	Uncorrelated Feature Encoding for Faster Image Style Transfer
Authors	Minseong Kim, Jongju Shin, Myung-Cheol Roh, Hyun-Chul Choi
Abstract	Recent fast style transfer methods use a pre-trained convolutional neural network as a feature encoder and a perceptual loss network. Although the pre-trained network is used to generate responses of receptive fields effective for representing style and content of image, it is not optimized for image style transfer but rather for image classification. Furthermore, it also requires a time-consuming and correlation-considering feature alignment process for image style transfer because of its inter-channel correlation. In this paper, we propose an end-to-end learning method which optimizes an encoder/decoder network for the purpose of style transfer as well as relieves the feature alignment complexity from considering inter-channel correlation. We used uncorrelation loss, i.e., the total correlation coefficient between the responses of different encoder channels, with style and content losses for training style transfer network. This makes the encoder network to be trained to generate inter-channel uncorrelated features and to be optimized for the task of image style transfer which maintained the quality of image style only with a light-weighted and correlation-unaware feature alignment process. Moreover, our method drastically reduced redundant channels of the encoded feature and this resulted in the efficient size of structure of network and faster forward processing speed. Our method can also be applied to cascade network scheme for multiple scaled style transferring and allows user-control of style strength by using a content-style trade-off parameter.
Tasks	Image Classification, Style Transfer
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01493v1
PDF	http://arxiv.org/pdf/1807.01493v1.pdf
PWC	https://paperswithcode.com/paper/uncorrelated-feature-encoding-for-faster
Repo
Framework

MVOR: A Multi-view RGB-D Operating Room Dataset for 2D and 3D Human Pose Estimation


Title	MVOR: A Multi-view RGB-D Operating Room Dataset for 2D and 3D Human Pose Estimation
Authors	Vinkle Srivastav, Thibaut Issenhuth, Abdolrahim Kadkhodamohammadi, Michel de Mathelin, Afshin Gangi, Nicolas Padoy
Abstract	Person detection and pose estimation is a key requirement to develop intelligent context-aware assistance systems. To foster the development of human pose estimation methods and their applications in the Operating Room (OR), we release the Multi-View Operating Room (MVOR) dataset, the first public dataset recorded during real clinical interventions. It consists of 732 synchronized multi-view frames recorded by three RGB-D cameras in a hybrid OR. It also includes the visual challenges present in such environments, such as occlusions and clutter. We provide camera calibration parameters, color and depth frames, human bounding boxes, and 2D/3D pose annotations. In this paper, we present the dataset, its annotations, as well as baseline results from several recent person detection and 2D/3D pose estimation methods. Since we need to blur some parts of the images to hide identity and nudity in the released dataset, we also present a comparative study of how the baselines have been impacted by the blurring. Results show a large margin for improvement and suggest that the MVOR dataset can be useful to compare the performance of the different methods.
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Calibration, Human Detection, Pose Estimation
Published	2018-08-24
URL	https://arxiv.org/abs/1808.08180v2
PDF	https://arxiv.org/pdf/1808.08180v2.pdf
PWC	https://paperswithcode.com/paper/mvor-a-multi-view-rgb-d-operating-room
Repo
Framework

Data-Driven Methods for Solving Algebra Word Problems


Title	Data-Driven Methods for Solving Algebra Word Problems
Authors	Benjamin Robaidek, Rik Koncel-Kedziorski, Hannaneh Hajishirzi
Abstract	We explore contemporary, data-driven techniques for solving math word problems over recent large-scale datasets. We show that well-tuned neural equation classifiers can outperform more sophisticated models such as sequence to sequence and self-attention across these datasets. Our error analysis indicates that, while fully data driven models show some promise, semantic and world knowledge is necessary for further advances.
Tasks
Published	2018-04-28
URL	http://arxiv.org/abs/1804.10718v1
PDF	http://arxiv.org/pdf/1804.10718v1.pdf
PWC	https://paperswithcode.com/paper/data-driven-methods-for-solving-algebra-word
Repo
Framework

Neuromodulated Learning in Deep Neural Networks


Title	Neuromodulated Learning in Deep Neural Networks
Authors	Dennis G Wilson, Sylvain Cussat-Blanc, Hervé Luga, Kyle Harrington
Abstract	In the brain, learning signals change over time and synaptic location, and are applied based on the learning history at the synapse, in the complex process of neuromodulation. Learning in artificial neural networks, on the other hand, is shaped by hyper-parameters set before learning starts, which remain static throughout learning, and which are uniform for the entire network. In this work, we propose a method of deep artificial neuromodulation which applies the concepts of biological neuromodulation to stochastic gradient descent. Evolved neuromodulatory dynamics modify learning parameters at each layer in a deep neural network over the course of the network’s training. We show that the same neuromodulatory dynamics can be applied to different models and can scale to new problems not encountered during evolution. Finally, we examine the evolved neuromodulation, showing that evolution found dynamic, location-specific learning strategies.
Tasks
Published	2018-12-05
URL	http://arxiv.org/abs/1812.03365v1
PDF	http://arxiv.org/pdf/1812.03365v1.pdf
PWC	https://paperswithcode.com/paper/neuromodulated-learning-in-deep-neural
Repo
Framework

Cost-Sensitive Learning for Predictive Maintenance


Title	Cost-Sensitive Learning for Predictive Maintenance
Authors	Stephan Spiegel, Fabian Mueller, Dorothea Weismann, John Bird
Abstract	In predictive maintenance, model performance is usually assessed by means of precision, recall, and F1-score. However, employing the model with best performance, e.g. highest F1-score, does not necessarily result in minimum maintenance cost, but can instead lead to additional expenses. Thus, we propose to perform model selection based on the economic costs associated with the particular maintenance application. We show that cost-sensitive learning for predictive maintenance can result in significant cost reduction and fault tolerant policies, since it allows to incorporate various business constraints and requirements.
Tasks	Model Selection
Published	2018-09-28
URL	http://arxiv.org/abs/1809.10979v1
PDF	http://arxiv.org/pdf/1809.10979v1.pdf
PWC	https://paperswithcode.com/paper/cost-sensitive-learning-for-predictive
Repo
Framework

A study on speech enhancement using exponent-only floating point quantized neural network (EOFP-QNN)


Title	A study on speech enhancement using exponent-only floating point quantized neural network (EOFP-QNN)
Authors	Yi-Te Hsu, Yu-Chen Lin, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo
Abstract	Numerous studies have investigated the effectiveness of neural network quantization on pattern classification tasks. The present study, for the first time, investigated the performance of speech enhancement (a regression task in speech processing) using a novel exponent-only floating-point quantized neural network (EOFP-QNN). The proposed EOFP-QNN consists of two stages: mantissa-quantization and exponent-quantization. In the mantissa-quantization stage, EOFP-QNN learns how to quantize the mantissa bits of the model parameters while preserving the regression accuracy using the least mantissa precision. In the exponent-quantization stage, the exponent part of the parameters is further quantized without causing any additional performance degradation. We evaluated the proposed EOFP quantization technique on two types of neural networks, namely, bidirectional long short-term memory (BLSTM) and fully convolutional neural network (FCN), on a speech enhancement task. Experimental results showed that the model sizes can be significantly reduced (the model sizes of the quantized BLSTM and FCN models were only 18.75% and 21.89%, respectively, compared to those of the original models) while maintaining satisfactory speech-enhancement performance.
Tasks	Quantization, Speech Enhancement
Published	2018-08-17
URL	http://arxiv.org/abs/1808.06474v4
PDF	http://arxiv.org/pdf/1808.06474v4.pdf
PWC	https://paperswithcode.com/paper/a-study-on-speech-enhancement-using-exponent
Repo
Framework

Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning


Title	Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
Authors	Mahammad Humayoo, Xueqi Cheng
Abstract	Off-policy learning is more unstable compared to on-policy learning in reinforcement learning (RL). One reason for the instability of off-policy learning is a discrepancy between the target ($\pi$) and behavior (b) policy distributions. The discrepancy between $\pi$ and b distributions can be alleviated by employing a smooth variant of the importance sampling (IS), such as the relative importance sampling (RIS). RIS has parameter $\beta\in[0, 1]$ which controls smoothness. To cope with instability, we present the first relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free algorithms in RL. In our method, the network yields a target policy (the actor), a value function (the critic) assessing the current policy ($\pi$) using samples drawn from behavior policy. We use action value generated from the behavior policy in reward function to train our algorithm rather than from the target policy. We also use deep neural networks to train both actor and critic. We evaluated our algorithm on a number of Open AI Gym benchmark problems and demonstrate better or comparable performance to several state-of-the-art RL baselines.
Tasks
Published	2018-10-30
URL	https://arxiv.org/abs/1810.12558v6
PDF	https://arxiv.org/pdf/1810.12558v6.pdf
PWC	https://paperswithcode.com/paper/relative-importance-sampling-for-off-policy
Repo
Framework

Learning representations of molecules and materials with atomistic neural networks


Title	Learning representations of molecules and materials with atomistic neural networks
Authors	Kristof T. Schütt, Alexandre Tkatchenko, Klaus-Robert Müller
Abstract	Deep Learning has been shown to learn efficient representations for structured data such as image, text or audio. In this chapter, we present neural network architectures that are able to learn efficient representations of molecules and materials. In particular, the continuous-filter convolutional network SchNet accurately predicts chemical properties across compositional and configurational space on a variety of datasets. Beyond that, we analyze the obtained representations to find evidence that their spatial and chemical properties agree with chemical intuition.
Tasks
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04690v1
PDF	http://arxiv.org/pdf/1812.04690v1.pdf
PWC	https://paperswithcode.com/paper/learning-representations-of-molecules-and
Repo
Framework

Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings


Title	Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings
Authors	Huan Gui, Qi Zhu, Liyuan Liu, Aston Zhang, Jiawei Han
Abstract	Expert finding is an important task in both industry and academia. It is challenging to rank candidates with appropriate expertise for various queries. In addition, different types of objects interact with one another, which naturally forms heterogeneous information networks. We study the task of expert finding in heterogeneous bibliographical networks based on two aspects: textual content analysis and authority ranking. Regarding the textual content analysis, we propose a new method for query expansion via locally-trained embedding learning with concept hierarchy as guidance, which is particularly tailored for specific queries with narrow semantic meanings. Compared with global embedding learning, locally-trained embedding learning projects the terms into a latent semantic space constrained on relevant topics, therefore it preserves more precise and subtle information for specific queries. Considering the candidate ranking, the heterogeneous information network structure, while being largely ignored in the previous studies of expert finding, provides additional information. Specifically, different types of interactions among objects play different roles. We propose a ranking algorithm to estimate the authority of objects in the network, treating each strongly-typed edge type individually. To demonstrate the effectiveness of the proposed framework, we apply the proposed method to a large-scale bibliographical dataset with over two million entries and one million researcher candidates. The experiment results show that the proposed framework outperforms existing methods for both general and specific queries.
Tasks
Published	2018-03-09
URL	http://arxiv.org/abs/1803.03370v1
PDF	http://arxiv.org/pdf/1803.03370v1.pdf
PWC	https://paperswithcode.com/paper/expert-finding-in-heterogeneous-bibliographic
Repo
Framework