July 29, 2019

3068 words 15 mins read

Paper Group AWR 96

On Improving Deep Reinforcement Learning for POMDPs. Guiding InfoGAN with Semi-Supervision. A Survey Of Cross-lingual Word Embedding Models. Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback. Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence. Learning to Compose with Profes …

On Improving Deep Reinforcement Learning for POMDPs


Title	On Improving Deep Reinforcement Learning for POMDPs
Authors	Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao
Abstract	Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games.
Tasks	Atari Games, Decision Making, Time Series
Published	2017-04-26
URL	http://arxiv.org/abs/1704.07978v6
PDF	http://arxiv.org/pdf/1704.07978v6.pdf
PWC	https://paperswithcode.com/paper/on-improving-deep-reinforcement-learning-for
Repo	https://github.com/bit1029public/ADRQN
Framework	none

Guiding InfoGAN with Semi-Supervision


Title	Guiding InfoGAN with Semi-Supervision
Authors	Adrian Spurr, Emre Aksan, Otmar Hilliges
Abstract	In this paper we propose a new semi-supervised GAN architecture (ss-InfoGAN) for image synthesis that leverages information from few labels (as little as 0.22%, max. 10% of the dataset) to learn semantically meaningful and controllable data representations where latent variables correspond to label categories. The architecture builds on Information Maximizing Generative Adversarial Networks (InfoGAN) and is shown to learn both continuous and categorical codes and achieves higher quality of synthetic samples compared to fully unsupervised settings. Furthermore, we show that using small amounts of labeled data speeds-up training convergence. The architecture maintains the ability to disentangle latent variables for which no labels are available. Finally, we contribute an information-theoretic reasoning on how introducing semi-supervision increases mutual information between synthetic and real data.
Tasks	Image Generation
Published	2017-07-14
URL	http://arxiv.org/abs/1707.04487v1
PDF	http://arxiv.org/pdf/1707.04487v1.pdf
PWC	https://paperswithcode.com/paper/guiding-infogan-with-semi-supervision
Repo	https://github.com/spurra/ss-infogan
Framework	torch

A Survey Of Cross-lingual Word Embedding Models


Title	A Survey Of Cross-lingual Word Embedding Models
Authors	Sebastian Ruder, Ivan Vulić, Anders Søgaard
Abstract	Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.
Tasks	Cross-Lingual Transfer, Word Embeddings
Published	2017-06-15
URL	https://arxiv.org/abs/1706.04902v4
PDF	https://arxiv.org/pdf/1706.04902v4.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-cross-lingual-word-embedding
Repo	https://github.com/muyeby/Awesome-Word-Embeddings
Framework	none

Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback


Title	Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
Authors	Khanh Nguyen, Hal Daumé III, Jordan Boyd-Graber
Abstract	Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.
Tasks	Machine Translation
Published	2017-07-24
URL	http://arxiv.org/abs/1707.07402v4
PDF	http://arxiv.org/pdf/1707.07402v4.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-for-bandit-neural
Repo	https://github.com/khanhptnk/bandit-nmt
Framework	pytorch

Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence


Title	Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence
Authors	Xin Jin, Le Wu, Xiaodong Li, Siyu Chen, Siwei Peng, Jingying Chi, Shiming Ge, Chenggen Song, Geng Zhao
Abstract	Aesthetic quality prediction is a challenging task in the computer vision community because of the complex interplay with semantic contents and photographic technologies. Recent studies on the powerful deep learning based aesthetic quality assessment usually use a binary high-low label or a numerical score to represent the aesthetic quality. However the scalar representation cannot describe well the underlying varieties of the human perception of aesthetics. In this work, we propose to predict the aesthetic score distribution (i.e., a score distribution vector of the ordinal basic human ratings) using Deep Convolutional Neural Network (DCNN). Conventional DCNNs which aim to minimize the difference between the predicted scalar numbers or vectors and the ground truth cannot be directly used for the ordinal basic rating distribution. Thus, a novel CNN based on the Cumulative distribution with Jensen-Shannon divergence (CJS-CNN) is presented to predict the aesthetic score distribution of human ratings, with a new reliability-sensitive learning method based on the kurtosis of the score distribution, which eliminates the requirement of the original full data of human ratings (without normalization). Experimental results on large scale aesthetic dataset demonstrate the effectiveness of our introduced CJS-CNN in this task.
Tasks
Published	2017-08-23
URL	http://arxiv.org/abs/1708.07089v2
PDF	http://arxiv.org/pdf/1708.07089v2.pdf
PWC	https://paperswithcode.com/paper/predicting-aesthetic-score-distribution
Repo	https://github.com/luke321321/portfolio
Framework	none

Learning to Compose with Professional Photographs on the Web


Title	Learning to Compose with Professional Photographs on the Web
Authors	Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, Kwan-Liu Ma
Abstract	Photo composition is an important factor affecting the aesthetics in photography. However, it is a highly challenging task to model the aesthetic properties of good compositions due to the lack of globally applicable rules to the wide variety of photographic styles. Inspired by the thinking process of photo taking, we formulate the photo composition problem as a view finding process which successively examines pairs of views and determines their aesthetic preferences. We further exploit the rich professional photographs on the web to mine unlimited high-quality ranking samples and demonstrate that an aesthetics-aware deep ranking network can be trained without explicitly modeling any photographic rules. The resulting model is simple and effective in terms of its architectural design and data sampling method. It is also generic since it naturally learns any photographic rules implicitly encoded in professional photographs. The experiments show that the proposed view finding network achieves state-of-the-art performance with sliding window search strategy on two image cropping datasets.
Tasks	Image Cropping
Published	2017-02-01
URL	http://arxiv.org/abs/1702.00503v2
PDF	http://arxiv.org/pdf/1702.00503v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-compose-with-professional
Repo	https://github.com/yiling-chen/view-finding-network
Framework	tf

Recurrent Scale Approximation for Object Detection in CNN


Title	Recurrent Scale Approximation for Object Detection in CNN
Authors	Yu Liu, Hongyang Li, Junjie Yan, Fangyin Wei, Xiaogang Wang, Xiaoou Tang
Abstract	Since convolutional neural network (CNN) lacks an inherent mechanism to handle large scale variations, we always need to compute feature maps multiple times for multi-scale object detection, which has the bottleneck of computational cost in practice. To address this, we devise a recurrent scale approximation (RSA) to compute feature map once only, and only through this map can we approximate the rest maps on other levels. At the core of RSA is the recursive rolling out mechanism: given an initial map at a particular scale, it generates the prediction at a smaller scale that is half the size of input. To further increase efficiency and accuracy, we (a): design a scale-forecast network to globally predict potential scales in the image since there is no need to compute maps on all levels of the pyramid. (b): propose a landmark retracing network (LRN) to trace back locations of the regressed landmarks and generate a confidence score for each landmark; LRN can effectively alleviate false positives caused by the accumulated error in RSA. The whole system can be trained end-to-end in a unified CNN framework. Experiments demonstrate that our proposed algorithm is superior against state-of-the-art methods on face detection benchmarks and achieves comparable results for generic proposal generation. The source code of RSA is available at github.com/sciencefans/RSA-for-object-detection.
Tasks	Face Detection, Object Detection
Published	2017-07-29
URL	http://arxiv.org/abs/1707.09531v2
PDF	http://arxiv.org/pdf/1707.09531v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-scale-approximation-for-object
Repo	https://github.com/sciencefans/RSA-for-object-detection
Framework	none

A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition


Title	A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition
Authors	Alexander Richard, Juergen Gall
Abstract	The traditional bag-of-words approach has found a wide range of applications in computer vision. The standard pipeline consists of a generation of a visual vocabulary, a quantization of the features into histograms of visual words, and a classification step for which usually a support vector machine in combination with a non-linear kernel is used. Given large amounts of data, however, the model suffers from a lack of discriminative power. This applies particularly for action recognition, where the vast amount of video features needs to be subsampled for unsupervised visual vocabulary generation. Moreover, the kernel computation can be very expensive on large datasets. In this work, we propose a recurrent neural network that is equivalent to the traditional bag-of-words approach but enables for the application of discriminative training. The model further allows to incorporate the kernel computation into the neural network directly, solving the complexity issue and allowing to represent the complete classification system within a single network. We evaluate our method on four recent action recognition benchmarks and show that the conventional model as well as sparse coding methods are outperformed.
Tasks	Quantization, Temporal Action Localization
Published	2017-03-23
URL	http://arxiv.org/abs/1703.08089v1
PDF	http://arxiv.org/pdf/1703.08089v1.pdf
PWC	https://paperswithcode.com/paper/a-bag-of-words-equivalent-recurrent-neural
Repo	https://github.com/alexanderrichard/squirrel
Framework	none

Boundary-Seeking Generative Adversarial Networks


Title	Boundary-Seeking Generative Adversarial Networks
Authors	R Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio
Abstract	Generative adversarial networks (GANs) are a learning framework that rely on training a discriminator to estimate a measure of difference between a target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning.
Tasks	Scene Understanding, Text Generation
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08431v4
PDF	http://arxiv.org/pdf/1702.08431v4.pdf
PWC	https://paperswithcode.com/paper/boundary-seeking-generative-adversarial
Repo	https://github.com/eriklindernoren/PyTorch-GAN
Framework	pytorch

Im2Flow: Motion Hallucination from Static Images for Action Recognition


Title	Im2Flow: Motion Hallucination from Static Images for Action Recognition
Authors	Ruohan Gao, Bo Xiong, Kristen Grauman
Abstract	Existing methods to recognize actions in static images take the images at their face value, learning the appearances—objects, scenes, and body poses—that distinguish each action class. However, such models are deprived of the rich dynamic structure and motions that also define human activity. We propose an approach that hallucinates the unobserved future motion implied by a single snapshot to help static-image action recognition. The key idea is to learn a prior over short-term dynamics from thousands of unlabeled videos, infer the anticipated optical flow on novel static images, and then train discriminative models that exploit both streams of information. Our main contributions are twofold. First, we devise an encoder-decoder convolutional neural network and a novel optical flow encoding that can translate a static image into an accurate flow map. Second, we show the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition. On seven datasets, we demonstrate the power of the approach. It not only achieves state-of-the-art accuracy for dense optical flow prediction, but also consistently enhances recognition of actions and dynamic scenes.
Tasks	Activity Recognition, Optical Flow Estimation, Temporal Action Localization
Published	2017-12-12
URL	http://arxiv.org/abs/1712.04109v3
PDF	http://arxiv.org/pdf/1712.04109v3.pdf
PWC	https://paperswithcode.com/paper/im2flow-motion-hallucination-from-static
Repo	https://github.com/rhgao/separating-object-sounds
Framework	pytorch

A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering


Title	A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering
Authors	Hongteng Xu, Hongyuan Zha
Abstract	We propose an effective method to solve the event sequence clustering problems based on a novel Dirichlet mixture model of a special but significant type of point processes — Hawkes process. In this model, each event sequence belonging to a cluster is generated via the same Hawkes process with specific parameters, and different clusters correspond to different Hawkes processes. The prior distribution of the Hawkes processes is controlled via a Dirichlet distribution. We learn the model via a maximum likelihood estimator (MLE) and propose an effective variational Bayesian inference algorithm. We specifically analyze the resulting EM-type algorithm in the context of inner-outer iterations and discuss several inner iteration allocation strategies. The identifiability of our model, the convergence of our learning method, and its sample complexity are analyzed in both theoretical and empirical ways, which demonstrate the superiority of our method to other competitors. The proposed method learns the number of clusters automatically and is robust to model misspecification. Experiments on both synthetic and real-world data show that our method can learn diverse triggering patterns hidden in asynchronous event sequences and achieve encouraging performance on clustering purity and consistency.
Tasks	Bayesian Inference, Point Processes
Published	2017-01-31
URL	http://arxiv.org/abs/1701.09177v5
PDF	http://arxiv.org/pdf/1701.09177v5.pdf
PWC	https://paperswithcode.com/paper/a-dirichlet-mixture-model-of-hawkes-processes
Repo	https://github.com/HongtengXu/Hawkes-Process-Toolkit
Framework	none

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset


Title	DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset
Authors	Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, Shuzi Niu
Abstract	We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems.
Tasks
Published	2017-10-11
URL	http://arxiv.org/abs/1710.03957v1
PDF	http://arxiv.org/pdf/1710.03957v1.pdf
PWC	https://paperswithcode.com/paper/dailydialog-a-manually-labelled-multi-turn
Repo	https://github.com/snakeztc/NeuralDialog-LAED
Framework	pytorch

Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization


Title	Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization
Authors	Krishna Kumar Singh, Yong Jae Lee
Abstract	We propose `Hide-and-Seek’, a weakly-supervised framework that aims to improve object localization in images and action localization in videos. Most existing weakly-supervised methods localize only the most discriminative parts of an object rather than all relevant parts, which leads to suboptimal performance. Our key idea is to hide patches in a training image randomly, forcing the network to seek other relevant parts when the most discriminative part is hidden. Our approach only needs to modify the input image and can work with any network designed for object localization. During testing, we do not need to hide any patches. Our Hide-and-Seek approach obtains superior performance compared to previous methods for weakly-supervised object localization on the ILSVRC dataset. We also demonstrate that our framework can be easily extended to weakly-supervised action localization. \|
Tasks	Action Localization, Object Localization, Weakly Supervised Action Localization, Weakly-Supervised Object Localization
Published	2017-04-13
URL	http://arxiv.org/abs/1704.04232v2
PDF	http://arxiv.org/pdf/1704.04232v2.pdf
PWC	https://paperswithcode.com/paper/hide-and-seek-forcing-a-network-to-be
Repo	https://github.com/zhengshou/AutoLoc
Framework	none

SHINE: Signed Heterogeneous Information Network Embedding for Sentiment Link Prediction


Title	SHINE: Signed Heterogeneous Information Network Embedding for Sentiment Link Prediction
Authors	Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, Qi Liu
Abstract	In online social networks people often express attitudes towards others, which forms massive sentiment links among users. Predicting the sign of sentiment links is a fundamental task in many areas such as personal advertising and public opinion analysis. Previous works mainly focus on textual sentiment classification, however, text information can only disclose the “tip of the iceberg” about users’ true opinions, of which the most are unobserved but implied by other sources of information such as social relation and users’ profile. To address this problem, in this paper we investigate how to predict possibly existing sentiment links in the presence of heterogeneous information. First, due to the lack of explicit sentiment links in mainstream social networks, we establish a labeled heterogeneous sentiment dataset which consists of users’ sentiment relation, social relation and profile knowledge by entity-level sentiment extraction method. Then we propose a novel and flexible end-to-end Signed Heterogeneous Information Network Embedding (SHINE) framework to extract users’ latent representations from heterogeneous networks and predict the sign of unobserved sentiment links. SHINE utilizes multiple deep autoencoders to map each user into a low-dimension feature space while preserving the network structure. We demonstrate the superiority of SHINE over state-of-the-art baselines on link prediction and node recommendation in two real-world datasets. The experimental results also prove the efficacy of SHINE in cold start scenario.
Tasks	Link Prediction, Network Embedding, Sentiment Analysis
Published	2017-12-03
URL	http://arxiv.org/abs/1712.00732v1
PDF	http://arxiv.org/pdf/1712.00732v1.pdf
PWC	https://paperswithcode.com/paper/shine-signed-heterogeneous-information
Repo	https://github.com/boom85423/hello_SHINE
Framework	none

Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB


Title	Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB
Authors	Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Srinath Sridhar, Gerard Pons-Moll, Christian Theobalt
Abstract	We propose a new single-shot method for multi-person 3D pose estimation in general scenes from a monocular RGB camera. Our approach uses novel occlusion-robust pose-maps (ORPM) which enable full body pose inference even under strong partial occlusions by other people and objects in the scene. ORPM outputs a fixed number of maps which encode the 3D joint locations of all people in the scene. Body part associations allow us to infer 3D pose for an arbitrary number of people without explicit bounding box prediction. To train our approach we introduce MuCo-3DHP, the first large scale training data set showing real images of sophisticated multi-person interactions and occlusions. We synthesize a large corpus of multi-person images by compositing images of individual people (with ground truth from mutli-view performance capture). We evaluate our method on our new challenging 3D annotated multi-person test set MuPoTs-3D where we achieve state-of-the-art performance. To further stimulate research in multi-person 3D pose estimation, we will make our new datasets, and associated code publicly available for research purposes.
Tasks	3D Pose Estimation, Pose Estimation
Published	2017-12-09
URL	http://arxiv.org/abs/1712.03453v3
PDF	http://arxiv.org/pdf/1712.03453v3.pdf
PWC	https://paperswithcode.com/paper/single-shot-multi-person-3d-pose-estimation
Repo	https://github.com/Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch
Framework	pytorch