Paper Group AWR 96
On Improving Deep Reinforcement Learning for POMDPs. Guiding InfoGAN with Semi-Supervision. A Survey Of Cross-lingual Word Embedding Models. Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback. Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence. Learning to Compose with Profes …
On Improving Deep Reinforcement Learning for POMDPs
Title | On Improving Deep Reinforcement Learning for POMDPs |
Authors | Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao |
Abstract | Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games. |
Tasks | Atari Games, Decision Making, Time Series |
Published | 2017-04-26 |
URL | http://arxiv.org/abs/1704.07978v6 |
http://arxiv.org/pdf/1704.07978v6.pdf | |
PWC | https://paperswithcode.com/paper/on-improving-deep-reinforcement-learning-for |
Repo | https://github.com/bit1029public/ADRQN |
Framework | none |
Guiding InfoGAN with Semi-Supervision
Title | Guiding InfoGAN with Semi-Supervision |
Authors | Adrian Spurr, Emre Aksan, Otmar Hilliges |
Abstract | In this paper we propose a new semi-supervised GAN architecture (ss-InfoGAN) for image synthesis that leverages information from few labels (as little as 0.22%, max. 10% of the dataset) to learn semantically meaningful and controllable data representations where latent variables correspond to label categories. The architecture builds on Information Maximizing Generative Adversarial Networks (InfoGAN) and is shown to learn both continuous and categorical codes and achieves higher quality of synthetic samples compared to fully unsupervised settings. Furthermore, we show that using small amounts of labeled data speeds-up training convergence. The architecture maintains the ability to disentangle latent variables for which no labels are available. Finally, we contribute an information-theoretic reasoning on how introducing semi-supervision increases mutual information between synthetic and real data. |
Tasks | Image Generation |
Published | 2017-07-14 |
URL | http://arxiv.org/abs/1707.04487v1 |
http://arxiv.org/pdf/1707.04487v1.pdf | |
PWC | https://paperswithcode.com/paper/guiding-infogan-with-semi-supervision |
Repo | https://github.com/spurra/ss-infogan |
Framework | torch |
A Survey Of Cross-lingual Word Embedding Models
Title | A Survey Of Cross-lingual Word Embedding Models |
Authors | Sebastian Ruder, Ivan Vulić, Anders Søgaard |
Abstract | Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons. |
Tasks | Cross-Lingual Transfer, Word Embeddings |
Published | 2017-06-15 |
URL | https://arxiv.org/abs/1706.04902v4 |
https://arxiv.org/pdf/1706.04902v4.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-cross-lingual-word-embedding |
Repo | https://github.com/muyeby/Awesome-Word-Embeddings |
Framework | none |
Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
Title | Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback |
Authors | Khanh Nguyen, Hal Daumé III, Jordan Boyd-Graber |
Abstract | Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors. |
Tasks | Machine Translation |
Published | 2017-07-24 |
URL | http://arxiv.org/abs/1707.07402v4 |
http://arxiv.org/pdf/1707.07402v4.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-for-bandit-neural |
Repo | https://github.com/khanhptnk/bandit-nmt |
Framework | pytorch |
Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence
Title | Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence |
Authors | Xin Jin, Le Wu, Xiaodong Li, Siyu Chen, Siwei Peng, Jingying Chi, Shiming Ge, Chenggen Song, Geng Zhao |
Abstract | Aesthetic quality prediction is a challenging task in the computer vision community because of the complex interplay with semantic contents and photographic technologies. Recent studies on the powerful deep learning based aesthetic quality assessment usually use a binary high-low label or a numerical score to represent the aesthetic quality. However the scalar representation cannot describe well the underlying varieties of the human perception of aesthetics. In this work, we propose to predict the aesthetic score distribution (i.e., a score distribution vector of the ordinal basic human ratings) using Deep Convolutional Neural Network (DCNN). Conventional DCNNs which aim to minimize the difference between the predicted scalar numbers or vectors and the ground truth cannot be directly used for the ordinal basic rating distribution. Thus, a novel CNN based on the Cumulative distribution with Jensen-Shannon divergence (CJS-CNN) is presented to predict the aesthetic score distribution of human ratings, with a new reliability-sensitive learning method based on the kurtosis of the score distribution, which eliminates the requirement of the original full data of human ratings (without normalization). Experimental results on large scale aesthetic dataset demonstrate the effectiveness of our introduced CJS-CNN in this task. |
Tasks | |
Published | 2017-08-23 |
URL | http://arxiv.org/abs/1708.07089v2 |
http://arxiv.org/pdf/1708.07089v2.pdf | |
PWC | https://paperswithcode.com/paper/predicting-aesthetic-score-distribution |
Repo | https://github.com/luke321321/portfolio |
Framework | none |
Learning to Compose with Professional Photographs on the Web
Title | Learning to Compose with Professional Photographs on the Web |
Authors | Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, Kwan-Liu Ma |
Abstract | Photo composition is an important factor affecting the aesthetics in photography. However, it is a highly challenging task to model the aesthetic properties of good compositions due to the lack of globally applicable rules to the wide variety of photographic styles. Inspired by the thinking process of photo taking, we formulate the photo composition problem as a view finding process which successively examines pairs of views and determines their aesthetic preferences. We further exploit the rich professional photographs on the web to mine unlimited high-quality ranking samples and demonstrate that an aesthetics-aware deep ranking network can be trained without explicitly modeling any photographic rules. The resulting model is simple and effective in terms of its architectural design and data sampling method. It is also generic since it naturally learns any photographic rules implicitly encoded in professional photographs. The experiments show that the proposed view finding network achieves state-of-the-art performance with sliding window search strategy on two image cropping datasets. |
Tasks | Image Cropping |
Published | 2017-02-01 |
URL | http://arxiv.org/abs/1702.00503v2 |
http://arxiv.org/pdf/1702.00503v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-compose-with-professional |
Repo | https://github.com/yiling-chen/view-finding-network |
Framework | tf |
Recurrent Scale Approximation for Object Detection in CNN
Title | Recurrent Scale Approximation for Object Detection in CNN |
Authors | Yu Liu, Hongyang Li, Junjie Yan, Fangyin Wei, Xiaogang Wang, Xiaoou Tang |
Abstract | Since convolutional neural network (CNN) lacks an inherent mechanism to handle large scale variations, we always need to compute feature maps multiple times for multi-scale object detection, which has the bottleneck of computational cost in practice. To address this, we devise a recurrent scale approximation (RSA) to compute feature map once only, and only through this map can we approximate the rest maps on other levels. At the core of RSA is the recursive rolling out mechanism: given an initial map at a particular scale, it generates the prediction at a smaller scale that is half the size of input. To further increase efficiency and accuracy, we (a): design a scale-forecast network to globally predict potential scales in the image since there is no need to compute maps on all levels of the pyramid. (b): propose a landmark retracing network (LRN) to trace back locations of the regressed landmarks and generate a confidence score for each landmark; LRN can effectively alleviate false positives caused by the accumulated error in RSA. The whole system can be trained end-to-end in a unified CNN framework. Experiments demonstrate that our proposed algorithm is superior against state-of-the-art methods on face detection benchmarks and achieves comparable results for generic proposal generation. The source code of RSA is available at github.com/sciencefans/RSA-for-object-detection. |
Tasks | Face Detection, Object Detection |
Published | 2017-07-29 |
URL | http://arxiv.org/abs/1707.09531v2 |
http://arxiv.org/pdf/1707.09531v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-scale-approximation-for-object |
Repo | https://github.com/sciencefans/RSA-for-object-detection |
Framework | none |
A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition
Title | A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition |
Authors | Alexander Richard, Juergen Gall |
Abstract | The traditional bag-of-words approach has found a wide range of applications in computer vision. The standard pipeline consists of a generation of a visual vocabulary, a quantization of the features into histograms of visual words, and a classification step for which usually a support vector machine in combination with a non-linear kernel is used. Given large amounts of data, however, the model suffers from a lack of discriminative power. This applies particularly for action recognition, where the vast amount of video features needs to be subsampled for unsupervised visual vocabulary generation. Moreover, the kernel computation can be very expensive on large datasets. In this work, we propose a recurrent neural network that is equivalent to the traditional bag-of-words approach but enables for the application of discriminative training. The model further allows to incorporate the kernel computation into the neural network directly, solving the complexity issue and allowing to represent the complete classification system within a single network. We evaluate our method on four recent action recognition benchmarks and show that the conventional model as well as sparse coding methods are outperformed. |
Tasks | Quantization, Temporal Action Localization |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08089v1 |
http://arxiv.org/pdf/1703.08089v1.pdf | |
PWC | https://paperswithcode.com/paper/a-bag-of-words-equivalent-recurrent-neural |
Repo | https://github.com/alexanderrichard/squirrel |
Framework | none |
Boundary-Seeking Generative Adversarial Networks
Title | Boundary-Seeking Generative Adversarial Networks |
Authors | R Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio |
Abstract | Generative adversarial networks (GANs) are a learning framework that rely on training a discriminator to estimate a measure of difference between a target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning. |
Tasks | Scene Understanding, Text Generation |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08431v4 |
http://arxiv.org/pdf/1702.08431v4.pdf | |
PWC | https://paperswithcode.com/paper/boundary-seeking-generative-adversarial |
Repo | https://github.com/eriklindernoren/PyTorch-GAN |
Framework | pytorch |
Im2Flow: Motion Hallucination from Static Images for Action Recognition
Title | Im2Flow: Motion Hallucination from Static Images for Action Recognition |
Authors | Ruohan Gao, Bo Xiong, Kristen Grauman |
Abstract | Existing methods to recognize actions in static images take the images at their face value, learning the appearances—objects, scenes, and body poses—that distinguish each action class. However, such models are deprived of the rich dynamic structure and motions that also define human activity. We propose an approach that hallucinates the unobserved future motion implied by a single snapshot to help static-image action recognition. The key idea is to learn a prior over short-term dynamics from thousands of unlabeled videos, infer the anticipated optical flow on novel static images, and then train discriminative models that exploit both streams of information. Our main contributions are twofold. First, we devise an encoder-decoder convolutional neural network and a novel optical flow encoding that can translate a static image into an accurate flow map. Second, we show the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition. On seven datasets, we demonstrate the power of the approach. It not only achieves state-of-the-art accuracy for dense optical flow prediction, but also consistently enhances recognition of actions and dynamic scenes. |
Tasks | Activity Recognition, Optical Flow Estimation, Temporal Action Localization |
Published | 2017-12-12 |
URL | http://arxiv.org/abs/1712.04109v3 |
http://arxiv.org/pdf/1712.04109v3.pdf | |
PWC | https://paperswithcode.com/paper/im2flow-motion-hallucination-from-static |
Repo | https://github.com/rhgao/separating-object-sounds |
Framework | pytorch |
A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering
Title | A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering |
Authors | Hongteng Xu, Hongyuan Zha |
Abstract | We propose an effective method to solve the event sequence clustering problems based on a novel Dirichlet mixture model of a special but significant type of point processes — Hawkes process. In this model, each event sequence belonging to a cluster is generated via the same Hawkes process with specific parameters, and different clusters correspond to different Hawkes processes. The prior distribution of the Hawkes processes is controlled via a Dirichlet distribution. We learn the model via a maximum likelihood estimator (MLE) and propose an effective variational Bayesian inference algorithm. We specifically analyze the resulting EM-type algorithm in the context of inner-outer iterations and discuss several inner iteration allocation strategies. The identifiability of our model, the convergence of our learning method, and its sample complexity are analyzed in both theoretical and empirical ways, which demonstrate the superiority of our method to other competitors. The proposed method learns the number of clusters automatically and is robust to model misspecification. Experiments on both synthetic and real-world data show that our method can learn diverse triggering patterns hidden in asynchronous event sequences and achieve encouraging performance on clustering purity and consistency. |
Tasks | Bayesian Inference, Point Processes |
Published | 2017-01-31 |
URL | http://arxiv.org/abs/1701.09177v5 |
http://arxiv.org/pdf/1701.09177v5.pdf | |
PWC | https://paperswithcode.com/paper/a-dirichlet-mixture-model-of-hawkes-processes |
Repo | https://github.com/HongtengXu/Hawkes-Process-Toolkit |
Framework | none |
DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset
Title | DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset |
Authors | Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, Shuzi Niu |
Abstract | We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems. |
Tasks | |
Published | 2017-10-11 |
URL | http://arxiv.org/abs/1710.03957v1 |
http://arxiv.org/pdf/1710.03957v1.pdf | |
PWC | https://paperswithcode.com/paper/dailydialog-a-manually-labelled-multi-turn |
Repo | https://github.com/snakeztc/NeuralDialog-LAED |
Framework | pytorch |
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization
Title | Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization |
Authors | Krishna Kumar Singh, Yong Jae Lee |
Abstract | We propose `Hide-and-Seek’, a weakly-supervised framework that aims to improve object localization in images and action localization in videos. Most existing weakly-supervised methods localize only the most discriminative parts of an object rather than all relevant parts, which leads to suboptimal performance. Our key idea is to hide patches in a training image randomly, forcing the network to seek other relevant parts when the most discriminative part is hidden. Our approach only needs to modify the input image and can work with any network designed for object localization. During testing, we do not need to hide any patches. Our Hide-and-Seek approach obtains superior performance compared to previous methods for weakly-supervised object localization on the ILSVRC dataset. We also demonstrate that our framework can be easily extended to weakly-supervised action localization. | |
Tasks | Action Localization, Object Localization, Weakly Supervised Action Localization, Weakly-Supervised Object Localization |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.04232v2 |
http://arxiv.org/pdf/1704.04232v2.pdf | |
PWC | https://paperswithcode.com/paper/hide-and-seek-forcing-a-network-to-be |
Repo | https://github.com/zhengshou/AutoLoc |
Framework | none |
SHINE: Signed Heterogeneous Information Network Embedding for Sentiment Link Prediction
Title | SHINE: Signed Heterogeneous Information Network Embedding for Sentiment Link Prediction |
Authors | Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, Qi Liu |
Abstract | In online social networks people often express attitudes towards others, which forms massive sentiment links among users. Predicting the sign of sentiment links is a fundamental task in many areas such as personal advertising and public opinion analysis. Previous works mainly focus on textual sentiment classification, however, text information can only disclose the “tip of the iceberg” about users’ true opinions, of which the most are unobserved but implied by other sources of information such as social relation and users’ profile. To address this problem, in this paper we investigate how to predict possibly existing sentiment links in the presence of heterogeneous information. First, due to the lack of explicit sentiment links in mainstream social networks, we establish a labeled heterogeneous sentiment dataset which consists of users’ sentiment relation, social relation and profile knowledge by entity-level sentiment extraction method. Then we propose a novel and flexible end-to-end Signed Heterogeneous Information Network Embedding (SHINE) framework to extract users’ latent representations from heterogeneous networks and predict the sign of unobserved sentiment links. SHINE utilizes multiple deep autoencoders to map each user into a low-dimension feature space while preserving the network structure. We demonstrate the superiority of SHINE over state-of-the-art baselines on link prediction and node recommendation in two real-world datasets. The experimental results also prove the efficacy of SHINE in cold start scenario. |
Tasks | Link Prediction, Network Embedding, Sentiment Analysis |
Published | 2017-12-03 |
URL | http://arxiv.org/abs/1712.00732v1 |
http://arxiv.org/pdf/1712.00732v1.pdf | |
PWC | https://paperswithcode.com/paper/shine-signed-heterogeneous-information |
Repo | https://github.com/boom85423/hello_SHINE |
Framework | none |
Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB
Title | Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB |
Authors | Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Srinath Sridhar, Gerard Pons-Moll, Christian Theobalt |
Abstract | We propose a new single-shot method for multi-person 3D pose estimation in general scenes from a monocular RGB camera. Our approach uses novel occlusion-robust pose-maps (ORPM) which enable full body pose inference even under strong partial occlusions by other people and objects in the scene. ORPM outputs a fixed number of maps which encode the 3D joint locations of all people in the scene. Body part associations allow us to infer 3D pose for an arbitrary number of people without explicit bounding box prediction. To train our approach we introduce MuCo-3DHP, the first large scale training data set showing real images of sophisticated multi-person interactions and occlusions. We synthesize a large corpus of multi-person images by compositing images of individual people (with ground truth from mutli-view performance capture). We evaluate our method on our new challenging 3D annotated multi-person test set MuPoTs-3D where we achieve state-of-the-art performance. To further stimulate research in multi-person 3D pose estimation, we will make our new datasets, and associated code publicly available for research purposes. |
Tasks | 3D Pose Estimation, Pose Estimation |
Published | 2017-12-09 |
URL | http://arxiv.org/abs/1712.03453v3 |
http://arxiv.org/pdf/1712.03453v3.pdf | |
PWC | https://paperswithcode.com/paper/single-shot-multi-person-3d-pose-estimation |
Repo | https://github.com/Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch |
Framework | pytorch |