July 29, 2019

2992 words 15 mins read

Paper Group AWR 121

Emergence of Locomotion Behaviours in Rich Environments. Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives. PAC-Bayesian Margin Bounds for Convolutional Neural Networks. Towards Neural Phrase-based Machine Translation. Regret Minimization for Partially Observable Deep Reinforceme …

Emergence of Locomotion Behaviours in Rich Environments


Title	Emergence of Locomotion Behaviours in Rich Environments
Authors	Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver
Abstract	The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion – behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed following https://youtu.be/hx_bgoTF7bs .
Tasks
Published	2017-07-07
URL	http://arxiv.org/abs/1707.02286v2
PDF	http://arxiv.org/pdf/1707.02286v2.pdf
PWC	https://paperswithcode.com/paper/emergence-of-locomotion-behaviours-in-rich
Repo	https://github.com/Unity-Technologies/marathon-envs
Framework	none

Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives


Title	Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Authors	A. Cichocki, A-H. Phan, Q. Zhao, N. Lee, I. V. Oseledets, M. Sugiyama, D. Mandic
Abstract	Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.
Tasks	Dimensionality Reduction, Tensor Networks
Published	2017-08-30
URL	http://arxiv.org/abs/1708.09165v1
PDF	http://arxiv.org/pdf/1708.09165v1.pdf
PWC	https://paperswithcode.com/paper/tensor-networks-for-dimensionality-reduction
Repo	https://github.com/rballester/ttrecipes
Framework	none

PAC-Bayesian Margin Bounds for Convolutional Neural Networks


Title	PAC-Bayesian Margin Bounds for Convolutional Neural Networks
Authors	Konstantinos Pitas, Mike Davies, Pierre Vandergheynst
Abstract	Recently the generalization error of deep neural networks has been analyzed through the PAC-Bayesian framework, for the case of fully connected layers. We adapt this approach to the convolutional setting.
Tasks
Published	2017-12-30
URL	http://arxiv.org/abs/1801.00171v2
PDF	http://arxiv.org/pdf/1801.00171v2.pdf
PWC	https://paperswithcode.com/paper/pac-bayesian-margin-bounds-for-convolutional
Repo	https://github.com/konstantinos-p/PAC_Bayesian_Generalization
Framework	tf

Towards Neural Phrase-based Machine Translation


Title	Towards Neural Phrase-based Machine Translation
Authors	Po-Sen Huang, Chong Wang, Sitao Huang, Dengyong Zhou, Li Deng
Abstract	In this paper, we present Neural Phrase-based Machine Translation (NPMT). Our method explicitly models the phrase structures in output sequences using Sleep-WAke Networks (SWAN), a recently proposed segmentation-based sequence modeling method. To mitigate the monotonic alignment requirement of SWAN, we introduce a new layer to perform (soft) local reordering of input sequences. Different from existing neural machine translation (NMT) approaches, NPMT does not use attention-based decoding mechanisms. Instead, it directly outputs phrases in a sequential order and can decode in linear time. Our experiments show that NPMT achieves superior performances on IWSLT 2014 German-English/English-German and IWSLT 2015 English-Vietnamese machine translation tasks compared with strong NMT baselines. We also observe that our method produces meaningful phrases in output languages.
Tasks	Machine Translation
Published	2017-06-17
URL	http://arxiv.org/abs/1706.05565v8
PDF	http://arxiv.org/pdf/1706.05565v8.pdf
PWC	https://paperswithcode.com/paper/towards-neural-phrase-based-machine
Repo	https://github.com/Microsoft/NPMT
Framework	torch

Regret Minimization for Partially Observable Deep Reinforcement Learning


Title	Regret Minimization for Partially Observable Deep Reinforcement Learning
Authors	Peter Jin, Kurt Keutzer, Sergey Levine
Abstract	Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.
Tasks
Published	2017-10-31
URL	http://arxiv.org/abs/1710.11424v2
PDF	http://arxiv.org/pdf/1710.11424v2.pdf
PWC	https://paperswithcode.com/paper/regret-minimization-for-partially-observable
Repo	https://github.com/peterhj/arm-pytorch
Framework	pytorch

Learning to Map Vehicles into Bird’s Eye View


Title	Learning to Map Vehicles into Bird’s Eye View
Authors	Andrea Palazzi, Guido Borghi, Davide Abati, Simone Calderara, Rita Cucchiara
Abstract	Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is gaining importance both for the academia and car companies. This paper presents a way to learn a semantic-aware transformation which maps detections from a dashboard camera view onto a broader bird’s eye occupancy map of the scene. To this end, a huge synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird’s eye view, has been collected and automatically annotated. A deep-network is then trained to warp detections from the first to the second view. We demonstrate the effectiveness of our model against several baselines and observe that is able to generalize on real-world data despite having been trained solely on synthetic ones.
Tasks	Autonomous Vehicles
Published	2017-06-26
URL	http://arxiv.org/abs/1706.08442v1
PDF	http://arxiv.org/pdf/1706.08442v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-map-vehicles-into-birds-eye-view
Repo	https://github.com/cciprianmihai/Self_Driving_Car_NanoDegree_P2_AdvancedLaneLines
Framework	none

Generating and designing DNA with deep generative models


Title	Generating and designing DNA with deep generative models
Authors	Nathan Killoran, Leo J. Lee, Andrew Delong, David Duvenaud, Brendan J. Frey
Abstract	We propose generative neural network methods to generate DNA sequences and tune them to have desired properties. We present three approaches: creating synthetic DNA sequences using a generative adversarial network; a DNA-based variant of the activation maximization (“deep dream”) design method; and a joint procedure which combines these two approaches together. We show that these tools capture important structures of the data and, when applied to designing probes for protein binding microarrays, allow us to generate new sequences whose properties are estimated to be superior to those found in the training data. We believe that these results open the door for applying deep generative models to advance genomics research.
Tasks
Published	2017-12-17
URL	http://arxiv.org/abs/1712.06148v1
PDF	http://arxiv.org/pdf/1712.06148v1.pdf
PWC	https://paperswithcode.com/paper/generating-and-designing-dna-with-deep
Repo	https://github.com/DamLabResources/journalclub
Framework	none

Equivalence of restricted Boltzmann machines and tensor network states


Title	Equivalence of restricted Boltzmann machines and tensor network states
Authors	Jing Chen, Song Cheng, Haidong Xie, Lei Wang, Tao Xiang
Abstract	The restricted Boltzmann machine (RBM) is one of the fundamental building blocks of deep learning. RBM finds wide applications in dimensional reduction, feature extraction, and recommender systems via modeling the probability distributions of a variety of input data including natural images, speech signals, and customer ratings, etc. We build a bridge between RBM and tensor network states (TNS) widely used in quantum many-body physics research. We devise efficient algorithms to translate an RBM into the commonly used TNS. Conversely, we give sufficient and necessary conditions to determine whether a TNS can be transformed into an RBM of given architectures. Revealing these general and constructive connections can cross-fertilize both deep learning and quantum many-body physics. Notably, by exploiting the entanglement entropy bound of TNS, we can rigorously quantify the expressive power of RBM on complex data sets. Insights into TNS and its entanglement capacity can guide the design of more powerful deep learning architectures. On the other hand, RBM can represent quantum many-body states with fewer parameters compared to TNS, which may allow more efficient classical simulations.
Tasks	Recommendation Systems
Published	2017-01-17
URL	http://arxiv.org/abs/1701.04831v2
PDF	http://arxiv.org/pdf/1701.04831v2.pdf
PWC	https://paperswithcode.com/paper/equivalence-of-restricted-boltzmann-machines
Repo	https://github.com/yzcj105/rbm2mps
Framework	none

Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks


Title	Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
Authors	Marco Carraro, Matteo Munaro, Jeff Burke, Emanuele Menegatti
Abstract	This paper proposes a novel system to estimate and track the 3D poses of multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D pose of each person is computed by a central node which receives the single-view outcomes from each camera of the network. Each single-view outcome is computed by using a CNN for 2D pose estimation and extending the resulting skeletons to 3D by means of the sensor depth. The proposed system is marker-less, multi-person, independent of background and does not make any assumption on people appearance and initial pose. The system provides real-time outcomes, thus being perfectly suited for applications requiring user interaction. Experimental results show the effectiveness of this work with respect to a baseline multi-view approach in different scenarios. To foster research and applications based on this work, we released the source code in OpenPTrack, an open source project for RGB-D people tracking.
Tasks	3D Pose Estimation, Pose Estimation
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06235v1
PDF	http://arxiv.org/pdf/1710.06235v1.pdf
PWC	https://paperswithcode.com/paper/real-time-marker-less-multi-person-3d-pose
Repo	https://github.com/marketto89/open_ptrack
Framework	none

Online Adaptation of Convolutional Neural Networks for Video Object Segmentation


Title	Online Adaptation of Convolutional Neural Networks for Video Object Segmentation
Authors	Paul Voigtlaender, Bastian Leibe
Abstract	We tackle the task of semi-supervised video object segmentation, i.e. segmenting the pixels belonging to an object in the video using the ground truth pixel mask for the first frame. We build on the recently introduced one-shot video object segmentation (OSVOS) approach which uses a pretrained network and fine-tunes it on the first frame. While achieving impressive performance, at test time OSVOS uses the fine-tuned network in unchanged form and is not able to adapt to large changes in object appearance. To overcome this limitation, we propose Online Adaptive Video Object Segmentation (OnAVOS) which updates the network online using training examples selected based on the confidence of the network and the spatial configuration. Additionally, we add a pretraining step based on objectness, which is learned on PASCAL. Our experiments show that both extensions are highly effective and improve the state of the art on DAVIS to an intersection-over-union score of 85.7%.
Tasks	Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation, Visual Object Tracking
Published	2017-06-28
URL	http://arxiv.org/abs/1706.09364v2
PDF	http://arxiv.org/pdf/1706.09364v2.pdf
PWC	https://paperswithcode.com/paper/online-adaptation-of-convolutional-neural
Repo	https://github.com/Stocastico/OnAVOS
Framework	tf

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering


Title	Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
Authors	Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell
Abstract	A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets.
Tasks	Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2017-11-14
URL	http://arxiv.org/abs/1711.05116v2
PDF	http://arxiv.org/pdf/1711.05116v2.pdf
PWC	https://paperswithcode.com/paper/evidence-aggregation-for-answer-re-ranking-in
Repo	https://github.com/shuohangwang/mprc
Framework	torch

VAE with a VampPrior


Title	VAE with a VampPrior
Authors	Jakub M. Tomczak, Max Welling
Abstract	Many different methods to train deep generative models have been introduced in the past. In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call “Variational Mixture of Posteriors” prior, or VampPrior for short. The VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs. We further extend this prior to a two layer hierarchical model and show that this architecture with a coupled prior and posterior, learns significantly better models. The model also avoids the usual local optima issues related to useless latent dimensions that plague VAEs. We provide empirical studies on six datasets, namely, static and binary MNIST, OMNIGLOT, Caltech 101 Silhouettes, Frey Faces and Histopathology patches, and show that applying the hierarchical VampPrior delivers state-of-the-art results on all datasets in the unsupervised permutation invariant setting and the best results or comparable to SOTA methods for the approach with convolutional networks.
Tasks	Omniglot
Published	2017-05-19
URL	http://arxiv.org/abs/1705.07120v5
PDF	http://arxiv.org/pdf/1705.07120v5.pdf
PWC	https://paperswithcode.com/paper/vae-with-a-vampprior
Repo	https://github.com/belaalb/CEVAE-VampPrior
Framework	pytorch

Improving Facial Attribute Prediction using Semantic Segmentation


Title	Improving Facial Attribute Prediction using Semantic Segmentation
Authors	Mahdi M. Kalayeh, Boqing Gong, Mubarak Shah
Abstract	Attributes are semantically meaningful characteristics whose applicability widely crosses category boundaries. They are particularly important in describing and recognizing concepts where no explicit training example is given, \textit{e.g., zero-shot learning}. Additionally, since attributes are human describable, they can be used for efficient human-computer interaction. In this paper, we propose to employ semantic segmentation to improve facial attribute prediction. The core idea lies in the fact that many facial attributes describe local properties. In other words, the probability of an attribute to appear in a face image is far from being uniform in the spatial domain. We build our facial attribute prediction model jointly with a deep semantic segmentation network. This harnesses the localization cues learned by the semantic segmentation to guide the attention of the attribute prediction to the regions where different attributes naturally show up. As a result of this approach, in addition to recognition, we are able to localize the attributes, despite merely having access to image level labels (weak supervision) during training. We evaluate our proposed method on CelebA and LFWA datasets and achieve superior results to the prior arts. Furthermore, we show that in the reverse problem, semantic face parsing improves when facial attributes are available. That reaffirms the need to jointly model these two interconnected tasks.
Tasks	Semantic Segmentation, Zero-Shot Learning
Published	2017-04-27
URL	http://arxiv.org/abs/1704.08740v1
PDF	http://arxiv.org/pdf/1704.08740v1.pdf
PWC	https://paperswithcode.com/paper/improving-facial-attribute-prediction-using
Repo	https://github.com/nbansal90/Facial_attribute_segmentation
Framework	pytorch


Title	On Identifying Disaster-Related Tweets: Matching-based or Learning-based?
Authors	Hien To, Sumeet Agrawal, Seon Ho Kim, Cyrus Shahabi
Abstract	Social media such as tweets are emerging as platforms contributing to situational awareness during disasters. Information shared on Twitter by both affected population (e.g., requesting assistance, warning) and those outside the impact zone (e.g., providing assistance) would help first responders, decision makers, and the public to understand the situation first-hand. Effective use of such information requires timely selection and analysis of tweets that are relevant to a particular disaster. Even though abundant tweets are promising as a data source, it is challenging to automatically identify relevant messages since tweet are short and unstructured, resulting to unsatisfactory classification performance of conventional learning-based approaches. Thus, we propose a simple yet effective algorithm to identify relevant messages based on matching keywords and hashtags, and provide a comparison between matching-based and learning-based approaches. To evaluate the two approaches, we put them into a framework specifically proposed for analyzing disaster-related tweets. Analysis results on eleven datasets with various disaster types show that our technique provides relevant tweets of higher quality and more interpretable results of sentiment analysis tasks when compared to learning approach.
Tasks	Sentiment Analysis
Published	2017-05-04
URL	http://arxiv.org/abs/1705.02009v1
PDF	http://arxiv.org/pdf/1705.02009v1.pdf
PWC	https://paperswithcode.com/paper/on-identifying-disaster-related-tweets
Repo	https://github.com/infolab-usc/bdr-tweet
Framework	none

Temporal Segment Networks for Action Recognition in Videos


Title	Temporal Segment Networks for Action Recognition in Videos
Authors	Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
Abstract	Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structures with a new segment-based sampling and aggregation module. This unique design enables our TSN to efficiently learn action models by using the whole action videos. The learned models could be easily adapted for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the instantiation of TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on four challenging action recognition benchmarks: HMDB51 (71.0%), UCF101 (94.9%), THUMOS14 (80.1%), and ActivityNet v1.2 (89.6%). Using the proposed RGB difference for motion models, our method can still achieve competitive accuracy on UCF101 (91.0%) while running at 340 FPS. Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.
Tasks	Action Classification, Action Recognition In Videos, Temporal Action Localization, Video Classification
Published	2017-05-08
URL	http://arxiv.org/abs/1705.02953v1
PDF	http://arxiv.org/pdf/1705.02953v1.pdf
PWC	https://paperswithcode.com/paper/temporal-segment-networks-for-action
Repo	https://github.com/open-mmlab/mmaction
Framework	pytorch