Paper Group AWR 121
Emergence of Locomotion Behaviours in Rich Environments. Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives. PAC-Bayesian Margin Bounds for Convolutional Neural Networks. Towards Neural Phrase-based Machine Translation. Regret Minimization for Partially Observable Deep Reinforceme …
Emergence of Locomotion Behaviours in Rich Environments
Title | Emergence of Locomotion Behaviours in Rich Environments |
Authors | Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver |
Abstract | The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion – behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed following https://youtu.be/hx_bgoTF7bs . |
Tasks | |
Published | 2017-07-07 |
URL | http://arxiv.org/abs/1707.02286v2 |
http://arxiv.org/pdf/1707.02286v2.pdf | |
PWC | https://paperswithcode.com/paper/emergence-of-locomotion-behaviours-in-rich |
Repo | https://github.com/Unity-Technologies/marathon-envs |
Framework | none |
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Title | Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives |
Authors | A. Cichocki, A-H. Phan, Q. Zhao, N. Lee, I. V. Oseledets, M. Sugiyama, D. Mandic |
Abstract | Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions. |
Tasks | Dimensionality Reduction, Tensor Networks |
Published | 2017-08-30 |
URL | http://arxiv.org/abs/1708.09165v1 |
http://arxiv.org/pdf/1708.09165v1.pdf | |
PWC | https://paperswithcode.com/paper/tensor-networks-for-dimensionality-reduction |
Repo | https://github.com/rballester/ttrecipes |
Framework | none |
PAC-Bayesian Margin Bounds for Convolutional Neural Networks
Title | PAC-Bayesian Margin Bounds for Convolutional Neural Networks |
Authors | Konstantinos Pitas, Mike Davies, Pierre Vandergheynst |
Abstract | Recently the generalization error of deep neural networks has been analyzed through the PAC-Bayesian framework, for the case of fully connected layers. We adapt this approach to the convolutional setting. |
Tasks | |
Published | 2017-12-30 |
URL | http://arxiv.org/abs/1801.00171v2 |
http://arxiv.org/pdf/1801.00171v2.pdf | |
PWC | https://paperswithcode.com/paper/pac-bayesian-margin-bounds-for-convolutional |
Repo | https://github.com/konstantinos-p/PAC_Bayesian_Generalization |
Framework | tf |
Towards Neural Phrase-based Machine Translation
Title | Towards Neural Phrase-based Machine Translation |
Authors | Po-Sen Huang, Chong Wang, Sitao Huang, Dengyong Zhou, Li Deng |
Abstract | In this paper, we present Neural Phrase-based Machine Translation (NPMT). Our method explicitly models the phrase structures in output sequences using Sleep-WAke Networks (SWAN), a recently proposed segmentation-based sequence modeling method. To mitigate the monotonic alignment requirement of SWAN, we introduce a new layer to perform (soft) local reordering of input sequences. Different from existing neural machine translation (NMT) approaches, NPMT does not use attention-based decoding mechanisms. Instead, it directly outputs phrases in a sequential order and can decode in linear time. Our experiments show that NPMT achieves superior performances on IWSLT 2014 German-English/English-German and IWSLT 2015 English-Vietnamese machine translation tasks compared with strong NMT baselines. We also observe that our method produces meaningful phrases in output languages. |
Tasks | Machine Translation |
Published | 2017-06-17 |
URL | http://arxiv.org/abs/1706.05565v8 |
http://arxiv.org/pdf/1706.05565v8.pdf | |
PWC | https://paperswithcode.com/paper/towards-neural-phrase-based-machine |
Repo | https://github.com/Microsoft/NPMT |
Framework | torch |
Regret Minimization for Partially Observable Deep Reinforcement Learning
Title | Regret Minimization for Partially Observable Deep Reinforcement Learning |
Authors | Peter Jin, Kurt Keutzer, Sergey Levine |
Abstract | Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong. |
Tasks | |
Published | 2017-10-31 |
URL | http://arxiv.org/abs/1710.11424v2 |
http://arxiv.org/pdf/1710.11424v2.pdf | |
PWC | https://paperswithcode.com/paper/regret-minimization-for-partially-observable |
Repo | https://github.com/peterhj/arm-pytorch |
Framework | pytorch |
Learning to Map Vehicles into Bird’s Eye View
Title | Learning to Map Vehicles into Bird’s Eye View |
Authors | Andrea Palazzi, Guido Borghi, Davide Abati, Simone Calderara, Rita Cucchiara |
Abstract | Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is gaining importance both for the academia and car companies. This paper presents a way to learn a semantic-aware transformation which maps detections from a dashboard camera view onto a broader bird’s eye occupancy map of the scene. To this end, a huge synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird’s eye view, has been collected and automatically annotated. A deep-network is then trained to warp detections from the first to the second view. We demonstrate the effectiveness of our model against several baselines and observe that is able to generalize on real-world data despite having been trained solely on synthetic ones. |
Tasks | Autonomous Vehicles |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08442v1 |
http://arxiv.org/pdf/1706.08442v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-map-vehicles-into-birds-eye-view |
Repo | https://github.com/cciprianmihai/Self_Driving_Car_NanoDegree_P2_AdvancedLaneLines |
Framework | none |
Generating and designing DNA with deep generative models
Title | Generating and designing DNA with deep generative models |
Authors | Nathan Killoran, Leo J. Lee, Andrew Delong, David Duvenaud, Brendan J. Frey |
Abstract | We propose generative neural network methods to generate DNA sequences and tune them to have desired properties. We present three approaches: creating synthetic DNA sequences using a generative adversarial network; a DNA-based variant of the activation maximization (“deep dream”) design method; and a joint procedure which combines these two approaches together. We show that these tools capture important structures of the data and, when applied to designing probes for protein binding microarrays, allow us to generate new sequences whose properties are estimated to be superior to those found in the training data. We believe that these results open the door for applying deep generative models to advance genomics research. |
Tasks | |
Published | 2017-12-17 |
URL | http://arxiv.org/abs/1712.06148v1 |
http://arxiv.org/pdf/1712.06148v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-and-designing-dna-with-deep |
Repo | https://github.com/DamLabResources/journalclub |
Framework | none |
Equivalence of restricted Boltzmann machines and tensor network states
Title | Equivalence of restricted Boltzmann machines and tensor network states |
Authors | Jing Chen, Song Cheng, Haidong Xie, Lei Wang, Tao Xiang |
Abstract | The restricted Boltzmann machine (RBM) is one of the fundamental building blocks of deep learning. RBM finds wide applications in dimensional reduction, feature extraction, and recommender systems via modeling the probability distributions of a variety of input data including natural images, speech signals, and customer ratings, etc. We build a bridge between RBM and tensor network states (TNS) widely used in quantum many-body physics research. We devise efficient algorithms to translate an RBM into the commonly used TNS. Conversely, we give sufficient and necessary conditions to determine whether a TNS can be transformed into an RBM of given architectures. Revealing these general and constructive connections can cross-fertilize both deep learning and quantum many-body physics. Notably, by exploiting the entanglement entropy bound of TNS, we can rigorously quantify the expressive power of RBM on complex data sets. Insights into TNS and its entanglement capacity can guide the design of more powerful deep learning architectures. On the other hand, RBM can represent quantum many-body states with fewer parameters compared to TNS, which may allow more efficient classical simulations. |
Tasks | Recommendation Systems |
Published | 2017-01-17 |
URL | http://arxiv.org/abs/1701.04831v2 |
http://arxiv.org/pdf/1701.04831v2.pdf | |
PWC | https://paperswithcode.com/paper/equivalence-of-restricted-boltzmann-machines |
Repo | https://github.com/yzcj105/rbm2mps |
Framework | none |
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
Title | Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks |
Authors | Marco Carraro, Matteo Munaro, Jeff Burke, Emanuele Menegatti |
Abstract | This paper proposes a novel system to estimate and track the 3D poses of multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D pose of each person is computed by a central node which receives the single-view outcomes from each camera of the network. Each single-view outcome is computed by using a CNN for 2D pose estimation and extending the resulting skeletons to 3D by means of the sensor depth. The proposed system is marker-less, multi-person, independent of background and does not make any assumption on people appearance and initial pose. The system provides real-time outcomes, thus being perfectly suited for applications requiring user interaction. Experimental results show the effectiveness of this work with respect to a baseline multi-view approach in different scenarios. To foster research and applications based on this work, we released the source code in OpenPTrack, an open source project for RGB-D people tracking. |
Tasks | 3D Pose Estimation, Pose Estimation |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06235v1 |
http://arxiv.org/pdf/1710.06235v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-marker-less-multi-person-3d-pose |
Repo | https://github.com/marketto89/open_ptrack |
Framework | none |
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation
Title | Online Adaptation of Convolutional Neural Networks for Video Object Segmentation |
Authors | Paul Voigtlaender, Bastian Leibe |
Abstract | We tackle the task of semi-supervised video object segmentation, i.e. segmenting the pixels belonging to an object in the video using the ground truth pixel mask for the first frame. We build on the recently introduced one-shot video object segmentation (OSVOS) approach which uses a pretrained network and fine-tunes it on the first frame. While achieving impressive performance, at test time OSVOS uses the fine-tuned network in unchanged form and is not able to adapt to large changes in object appearance. To overcome this limitation, we propose Online Adaptive Video Object Segmentation (OnAVOS) which updates the network online using training examples selected based on the confidence of the network and the spatial configuration. Additionally, we add a pretraining step based on objectness, which is learned on PASCAL. Our experiments show that both extensions are highly effective and improve the state of the art on DAVIS to an intersection-over-union score of 85.7%. |
Tasks | Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation, Visual Object Tracking |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09364v2 |
http://arxiv.org/pdf/1706.09364v2.pdf | |
PWC | https://paperswithcode.com/paper/online-adaptation-of-convolutional-neural |
Repo | https://github.com/Stocastico/OnAVOS |
Framework | tf |
Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
Title | Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering |
Authors | Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell |
Abstract | A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets. |
Tasks | Open-Domain Question Answering, Question Answering, Reading Comprehension |
Published | 2017-11-14 |
URL | http://arxiv.org/abs/1711.05116v2 |
http://arxiv.org/pdf/1711.05116v2.pdf | |
PWC | https://paperswithcode.com/paper/evidence-aggregation-for-answer-re-ranking-in |
Repo | https://github.com/shuohangwang/mprc |
Framework | torch |
VAE with a VampPrior
Title | VAE with a VampPrior |
Authors | Jakub M. Tomczak, Max Welling |
Abstract | Many different methods to train deep generative models have been introduced in the past. In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call “Variational Mixture of Posteriors” prior, or VampPrior for short. The VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs. We further extend this prior to a two layer hierarchical model and show that this architecture with a coupled prior and posterior, learns significantly better models. The model also avoids the usual local optima issues related to useless latent dimensions that plague VAEs. We provide empirical studies on six datasets, namely, static and binary MNIST, OMNIGLOT, Caltech 101 Silhouettes, Frey Faces and Histopathology patches, and show that applying the hierarchical VampPrior delivers state-of-the-art results on all datasets in the unsupervised permutation invariant setting and the best results or comparable to SOTA methods for the approach with convolutional networks. |
Tasks | Omniglot |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07120v5 |
http://arxiv.org/pdf/1705.07120v5.pdf | |
PWC | https://paperswithcode.com/paper/vae-with-a-vampprior |
Repo | https://github.com/belaalb/CEVAE-VampPrior |
Framework | pytorch |
Improving Facial Attribute Prediction using Semantic Segmentation
Title | Improving Facial Attribute Prediction using Semantic Segmentation |
Authors | Mahdi M. Kalayeh, Boqing Gong, Mubarak Shah |
Abstract | Attributes are semantically meaningful characteristics whose applicability widely crosses category boundaries. They are particularly important in describing and recognizing concepts where no explicit training example is given, \textit{e.g., zero-shot learning}. Additionally, since attributes are human describable, they can be used for efficient human-computer interaction. In this paper, we propose to employ semantic segmentation to improve facial attribute prediction. The core idea lies in the fact that many facial attributes describe local properties. In other words, the probability of an attribute to appear in a face image is far from being uniform in the spatial domain. We build our facial attribute prediction model jointly with a deep semantic segmentation network. This harnesses the localization cues learned by the semantic segmentation to guide the attention of the attribute prediction to the regions where different attributes naturally show up. As a result of this approach, in addition to recognition, we are able to localize the attributes, despite merely having access to image level labels (weak supervision) during training. We evaluate our proposed method on CelebA and LFWA datasets and achieve superior results to the prior arts. Furthermore, we show that in the reverse problem, semantic face parsing improves when facial attributes are available. That reaffirms the need to jointly model these two interconnected tasks. |
Tasks | Semantic Segmentation, Zero-Shot Learning |
Published | 2017-04-27 |
URL | http://arxiv.org/abs/1704.08740v1 |
http://arxiv.org/pdf/1704.08740v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-facial-attribute-prediction-using |
Repo | https://github.com/nbansal90/Facial_attribute_segmentation |
Framework | pytorch |
On Identifying Disaster-Related Tweets: Matching-based or Learning-based?
Title | On Identifying Disaster-Related Tweets: Matching-based or Learning-based? |
Authors | Hien To, Sumeet Agrawal, Seon Ho Kim, Cyrus Shahabi |
Abstract | Social media such as tweets are emerging as platforms contributing to situational awareness during disasters. Information shared on Twitter by both affected population (e.g., requesting assistance, warning) and those outside the impact zone (e.g., providing assistance) would help first responders, decision makers, and the public to understand the situation first-hand. Effective use of such information requires timely selection and analysis of tweets that are relevant to a particular disaster. Even though abundant tweets are promising as a data source, it is challenging to automatically identify relevant messages since tweet are short and unstructured, resulting to unsatisfactory classification performance of conventional learning-based approaches. Thus, we propose a simple yet effective algorithm to identify relevant messages based on matching keywords and hashtags, and provide a comparison between matching-based and learning-based approaches. To evaluate the two approaches, we put them into a framework specifically proposed for analyzing disaster-related tweets. Analysis results on eleven datasets with various disaster types show that our technique provides relevant tweets of higher quality and more interpretable results of sentiment analysis tasks when compared to learning approach. |
Tasks | Sentiment Analysis |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.02009v1 |
http://arxiv.org/pdf/1705.02009v1.pdf | |
PWC | https://paperswithcode.com/paper/on-identifying-disaster-related-tweets |
Repo | https://github.com/infolab-usc/bdr-tweet |
Framework | none |
Temporal Segment Networks for Action Recognition in Videos
Title | Temporal Segment Networks for Action Recognition in Videos |
Authors | Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool |
Abstract | Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structures with a new segment-based sampling and aggregation module. This unique design enables our TSN to efficiently learn action models by using the whole action videos. The learned models could be easily adapted for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the instantiation of TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on four challenging action recognition benchmarks: HMDB51 (71.0%), UCF101 (94.9%), THUMOS14 (80.1%), and ActivityNet v1.2 (89.6%). Using the proposed RGB difference for motion models, our method can still achieve competitive accuracy on UCF101 (91.0%) while running at 340 FPS. Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices. |
Tasks | Action Classification, Action Recognition In Videos, Temporal Action Localization, Video Classification |
Published | 2017-05-08 |
URL | http://arxiv.org/abs/1705.02953v1 |
http://arxiv.org/pdf/1705.02953v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-segment-networks-for-action |
Repo | https://github.com/open-mmlab/mmaction |
Framework | pytorch |