July 29, 2019

3198 words 16 mins read

Paper Group AWR 157

Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference. Deeply-Learned Part-Aligned Representations for Person Re-Identification. Pedestrian Alignment Network for Large-scale Person Re-identification. DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks. SurfaceNet: An End-to-end 3D Neural Netwo …

Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference


Title	Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference
Authors	Geoffrey Roeder, Yuhuai Wu, David Duvenaud
Abstract	We propose a simple and general variant of the standard reparameterized gradient estimator for the variational evidence lower bound. Specifically, we remove a part of the total derivative with respect to the variational parameters that corresponds to the score function. Removing this term produces an unbiased gradient estimator whose variance approaches zero as the approximate posterior approaches the exact posterior. We analyze the behavior of this gradient estimator theoretically and empirically, and generalize it to more complex variational distributions such as mixtures and importance-weighted posteriors.
Tasks
Published	2017-03-27
URL	http://arxiv.org/abs/1703.09194v3
PDF	http://arxiv.org/pdf/1703.09194v3.pdf
PWC	https://paperswithcode.com/paper/sticking-the-landing-simple-lower-variance
Repo	https://github.com/geoffroeder/iwae
Framework	none

Deeply-Learned Part-Aligned Representations for Person Re-Identification


Title	Deeply-Learned Part-Aligned Representations for Person Re-Identification
Authors	Liming Zhao, Xi Li, Jingdong Wang, Yueting Zhuang
Abstract	In this paper, we address the problem of person re-identification, which refers to associating the persons captured from different cameras. We propose a simple yet effective human part-aligned representation for handling the body part misalignment problem. Our approach decomposes the human body into regions (parts) which are discriminative for person matching, accordingly computes the representations over the regions, and aggregates the similarities computed between the corresponding regions of a pair of probe and gallery images as the overall matching score. Our formulation, inspired by attention models, is a deep neural network modeling the three steps together, which is learnt through minimizing the triplet loss function without requiring body part labeling information. Unlike most existing deep learning algorithms that learn a global or spatial partition-based local representation, our approach performs human body partition, and thus is more robust to pose changes and various human spatial distributions in the person bounding box. Our approach shows state-of-the-art results over standard datasets, Market-$1501$, CUHK$03$, CUHK$01$ and VIPeR.
Tasks	Person Re-Identification
Published	2017-07-23
URL	http://arxiv.org/abs/1707.07256v1
PDF	http://arxiv.org/pdf/1707.07256v1.pdf
PWC	https://paperswithcode.com/paper/deeply-learned-part-aligned-representations
Repo	https://github.com/Phoebe-star/part_aligned
Framework	tf

Pedestrian Alignment Network for Large-scale Person Re-identification


Title	Pedestrian Alignment Network for Large-scale Person Re-identification
Authors	Zhedong Zheng, Liang Zheng, Yi Yang
Abstract	Person re-identification (person re-ID) is mostly viewed as an image retrieval problem. This task aims to search a query person in a large image pool. In practice, person re-ID usually adopts automatic detectors to obtain cropped pedestrian images. However, this process suffers from two types of detector errors: excessive background and part missing. Both errors deteriorate the quality of pedestrian alignment and may compromise pedestrian matching due to the position and scale variances. To address the misalignment problem, we propose that alignment can be learned from an identification procedure. We introduce the pedestrian alignment network (PAN) which allows discriminative embedding learning and pedestrian alignment without extra annotations. Our key observation is that when the convolutional neural network (CNN) learns to discriminate between different identities, the learned feature maps usually exhibit strong activations on the human body rather than the background. The proposed network thus takes advantage of this attention mechanism to adaptively locate and align pedestrians within a bounding box. Visual examples show that pedestrians are better aligned with PAN. Experiments on three large-scale re-ID datasets confirm that PAN improves the discriminative ability of the feature embeddings and yields competitive accuracy with the state-of-the-art methods.
Tasks	Image Retrieval, Large-Scale Person Re-Identification, Person Re-Identification
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00408v1
PDF	http://arxiv.org/pdf/1707.00408v1.pdf
PWC	https://paperswithcode.com/paper/pedestrian-alignment-network-for-large-scale
Repo	https://github.com/layumi/Pedestrian_Alignment
Framework	none

DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks


Title	DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks
Authors	Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, Luc Van Gool
Abstract	Despite a rapid rise in the quality of built-in smartphone cameras, their physical limitations - small sensor size, compact lenses and the lack of specific hardware, - impede them to achieve the quality results of DSLR cameras. In this work we present an end-to-end deep learning approach that bridges this gap by translating ordinary photos into DSLR-quality images. We propose learning the translation function using a residual convolutional neural network that improves both color rendition and image sharpness. Since the standard mean squared loss is not well suited for measuring perceptual image quality, we introduce a composite perceptual error function that combines content, color and texture losses. The first two losses are defined analytically, while the texture loss is learned in an adversarial fashion. We also present DPED, a large-scale dataset that consists of real photos captured from three different phones and one high-end reflex camera. Our quantitative and qualitative assessments reveal that the enhanced image quality is comparable to that of DSLR-taken photos, while the methodology is generalized to any type of digital camera.
Tasks
Published	2017-04-08
URL	http://arxiv.org/abs/1704.02470v2
PDF	http://arxiv.org/pdf/1704.02470v2.pdf
PWC	https://paperswithcode.com/paper/dslr-quality-photos-on-mobile-devices-with
Repo	https://github.com/antegrbesa/cnn-project
Framework	tf

SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis


Title	SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis
Authors	Mengqi Ji, Juergen Gall, Haitian Zheng, Yebin Liu, Lu Fang
Abstract	This paper proposes an end-to-end learning framework for multiview stereopsis. We term the network SurfaceNet. It takes a set of images and their corresponding camera parameters as input and directly infers the 3D model. The key advantage of the framework is that both photo-consistency as well geometric relations of the surface structure can be directly learned for the purpose of multiview stereopsis in an end-to-end fashion. SurfaceNet is a fully 3D convolutional network which is achieved by encoding the camera parameters together with the images in a 3D voxel representation. We evaluate SurfaceNet on the large-scale DTU benchmark.
Tasks
Published	2017-08-05
URL	http://arxiv.org/abs/1708.01749v1
PDF	http://arxiv.org/pdf/1708.01749v1.pdf
PWC	https://paperswithcode.com/paper/surfacenet-an-end-to-end-3d-neural-network
Repo	https://github.com/mjiUST/SurfaceNet
Framework	none

Rule-Mining based classification: a benchmark study


Title	Rule-Mining based classification: a benchmark study
Authors	Margaux Luck, Nicolas Pallet, Cecilia Damon
Abstract	This study proposed an exhaustive stable/reproducible rule-mining algorithm combined to a classifier to generate both accurate and interpretable models. Our method first extracts rules (i.e., a conjunction of conditions about the values of a small number of input features) with our exhaustive rule-mining algorithm, then constructs a new feature space based on the most relevant rules called “local features” and finally, builds a local predictive model by training a standard classifier on the new local feature space. This local feature space is easy interpretable by providing a human-understandable explanation under the explicit form of rules. Furthermore, our local predictive approach is as powerful as global classical ones like logistic regression (LR), support vector machine (SVM) and rules based methods like random forest (RF) and gradient boosted tree (GBT).
Tasks
Published	2017-06-30
URL	http://arxiv.org/abs/1706.10199v1
PDF	http://arxiv.org/pdf/1706.10199v1.pdf
PWC	https://paperswithcode.com/paper/rule-mining-based-classification-a-benchmark
Repo	https://github.com/Museau/Rule-Mining
Framework	none

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards


Title	Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
Authors	Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, Martin Riedmiller
Abstract	We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mechanism. Typically, carefully engineered shaping rewards are required to enable the agents to efficiently explore on high dimensional control problems such as robotics. They are also required for model-based acceleration methods relying on local solvers such as iLQG (e.g. Guided Policy Search and Normalized Advantage Function). The demonstrations replace the need for carefully engineered rewards, and reduce the exploration problem encountered by classical RL approaches in these domains. Demonstrations are collected by a robot kinesthetically force-controlled by a human demonstrator. Results on four simulated insertion tasks show that DDPG from demonstrations out-performs DDPG, and does not require engineered rewards. Finally, we demonstrate the method on a real robotics task consisting of inserting a clip (flexible object) into a rigid object.
Tasks
Published	2017-07-27
URL	http://arxiv.org/abs/1707.08817v2
PDF	http://arxiv.org/pdf/1707.08817v2.pdf
PWC	https://paperswithcode.com/paper/leveraging-demonstrations-for-deep
Repo	https://github.com/kairproject/kair_algorithms_draft
Framework	pytorch

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications


Title	MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Authors	Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
Abstract	We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
Tasks	Image Classification, Object Detection
Published	2017-04-17
URL	http://arxiv.org/abs/1704.04861v1
PDF	http://arxiv.org/pdf/1704.04861v1.pdf
PWC	https://paperswithcode.com/paper/mobilenets-efficient-convolutional-neural
Repo	https://github.com/Edmonton-School-of-AI/ml5-Simple-Image-Classification
Framework	none

Deep Convolutional Neural Network for Age Estimation based on VGG-Face Model


Title	Deep Convolutional Neural Network for Age Estimation based on VGG-Face Model
Authors	Zakariya Qawaqneh, Arafat Abu Mallouh, Buket D. Barkana
Abstract	Automatic age estimation from real-world and unconstrained face images is rapidly gaining importance. In our proposed work, a deep CNN model that was trained on a database for face recognition task is used to estimate the age information on the Adience database. This paper has three significant contributions in this field. (1) This work proves that a CNN model, which was trained for face recognition task, can be utilized for age estimation to improve performance; (2) Over fitting problem can be overcome by employing a pretrained CNN on a large database for face recognition task; (3) Not only the number of training images and the number subjects in a training database effect the performance of the age estimation model, but also the pre-training task of the employed CNN determines the performance of the model.
Tasks	Age Estimation, Face Recognition
Published	2017-09-06
URL	http://arxiv.org/abs/1709.01664v1
PDF	http://arxiv.org/pdf/1709.01664v1.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-neural-network-for-age
Repo	https://github.com/shridharrhegde/vgg-age
Framework	pytorch

StarCraft II: A New Challenge for Reinforcement Learning


Title	StarCraft II: A New Challenge for Reinforcement Learning
Authors	Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, Rodney Tsing
Abstract	This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.
Tasks	Real-Time Strategy Games, Starcraft, Starcraft II
Published	2017-08-16
URL	http://arxiv.org/abs/1708.04782v1
PDF	http://arxiv.org/pdf/1708.04782v1.pdf
PWC	https://paperswithcode.com/paper/starcraft-ii-a-new-challenge-for
Repo	https://github.com/tuomaso/SC2LE-implementation
Framework	tf

Network Inference via the Time-Varying Graphical Lasso


Title	Network Inference via the Time-Varying Graphical Lasso
Authors	David Hallac, Youngsuk Park, Stephen Boyd, Jure Leskovec
Abstract	Many important problems can be modeled as a system of interconnected entities, where each entity is recording time-dependent observations or measurements. In order to spot trends, detect anomalies, and interpret the temporal dynamics of such data, it is essential to understand the relationships between the different entities and how these relationships evolve over time. In this paper, we introduce the time-varying graphical lasso (TVGL), a method of inferring time-varying networks from raw time series data. We cast the problem in terms of estimating a sparse time-varying inverse covariance matrix, which reveals a dynamic network of interdependencies between the entities. Since dynamic network inference is a computationally expensive task, we derive a scalable message-passing algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in an efficient way. We also discuss several extensions, including a streaming algorithm to update the model and incorporate new observations in real time. Finally, we evaluate our TVGL algorithm on both real and synthetic datasets, obtaining interpretable results and outperforming state-of-the-art baselines in terms of both accuracy and scalability.
Tasks	Time Series
Published	2017-03-06
URL	http://arxiv.org/abs/1703.01958v2
PDF	http://arxiv.org/pdf/1703.01958v2.pdf
PWC	https://paperswithcode.com/paper/network-inference-via-the-time-varying
Repo	https://github.com/ams129/TVGL
Framework	none

3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks


Title	3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks
Authors	Chuhang Zou, Ersin Yumer, Jimei Yang, Duygu Ceylan, Derek Hoiem
Abstract	The success of various applications including robotics, digital content creation, and visualization demand a structured and abstract representation of the 3D world from limited sensor data. Inspired by the nature of human perception of 3D shapes as a collection of simple parts, we explore such an abstract shape representation based on primitives. Given a single depth image of an object, we present 3D-PRNN, a generative recurrent neural network that synthesizes multiple plausible shapes composed of a set of primitives. Our generative model encodes symmetry characteristics of common man-made objects, preserves long-range structural coherence, and describes objects of varying complexity with a compact representation. We also propose a method based on Gaussian Fields to generate a large scale dataset of primitive-based shape representations to train our network. We evaluate our approach on a wide range of examples and show that it outperforms nearest-neighbor based shape retrieval methods and is on-par with voxel-based generative models while using a significantly reduced parameter space.
Tasks
Published	2017-08-04
URL	http://arxiv.org/abs/1708.01648v1
PDF	http://arxiv.org/pdf/1708.01648v1.pdf
PWC	https://paperswithcode.com/paper/3d-prnn-generating-shape-primitives-with
Repo	https://github.com/paschalidoud/superquadric_parsing
Framework	pytorch

3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes


Title	3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes
Authors	Siqi Liu, Daguang Xu, S. Kevin Zhou, Thomas Mertelmeier, Julia Wicklein, Anna Jerebko, Sasa Grbic, Olivier Pauly, Weidong Cai, Dorin Comaniciu
Abstract	While deep convolutional neural networks (CNN) have been successfully applied for 2D image analysis, it is still challenging to apply them to 3D anisotropic volumes, especially when the within-slice resolution is much higher than the between-slice resolution and when the amount of 3D volumes is relatively small. On one hand, direct learning of CNN with 3D convolution kernels suffers from the lack of data and likely ends up with poor generalization; insufficient GPU memory limits the model size or representational power. On the other hand, applying 2D CNN with generalizable features to 2D slices ignores between-slice information. Coupling 2D network with LSTM to further handle the between-slice information is not optimal due to the difficulty in LSTM learning. To overcome the above challenges, we propose a 3D Anisotropic Hybrid Network (AH-Net) that transfers convolutional features learned from 2D images to 3D anisotropic volumes. Such a transfer inherits the desired strong generalization capability for within-slice information while naturally exploiting between-slice information for more effective modelling. The focal loss is further utilized for more effective end-to-end learning. We experiment with the proposed 3D AH-Net on two different medical image analysis tasks, namely lesion detection from a Digital Breast Tomosynthesis volume, and liver and liver tumor segmentation from a Computed Tomography volume and obtain the state-of-the-art results.
Tasks
Published	2017-11-23
URL	http://arxiv.org/abs/1711.08580v2
PDF	http://arxiv.org/pdf/1711.08580v2.pdf
PWC	https://paperswithcode.com/paper/3d-anisotropic-hybrid-network-transferring
Repo	https://github.com/lsqshr/AH-Net
Framework	pytorch

Recurrent Poisson Factorization for Temporal Recommendation


Title	Recurrent Poisson Factorization for Temporal Recommendation
Authors	Seyed Abbas Hosseini, Keivan Alizadeh, Ali Khodadadi, Ali Arabzadeh, Mehrdad Farajtabar, Hongyuan Zha, Hamid R. Rabiee
Abstract	Poisson factorization is a probabilistic model of users and items for recommendation systems, where the so-called implicit consumer data is modeled by a factorized Poisson distribution. There are many variants of Poisson factorization methods who show state-of-the-art performance on real-world recommendation tasks. However, most of them do not explicitly take into account the temporal behavior and the recurrent activities of users which is essential to recommend the right item to the right user at the right time. In this paper, we introduce Recurrent Poisson Factorization (RPF) framework that generalizes the classical PF methods by utilizing a Poisson process for modeling the implicit feedback. RPF treats time as a natural constituent of the model and brings to the table a rich family of time-sensitive factorization models. To elaborate, we instantiate several variants of RPF who are capable of handling dynamic user preferences and item specification (DRPF), modeling the social-aspect of product adoption (SRPF), and capturing the consumption heterogeneity among users and items (HRPF). We also develop a variational algorithm for approximate posterior inference that scales up to massive data sets. Furthermore, we demonstrate RPF’s superior performance over many state-of-the-art methods on synthetic dataset, and large scale real-world datasets on music streaming logs, and user-item interactions in M-Commerce platforms.
Tasks	Recommendation Systems
Published	2017-03-04
URL	http://arxiv.org/abs/1703.01442v1
PDF	http://arxiv.org/pdf/1703.01442v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-poisson-factorization-for-temporal
Repo	https://github.com/AHosseini/RPF
Framework	none

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders


Title	Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders
Authors	Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
Abstract	While recent neural encoder-decoder models have shown great promise in modeling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that captures the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved by introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence in discourse-level decision-making.
Tasks	Decision Making, Dialogue Generation, Text Generation
Published	2017-03-31
URL	http://arxiv.org/abs/1703.10960v3
PDF	http://arxiv.org/pdf/1703.10960v3.pdf
PWC	https://paperswithcode.com/paper/learning-discourse-level-diversity-for-neural
Repo	https://github.com/snakeztc/NeuralDialog-CVAE
Framework	tf