October 20, 2019

2998 words 15 mins read

Paper Group AWR 279

Paper Group AWR 279

Deep Content-User Embedding Model for Music Recommendation. R-grams: Unsupervised Learning of Semantic Units in Natural Language. Convolutional Deblurring for Natural Imaging. MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Const …

Deep Content-User Embedding Model for Music Recommendation

Title Deep Content-User Embedding Model for Music Recommendation
Authors Jongpil Lee, Kyungyun Lee, Jiyoung Park, Jangyeon Park, Juhan Nam
Abstract Recently deep learning based recommendation systems have been actively explored to solve the cold-start problem using a hybrid approach. However, the majority of previous studies proposed a hybrid model where collaborative filtering and content-based filtering modules are independently trained. The end-to-end approach that takes different modality data as input and jointly trains the model can provide better optimization but it has not been fully explored yet. In this work, we propose deep content-user embedding model, a simple and intuitive architecture that combines the user-item interaction and music audio content. We evaluate the model on music recommendation and music auto-tagging tasks. The results show that the proposed model significantly outperforms the previous work. We also discuss various directions to improve the proposed model further.
Tasks Music Auto-Tagging, Recommendation Systems
Published 2018-07-18
URL http://arxiv.org/abs/1807.06786v1
PDF http://arxiv.org/pdf/1807.06786v1.pdf
PWC https://paperswithcode.com/paper/deep-content-user-embedding-model-for-music
Repo https://github.com/jongpillee/deep-content-user
Framework tf

R-grams: Unsupervised Learning of Semantic Units in Natural Language

Title R-grams: Unsupervised Learning of Semantic Units in Natural Language
Authors Ariel Ekgren, Amaru Cuba Gyllensten, Magnus Sahlgren
Abstract This paper investigates data-driven segmentation using Re-Pair or Byte Pair Encoding-techniques. In contrast to previous work which has primarily been focused on subword units for machine translation, we are interested in the general properties of such segments above the word level. We call these segments r-grams, and discuss their properties and the effect they have on the token frequency distribution. The proposed approach is evaluated by demonstrating its viability in embedding techniques, both in monolingual and multilingual test settings. We also provide a number of qualitative examples of the proposed methodology, demonstrating its viability as a language-invariant segmentation procedure.
Tasks Machine Translation
Published 2018-08-14
URL http://arxiv.org/abs/1808.04670v2
PDF http://arxiv.org/pdf/1808.04670v2.pdf
PWC https://paperswithcode.com/paper/r-grams-unsupervised-learning-of-semantic
Repo https://github.com/bakirillov/rgramlib
Framework none

Convolutional Deblurring for Natural Imaging

Title Convolutional Deblurring for Natural Imaging
Authors Mahdi S. Hosseini, Konstantinos N. Plataniotis
Abstract In this paper, we propose a novel design of image deblurring in the form of one-shot convolution filtering that can directly convolve with naturally blurred images for restoration. The problem of optical blurring is a common disadvantage to many imaging applications that suffer from optical imperfections. Despite numerous deconvolution methods that blindly estimate blurring in either inclusive or exclusive forms, they are practically challenging due to high computational cost and low image reconstruction quality. Both conditions of high accuracy and high speed are prerequisites for high-throughput imaging platforms in digital archiving. In such platforms, deblurring is required after image acquisition before being stored, previewed, or processed for high-level interpretation. Therefore, on-the-fly correction of such images is important to avoid possible time delays, mitigate computational expenses, and increase image perception quality. We bridge this gap by synthesizing a deconvolution kernel as a linear combination of Finite Impulse Response (FIR) even-derivative filters that can be directly convolved with blurry input images to boost the frequency fall-off of the Point Spread Function (PSF) associated with the optical blur. We employ a Gaussian low-pass filter to decouple the image denoising problem for image edge deblurring. Furthermore, we propose a blind approach to estimate the PSF statistics for two Gaussian and Laplacian models that are common in many imaging pipelines. Thorough experiments are designed to test and validate the efficiency of the proposed method using 2054 naturally blurred images across six imaging applications and seven state-of-the-art deconvolution methods.
Tasks Deblurring, Denoising, Image Denoising, Image Reconstruction
Published 2018-10-25
URL https://arxiv.org/abs/1810.10725v2
PDF https://arxiv.org/pdf/1810.10725v2.pdf
PWC https://paperswithcode.com/paper/convolutional-deblurring-for-natural-imaging
Repo https://github.com/mahdihosseini/1Shot-MaxPol
Framework none

MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network

Title MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network
Authors Muhammed Kocabas, Salih Karagoz, Emre Akbas
Abstract In this paper, we present MultiPoseNet, a novel bottom-up multi-person pose estimation architecture that combines a multi-task model with a novel assignment method. MultiPoseNet can jointly handle person detection, keypoint detection, person segmentation and pose estimation problems. The novel assignment method is implemented by the Pose Residual Network (PRN) which receives keypoint and person detections, and produces accurate poses by assigning keypoints to person instances. On the COCO keypoints dataset, our pose estimation method outperforms all previous bottom-up methods both in accuracy (+4-point mAP over previous best result) and speed; it also performs on par with the best top-down methods while being at least 4x faster. Our method is the fastest real time system with 23 frames/sec. Source code is available at: https://github.com/mkocabas/pose-residual-network
Tasks Human Detection, Keypoint Detection, Multi-Person Pose Estimation, Pose Estimation
Published 2018-07-11
URL http://arxiv.org/abs/1807.04067v1
PDF http://arxiv.org/pdf/1807.04067v1.pdf
PWC https://paperswithcode.com/paper/multiposenet-fast-multi-person-pose
Repo https://github.com/danielperezr88/multiposenet-aries
Framework tf

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

Title Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
Authors Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine
Abstract Adversarial learning methods have been proposed for a wide range of applications, but the training of adversarial models can be notoriously unstable. Effectively balancing the performance of the generator and discriminator is critical, since a discriminator that achieves very high accuracy will produce relatively uninformative gradients. In this work, we propose a simple and general technique to constrain information flow in the discriminator by means of an information bottleneck. By enforcing a constraint on the mutual information between the observations and the discriminator’s internal representation, we can effectively modulate the discriminator’s accuracy and maintain useful and informative gradients. We demonstrate that our proposed variational discriminator bottleneck (VDB) leads to significant improvements across three distinct application areas for adversarial learning algorithms. Our primary evaluation studies the applicability of the VDB to imitation learning of dynamic continuous control skills, such as running. We show that our method can learn such skills directly from \emph{raw} video demonstrations, substantially outperforming prior adversarial imitation learning methods. The VDB can also be combined with adversarial inverse reinforcement learning to learn parsimonious reward functions that can be transferred and re-optimized in new settings. Finally, we demonstrate that VDB can train GANs more effectively for image generation, improving upon a number of prior stabilization methods.
Tasks Continuous Control, Image Generation, Imitation Learning
Published 2018-10-01
URL http://arxiv.org/abs/1810.00821v3
PDF http://arxiv.org/pdf/1810.00821v3.pdf
PWC https://paperswithcode.com/paper/variational-discriminator-bottleneck
Repo https://github.com/akanimax/Variational_Discriminator_Bottleneck
Framework pytorch

Analyzing Inverse Problems with Invertible Neural Networks

Title Analyzing Inverse Problems with Invertible Neural Networks
Authors Lynton Ardizzone, Jakob Kruse, Sebastian Wirkert, Daniel Rahner, Eric W. Pellegrini, Ralf S. Klessen, Lena Maier-Hein, Carsten Rother, Ullrich Köthe
Abstract In many tasks, in particular in natural science, the goal is to determine hidden system parameters from a set of measurements. Often, the forward process from parameter- to measurement-space is a well-defined function, whereas the inverse problem is ambiguous: one measurement may map to multiple different sets of parameters. In this setting, the posterior parameter distribution, conditioned on an input measurement, has to be determined. We argue that a particular class of neural networks is well suited for this task – so-called Invertible Neural Networks (INNs). Although INNs are not new, they have, so far, received little attention in literature. While classical neural networks attempt to solve the ambiguous inverse problem directly, INNs are able to learn it jointly with the well-defined forward process, using additional latent output variables to capture the information otherwise lost. Given a specific measurement and sampled latent variables, the inverse pass of the INN provides a full distribution over parameter space. We verify experimentally, on artificial data and real-world problems from astrophysics and medicine, that INNs are a powerful analysis tool to find multi-modalities in parameter space, to uncover parameter correlations, and to identify unrecoverable parameters.
Tasks
Published 2018-08-14
URL http://arxiv.org/abs/1808.04730v3
PDF http://arxiv.org/pdf/1808.04730v3.pdf
PWC https://paperswithcode.com/paper/analyzing-inverse-problems-with-invertible
Repo https://github.com/VLL-HD/analyzing_inverse_problems
Framework pytorch

Learning Probabilistic Trajectory Models of Aircraft in Terminal Airspace from Position Data

Title Learning Probabilistic Trajectory Models of Aircraft in Terminal Airspace from Position Data
Authors Shane Barratt, Mykel Kochenderfer, Stephen Boyd
Abstract Models for predicting aircraft motion are an important component of modern aeronautical systems. These models help aircraft plan collision avoidance maneuvers and help conduct offline performance and safety analyses. In this article, we develop a method for learning a probabilistic generative model of aircraft motion in terminal airspace, the controlled airspace surrounding a given airport. The method fits the model based on a historical dataset of radar-based position measurements of aircraft landings and takeoffs at that airport. We find that the model generates realistic trajectories, provides accurate predictions, and captures the statistical properties of aircraft trajectories. Furthermore, the model trains quickly, is compact, and allows for efficient real-time inference.
Tasks
Published 2018-10-22
URL http://arxiv.org/abs/1810.09568v1
PDF http://arxiv.org/pdf/1810.09568v1.pdf
PWC https://paperswithcode.com/paper/learning-probabilistic-trajectory-models-of
Repo https://github.com/sisl/terminal-airspace-models
Framework none

Efficient Collaborative Multi-Agent Deep Reinforcement Learning for Large-Scale Fleet Management

Title Efficient Collaborative Multi-Agent Deep Reinforcement Learning for Large-Scale Fleet Management
Authors Kaixiang Lin, Renyu Zhao, Zhe Xu, Jiayu Zhou
Abstract Large-scale online ride-sharing platforms have substantially transformed our lives by reallocating transportation resources to alleviate traffic congestion and promote transportation efficiency. An efficient fleet management strategy not only can significantly improve the utilization of transportation resources but also increase the revenue and customer satisfaction. It is a challenging task to design an effective fleet management strategy that can adapt to an environment involving complex dynamics between demand and supply. Existing studies usually work on a simplified problem setting that can hardly capture the complicated stochastic demand-supply variations in high-dimensional space. In this paper we propose to tackle the large-scale fleet management problem using reinforcement learning, and propose a contextual multi-agent reinforcement learning framework including three concrete algorithms to achieve coordination among a large number of agents adaptive to different contexts. We show significant improvements of the proposed framework over state-of-the-art approaches through extensive empirical studies.
Tasks Multi-agent Reinforcement Learning, Q-Learning
Published 2018-02-18
URL https://arxiv.org/abs/1802.06444v3
PDF https://arxiv.org/pdf/1802.06444v3.pdf
PWC https://paperswithcode.com/paper/efficient-large-scale-fleet-management-via
Repo https://github.com/cambriandot/papers
Framework none

Parameter-Free Spatial Attention Network for Person Re-Identification

Title Parameter-Free Spatial Attention Network for Person Re-Identification
Authors Haoran Wang, Yue Fan, Zexin Wang, Licheng Jiao, Bernt Schiele
Abstract Global average pooling (GAP) allows to localize discriminative information for recognition [40]. While GAP helps the convolution neural network to attend to the most discriminative features of an object, it may suffer if that information is missing e.g. due to camera viewpoint changes. To circumvent this issue, we argue that it is advantageous to attend to the global configuration of the object by modeling spatial relations among high-level features. We propose a novel architecture for Person Re-Identification, based on a novel parameter-free spatial attention layer introducing spatial relations among the feature map activations back to the model. Our spatial attention layer consistently improves the performance over the model without it. Results on four benchmarks demonstrate a superiority of our model over the state-of-the-art achieving rank-1 accuracy of 94.7% on Market-1501, 89.0% on DukeMTMC-ReID, 74.9% on CUHK03-labeled and 69.7% on CUHK03-detected.
Tasks Person Re-Identification
Published 2018-11-29
URL http://arxiv.org/abs/1811.12150v1
PDF http://arxiv.org/pdf/1811.12150v1.pdf
PWC https://paperswithcode.com/paper/parameter-free-spatial-attention-network-for
Repo https://github.com/HRanWang/Spatial-Attention
Framework pytorch

TFLMS: Large Model Support in TensorFlow by Graph Rewriting

Title TFLMS: Large Model Support in TensorFlow by Graph Rewriting
Authors Tung D. Le, Haruki Imai, Yasushi Negishi, Kiyokuni Kawachiya
Abstract While accelerators such as GPUs have limited memory, deep neural networks are becoming larger and will not fit with the memory limitation of accelerators for training. We propose an approach to tackle this problem by rewriting the computational graph of a neural network, in which swap-out and swap-in operations are inserted to temporarily store intermediate results on CPU memory. In particular, we first revise the concept of a computational graph by defining a concrete semantics for variables in a graph. We then formally show how to derive swap-out and swap-in operations from an existing graph and present rules to optimize the graph. To realize our approach, we developed a module in TensorFlow, named TFLMS. TFLMS is published as a pull request in the TensorFlow repository for contributing to the TensorFlow community. With TFLMS, we were able to train ResNet-50 and 3DUnet with 4.7x and 2x larger batch size, respectively. In particular, we were able to train 3DUNet using images of size of $192^3$ for image segmentation, which, without TFLMS, had been done only by dividing the images to smaller images, which affects the accuracy.
Tasks Semantic Segmentation
Published 2018-07-05
URL https://arxiv.org/abs/1807.02037v2
PDF https://arxiv.org/pdf/1807.02037v2.pdf
PWC https://paperswithcode.com/paper/tflms-large-model-support-in-tensorflow-by
Repo https://github.com/IBM/tensorflow-large-model-support
Framework tf

Densely Connected Attention Propagation for Reading Comprehension

Title Densely Connected Attention Propagation for Reading Comprehension
Authors Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Jian Su
Abstract We propose DecaProp (Densely Connected Attention Propagation), a new densely connected neural architecture for reading comprehension (RC). There are two distinct characteristics of our model. Firstly, our model densely connects all pairwise layers of the network, modeling relationships between passage and query across all hierarchical levels. Secondly, the dense connectors in our network are learned via attention instead of standard residual skip-connectors. To this end, we propose novel Bidirectional Attention Connectors (BAC) for efficiently forging connections throughout the network. We conduct extensive experiments on four challenging RC benchmarks. Our proposed approach achieves state-of-the-art results on all four, outperforming existing baselines by up to $2.6%-14.2%$ in absolute F1 score.
Tasks Open-Domain Question Answering, Question Answering, Reading Comprehension
Published 2018-11-10
URL http://arxiv.org/abs/1811.04210v2
PDF http://arxiv.org/pdf/1811.04210v2.pdf
PWC https://paperswithcode.com/paper/densely-connected-attention-propagation-for
Repo https://github.com/vanzytay/NIPS2018_DECAPROP
Framework tf

RedNet: Residual Encoder-Decoder Network for indoor RGB-D Semantic Segmentation

Title RedNet: Residual Encoder-Decoder Network for indoor RGB-D Semantic Segmentation
Authors Jindong Jiang, Lunan Zheng, Fei Luo, Zhijun Zhang
Abstract Indoor semantic segmentation has always been a difficult task in computer vision. In this paper, we propose an RGB-D residual encoder-decoder architecture, named RedNet, for indoor RGB-D semantic segmentation. In RedNet, the residual module is applied to both the encoder and decoder as the basic building block, and the skip-connection is used to bypass the spatial feature between the encoder and decoder. In order to incorporate the depth information of the scene, a fusion structure is constructed, which makes inference on RGB image and depth image separately, and fuses their features over several layers. In order to efficiently optimize the network’s parameters, we propose a `pyramid supervision’ training scheme, which applies supervised learning over different layers in the decoder, to cope with the problem of gradients vanishing. Experiment results show that the proposed RedNet(ResNet-50) achieves a state-of-the-art mIoU accuracy of 47.8% on the SUN RGB-D benchmark dataset. |
Tasks Semantic Segmentation
Published 2018-06-04
URL http://arxiv.org/abs/1806.01054v2
PDF http://arxiv.org/pdf/1806.01054v2.pdf
PWC https://paperswithcode.com/paper/rednet-residual-encoder-decoder-network-for
Repo https://github.com/JindongJiang/RedNet
Framework pytorch

Neural Message Passing with Edge Updates for Predicting Properties of Molecules and Materials

Title Neural Message Passing with Edge Updates for Predicting Properties of Molecules and Materials
Authors Peter Bjørn Jørgensen, Karsten Wedel Jacobsen, Mikkel N. Schmidt
Abstract Neural message passing on molecular graphs is one of the most promising methods for predicting formation energy and other properties of molecules and materials. In this work we extend the neural message passing model with an edge update network which allows the information exchanged between atoms to depend on the hidden state of the receiving atom. We benchmark the proposed model on three publicly available datasets (QM9, The Materials Project and OQMD) and show that the proposed model yields superior prediction of formation energies and other properties on all three datasets in comparison with the best published results. Furthermore we investigate different methods for constructing the graph used to represent crystalline structures and we find that using a graph based on K-nearest neighbors achieves better prediction accuracy than using maximum distance cutoff or the Voronoi tessellation graph.
Tasks Formation Energy
Published 2018-06-08
URL http://arxiv.org/abs/1806.03146v1
PDF http://arxiv.org/pdf/1806.03146v1.pdf
PWC https://paperswithcode.com/paper/neural-message-passing-with-edge-updates-for
Repo https://github.com/toshi-k/kaggle-champs-scalar-coupling
Framework none

Modelling sparsity, heterogeneity, reciprocity and community structure in temporal interaction data

Title Modelling sparsity, heterogeneity, reciprocity and community structure in temporal interaction data
Authors Xenia Miscouridou, François Caron, Yee Whye Teh
Abstract We propose a novel class of network models for temporal dyadic interaction data. Our goal is to capture a number of important features often observed in social interactions: sparsity, degree heterogeneity, community structure and reciprocity. We propose a family of models based on self-exciting Hawkes point processes in which events depend on the history of the process. The key component is the conditional intensity function of the Hawkes Process, which captures the fact that interactions may arise as a response to past interactions (reciprocity), or due to shared interests between individuals (community structure). In order to capture the sparsity and degree heterogeneity, the base (non time dependent) part of the intensity function builds on compound random measures following Todeschini et al. (2016). We conduct experiments on a variety of real-world temporal interaction data and show that the proposed model outperforms many competing approaches for link prediction, and leads to interpretable parameters.
Tasks Link Prediction, Point Processes
Published 2018-03-16
URL http://arxiv.org/abs/1803.06070v2
PDF http://arxiv.org/pdf/1803.06070v2.pdf
PWC https://paperswithcode.com/paper/modelling-sparsity-heterogeneity-reciprocity
Repo https://github.com/OxCSML-BayesNP/HawkesNetOC
Framework none

Robust Ordinal Embedding from Contaminated Relative Comparisons

Title Robust Ordinal Embedding from Contaminated Relative Comparisons
Authors Ke Ma, Qianqian Xu, Xiaochun Cao
Abstract Existing ordinal embedding methods usually follow a two-stage routine: outlier detection is first employed to pick out the inconsistent comparisons; then an embedding is learned from the clean data. However, learning in a multi-stage manner is well-known to suffer from sub-optimal solutions. In this paper, we propose a unified framework to jointly identify the contaminated comparisons and derive reliable embeddings. The merits of our method are three-fold: (1) By virtue of the proposed unified framework, the sub-optimality of traditional methods is largely alleviated; (2) The proposed method is aware of global inconsistency by minimizing a corresponding cost, while traditional methods only involve local inconsistency; (3) Instead of considering the nuclear norm heuristics, we adopt an exact solution for rank equality constraint. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ordinal embedding from the contaminated comparisons.
Tasks Outlier Detection
Published 2018-12-05
URL http://arxiv.org/abs/1812.01945v1
PDF http://arxiv.org/pdf/1812.01945v1.pdf
PWC https://paperswithcode.com/paper/robust-ordinal-embedding-from-contaminated
Repo https://github.com/alphaprime/ROE
Framework none
comments powered by Disqus