July 30, 2019

3256 words 16 mins read

Paper Group AWR 23

Estimating Missing Data in Temporal Data Streams Using Multi-directional Recurrent Neural Networks. A Simple Exponential Family Framework for Zero-Shot Learning. Deep Generative Adversarial Networks for Compressed Sensing Automates MRI. Deep Learning Sparse Ternary Projections for Compressed Sensing of Images. Pixie: A System for Recommending 3+ Bi …

Estimating Missing Data in Temporal Data Streams Using Multi-directional Recurrent Neural Networks


Title	Estimating Missing Data in Temporal Data Streams Using Multi-directional Recurrent Neural Networks
Authors	Jinsung Yoon, William R. Zame, Mihaela van der Schaar
Abstract	Missing data is a ubiquitous problem. It is especially challenging in medical settings because many streams of measurements are collected at different - and often irregular - times. Accurate estimation of those missing measurements is critical for many reasons, including diagnosis, prognosis and treatment. Existing methods address this estimation problem by interpolating within data streams or imputing across data streams (both of which ignore important information) or ignoring the temporal aspect of the data and imposing strong assumptions about the nature of the data-generating process and/or the pattern of missing data (both of which are especially problematic for medical data). We propose a new approach, based on a novel deep learning architecture that we call a Multi-directional Recurrent Neural Network (M-RNN) that interpolates within data streams and imputes across data streams. We demonstrate the power of our approach by applying it to five real-world medical datasets. We show that it provides dramatically improved estimation of missing measurements in comparison to 11 state-of-the-art benchmarks (including Spline and Cubic Interpolations, MICE, MissForest, matrix completion and several RNN methods); typical improvements in Root Mean Square Error are between 35% - 50%. Additional experiments based on the same five datasets demonstrate that the improvements provided by our method are extremely robust.
Tasks	Matrix Completion, Multivariate Time Series Imputation
Published	2017-11-23
URL	http://arxiv.org/abs/1711.08742v1
PDF	http://arxiv.org/pdf/1711.08742v1.pdf
PWC	https://paperswithcode.com/paper/estimating-missing-data-in-temporal-data
Repo	https://github.com/jsyoon0823/MRNN
Framework	tf

A Simple Exponential Family Framework for Zero-Shot Learning


Title	A Simple Exponential Family Framework for Zero-Shot Learning
Authors	Vinay Kumar Verma, Piyush Rai
Abstract	We present a simple generative framework for learning to predict previously unseen classes, based on estimating class-attribute-gated class-conditional distributions. We model each class-conditional distribution as an exponential family distribution and the parameters of the distribution of each seen/unseen class are defined as functions of the respective observed class attributes. These functions can be learned using only the seen class data and can be used to predict the parameters of the class-conditional distribution of each unseen class. Unlike most existing methods for zero-shot learning that represent classes as fixed embeddings in some vector space, our generative model naturally represents each class as a probability distribution. It is simple to implement and also allows leveraging additional unlabeled data from unseen classes to improve the estimates of their class-conditional distributions using transductive/semi-supervised learning. Moreover, it extends seamlessly to few-shot learning by easily updating these distributions when provided with a small number of additional labelled examples from unseen classes. Through a comprehensive set of experiments on several benchmark data sets, we demonstrate the efficacy of our framework.
Tasks	Few-Shot Learning, Zero-Shot Learning
Published	2017-07-25
URL	http://arxiv.org/abs/1707.08040v3
PDF	http://arxiv.org/pdf/1707.08040v3.pdf
PWC	https://paperswithcode.com/paper/a-simple-exponential-family-framework-for
Repo	https://github.com/vkverma01/Zero-Shot-Learning
Framework	none

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI


Title	Deep Generative Adversarial Networks for Compressed Sensing Automates MRI
Authors	Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus Alley, Neil Thakur, Song Han, William Dally, John M. Pauly, Lei Xing
Abstract	Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear inverse task demanding time and resource intensive computations that can substantially trade off {\it accuracy} for {\it speed} in real-time imaging. In addition, state-of-the-art compressed sensing (CS) analytics are not cognizant of the image {\it diagnostic quality}. To cope with these challenges we put forth a novel CS framework that permeates benefits from generative adversarial networks (GAN) to train a (low-dimensional) manifold of diagnostic-quality MR images from historical patients. Leveraging a mixture of least-squares (LS) GANs and pixel-wise $\ell_1$ cost, a deep residual network with skip connections is trained as the generator that learns to remove the {\it aliasing} artifacts by projecting onto the manifold. LSGAN learns the texture details, while $\ell_1$ controls the high-frequency noise. A multilayer convolutional neural network is then jointly trained based on diagnostic quality images to discriminate the projection quality. The test phase performs feed-forward propagation over the generator network that demands a very low computational overhead. Extensive evaluations are performed on a large contrast-enhanced MR dataset of pediatric patients. In particular, images rated based on expert radiologists corroborate that GANCS retrieves high contrast images with detailed texture relative to conventional CS, and pixel-wise schemes. In addition, it offers reconstruction under a few milliseconds, two orders of magnitude faster than state-of-the-art CS-MRI schemes.
Tasks
Published	2017-05-31
URL	http://arxiv.org/abs/1706.00051v1
PDF	http://arxiv.org/pdf/1706.00051v1.pdf
PWC	https://paperswithcode.com/paper/deep-generative-adversarial-networks-for
Repo	https://github.com/gongenhao/GANCS
Framework	tf

Deep Learning Sparse Ternary Projections for Compressed Sensing of Images


Title	Deep Learning Sparse Ternary Projections for Compressed Sensing of Images
Authors	Duc Minh Nguyen, Evaggelia Tsiligianni, Nikos Deligiannis
Abstract	Compressed sensing (CS) is a sampling theory that allows reconstruction of sparse (or compressible) signals from an incomplete number of measurements, using of a sensing mechanism implemented by an appropriate projection matrix. The CS theory is based on random Gaussian projection matrices, which satisfy recovery guarantees with high probability; however, sparse ternary {0, -1, +1} projections are more suitable for hardware implementation. In this paper, we present a deep learning approach to obtain very sparse ternary projections for compressed sensing. Our deep learning architecture jointly learns a pair of a projection matrix and a reconstruction operator in an end-to-end fashion. The experimental results on real images demonstrate the effectiveness of the proposed approach compared to state-of-the-art methods, with significant advantage in terms of complexity.
Tasks
Published	2017-08-28
URL	http://arxiv.org/abs/1708.08311v1
PDF	http://arxiv.org/pdf/1708.08311v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-sparse-ternary-projections-for
Repo	https://github.com/nmduc/deep-ternary
Framework	tf

Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time


Title	Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time
Authors	Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, Jure Leskovec
Abstract	User experience in modern content discovery applications critically depends on high-quality personalized recommendations. However, building systems that provide such recommendations presents a major challenge due to a massive pool of items, a large number of users, and requirements for recommendations to be responsive to user actions and generated on demand in real-time. Here we present Pixie, a scalable graph-based real-time recommender system that we developed and deployed at Pinterest. Given a set of user-specific pins as a query, Pixie selects in real-time from billions of possible pins those that are most related to the query. To generate recommendations, we develop Pixie Random Walk algorithm that utilizes the Pinterest object graph of 3 billion nodes and 17 billion edges. Experiments show that recommendations provided by Pixie lead up to 50% higher user engagement when compared to the previous Hadoop-based production system. Furthermore, we develop a graph pruning strategy at that leads to an additional 58% improvement in recommendations. Last, we discuss system aspects of Pixie, where a single server executes 1,200 recommendation requests per second with 60 millisecond latency. Today, systems backed by Pixie contribute to more than 80% of all user engagement on Pinterest.
Tasks	Recommendation Systems
Published	2017-11-21
URL	http://arxiv.org/abs/1711.07601v1
PDF	http://arxiv.org/pdf/1711.07601v1.pdf
PWC	https://paperswithcode.com/paper/pixie-a-system-for-recommending-3-billion
Repo	https://github.com/jd557/pixie-rust
Framework	none

Weighted-SVD: Matrix Factorization with Weights on the Latent Factors


Title	Weighted-SVD: Matrix Factorization with Weights on the Latent Factors
Authors	Hung-Hsuan Chen
Abstract	The Matrix Factorization models, sometimes called the latent factor models, are a family of methods in the recommender system research area to (1) generate the latent factors for the users and the items and (2) predict users’ ratings on items based on their latent factors. However, current Matrix Factorization models presume that all the latent factors are equally weighted, which may not always be a reasonable assumption in practice. In this paper, we propose a new model, called Weighted-SVD, to integrate the linear regression model with the SVD model such that each latent factor accompanies with a corresponding weight parameter. This mechanism allows the latent factors have different weights to influence the final ratings. The complexity of the Weighted-SVD model is slightly larger than the SVD model but much smaller than the SVD++ model. We compared the Weighted-SVD model with several latent factor models on five public datasets based on the Root-Mean-Squared-Errors (RMSEs). The results show that the Weighted-SVD model outperforms the baseline methods in all the experimental datasets under almost all settings.
Tasks	Recommendation Systems
Published	2017-10-02
URL	http://arxiv.org/abs/1710.00482v1
PDF	http://arxiv.org/pdf/1710.00482v1.pdf
PWC	https://paperswithcode.com/paper/weighted-svd-matrix-factorization-with
Repo	https://github.com/demianbucik/collaborative-filtering-recommender-systems
Framework	none


Title	Using Posters to Recommend Anime and Mangas in a Cold-Start Scenario
Authors	Jill-Jênn Vie, Florian Yger, Ryan Lahfa, Basile Clement, Kévin Cocchi, Thomas Chalumeau, Hisashi Kashima
Abstract	Item cold-start is a classical issue in recommender systems that affects anime and manga recommendations as well. This problem can be framed as follows: how to predict whether a user will like a manga that received few ratings from the community? Content-based techniques can alleviate this issue but require extra information, that is usually expensive to gather. In this paper, we use a deep learning technique, Illustration2Vec, to easily extract tag information from the manga and anime posters (e.g., sword, or ponytail). We propose BALSE (Blended Alternate Least Squares with Explanation), a new model for collaborative filtering, that benefits from this extra information to recommend mangas. We show, using real data from an online manga recommender system called Mangaki, that our model improves substantially the quality of recommendations, especially for less-known manga, and is able to provide an interpretation of the taste of the users.
Tasks	Recommendation Systems
Published	2017-09-03
URL	http://arxiv.org/abs/1709.01584v2
PDF	http://arxiv.org/pdf/1709.01584v2.pdf
PWC	https://paperswithcode.com/paper/using-posters-to-recommend-anime-and-mangas
Repo	https://github.com/mangaki/balse
Framework	tf

Fine-Grained Head Pose Estimation Without Keypoints


Title	Fine-Grained Head Pose Estimation Without Keypoints
Authors	Nataniel Ruiz, Eunji Chong, James M. Rehg
Abstract	Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to determine pose by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from image intensities through joint binned pose classification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose methods. We open-source our training and testing code as well as release our pre-trained models.
Tasks	Face Alignment, Gaze Estimation, Head Pose Estimation, Pose Estimation
Published	2017-10-02
URL	http://arxiv.org/abs/1710.00925v5
PDF	http://arxiv.org/pdf/1710.00925v5.pdf
PWC	https://paperswithcode.com/paper/fine-grained-head-pose-estimation-without
Repo	https://github.com/chenyeheng/SmartCar-FaceRec
Framework	tf

HandSeg: An Automatically Labeled Dataset for Hand Segmentation from Depth Images


Title	HandSeg: An Automatically Labeled Dataset for Hand Segmentation from Depth Images
Authors	Abhishake Kumar Bojja, Franziska Mueller, Sri Raghu Malireddi, Markus Oberweger, Vincent Lepetit, Christian Theobalt, Kwang Moo Yi, Andrea Tagliasacchi
Abstract	We propose an automatic method for generating high-quality annotations for depth-based hand segmentation, and introduce a large-scale hand segmentation dataset. Existing datasets are typically limited to a single hand. By exploiting the visual cues given by an RGBD sensor and a pair of colored gloves, we automatically generate dense annotations for two hand segmentation. This lowers the cost/complexity of creating high quality datasets, and makes it easy to expand the dataset in the future. We further show that existing datasets, even with data augmentation, are not sufficient to train a hand segmentation algorithm that can distinguish two hands. Source and datasets will be made publicly available.
Tasks	Data Augmentation, Hand Segmentation
Published	2017-11-16
URL	http://arxiv.org/abs/1711.05944v4
PDF	http://arxiv.org/pdf/1711.05944v4.pdf
PWC	https://paperswithcode.com/paper/handseg-an-automatically-labeled-dataset-for
Repo	https://github.com/lukasuz/List-of-Hand-Segmentation-Data-Sets
Framework	none


Title	Emergent Communication in a Multi-Modal, Multi-Step Referential Game
Authors	Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho
Abstract	Inspired by previous work on emergent communication in referential games, we propose a novel multi-modal, multi-step referential game, where the sender and receiver have access to distinct modalities of an object, and their information exchange is bidirectional and of arbitrary duration. The multi-modal multi-step setting allows agents to develop an internal communication significantly closer to natural language, in that they share a single set of messages, and that the length of the conversation may vary according to the difficulty of the task. We examine these properties empirically using a dataset consisting of images and textual descriptions of mammals, where the agents are tasked with identifying the correct object. Our experiments indicate that a robust and efficient communication protocol emerges, where gradual information exchange informs better predictions and higher communication bandwidth improves generalization.
Tasks
Published	2017-05-29
URL	http://arxiv.org/abs/1705.10369v4
PDF	http://arxiv.org/pdf/1705.10369v4.pdf
PWC	https://paperswithcode.com/paper/emergent-communication-in-a-multi-modal-multi
Repo	https://github.com/nyu-dl/MultimodalGame
Framework	pytorch

Recurrent Pixel Embedding for Instance Grouping


Title	Recurrent Pixel Embedding for Instance Grouping
Authors	Shu Kong, Charless Fowlkes
Abstract	We introduce a differentiable, end-to-end trainable framework for solving pixel-level grouping problems such as instance segmentation consisting of two novel components. First, we regress pixels into a hyper-spherical embedding space so that pixels from the same group have high cosine similarity while those from different groups have similarity below a specified margin. We analyze the choice of embedding dimension and margin, relating them to theoretical results on the problem of distributing points uniformly on the sphere. Second, to group instances, we utilize a variant of mean-shift clustering, implemented as a recurrent neural network parameterized by kernel bandwidth. This recurrent grouping module is differentiable, enjoys convergent dynamics and probabilistic interpretability. Backpropagating the group-weighted loss through this module allows learning to focus on only correcting embedding errors that won’t be resolved during subsequent clustering. Our framework, while conceptually simple and theoretically abundant, is also practically effective and computationally efficient. We demonstrate substantial improvements over state-of-the-art instance segmentation for object proposal generation, as well as demonstrating the benefits of grouping loss for classification tasks such as boundary detection and semantic segmentation.
Tasks	Boundary Detection, Instance Segmentation, Object Proposal Generation, Semantic Segmentation
Published	2017-12-22
URL	http://arxiv.org/abs/1712.08273v1
PDF	http://arxiv.org/pdf/1712.08273v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-pixel-embedding-for-instance
Repo	https://github.com/aimerykong/predictive-filter-flow
Framework	pytorch

Deformable Convolutional Networks


Title	Deformable Convolutional Networks
Authors	Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei
Abstract	Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules. In this work, we introduce two new modules to enhance the transformation modeling capacity of CNNs, namely, deformable convolution and deformable RoI pooling. Both are based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from target tasks, without additional supervision. The new modules can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard back-propagation, giving rise to deformable convolutional networks. Extensive experiments validate the effectiveness of our approach on sophisticated vision tasks of object detection and semantic segmentation. The code would be released.
Tasks	Object Detection, Semantic Segmentation
Published	2017-03-17
URL	http://arxiv.org/abs/1703.06211v3
PDF	http://arxiv.org/pdf/1703.06211v3.pdf
PWC	https://paperswithcode.com/paper/deformable-convolutional-networks
Repo	https://github.com/NVIDIAAICITYCHALLENGE/AICity_Team6_ISU
Framework	tf

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment


Title	MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Authors	Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang
Abstract	Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/ .
Tasks	Music Generation
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06298v2
PDF	http://arxiv.org/pdf/1709.06298v2.pdf
PWC	https://paperswithcode.com/paper/musegan-multi-track-sequential-generative
Repo	https://github.com/salu133445/musegan
Framework	tf

MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation


Title	MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation
Authors	Li-Chia Yang, Szu-Yu Chou, Yi-Hsuan Yang
Abstract	Most existing neural network models for music generation use recurrent neural networks. However, the recent WaveNet model proposed by DeepMind shows that convolutional neural networks (CNNs) can also generate realistic musical waveforms in the audio domain. Following this light, we investigate using CNNs for generating melody (a series of MIDI notes) one bar after another in the symbolic domain. In addition to the generator, we use a discriminator to learn the distributions of melodies, making it a generative adversarial network (GAN). Moreover, we propose a novel conditional mechanism to exploit available prior knowledge, so that the model can generate melodies either from scratch, by following a chord sequence, or by conditioning on the melody of previous bars (e.g. a priming melody), among other possibilities. The resulting model, named MidiNet, can be expanded to generate music with multiple MIDI channels (i.e. tracks). We conduct a user study to compare the melody of eight-bar long generated by MidiNet and by Google’s MelodyRNN models, each time using the same priming melody. Result shows that MidiNet performs comparably with MelodyRNN models in being realistic and pleasant to listen to, yet MidiNet’s melodies are reported to be much more interesting.
Tasks	Music Generation
Published	2017-03-31
URL	http://arxiv.org/abs/1703.10847v2
PDF	http://arxiv.org/pdf/1703.10847v2.pdf
PWC	https://paperswithcode.com/paper/midinet-a-convolutional-generative
Repo	https://github.com/annahung31/MIdiNet-by-pytorch
Framework	pytorch

Deep Residual Learning for Weakly-Supervised Relation Extraction


Title	Deep Residual Learning for Weakly-Supervised Relation Extraction
Authors	Yi Yao Huang, William Yang Wang
Abstract	Deep residual learning (ResNet) is a new method for training very deep neural networks using identity map-ping for shortcut connections. ResNet has won the ImageNet ILSVRC 2015 classification task, and achieved state-of-the-art performances in many computer vision tasks. However, the effect of residual learning on noisy natural language processing tasks is still not well understood. In this paper, we design a novel convolutional neural network (CNN) with residual learning, and investigate its impacts on the task of distantly supervised noisy relation extraction. In contradictory to popular beliefs that ResNet only works well for very deep networks, we found that even with 9 layers of CNNs, using identity mapping could significantly improve the performance for distantly-supervised relation extraction.
Tasks	Relation Extraction
Published	2017-07-27
URL	http://arxiv.org/abs/1707.08866v1
PDF	http://arxiv.org/pdf/1707.08866v1.pdf
PWC	https://paperswithcode.com/paper/deep-residual-learning-for-weakly-supervised
Repo	https://github.com/liuzhencheng/zcliu_code
Framework	tf