Paper Group AWR 23
Estimating Missing Data in Temporal Data Streams Using Multi-directional Recurrent Neural Networks. A Simple Exponential Family Framework for Zero-Shot Learning. Deep Generative Adversarial Networks for Compressed Sensing Automates MRI. Deep Learning Sparse Ternary Projections for Compressed Sensing of Images. Pixie: A System for Recommending 3+ Bi …
Estimating Missing Data in Temporal Data Streams Using Multi-directional Recurrent Neural Networks
Title | Estimating Missing Data in Temporal Data Streams Using Multi-directional Recurrent Neural Networks |
Authors | Jinsung Yoon, William R. Zame, Mihaela van der Schaar |
Abstract | Missing data is a ubiquitous problem. It is especially challenging in medical settings because many streams of measurements are collected at different - and often irregular - times. Accurate estimation of those missing measurements is critical for many reasons, including diagnosis, prognosis and treatment. Existing methods address this estimation problem by interpolating within data streams or imputing across data streams (both of which ignore important information) or ignoring the temporal aspect of the data and imposing strong assumptions about the nature of the data-generating process and/or the pattern of missing data (both of which are especially problematic for medical data). We propose a new approach, based on a novel deep learning architecture that we call a Multi-directional Recurrent Neural Network (M-RNN) that interpolates within data streams and imputes across data streams. We demonstrate the power of our approach by applying it to five real-world medical datasets. We show that it provides dramatically improved estimation of missing measurements in comparison to 11 state-of-the-art benchmarks (including Spline and Cubic Interpolations, MICE, MissForest, matrix completion and several RNN methods); typical improvements in Root Mean Square Error are between 35% - 50%. Additional experiments based on the same five datasets demonstrate that the improvements provided by our method are extremely robust. |
Tasks | Matrix Completion, Multivariate Time Series Imputation |
Published | 2017-11-23 |
URL | http://arxiv.org/abs/1711.08742v1 |
http://arxiv.org/pdf/1711.08742v1.pdf | |
PWC | https://paperswithcode.com/paper/estimating-missing-data-in-temporal-data |
Repo | https://github.com/jsyoon0823/MRNN |
Framework | tf |
A Simple Exponential Family Framework for Zero-Shot Learning
Title | A Simple Exponential Family Framework for Zero-Shot Learning |
Authors | Vinay Kumar Verma, Piyush Rai |
Abstract | We present a simple generative framework for learning to predict previously unseen classes, based on estimating class-attribute-gated class-conditional distributions. We model each class-conditional distribution as an exponential family distribution and the parameters of the distribution of each seen/unseen class are defined as functions of the respective observed class attributes. These functions can be learned using only the seen class data and can be used to predict the parameters of the class-conditional distribution of each unseen class. Unlike most existing methods for zero-shot learning that represent classes as fixed embeddings in some vector space, our generative model naturally represents each class as a probability distribution. It is simple to implement and also allows leveraging additional unlabeled data from unseen classes to improve the estimates of their class-conditional distributions using transductive/semi-supervised learning. Moreover, it extends seamlessly to few-shot learning by easily updating these distributions when provided with a small number of additional labelled examples from unseen classes. Through a comprehensive set of experiments on several benchmark data sets, we demonstrate the efficacy of our framework. |
Tasks | Few-Shot Learning, Zero-Shot Learning |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.08040v3 |
http://arxiv.org/pdf/1707.08040v3.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-exponential-family-framework-for |
Repo | https://github.com/vkverma01/Zero-Shot-Learning |
Framework | none |
Deep Generative Adversarial Networks for Compressed Sensing Automates MRI
Title | Deep Generative Adversarial Networks for Compressed Sensing Automates MRI |
Authors | Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus Alley, Neil Thakur, Song Han, William Dally, John M. Pauly, Lei Xing |
Abstract | Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear inverse task demanding time and resource intensive computations that can substantially trade off {\it accuracy} for {\it speed} in real-time imaging. In addition, state-of-the-art compressed sensing (CS) analytics are not cognizant of the image {\it diagnostic quality}. To cope with these challenges we put forth a novel CS framework that permeates benefits from generative adversarial networks (GAN) to train a (low-dimensional) manifold of diagnostic-quality MR images from historical patients. Leveraging a mixture of least-squares (LS) GANs and pixel-wise $\ell_1$ cost, a deep residual network with skip connections is trained as the generator that learns to remove the {\it aliasing} artifacts by projecting onto the manifold. LSGAN learns the texture details, while $\ell_1$ controls the high-frequency noise. A multilayer convolutional neural network is then jointly trained based on diagnostic quality images to discriminate the projection quality. The test phase performs feed-forward propagation over the generator network that demands a very low computational overhead. Extensive evaluations are performed on a large contrast-enhanced MR dataset of pediatric patients. In particular, images rated based on expert radiologists corroborate that GANCS retrieves high contrast images with detailed texture relative to conventional CS, and pixel-wise schemes. In addition, it offers reconstruction under a few milliseconds, two orders of magnitude faster than state-of-the-art CS-MRI schemes. |
Tasks | |
Published | 2017-05-31 |
URL | http://arxiv.org/abs/1706.00051v1 |
http://arxiv.org/pdf/1706.00051v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-generative-adversarial-networks-for |
Repo | https://github.com/gongenhao/GANCS |
Framework | tf |
Deep Learning Sparse Ternary Projections for Compressed Sensing of Images
Title | Deep Learning Sparse Ternary Projections for Compressed Sensing of Images |
Authors | Duc Minh Nguyen, Evaggelia Tsiligianni, Nikos Deligiannis |
Abstract | Compressed sensing (CS) is a sampling theory that allows reconstruction of sparse (or compressible) signals from an incomplete number of measurements, using of a sensing mechanism implemented by an appropriate projection matrix. The CS theory is based on random Gaussian projection matrices, which satisfy recovery guarantees with high probability; however, sparse ternary {0, -1, +1} projections are more suitable for hardware implementation. In this paper, we present a deep learning approach to obtain very sparse ternary projections for compressed sensing. Our deep learning architecture jointly learns a pair of a projection matrix and a reconstruction operator in an end-to-end fashion. The experimental results on real images demonstrate the effectiveness of the proposed approach compared to state-of-the-art methods, with significant advantage in terms of complexity. |
Tasks | |
Published | 2017-08-28 |
URL | http://arxiv.org/abs/1708.08311v1 |
http://arxiv.org/pdf/1708.08311v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-sparse-ternary-projections-for |
Repo | https://github.com/nmduc/deep-ternary |
Framework | tf |
Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time
Title | Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time |
Authors | Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, Jure Leskovec |
Abstract | User experience in modern content discovery applications critically depends on high-quality personalized recommendations. However, building systems that provide such recommendations presents a major challenge due to a massive pool of items, a large number of users, and requirements for recommendations to be responsive to user actions and generated on demand in real-time. Here we present Pixie, a scalable graph-based real-time recommender system that we developed and deployed at Pinterest. Given a set of user-specific pins as a query, Pixie selects in real-time from billions of possible pins those that are most related to the query. To generate recommendations, we develop Pixie Random Walk algorithm that utilizes the Pinterest object graph of 3 billion nodes and 17 billion edges. Experiments show that recommendations provided by Pixie lead up to 50% higher user engagement when compared to the previous Hadoop-based production system. Furthermore, we develop a graph pruning strategy at that leads to an additional 58% improvement in recommendations. Last, we discuss system aspects of Pixie, where a single server executes 1,200 recommendation requests per second with 60 millisecond latency. Today, systems backed by Pixie contribute to more than 80% of all user engagement on Pinterest. |
Tasks | Recommendation Systems |
Published | 2017-11-21 |
URL | http://arxiv.org/abs/1711.07601v1 |
http://arxiv.org/pdf/1711.07601v1.pdf | |
PWC | https://paperswithcode.com/paper/pixie-a-system-for-recommending-3-billion |
Repo | https://github.com/jd557/pixie-rust |
Framework | none |
Weighted-SVD: Matrix Factorization with Weights on the Latent Factors
Title | Weighted-SVD: Matrix Factorization with Weights on the Latent Factors |
Authors | Hung-Hsuan Chen |
Abstract | The Matrix Factorization models, sometimes called the latent factor models, are a family of methods in the recommender system research area to (1) generate the latent factors for the users and the items and (2) predict users’ ratings on items based on their latent factors. However, current Matrix Factorization models presume that all the latent factors are equally weighted, which may not always be a reasonable assumption in practice. In this paper, we propose a new model, called Weighted-SVD, to integrate the linear regression model with the SVD model such that each latent factor accompanies with a corresponding weight parameter. This mechanism allows the latent factors have different weights to influence the final ratings. The complexity of the Weighted-SVD model is slightly larger than the SVD model but much smaller than the SVD++ model. We compared the Weighted-SVD model with several latent factor models on five public datasets based on the Root-Mean-Squared-Errors (RMSEs). The results show that the Weighted-SVD model outperforms the baseline methods in all the experimental datasets under almost all settings. |
Tasks | Recommendation Systems |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00482v1 |
http://arxiv.org/pdf/1710.00482v1.pdf | |
PWC | https://paperswithcode.com/paper/weighted-svd-matrix-factorization-with |
Repo | https://github.com/demianbucik/collaborative-filtering-recommender-systems |
Framework | none |
Using Posters to Recommend Anime and Mangas in a Cold-Start Scenario
Title | Using Posters to Recommend Anime and Mangas in a Cold-Start Scenario |
Authors | Jill-Jênn Vie, Florian Yger, Ryan Lahfa, Basile Clement, Kévin Cocchi, Thomas Chalumeau, Hisashi Kashima |
Abstract | Item cold-start is a classical issue in recommender systems that affects anime and manga recommendations as well. This problem can be framed as follows: how to predict whether a user will like a manga that received few ratings from the community? Content-based techniques can alleviate this issue but require extra information, that is usually expensive to gather. In this paper, we use a deep learning technique, Illustration2Vec, to easily extract tag information from the manga and anime posters (e.g., sword, or ponytail). We propose BALSE (Blended Alternate Least Squares with Explanation), a new model for collaborative filtering, that benefits from this extra information to recommend mangas. We show, using real data from an online manga recommender system called Mangaki, that our model improves substantially the quality of recommendations, especially for less-known manga, and is able to provide an interpretation of the taste of the users. |
Tasks | Recommendation Systems |
Published | 2017-09-03 |
URL | http://arxiv.org/abs/1709.01584v2 |
http://arxiv.org/pdf/1709.01584v2.pdf | |
PWC | https://paperswithcode.com/paper/using-posters-to-recommend-anime-and-mangas |
Repo | https://github.com/mangaki/balse |
Framework | tf |
Fine-Grained Head Pose Estimation Without Keypoints
Title | Fine-Grained Head Pose Estimation Without Keypoints |
Authors | Nataniel Ruiz, Eunji Chong, James M. Rehg |
Abstract | Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to determine pose by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from image intensities through joint binned pose classification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose methods. We open-source our training and testing code as well as release our pre-trained models. |
Tasks | Face Alignment, Gaze Estimation, Head Pose Estimation, Pose Estimation |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00925v5 |
http://arxiv.org/pdf/1710.00925v5.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-head-pose-estimation-without |
Repo | https://github.com/chenyeheng/SmartCar-FaceRec |
Framework | tf |
HandSeg: An Automatically Labeled Dataset for Hand Segmentation from Depth Images
Title | HandSeg: An Automatically Labeled Dataset for Hand Segmentation from Depth Images |
Authors | Abhishake Kumar Bojja, Franziska Mueller, Sri Raghu Malireddi, Markus Oberweger, Vincent Lepetit, Christian Theobalt, Kwang Moo Yi, Andrea Tagliasacchi |
Abstract | We propose an automatic method for generating high-quality annotations for depth-based hand segmentation, and introduce a large-scale hand segmentation dataset. Existing datasets are typically limited to a single hand. By exploiting the visual cues given by an RGBD sensor and a pair of colored gloves, we automatically generate dense annotations for two hand segmentation. This lowers the cost/complexity of creating high quality datasets, and makes it easy to expand the dataset in the future. We further show that existing datasets, even with data augmentation, are not sufficient to train a hand segmentation algorithm that can distinguish two hands. Source and datasets will be made publicly available. |
Tasks | Data Augmentation, Hand Segmentation |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.05944v4 |
http://arxiv.org/pdf/1711.05944v4.pdf | |
PWC | https://paperswithcode.com/paper/handseg-an-automatically-labeled-dataset-for |
Repo | https://github.com/lukasuz/List-of-Hand-Segmentation-Data-Sets |
Framework | none |
Emergent Communication in a Multi-Modal, Multi-Step Referential Game
Title | Emergent Communication in a Multi-Modal, Multi-Step Referential Game |
Authors | Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho |
Abstract | Inspired by previous work on emergent communication in referential games, we propose a novel multi-modal, multi-step referential game, where the sender and receiver have access to distinct modalities of an object, and their information exchange is bidirectional and of arbitrary duration. The multi-modal multi-step setting allows agents to develop an internal communication significantly closer to natural language, in that they share a single set of messages, and that the length of the conversation may vary according to the difficulty of the task. We examine these properties empirically using a dataset consisting of images and textual descriptions of mammals, where the agents are tasked with identifying the correct object. Our experiments indicate that a robust and efficient communication protocol emerges, where gradual information exchange informs better predictions and higher communication bandwidth improves generalization. |
Tasks | |
Published | 2017-05-29 |
URL | http://arxiv.org/abs/1705.10369v4 |
http://arxiv.org/pdf/1705.10369v4.pdf | |
PWC | https://paperswithcode.com/paper/emergent-communication-in-a-multi-modal-multi |
Repo | https://github.com/nyu-dl/MultimodalGame |
Framework | pytorch |
Recurrent Pixel Embedding for Instance Grouping
Title | Recurrent Pixel Embedding for Instance Grouping |
Authors | Shu Kong, Charless Fowlkes |
Abstract | We introduce a differentiable, end-to-end trainable framework for solving pixel-level grouping problems such as instance segmentation consisting of two novel components. First, we regress pixels into a hyper-spherical embedding space so that pixels from the same group have high cosine similarity while those from different groups have similarity below a specified margin. We analyze the choice of embedding dimension and margin, relating them to theoretical results on the problem of distributing points uniformly on the sphere. Second, to group instances, we utilize a variant of mean-shift clustering, implemented as a recurrent neural network parameterized by kernel bandwidth. This recurrent grouping module is differentiable, enjoys convergent dynamics and probabilistic interpretability. Backpropagating the group-weighted loss through this module allows learning to focus on only correcting embedding errors that won’t be resolved during subsequent clustering. Our framework, while conceptually simple and theoretically abundant, is also practically effective and computationally efficient. We demonstrate substantial improvements over state-of-the-art instance segmentation for object proposal generation, as well as demonstrating the benefits of grouping loss for classification tasks such as boundary detection and semantic segmentation. |
Tasks | Boundary Detection, Instance Segmentation, Object Proposal Generation, Semantic Segmentation |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08273v1 |
http://arxiv.org/pdf/1712.08273v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-pixel-embedding-for-instance |
Repo | https://github.com/aimerykong/predictive-filter-flow |
Framework | pytorch |
Deformable Convolutional Networks
Title | Deformable Convolutional Networks |
Authors | Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei |
Abstract | Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules. In this work, we introduce two new modules to enhance the transformation modeling capacity of CNNs, namely, deformable convolution and deformable RoI pooling. Both are based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from target tasks, without additional supervision. The new modules can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard back-propagation, giving rise to deformable convolutional networks. Extensive experiments validate the effectiveness of our approach on sophisticated vision tasks of object detection and semantic segmentation. The code would be released. |
Tasks | Object Detection, Semantic Segmentation |
Published | 2017-03-17 |
URL | http://arxiv.org/abs/1703.06211v3 |
http://arxiv.org/pdf/1703.06211v3.pdf | |
PWC | https://paperswithcode.com/paper/deformable-convolutional-networks |
Repo | https://github.com/NVIDIAAICITYCHALLENGE/AICity_Team6_ISU |
Framework | tf |
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Title | MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment |
Authors | Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang |
Abstract | Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/ . |
Tasks | Music Generation |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06298v2 |
http://arxiv.org/pdf/1709.06298v2.pdf | |
PWC | https://paperswithcode.com/paper/musegan-multi-track-sequential-generative |
Repo | https://github.com/salu133445/musegan |
Framework | tf |
MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation
Title | MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation |
Authors | Li-Chia Yang, Szu-Yu Chou, Yi-Hsuan Yang |
Abstract | Most existing neural network models for music generation use recurrent neural networks. However, the recent WaveNet model proposed by DeepMind shows that convolutional neural networks (CNNs) can also generate realistic musical waveforms in the audio domain. Following this light, we investigate using CNNs for generating melody (a series of MIDI notes) one bar after another in the symbolic domain. In addition to the generator, we use a discriminator to learn the distributions of melodies, making it a generative adversarial network (GAN). Moreover, we propose a novel conditional mechanism to exploit available prior knowledge, so that the model can generate melodies either from scratch, by following a chord sequence, or by conditioning on the melody of previous bars (e.g. a priming melody), among other possibilities. The resulting model, named MidiNet, can be expanded to generate music with multiple MIDI channels (i.e. tracks). We conduct a user study to compare the melody of eight-bar long generated by MidiNet and by Google’s MelodyRNN models, each time using the same priming melody. Result shows that MidiNet performs comparably with MelodyRNN models in being realistic and pleasant to listen to, yet MidiNet’s melodies are reported to be much more interesting. |
Tasks | Music Generation |
Published | 2017-03-31 |
URL | http://arxiv.org/abs/1703.10847v2 |
http://arxiv.org/pdf/1703.10847v2.pdf | |
PWC | https://paperswithcode.com/paper/midinet-a-convolutional-generative |
Repo | https://github.com/annahung31/MIdiNet-by-pytorch |
Framework | pytorch |
Deep Residual Learning for Weakly-Supervised Relation Extraction
Title | Deep Residual Learning for Weakly-Supervised Relation Extraction |
Authors | Yi Yao Huang, William Yang Wang |
Abstract | Deep residual learning (ResNet) is a new method for training very deep neural networks using identity map-ping for shortcut connections. ResNet has won the ImageNet ILSVRC 2015 classification task, and achieved state-of-the-art performances in many computer vision tasks. However, the effect of residual learning on noisy natural language processing tasks is still not well understood. In this paper, we design a novel convolutional neural network (CNN) with residual learning, and investigate its impacts on the task of distantly supervised noisy relation extraction. In contradictory to popular beliefs that ResNet only works well for very deep networks, we found that even with 9 layers of CNNs, using identity mapping could significantly improve the performance for distantly-supervised relation extraction. |
Tasks | Relation Extraction |
Published | 2017-07-27 |
URL | http://arxiv.org/abs/1707.08866v1 |
http://arxiv.org/pdf/1707.08866v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-residual-learning-for-weakly-supervised |
Repo | https://github.com/liuzhencheng/zcliu_code |
Framework | tf |