October 20, 2019

3397 words 16 mins read

Paper Group AWR 291

Paper Group AWR 291

DeepTAM: Deep Tracking and Mapping. T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks. BRITS: Bidirectional Recurrent Imputation for Time Series. Pyramidal Recurrent Unit for Language Modeling. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. Convolutional Neural Networ …

DeepTAM: Deep Tracking and Mapping

Title DeepTAM: Deep Tracking and Mapping
Authors Huizhong Zhou, Benjamin Ummenhofer, Thomas Brox
Abstract We present a system for keyframe-based dense camera tracking and depth map estimation that is entirely learned. For tracking, we estimate small pose increments between the current camera image and a synthetic viewpoint. This significantly simplifies the learning problem and alleviates the dataset bias for camera motions. Further, we show that generating a large number of pose hypotheses leads to more accurate predictions. For mapping, we accumulate information in a cost volume centered at the current depth estimate. The mapping network then combines the cost volume and the keyframe image to update the depth prediction, thereby effectively making use of depth measurements and image-based priors. Our approach yields state-of-the-art results with few images and is robust with respect to noisy camera poses. We demonstrate that the performance of our 6 DOF tracking competes with RGB-D tracking algorithms. We compare favorably against strong classic and deep learning powered dense depth algorithms.
Tasks Depth Estimation
Published 2018-08-06
URL http://arxiv.org/abs/1808.01900v2
PDF http://arxiv.org/pdf/1808.01900v2.pdf
PWC https://paperswithcode.com/paper/deeptam-deep-tracking-and-mapping
Repo https://github.com/lmb-freiburg/deeptam
Framework tf

T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks

Title T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks
Authors Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
Abstract Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire. We propose a framework, trained on synthetic image-depth pairs and unpaired real images, that comprises an image translation network for enhancing realism of input images, followed by a depth prediction network. A key idea is having the first network act as a wide-spectrum input translator, taking in either synthetic or real images, and ideally producing minimally modified realistic images. This is done via a reconstruction loss when the training input is real, and GAN loss when synthetic, removing the need for heuristic self-regularization. The second network is trained on a task loss for synthetic image-depth pairs, with extra GAN loss to unify real and synthetic feature distributions. Importantly, the framework can be trained end-to-end, leading to good results, even surpassing early deep-learning methods that use real paired data.
Tasks Depth Estimation
Published 2018-08-04
URL http://arxiv.org/abs/1808.01454v1
PDF http://arxiv.org/pdf/1808.01454v1.pdf
PWC https://paperswithcode.com/paper/t2net-synthetic-to-realistic-translation-for
Repo https://github.com/lyndonzheng/Synthetic2Realistic
Framework pytorch

BRITS: Bidirectional Recurrent Imputation for Time Series

Title BRITS: Bidirectional Recurrent Imputation for Time Series
Authors Wei Cao, Dong Wang, Jian Li, Hao Zhou, Lei Li, Yitan Li
Abstract Time series are widely used as signals in many classification/regression tasks. It is ubiquitous that time series contains many missing values. Given multiple correlated time series data, how to fill in missing values and to predict their class labels? Existing imputation methods often impose strong assumptions of the underlying data generating process, such as linear dynamics in the state space. In this paper, we propose BRITS, a novel method based on recurrent neural networks for missing value imputation in time series data. Our proposed method directly learns the missing values in a bidirectional recurrent dynamical system, without any specific assumption. The imputed values are treated as variables of RNN graph and can be effectively updated during the backpropagation.BRITS has three advantages: (a) it can handle multiple correlated missing values in time series; (b) it generalizes to time series with nonlinear dynamics underlying; (c) it provides a data-driven imputation procedure and applies to general settings with missing data.We evaluate our model on three real-world datasets, including an air quality dataset, a health-care data, and a localization data for human activity. Experiments show that our model outperforms the state-of-the-art methods in both imputation and classification/regression accuracies.
Tasks Imputation, Multivariate Time Series Forecasting, Multivariate Time Series Imputation, Time Series
Published 2018-05-27
URL http://arxiv.org/abs/1805.10572v1
PDF http://arxiv.org/pdf/1805.10572v1.pdf
PWC https://paperswithcode.com/paper/brits-bidirectional-recurrent-imputation-for
Repo https://github.com/johnypark/TimeSeries
Framework none

Pyramidal Recurrent Unit for Language Modeling

Title Pyramidal Recurrent Unit for Language Modeling
Authors Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, Hannaneh Hajishirzi
Abstract LSTMs are powerful tools for modeling contextual information, as evidenced by their success at the task of language modeling. However, modeling contexts in very high dimensional space can lead to poor generalizability. We introduce the Pyramidal Recurrent Unit (PRU), which enables learning representations in high dimensional space with more generalization power and fewer parameters. PRUs replace the linear transformation in LSTMs with more sophisticated interactions including pyramidal and grouped linear transformations. This architecture gives strong results on word-level language modeling while reducing the number of parameters significantly. In particular, PRU improves the perplexity of a recent state-of-the-art language model Merity et al. (2018) by up to 1.3 points while learning 15-20% fewer parameters. For similar number of model parameters, PRU outperforms all previous RNN models that exploit different gating mechanisms and transformations. We provide a detailed examination of the PRU and its behavior on the language modeling tasks. Our code is open-source and available at https://sacmehta.github.io/PRU/
Tasks Language Modelling
Published 2018-08-27
URL http://arxiv.org/abs/1808.09029v1
PDF http://arxiv.org/pdf/1808.09029v1.pdf
PWC https://paperswithcode.com/paper/pyramidal-recurrent-unit-for-language
Repo https://github.com/sacmehta/PRU
Framework pytorch

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

Title QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
Authors Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le
Abstract Current end-to-end machine reading and question answering (Q&A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these models are often slow for both training and inference due to the sequential nature of RNNs. We propose a new Q&A architecture called QANet, which does not require recurrent networks: Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions and self-attention models global interactions. On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models. The speed-up gain allows us to train the model with much more data. We hence combine our model with data generated by backtranslation from a neural machine translation model. On the SQuAD dataset, our single model, trained with augmented data, achieves 84.6 F1 score on the test set, which is significantly better than the best published F1 score of 81.8.
Tasks Machine Translation, Question Answering, Reading Comprehension
Published 2018-04-23
URL http://arxiv.org/abs/1804.09541v1
PDF http://arxiv.org/pdf/1804.09541v1.pdf
PWC https://paperswithcode.com/paper/qanet-combining-local-convolution-with-global
Repo https://github.com/TSLNIHAOGIT/QANet_keras_debug
Framework tf

Convolutional Neural Networks for Toxic Comment Classification

Title Convolutional Neural Networks for Toxic Comment Classification
Authors Spiros V. Georgakopoulos, Sotiris K. Tasoulis, Aristidis G. Vrahatis, Vassilis P. Plagianakos
Abstract Flood of information is produced in a daily basis through the global Internet usage arising from the on-line interactive communications among users. While this situation contributes significantly to the quality of human life, unfortunately it involves enormous dangers, since on-line texts with high toxicity can cause personal attacks, on-line harassment and bullying behaviors. This has triggered both industrial and research community in the last few years while there are several tries to identify an efficient model for on-line toxic comment prediction. However, these steps are still in their infancy and new approaches and frameworks are required. On parallel, the data explosion that appears constantly, makes the construction of new machine learning computational tools for managing this information, an imperative need. Thankfully advances in hardware, cloud computing and big data management allow the development of Deep Learning approaches appearing very promising performance so far. For text classification in particular the use of Convolutional Neural Networks (CNN) have recently been proposed approaching text analytics in a modern manner emphasizing in the structure of words in a document. In this work, we employ this approach to discover toxic comments in a large pool of documents provided by a current Kaggle’s competition regarding Wikipedia’s talk page edits. To justify this decision we choose to compare CNNs against the traditional bag-of-words approach for text analysis combined with a selection of algorithms proven to be very effective in text classification. The reported results provide enough evidence that CNN enhance toxic comment classification reinforcing research interest towards this direction.
Tasks Text Classification
Published 2018-02-27
URL http://arxiv.org/abs/1802.09957v1
PDF http://arxiv.org/pdf/1802.09957v1.pdf
PWC https://paperswithcode.com/paper/convolutional-neural-networks-for-toxic
Repo https://github.com/chrisdangerhaddad/toxic_comments
Framework tf

CentralNet: a Multilayer Approach for Multimodal Fusion

Title CentralNet: a Multilayer Approach for Multimodal Fusion
Authors Valentin Vielzeuf, Alexis Lechervy, Stéphane Pateux, Frédéric Jurie
Abstract This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from both visions. More specifically, assuming each modality can be processed by a separated deep convolutional network, allowing to take decisions independently from each modality, we introduce a central network linking the modality specific networks. This central network not only provides a common feature embedding but also regularizes the modality specific networks through the use of multi-task learning. The proposed approach is validated on 4 different computer vision tasks on which it consistently improves the accuracy of existing multimodal fusion approaches.
Tasks Multi-Task Learning
Published 2018-08-22
URL http://arxiv.org/abs/1808.07275v1
PDF http://arxiv.org/pdf/1808.07275v1.pdf
PWC https://paperswithcode.com/paper/centralnet-a-multilayer-approach-for
Repo https://github.com/jxnding/dsc531_bayes
Framework none

Joint Classification and Prediction CNN Framework for Automatic Sleep Stage Classification

Title Joint Classification and Prediction CNN Framework for Automatic Sleep Stage Classification
Authors Huy Phan, Fernando Andreotti, Navin Cooray, Oliver Y. Chén, Maarten De Vos
Abstract Correctly identifying sleep stages is important in diagnosing and treating sleep disorders. This work proposes a joint classification-and-prediction framework based on CNNs for automatic sleep staging, and, subsequently, introduces a simple yet efficient CNN architecture to power the framework. Given a single input epoch, the novel framework jointly determines its label (classification) and its neighboring epochs’ labels (prediction) in the contextual output. While the proposed framework is orthogonal to the widely adopted classification schemes, which take one or multiple epochs as contextual inputs and produce a single classification decision on the target epoch, we demonstrate its advantages in several ways. First, it leverages the dependency among consecutive sleep epochs while surpassing the problems experienced with the common classification schemes. Second, even with a single model, the framework has the capacity to produce multiple decisions, which are essential in obtaining a good performance as in ensemble-of-models methods, with very little induced computational overhead. Probabilistic aggregation techniques are then proposed to leverage the availability of multiple decisions. We conducted experiments on two public datasets: Sleep-EDF Expanded with 20 subjects, and Montreal Archive of Sleep Studies dataset with 200 subjects. The proposed framework yields an overall classification accuracy of 82.3% and 83.6%, respectively. We also show that the proposed framework not only is superior to the baselines based on the common classification schemes but also outperforms existing deep-learning approaches. To our knowledge, this is the first work going beyond the standard single-output classification to consider multitask neural networks for automatic sleep staging. This framework provides avenues for further studies of different neural-network architectures for automatic sleep staging.
Tasks Automatic Sleep Stage Classification, Sleep Stage Detection
Published 2018-05-16
URL http://arxiv.org/abs/1805.06546v3
PDF http://arxiv.org/pdf/1805.06546v3.pdf
PWC https://paperswithcode.com/paper/joint-classification-and-prediction-cnn
Repo https://github.com/pquochuy/MultitaskSleepNet
Framework tf

Phase transition in the recoverability of network history

Title Phase transition in the recoverability of network history
Authors Jean-Gabriel Young, Guillaume St-Onge, Edward Laurence, Charles Murphy, Laurent Hébert-Dufresne, Patrick Desrosiers
Abstract Network growth processes can be understood as generative models of the structure and history of complex networks. This point of view naturally leads to the problem of network archaeology: reconstructing all the past states of a network from its structure—a difficult permutation inference problem. In this paper, we introduce a Bayesian formulation of network archaeology, with a generalization of preferential attachment as our generative mechanism. We develop a sequential Monte Carlo algorithm to evaluate the posterior averages of this model, as well as an efficient heuristic that uncovers a history well correlated with the true one, in polynomial time. We use these methods to identify and characterize a phase transition in the quality of the reconstructed history, when they are applied to artificial networks generated by the model itself. Despite the existence of a no-recovery phase, we find that nontrivial inference is possible in a large portion of the parameter space as well as on empirical data.
Tasks
Published 2018-03-25
URL https://arxiv.org/abs/1803.09191v3
PDF https://arxiv.org/pdf/1803.09191v3.pdf
PWC https://paperswithcode.com/paper/network-archaeology-phase-transition-in-the
Repo https://github.com/jg-you/network-archaeology
Framework none

Adaptive Image Sampling using Deep Learning and its Application on X-Ray Fluorescence Image Reconstruction

Title Adaptive Image Sampling using Deep Learning and its Application on X-Ray Fluorescence Image Reconstruction
Authors Qiqin Dai, Henry Chopp, Emeline Pouyet, Oliver Cossairt, Marc Walton, Aggelos K. Katsaggelos
Abstract This paper presents an adaptive image sampling algorithm based on Deep Learning (DL). The adaptive sampling mask generation network is jointly trained with an image inpainting network. The sampling rate is controlled in the mask generation network, and a binarization strategy is investigated to make the sampling mask binary. Besides the image sampling and reconstruction application, we show that the proposed adaptive sampling algorithm is able to speed up raster scan processes such as the X-Ray fluorescence (XRF) image scanning process. Recently XRF laboratory-based systems have evolved to lightweight and portable instruments thanks to technological advancements in both X-Ray generation and detection. However, the scanning time of an XRF image is usually long due to the long exposures requires (e.g., $100 \mu s-1ms$ per point). We propose an XRF image inpainting approach to address the issue of long scanning time, thus speeding up the scanning process while still maintaining the possibility to reconstruct a high quality XRF image. The proposed adaptive image sampling algorithm is applied to the RGB image of the scanning target to generate the sampling mask. The XRF scanner is then driven according to the sampling mask to scan a subset of the total image pixels. Finally, we inpaint the scanned XRF image by fusing the RGB image to reconstruct the full scan XRF image. The experiments show that the proposed adaptive sampling algorithm is able to effectively sample the image and achieve a better reconstruction accuracy than that of the existing methods.
Tasks Image Inpainting, Image Reconstruction
Published 2018-12-27
URL https://arxiv.org/abs/1812.10836v3
PDF https://arxiv.org/pdf/1812.10836v3.pdf
PWC https://paperswithcode.com/paper/adaptive-image-sampling-using-deep-learning
Repo https://github.com/usstdqq/deep-adaptive-sampling-mask
Framework pytorch

Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks

Title Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks
Authors Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka
Abstract Large-scale distributed training of deep neural networks suffer from the generalization gap caused by the increase in the effective mini-batch size. Previous approaches try to solve this problem by varying the learning rate and batch size over epochs and layers, or some ad hoc modification of the batch normalization. We propose an alternative approach using a second-order optimization method that shows similar generalization capability to first-order methods, but converges faster and can handle larger mini-batches. To test our method on a benchmark where highly optimized first-order methods are available as references, we train ResNet-50 on ImageNet. We converged to 75% Top-1 validation accuracy in 35 epochs for mini-batch sizes under 16,384, and achieved 75% even with a mini-batch size of 131,072, which took only 978 iterations.
Tasks
Published 2018-11-29
URL http://arxiv.org/abs/1811.12019v5
PDF http://arxiv.org/pdf/1811.12019v5.pdf
PWC https://paperswithcode.com/paper/large-scale-distributed-second-order
Repo https://github.com/tyohei/chainerkfac
Framework none

Deep Learning under Privileged Information Using Heteroscedastic Dropout

Title Deep Learning under Privileged Information Using Heteroscedastic Dropout
Authors John Lambert, Ozan Sener, Silvio Savarese
Abstract Unlike machines, humans learn through rapid, abstract model-building. The role of a teacher is not simply to hammer home right or wrong answers, but rather to provide intuitive comments, comparisons, and explanations to a pupil. This is what the Learning Under Privileged Information (LUPI) paradigm endeavors to model by utilizing extra knowledge only available during training. We propose a new LUPI algorithm specifically designed for Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). We propose to use a heteroscedastic dropout (i.e. dropout with a varying variance) and make the variance of the dropout a function of privileged information. Intuitively, this corresponds to using the privileged information to control the uncertainty of the model output. We perform experiments using CNNs and RNNs for the tasks of image classification and machine translation. Our method significantly increases the sample efficiency during learning, resulting in higher accuracy with a large margin when the number of training examples is limited. We also theoretically justify the gains in sample efficiency by providing a generalization error bound decreasing with $O(\frac{1}{n})$, where $n$ is the number of training examples, in an oracle case.
Tasks Image Classification, Machine Translation
Published 2018-05-29
URL http://arxiv.org/abs/1805.11614v1
PDF http://arxiv.org/pdf/1805.11614v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-under-privileged-information
Repo https://github.com/johnwlambert/dlupi-heteroscedastic-dropout
Framework pytorch

Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings

Title Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Authors Kevin Chen, Christopher B. Choy, Manolis Savva, Angel X. Chang, Thomas Funkhouser, Silvio Savarese
Abstract We present a method for generating colored 3D shapes from natural language. To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes. Our model combines and extends learning by association and metric learning approaches to learn implicit cross-modal connections, and produces a joint representation that captures the many-to-many relations between language and physical properties of 3D shapes such as color and shape. To evaluate our approach, we collect a large dataset of natural language descriptions for physical 3D objects in the ShapeNet dataset. With this learned joint embedding we demonstrate text-to-shape retrieval that outperforms baseline approaches. Using our embeddings with a novel conditional Wasserstein GAN framework, we generate colored 3D shapes from text. Our method is the first to connect natural language text with realistic 3D objects exhibiting rich variations in color, texture, and shape detail. See video at https://youtu.be/zraPvRdl13Q
Tasks Metric Learning
Published 2018-03-22
URL http://arxiv.org/abs/1803.08495v1
PDF http://arxiv.org/pdf/1803.08495v1.pdf
PWC https://paperswithcode.com/paper/text2shape-generating-shapes-from-natural
Repo https://github.com/kchen92/text2shape
Framework tf

Reinforcement Learning Decoders for Fault-Tolerant Quantum Computation

Title Reinforcement Learning Decoders for Fault-Tolerant Quantum Computation
Authors Ryan Sweke, Markus S. Kesselring, Evert P. L. van Nieuwenburg, Jens Eisert
Abstract Topological error correcting codes, and particularly the surface code, currently provide the most feasible roadmap towards large-scale fault-tolerant quantum computation. As such, obtaining fast and flexible decoding algorithms for these codes, within the experimentally relevant context of faulty syndrome measurements, is of critical importance. In this work, we show that the problem of decoding such codes, in the full fault-tolerant setting, can be naturally reformulated as a process of repeated interactions between a decoding agent and a code environment, to which the machinery of reinforcement learning can be applied to obtain decoding agents. As a demonstration, by using deepQ learning, we obtain fast decoding agents for the surface code, for a variety of noise-models.
Tasks
Published 2018-10-16
URL http://arxiv.org/abs/1810.07207v1
PDF http://arxiv.org/pdf/1810.07207v1.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-decoders-for-fault
Repo https://github.com/R-Sweke/DeepQ-Decoding
Framework tf

Predicting pregnancy using large-scale data from a women’s health tracking mobile application

Title Predicting pregnancy using large-scale data from a women’s health tracking mobile application
Authors Bo Liu, Shuyang Shi, Yongshang Wu, Daniel Thomas, Laura Symul, Emma Pierson, Jure Leskovec
Abstract Predicting pregnancy has been a fundamental problem in women’s health for more than 50 years. Previous datasets have been collected via carefully curated medical studies, but the recent growth of women’s health tracking mobile apps offers potential for reaching a much broader population. However, the feasibility of predicting pregnancy from mobile health tracking data is unclear. Here we develop four models – a logistic regression model, and 3 LSTM models – to predict a woman’s probability of becoming pregnant using data from a women’s health tracking app, Clue by BioWink GmbH. Evaluating our models on a dataset of 79 million logs from 65,276 women with ground truth pregnancy test data, we show that our predicted pregnancy probabilities meaningfully stratify women: women in the top 10% of predicted probabilities have a 89% chance of becoming pregnant over 6 menstrual cycles, as compared to a 27% chance for women in the bottom 10%. We develop a technique for extracting interpretable time trends from our deep learning models, and show these trends are consistent with previous fertility research. Our findings illustrate the potential that women’s health tracking data offers for predicting pregnancy on a broader population; we conclude by discussing the steps needed to fulfill this potential.
Tasks
Published 2018-12-05
URL http://arxiv.org/abs/1812.02222v2
PDF http://arxiv.org/pdf/1812.02222v2.pdf
PWC https://paperswithcode.com/paper/predicting-pregnancy-using-large-scale-data
Repo https://github.com/AndyYSWoo/pregnancy-prediction
Framework tf
comments powered by Disqus