July 29, 2019

3332 words 16 mins read

Paper Group AWR 93

Self-Supervised Visual Planning with Temporal Skip Connections. Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning. f-GANs in an Information Geometric Nutshell. A signature-based machine learning model for bipolar disorder and borderline personality disorder. Learning a Hierarchical Latent-Variable Model of 3D Shapes. Ch …

Self-Supervised Visual Planning with Temporal Skip Connections


Title	Self-Supervised Visual Planning with Temporal Skip Connections
Authors	Frederik Ebert, Chelsea Finn, Alex X. Lee, Sergey Levine
Abstract	In order to autonomously learn wide repertoires of complex skills, robots must be able to learn from their own autonomously collected data, without human supervision. One learning signal that is always available for autonomously collected data is prediction: if a robot can learn to predict the future, it can use this predictive model to take actions to produce desired outcomes, such as moving an object to a particular location. However, in complex open-world scenarios, designing a representation for prediction is difficult. In this work, we instead aim to enable self-supervised robotic learning through direct video prediction: instead of attempting to design a good representation, we directly predict what the robot will see next, and then use this model to achieve desired goals. A key challenge in video prediction for robotic manipulation is handling complex spatial arrangements such as occlusions. To that end, we introduce a video prediction model that can keep track of objects through occlusion by incorporating temporal skip-connections. Together with a novel planning criterion and action space formulation, we demonstrate that this model substantially outperforms prior work on video prediction-based control. Our results show manipulation of objects not seen during training, handling multiple objects, and pushing objects around obstructions. These results represent a significant advance in the range and complexity of skills that can be performed entirely with self-supervised robotic learning.
Tasks	Video Prediction
Published	2017-10-15
URL	http://arxiv.org/abs/1710.05268v1
PDF	http://arxiv.org/pdf/1710.05268v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-visual-planning-with-temporal
Repo	https://github.com/joelouismarino/amortized-variational-filtering
Framework	pytorch

Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning


Title	Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning
Authors	Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker
Abstract	In this paper we study how to learn stochastic, multimodal transition dynamics in reinforcement learning (RL) tasks. We focus on evaluating transition function estimation, while we defer planning over this model to future work. Stochasticity is a fundamental property of many task environments. However, discriminative function approximators have difficulty estimating multimodal stochasticity. In contrast, deep generative models do capture complex high-dimensional outcome distributions. First we discuss why, amongst such models, conditional variational inference (VI) is theoretically most appealing for model-based RL. Subsequently, we compare different VI models on their ability to learn complex stochasticity on simulated functions, as well as on a typical RL gridworld with multimodal dynamics. Results show VI successfully predicts multimodal outcomes, but also robustly ignores these for deterministic parts of the transition dynamics. In summary, we show a robust method to learn multimodal transitions using function approximation, which is a key preliminary for model-based RL in stochastic domains.
Tasks
Published	2017-05-01
URL	http://arxiv.org/abs/1705.00470v2
PDF	http://arxiv.org/pdf/1705.00470v2.pdf
PWC	https://paperswithcode.com/paper/learning-multimodal-transition-dynamics-for
Repo	https://github.com/tmoer/multimodal_varinf
Framework	tf

f-GANs in an Information Geometric Nutshell


Title	f-GANs in an Information Geometric Nutshell
Authors	Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson
Abstract	Nowozin \textit{et al} showed last year how to extend the GAN \textit{principle} to all $f$-divergences. The approach is elegant but falls short of a full description of the supervised game, and says little about the key player, the generator: for example, what does the generator actually converge to if solving the GAN game means convergence in some space of parameters? How does that provide hints on the generator’s design and compare to the flourishing but almost exclusively experimental literature on the subject? In this paper, we unveil a broad class of distributions for which such convergence happens — namely, deformed exponential families, a wide superset of exponential families — and show tight connections with the three other key GAN parameters: loss, game and architecture. In particular, we show that current deep architectures are able to factorize a very large number of such densities using an especially compact design, hence displaying the power of deep architectures and their concinnity in the $f$-GAN game. This result holds given a sufficient condition on \textit{activation functions} — which turns out to be satisfied by popular choices. The key to our results is a variational generalization of an old theorem that relates the KL divergence between regular exponential families and divergences between their natural parameters. We complete this picture with additional results and experimental insights on how these results may be used to ground further improvements of GAN architectures, via (i) a principled design of the activation functions in the generator and (ii) an explicit integration of proper composite losses’ link function in the discriminator.
Tasks
Published	2017-07-14
URL	http://arxiv.org/abs/1707.04385v1
PDF	http://arxiv.org/pdf/1707.04385v1.pdf
PWC	https://paperswithcode.com/paper/f-gans-in-an-information-geometric-nutshell
Repo	https://github.com/qulizhen/fgan_info_geometric
Framework	pytorch

A signature-based machine learning model for bipolar disorder and borderline personality disorder


Title	A signature-based machine learning model for bipolar disorder and borderline personality disorder
Authors	Imanol Perez Arribas, Kate Saunders, Guy Goodwin, Terry Lyons
Abstract	Mobile technologies offer opportunities for higher resolution monitoring of health conditions. This opportunity seems of particular promise in psychiatry where diagnoses often rely on retrospective and subjective recall of mood states. However, getting actionable information from these rather complex time series is challenging, and at present the implications for clinical care are largely hypothetical. This research demonstrates that, with well chosen cohorts (of bipolar disorder, borderline personality disorder, and control) and modern methods, it is possible to objectively learn to identify distinctive behaviour over short periods (20 reports) that effectively separate the cohorts. Participants with bipolar disorder or borderline personality disorder and healthy volunteers completed daily mood ratings using a bespoke smartphone app for up to a year. A signature-based machine learning model was used to classify participants on the basis of the interrelationship between the different mood items assessed and to predict subsequent mood. The signature methodology was significantly superior to earlier statistical approaches applied to this data in distinguishing the participant three groups, clearly placing 75% into their original groups on the basis of their reports. Subsequent mood ratings were correctly predicted with greater than 70% accuracy in all groups. Prediction of mood was most accurate in healthy volunteers (89-98%) compared to bipolar disorder (82-90%) and borderline personality disorder (70-78%).
Tasks	Time Series
Published	2017-07-22
URL	http://arxiv.org/abs/1707.07124v2
PDF	http://arxiv.org/pdf/1707.07124v2.pdf
PWC	https://paperswithcode.com/paper/a-signature-based-machine-learning-model-for
Repo	https://github.com/Haadem/FSN_Workshop
Framework	none

Learning a Hierarchical Latent-Variable Model of 3D Shapes


Title	Learning a Hierarchical Latent-Variable Model of 3D Shapes
Authors	Shikun Liu, C. Lee Giles, Alexander G. Ororbia II
Abstract	We propose the Variational Shape Learner (VSL), a generative model that learns the underlying structure of voxelized 3D shapes in an unsupervised fashion. Through the use of skip-connections, our model can successfully learn and infer a latent, hierarchical representation of objects. Furthermore, realistic 3D objects can be easily generated by sampling the VSL’s latent probabilistic manifold. We show that our generative model can be trained end-to-end from 2D images to perform single image 3D model retrieval. Experiments show, both quantitatively and qualitatively, the improved generalization of our proposed model over a range of tasks, performing better or comparable to various state-of-the-art alternatives.
Tasks	3D Object Classification, 3D Object Recognition, 3D Reconstruction, 3D Shape Generation
Published	2017-05-17
URL	http://arxiv.org/abs/1705.05994v4
PDF	http://arxiv.org/pdf/1705.05994v4.pdf
PWC	https://paperswithcode.com/paper/learning-a-hierarchical-latent-variable-model
Repo	https://github.com/lorenmt/vsl
Framework	tf

Challenges in Disentangling Independent Factors of Variation


Title	Challenges in Disentangling Independent Factors of Variation
Authors	Attila Szabó, Qiyang Hu, Tiziano Portenier, Matthias Zwicker, Paolo Favaro
Abstract	We study the problem of building models that disentangle independent factors of variation. Such models could be used to encode features that can efficiently be used for classification and to transfer attributes between different images in image synthesis. As data we use a weakly labeled training set. Our weak labels indicate what single factor has changed between two data samples, although the relative value of the change is unknown. This labeling is of particular interest as it may be readily available without annotation costs. To make use of weak labels we introduce an autoencoder model and train it through constraints on image pairs and triplets. We formally prove that without additional knowledge there is no guarantee that two images with the same factor of variation will be mapped to the same feature. We call this issue the reference ambiguity. Moreover, we show the role of the feature dimensionality and adversarial training. We demonstrate experimentally that the proposed model can successfully transfer attributes on several datasets, but show also cases when the reference ambiguity occurs.
Tasks	Image Generation
Published	2017-11-07
URL	http://arxiv.org/abs/1711.02245v1
PDF	http://arxiv.org/pdf/1711.02245v1.pdf
PWC	https://paperswithcode.com/paper/challenges-in-disentangling-independent
Repo	https://github.com/ananyahjha93/challenges-in-disentangling
Framework	pytorch

TALL: Temporal Activity Localization via Language Query


Title	TALL: Temporal Activity Localization via Language Query
Authors	Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia
Abstract	This paper focuses on temporal localization of actions in untrimmed videos. Existing methods typically train classifiers for a pre-defined list of actions and apply them in a sliding window fashion. However, activities in the wild consist of a wide combination of actors, actions and objects; it is difficult to design a proper activity list that meets users’ needs. We propose to localize activities by natural language queries. Temporal Activity Localization via Language (TALL) is challenging as it requires: (1) suitable design of text and video representations to allow cross-modal matching of actions and language queries; (2) ability to locate actions accurately given features from sliding windows of limited granularity. We propose a novel Cross-modal Temporal Regression Localizer (CTRL) to jointly model text query and video clips, output alignment scores and action boundary regression results for candidate clips. For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA. We also build complex sentence queries in Charades-STA for test. Experimental results show that CTRL outperforms previous methods significantly on both datasets.
Tasks	Temporal Localization
Published	2017-05-05
URL	http://arxiv.org/abs/1705.02101v2
PDF	http://arxiv.org/pdf/1705.02101v2.pdf
PWC	https://paperswithcode.com/paper/tall-temporal-activity-localization-via
Repo	https://github.com/WuJie1010/TSP-PRL
Framework	pytorch

Pairwise Confusion for Fine-Grained Visual Classification


Title	Pairwise Confusion for Fine-Grained Visual Classification
Authors	Abhimanyu Dubey, Otkrist Gupta, Pei Guo, Ramesh Raskar, Ryan Farrell, Nikhil Naik
Abstract	Fine-Grained Visual Classification (FGVC) datasets contain small sample sizes, along with significant intra-class variation and inter-class similarity. While prior work has addressed intra-class variation using localization and segmentation techniques, inter-class similarity may also affect feature learning and reduce classification performance. In this work, we address this problem using a novel optimization procedure for the end-to-end neural network training on FGVC tasks. Our procedure, called Pairwise Confusion (PC) reduces overfitting by intentionally {introducing confusion} in the activations. With PC regularization, we obtain state-of-the-art performance on six of the most widely-used FGVC datasets and demonstrate improved localization ability. {PC} is easy to implement, does not need excessive hyperparameter tuning during training, and does not add significant overhead during test time.
Tasks	Fine-Grained Image Classification
Published	2017-05-22
URL	http://arxiv.org/abs/1705.08016v3
PDF	http://arxiv.org/pdf/1705.08016v3.pdf
PWC	https://paperswithcode.com/paper/pairwise-confusion-for-fine-grained-visual
Repo	https://github.com/abhimanyudubey/confusion
Framework	pytorch

A Tutorial on Canonical Correlation Methods


Title	A Tutorial on Canonical Correlation Methods
Authors	Viivi Uurtio, João M. Monteiro, Jaz Kandola, John Shawe-Taylor, Delmiro Fernandez-Reyes, Juho Rousu
Abstract	Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a hands-on tool for applying canonical correlation methods in data analysis.
Tasks
Published	2017-11-07
URL	http://arxiv.org/abs/1711.02391v1
PDF	http://arxiv.org/pdf/1711.02391v1.pdf
PWC	https://paperswithcode.com/paper/a-tutorial-on-canonical-correlation-methods
Repo	https://github.com/aalto-ics-kepaco/cca-tutorial
Framework	none

Video Frame Synthesis using Deep Voxel Flow


Title	Video Frame Synthesis using Deep Voxel Flow
Authors	Ziwei Liu, Raymond A. Yeh, Xiaoou Tang, Yiming Liu, Aseem Agarwala
Abstract	We address the problem of synthesizing new video frames in an existing video, either in-between existing frames (interpolation), or subsequent to them (extrapolation). This problem is challenging because video appearance and motion can be highly complex. Traditional optical-flow-based solutions often fail where flow estimation is challenging, while newer neural-network-based methods that hallucinate pixel values directly often produce blurry results. We combine the advantages of these two methods by training a deep network that learns to synthesize video frames by flowing pixel values from existing ones, which we call deep voxel flow. Our method requires no human supervision, and any video can be used as training data by dropping, and then learning to predict, existing frames. The technique is efficient, and can be applied at any video resolution. We demonstrate that our method produces results that both quantitatively and qualitatively improve upon the state-of-the-art.
Tasks	Optical Flow Estimation
Published	2017-02-08
URL	http://arxiv.org/abs/1702.02463v2
PDF	http://arxiv.org/pdf/1702.02463v2.pdf
PWC	https://paperswithcode.com/paper/video-frame-synthesis-using-deep-voxel-flow
Repo	https://github.com/NVIDIA/unsupervised-video-interpolation
Framework	pytorch

Neural Offset Min-Sum Decoding


Title	Neural Offset Min-Sum Decoding
Authors	Loren Lugosch, Warren J. Gross
Abstract	Recently, it was shown that if multiplicative weights are assigned to the edges of a Tanner graph used in belief propagation decoding, it is possible to use deep learning techniques to find values for the weights which improve the error-correction performance of the decoder. Unfortunately, this approach requires many multiplications, which are generally expensive operations. In this paper, we suggest a more hardware-friendly approach in which offset min-sum decoding is augmented with learnable offset parameters. Our method uses no multiplications and has a parameter count less than half that of the multiplicative algorithm. This both speeds up training and provides a feasible path to hardware architectures. After describing our method, we compare the performance of the two neural decoding algorithms and show that our method achieves error-correction performance within 0.1 dB of the multiplicative approach and as much as 1 dB better than traditional belief propagation for the codes under consideration.
Tasks
Published	2017-01-20
URL	http://arxiv.org/abs/1701.05931v3
PDF	http://arxiv.org/pdf/1701.05931v3.pdf
PWC	https://paperswithcode.com/paper/neural-offset-min-sum-decoding
Repo	https://github.com/lorenlugosch/neural-min-sum-decoding
Framework	tf

Facial Emotion Detection Using Convolutional Neural Networks and Representational Autoencoder Units


Title	Facial Emotion Detection Using Convolutional Neural Networks and Representational Autoencoder Units
Authors	Prudhvi Raj Dachapally
Abstract	Emotion being a subjective thing, leveraging knowledge and science behind labeled data and extracting the components that constitute it, has been a challenging problem in the industry for many years. With the evolution of deep learning in computer vision, emotion recognition has become a widely-tackled research problem. In this work, we propose two independent methods for this very task. The first method uses autoencoders to construct a unique representation of each emotion, while the second method is an 8-layer convolutional neural network (CNN). These methods were trained on the posed-emotion dataset (JAFFE), and to test their robustness, both the models were also tested on 100 random images from the Labeled Faces in the Wild (LFW) dataset, which consists of images that are candid than posed. The results show that with more fine-tuning and depth, our CNN model can outperform the state-of-the-art methods for emotion recognition. We also propose some exciting ideas for expanding the concept of representational autoencoders to improve their performance.
Tasks	Emotion Recognition
Published	2017-06-05
URL	http://arxiv.org/abs/1706.01509v1
PDF	http://arxiv.org/pdf/1706.01509v1.pdf
PWC	https://paperswithcode.com/paper/facial-emotion-detection-using-convolutional
Repo	https://github.com/JoyceHao/FinalProject
Framework	none

Detecting and classifying lesions in mammograms with Deep Learning


Title	Detecting and classifying lesions in mammograms with Deep Learning
Authors	Dezső Ribli, Anna Horváth, Zsuzsa Unger, Péter Pollner, István Csabai
Abstract	In the last two decades Computer Aided Diagnostics (CAD) systems were developed to help radiologists analyze screening mammograms. The benefits of current CAD technologies appear to be contradictory and they should be improved to be ultimately considered useful. Since 2012 deep convolutional neural networks (CNN) have been a tremendous success in image recognition, reaching human performance. These methods have greatly surpassed the traditional approaches, which are similar to currently used CAD solutions. Deep CNN-s have the potential to revolutionize medical image analysis. We propose a CAD system based on one of the most successful object detection frameworks, Faster R-CNN. The system detects and classifies malignant or benign lesions on a mammogram without any human intervention. The proposed method sets the state of the art classification performance on the public INbreast database, AUC = 0.95 . The approach described here has achieved the 2nd place in the Digital Mammography DREAM Challenge with AUC = 0.85 . When used as a detector, the system reaches high sensitivity with very few false positive marks per image on the INbreast dataset. Source code, the trained model and an OsiriX plugin are availaible online at https://github.com/riblidezso/frcnn_cad .
Tasks	Breast Cancer Detection, Object Detection
Published	2017-07-26
URL	http://arxiv.org/abs/1707.08401v3
PDF	http://arxiv.org/pdf/1707.08401v3.pdf
PWC	https://paperswithcode.com/paper/detecting-and-classifying-lesions-in
Repo	https://github.com/riblidezso/frcnn_cad
Framework	none

FingerNet: An Unified Deep Network for Fingerprint Minutiae Extraction


Title	FingerNet: An Unified Deep Network for Fingerprint Minutiae Extraction
Authors	Yao Tang, Fei Gao, Jufu Feng, Yuhang Liu
Abstract	Minutiae extraction is of critical importance in automated fingerprint recognition. Previous works on rolled/slap fingerprints failed on latent fingerprints due to noisy ridge patterns and complex background noises. In this paper, we propose a new way to design deep convolutional network combining domain knowledge and the representation ability of deep learning. In terms of orientation estimation, segmentation, enhancement and minutiae extraction, several typical traditional methods performed well on rolled/slap fingerprints are transformed into convolutional manners and integrated as an unified plain network. We demonstrate that this pipeline is equivalent to a shallow network with fixed weights. The network is then expanded to enhance its representation ability and the weights are released to learn complex background variance from data, while preserving end-to-end differentiability. Experimental results on NIST SD27 latent database and FVC 2004 slap database demonstrate that the proposed algorithm outperforms the state-of-the-art minutiae extraction algorithms. Code is made publicly available at: https://github.com/felixTY/FingerNet.
Tasks
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02228v1
PDF	http://arxiv.org/pdf/1709.02228v1.pdf
PWC	https://paperswithcode.com/paper/fingernet-an-unified-deep-network-for
Repo	https://github.com/felixTY/FingerNet
Framework	tf

High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks


Title	High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks
Authors	Krzysztof J. Geras, Stacey Wolfson, Yiqiu Shen, Nan Wu, S. Gene Kim, Eric Kim, Laura Heacock, Ujas Parikh, Linda Moy, Kyunghyun Cho
Abstract	Advances in deep learning for natural images have prompted a surge of interest in applying similar techniques to medical images. The majority of the initial attempts focused on replacing the input of a deep convolutional neural network with a medical image, which does not take into consideration the fundamental differences between these two types of images. Specifically, fine details are necessary for detection in medical images, unlike in natural images where coarse structures matter most. This difference makes it inadequate to use the existing network architectures developed for natural images, because they work on heavily downscaled images to reduce the memory requirements. This hides details necessary to make accurate predictions. Additionally, a single exam in medical imaging often comes with a set of views which must be fused in order to reach a correct conclusion. In our work, we propose to use a multi-view deep convolutional neural network that handles a set of high-resolution medical images. We evaluate it on large-scale mammography-based breast cancer screening (BI-RADS prediction) using 886,000 images. We focus on investigating the impact of the training set size and image size on the prediction accuracy. Our results highlight that performance increases with the size of training set, and that the best performance can only be achieved using the original resolution. In the reader study, performed on a random subset of the test set, we confirmed the efficacy of our model, which achieved performance comparable to a committee of radiologists when presented with the same data.
Tasks	Breast Cancer Detection
Published	2017-03-21
URL	http://arxiv.org/abs/1703.07047v3
PDF	http://arxiv.org/pdf/1703.07047v3.pdf
PWC	https://paperswithcode.com/paper/high-resolution-breast-cancer-screening-with
Repo	https://github.com/saidbm24/CNN-for-BIRADS
Framework	pytorch