October 16, 2019

2923 words 14 mins read

Paper Group ANR 1060

Paper Group ANR 1060

Video-based Sign Language Recognition without Temporal Segmentation. Learning Tree Distributions by Hidden Markov Models. GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks. Synthesizing Programs for Images using Reinforced Adversarial Learning. Stochastic Combinatorial Ensembles for Defendi …

Video-based Sign Language Recognition without Temporal Segmentation

Title Video-based Sign Language Recognition without Temporal Segmentation
Authors Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, Weiping Li
Abstract Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The proposed LS-HAN consists of three components: a two-stream Convolutional Neural Network (CNN) for video feature representation generation, a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention Network (HAN) for latent space based recognition. Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework.
Tasks Sign Language Recognition
Published 2018-01-30
URL http://arxiv.org/abs/1801.10111v1
PDF http://arxiv.org/pdf/1801.10111v1.pdf
PWC https://paperswithcode.com/paper/video-based-sign-language-recognition-without
Repo
Framework

Learning Tree Distributions by Hidden Markov Models

Title Learning Tree Distributions by Hidden Markov Models
Authors Davide Bacciu, Daniele Castellana
Abstract Hidden tree Markov models allow learning distributions for tree structured data while being interpretable as nondeterministic automata. We provide a concise summary of the main approaches in literature, focusing in particular on the causality assumptions introduced by the choice of a specific tree visit direction. We will then sketch a novel non-parametric generalization of the bottom-up hidden tree Markov model with its interpretation as a nondeterministic tree automaton with infinite states.
Tasks
Published 2018-05-31
URL http://arxiv.org/abs/1805.12372v1
PDF http://arxiv.org/pdf/1805.12372v1.pdf
PWC https://paperswithcode.com/paper/learning-tree-distributions-by-hidden-markov
Repo
Framework

GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks

Title GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks
Authors Yasin Almalioglu, Muhamad Risqi U. Saputra, Pedro P. B. de Gusmao, Andrew Markham, Niki Trigoni
Abstract In the last decade, supervised deep learning approaches have been extensively employed in visual odometry (VO) applications, which is not feasible in environments where labelled data is not abundant. On the other hand, unsupervised deep learning approaches for localization and mapping in unknown environments from unlabelled data have received comparatively less attention in VO research. In this study, we propose a generative unsupervised learning framework that predicts 6-DoF pose camera motion and monocular depth map of the scene from unlabelled RGB image sequences, using deep convolutional Generative Adversarial Networks (GANs). We create a supervisory signal by warping view sequences and assigning the re-projection minimization to the objective loss function that is adopted in multi-view pose estimation and single-view depth generation network. Detailed quantitative and qualitative evaluations of the proposed framework on the KITTI and Cityscapes datasets show that the proposed method outperforms both existing traditional and unsupervised deep VO methods providing better results for both pose estimation and depth recovery.
Tasks Depth Estimation, Monocular Visual Odometry, Pose Estimation, Visual Odometry
Published 2018-09-16
URL http://arxiv.org/abs/1809.05786v3
PDF http://arxiv.org/pdf/1809.05786v3.pdf
PWC https://paperswithcode.com/paper/ganvo-unsupervised-deep-monocular-visual
Repo
Framework

Synthesizing Programs for Images using Reinforced Adversarial Learning

Title Synthesizing Programs for Images using Reinforced Adversarial Learning
Authors Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S. M. Ali Eslami, Oriol Vinyals
Abstract Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep learning and renderers are limited by hand-crafted likelihood or distance functions, a need for large amounts of supervision, or difficulties in scaling their inference algorithms to richer datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that generates a program which is executed by a graphics engine to interpret and sample images. The goal of this agent is to fool a discriminator network that distinguishes between real and rendered data, trained with a distributed reinforcement learning setup without any supervision. A surprising finding is that using the discriminator’s output as a reward signal is the key to allow the agent to make meaningful progress at matching the desired output rendering. To the best of our knowledge, this is the first demonstration of an end-to-end, unsupervised and adversarial inverse graphics agent on challenging real world (MNIST, Omniglot, CelebA) and synthetic 3D datasets.
Tasks Omniglot
Published 2018-04-03
URL http://arxiv.org/abs/1804.01118v1
PDF http://arxiv.org/pdf/1804.01118v1.pdf
PWC https://paperswithcode.com/paper/synthesizing-programs-for-images-using
Repo
Framework

Stochastic Combinatorial Ensembles for Defending Against Adversarial Examples

Title Stochastic Combinatorial Ensembles for Defending Against Adversarial Examples
Authors George A. Adam, Petr Smirnov, David Duvenaud, Benjamin Haibe-Kains, Anna Goldenberg
Abstract Many deep learning algorithms can be easily fooled with simple adversarial examples. To address the limitations of existing defenses, we devised a probabilistic framework that can generate an exponentially large ensemble of models from a single model with just a linear cost. This framework takes advantage of neural network depth and stochastically decides whether or not to insert noise removal operators such as VAEs between layers. We show empirically the important role that model gradients have when it comes to determining transferability of adversarial examples, and take advantage of this result to demonstrate that it is possible to train models with limited adversarial attack transferability. Additionally, we propose a detection method based on metric learning in order to detect adversarial examples that have no hope of being cleaned of maliciously engineered noise.
Tasks Adversarial Attack, Metric Learning
Published 2018-08-20
URL http://arxiv.org/abs/1808.06645v2
PDF http://arxiv.org/pdf/1808.06645v2.pdf
PWC https://paperswithcode.com/paper/stochastic-combinatorial-ensembles-for
Repo
Framework

Diving Deep onto Discriminative Ensemble of Histological Hashing & Class-Specific Manifold Learning for Multi-class Breast Carcinoma Taxonomy

Title Diving Deep onto Discriminative Ensemble of Histological Hashing & Class-Specific Manifold Learning for Multi-class Breast Carcinoma Taxonomy
Authors Sawon Pratiher, Subhankar Chattoraj
Abstract Histopathological images (HI) encrypt resolution dependent heterogeneous textures & diverse color distribution variability, manifesting in micro-structural surface tissue convolutions. Also, inherently high coherency of cancerous cells poses significant challenges to breast cancer (BC) multi-classification. As such, multi-class stratification is sparsely explored & prior work mainly focus on benign & malignant tissue characterization only, which forestalls further quantitative analysis of subordinate classes like adenosis, mucinous carcinoma & fibroadenoma etc, for diagnostic competence. In this work, a fully-automated, near-real-time & computationally inexpensive robust multi-classification deep framework from HI is presented. The proposed scheme employs deep neural network (DNN) aided discriminative ensemble of holistic class-specific manifold learning (CSML) for underlying HI sub-space embedding & HI hashing based local shallow signatures. The model achieves 95.8% accuracy pertinent to multi-classification & 2.8% overall performance improvement & 38.2% enhancement for Lobular carcinoma (LC) sub-class recognition rate as compared to the existing state-of-the-art on well known BreakHis dataset is achieved. Also, 99.3% recognition rate at 200X & a sensitivity of 100% for binary grading at all magnification validates its suitability for clinical deployment in hand-held smart devices.
Tasks
Published 2018-06-18
URL http://arxiv.org/abs/1806.06876v3
PDF http://arxiv.org/pdf/1806.06876v3.pdf
PWC https://paperswithcode.com/paper/diving-deep-onto-discriminative-ensemble-of
Repo
Framework

Deep Convolutional Generative Adversarial Network Based Food Recognition Using Partially Labeled Data

Title Deep Convolutional Generative Adversarial Network Based Food Recognition Using Partially Labeled Data
Authors Bappaditya Mandal, N. B. Puhan, Avijit Verma
Abstract Traditional machine learning algorithms using hand-crafted feature extraction techniques (such as local binary pattern) have limited accuracy because of high variation in images of the same class (or intra-class variation) for food recognition task. In recent works, convolutional neural networks (CNN) have been applied to this task with better results than all previously reported methods. However, they perform best when trained with large amount of annotated (labeled) food images. This is problematic when obtained in large volume, because they are expensive, laborious and impractical. Our work aims at developing an efficient deep CNN learning-based method for food recognition alleviating these limitations by using partially labeled training data on generative adversarial networks (GANs). We make new enhancements to the unsupervised training architecture introduced by Goodfellow et al. (2014), which was originally aimed at generating new data by sampling a dataset. In this work, we make modifications to deep convolutional GANs to make them robust and efficient for classifying food images. Experimental results on benchmarking datasets show the superiority of our proposed method as compared to the current-state-of-the-art methodologies even when trained with partially labeled training data.
Tasks Food Recognition
Published 2018-12-26
URL http://arxiv.org/abs/1812.10179v1
PDF http://arxiv.org/pdf/1812.10179v1.pdf
PWC https://paperswithcode.com/paper/deep-convolutional-generative-adversarial
Repo
Framework

Deep Learning Power Allocation in Massive MIMO

Title Deep Learning Power Allocation in Massive MIMO
Authors Luca Sanguinetti, Alessio Zappone, Merouane Debbah
Abstract This work advocates the use of deep learning to perform max-min and max-prod power allocation in the downlink of Massive MIMO networks. More precisely, a deep neural network is trained to learn the map between the positions of user equipments (UEs) and the optimal power allocation policies, and then used to predict the power allocation profiles for a new set of UEs’ positions. The use of deep learning significantly improves the complexity-performance trade-off of power allocation, compared to traditional optimization-oriented methods. Particularly, the proposed approach does not require the computation of any statistical average, which would be instead necessary by using standard methods, and is able to guarantee near-optimal performance.
Tasks
Published 2018-12-10
URL https://arxiv.org/abs/1812.03640v2
PDF https://arxiv.org/pdf/1812.03640v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-power-allocation-in-massive
Repo
Framework

Deep Ensemble Bayesian Active Learning : Addressing the Mode Collapse issue in Monte Carlo dropout via Ensembles

Title Deep Ensemble Bayesian Active Learning : Addressing the Mode Collapse issue in Monte Carlo dropout via Ensembles
Authors Remus Pop, Patric Fulop
Abstract In image classification tasks, the ability of deep CNNs to deal with complex image data has proven to be unrivalled. However, they require large amounts of labeled training data to reach their full potential. In specialised domains such as healthcare, labeled data can be difficult and expensive to obtain. Active Learning aims to alleviate this problem, by reducing the amount of labelled data needed for a specific task while delivering satisfactory performance. We propose DEBAL, a new active learning strategy designed for deep neural networks. This method improves upon the current state-of-the-art deep Bayesian active learning method, which suffers from the mode collapse problem. We correct for this deficiency by making use of the expressive power and statistical properties of model ensembles. Our proposed method manages to capture superior data uncertainty, which translates into improved classification performance. We demonstrate empirically that our ensemble method yields faster convergence of CNNs trained on the MNIST and CIFAR-10 datasets.
Tasks Active Learning, Image Classification
Published 2018-11-09
URL http://arxiv.org/abs/1811.03897v1
PDF http://arxiv.org/pdf/1811.03897v1.pdf
PWC https://paperswithcode.com/paper/deep-ensemble-bayesian-active-learning-1
Repo
Framework

Predicting Destinations by Nearest Neighbor Search on Training Vessel Routes

Title Predicting Destinations by Nearest Neighbor Search on Training Vessel Routes
Authors Valentin Roşca, Emanuel Onica, Paul Diac, Ciprian Amariei
Abstract The DEBS Grand Challenge 2018 is set in the context of maritime route prediction. Vessel routes are modeled as streams of Automatic Identification System (AIS) data points selected from real-world tracking data. The challenge requires to correctly estimate the destination ports and arrival times of vessel trips, as early as possible. Our proposed solution partitions the training vessel routes by reported destination port and uses a nearest neighbor search to find the training routes that are closer to the query AIS point. Particular improvements have been included as well, such as a way to avoid changing the predicted ports frequently within one query route and automating the parameters tuning by the use of a genetic algorithm. This leads to significant improvements on the final score.
Tasks
Published 2018-09-28
URL http://arxiv.org/abs/1810.00096v1
PDF http://arxiv.org/pdf/1810.00096v1.pdf
PWC https://paperswithcode.com/paper/predicting-destinations-by-nearest-neighbor
Repo
Framework

Hypertree Decompositions Revisited for PGMs

Title Hypertree Decompositions Revisited for PGMs
Authors Aarthy Shivram Arun, Sai Vikneshwar Mani Jayaraman, Christopher Ré, Atri Rudra
Abstract We revisit the classical problem of exact inference on probabilistic graphical models (PGMs). Our algorithm is based on recent worst-case optimal database join algorithms, which can be asymptotically faster than traditional data processing methods. We present the first empirical evaluation of these new algorithms via JoinInfer, a new exact inference engine. We empirically explore the properties of the data for which our engine can be expected to outperform traditional inference engines refining current theoretical notions. Further, JoinInfer outperforms existing state-of-the-art inference engines (ACE, IJGP and libDAI) on some standard benchmark datasets by up to a factor of 630x. Finally, we propose a promising data-driven heuristic that extends JoinInfer to automatically tailor its parameters and/or switch to the traditional inference algorithms.
Tasks
Published 2018-04-05
URL http://arxiv.org/abs/1804.01640v1
PDF http://arxiv.org/pdf/1804.01640v1.pdf
PWC https://paperswithcode.com/paper/hypertree-decompositions-revisited-for-pgms-1
Repo
Framework

Identifiability of Gaussian Structural Equation Models with Dependent Errors Having Equal Variances

Title Identifiability of Gaussian Structural Equation Models with Dependent Errors Having Equal Variances
Authors Jose M. Peña
Abstract In this paper, we prove that some Gaussian structural equation models with dependent errors having equal variances are identifiable from their corresponding Gaussian distributions. Specifically, we prove identifiability for the Gaussian structural equation models that can be represented as Andersson-Madigan-Perlman chain graphs (Andersson et al., 2001). These chain graphs were originally developed to represent independence models. However, they are also suitable for representing causal models with additive noise (Pe~na, 2016. Our result implies then that these causal models can be identified from observational data alone. Our result generalizes the result by Peters and B"uhlmann (2014), who considered independent errors having equal variances. The suitability of the equal error variances assumption should be assessed on a per domain basis.
Tasks
Published 2018-06-21
URL http://arxiv.org/abs/1806.08156v4
PDF http://arxiv.org/pdf/1806.08156v4.pdf
PWC https://paperswithcode.com/paper/identifiability-of-gaussian-structural-1
Repo
Framework

Crossing Generative Adversarial Networks for Cross-View Person Re-identification

Title Crossing Generative Adversarial Networks for Cross-View Person Re-identification
Authors Chengyuan Zhang, Lin Wu, Yang Wang
Abstract Person re-identification (\textit{re-id}) refers to matching pedestrians across disjoint yet non-overlapping camera views. The most effective way to match these pedestrians undertaking significant visual variations is to seek reliably invariant features that can describe the person of interest faithfully. Most of existing methods are presented in a supervised manner to produce discriminative features by relying on labeled paired images in correspondence. However, annotating pair-wise images is prohibitively expensive in labors, and thus not practical in large-scale networked cameras. Moreover, seeking comparable representations across camera views demands a flexible model to address the complex distributions of images. In this work, we study the co-occurrence statistic patterns between pairs of images, and propose to crossing Generative Adversarial Network (Cross-GAN) for learning a joint distribution for cross-image representations in a unsupervised manner. Given a pair of person images, the proposed model consists of the variational auto-encoder to encode the pair into respective latent variables, a proposed cross-view alignment to reduce the view disparity, and an adversarial layer to seek the joint distribution of latent representations. The learned latent representations are well-aligned to reflect the co-occurrence patterns of paired images. We empirically evaluate the proposed model against challenging datasets, and our results show the importance of joint invariant features in improving matching rates of person re-id with comparison to semi/unsupervised state-of-the-arts.
Tasks Cross-Modal Person Re-Identification, Person Re-Identification
Published 2018-01-04
URL http://arxiv.org/abs/1801.01760v1
PDF http://arxiv.org/pdf/1801.01760v1.pdf
PWC https://paperswithcode.com/paper/crossing-generative-adversarial-networks-for
Repo
Framework

Restructuring Batch Normalization to Accelerate CNN Training

Title Restructuring Batch Normalization to Accelerate CNN Training
Authors Wonkyung Jung, Daejin Jung, and Byeongho Kim, Sunjung Lee, Wonjong Rhee, Jung Ho Ahn
Abstract Batch Normalization (BN) has become a core design block of modern Convolutional Neural Networks (CNNs). A typical modern CNN has a large number of BN layers in its lean and deep architecture. BN requires mean and variance calculations over each mini-batch during training. Therefore, the existing memory access reduction techniques, such as fusing multiple CONV layers, are not effective for accelerating BN due to their inability to optimize mini-batch related calculations during training. To address this increasingly important problem, we propose to restructure BN layers by first splitting a BN layer into two sub-layers (fission) and then combining the first sub-layer with its preceding CONV layer and the second sub-layer with the following activation and CONV layers (fusion). The proposed solution can significantly reduce main-memory accesses while training the latest CNN models, and the experiments on a chip multiprocessor show that the proposed BN restructuring can improve the performance of DenseNet-121 by 25.7%.
Tasks
Published 2018-07-04
URL http://arxiv.org/abs/1807.01702v2
PDF http://arxiv.org/pdf/1807.01702v2.pdf
PWC https://paperswithcode.com/paper/restructuring-batch-normalization-to
Repo
Framework

RUC+CMU: System Report for Dense Captioning Events in Videos

Title RUC+CMU: System Report for Dense Captioning Events in Videos
Authors Shizhe Chen, Yuqing Song, Yida Zhao, Jiarong Qiu, Qin Jin, Alexander Hauptmann
Abstract This notebook paper presents our system in the ActivityNet Dense Captioning in Video task (task 3). Temporal proposal generation and caption generation are both important to the dense captioning task. Therefore, we propose a proposal ranking model to employ a set of effective feature representations for proposal generation, and ensemble a series of caption models enhanced with context information to generate captions robustly on predicted proposals. Our approach achieves the state-of-the-art performance on the dense video captioning task with 8.529 METEOR score on the challenge testing set.
Tasks Dense Video Captioning, Video Captioning
Published 2018-06-22
URL http://arxiv.org/abs/1806.08854v1
PDF http://arxiv.org/pdf/1806.08854v1.pdf
PWC https://paperswithcode.com/paper/ruccmu-system-report-for-dense-captioning
Repo
Framework
comments powered by Disqus