January 31, 2020

2904 words 14 mins read

Paper Group AWR 419

Paper Group AWR 419

Powering Hidden Markov Model by Neural Network based Generative Models. BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget. Deeper Text Understanding for IR with Contextual Neural Language Modeling. CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation. Modular Universal Reparameterization: Dee …

Powering Hidden Markov Model by Neural Network based Generative Models

Title Powering Hidden Markov Model by Neural Network based Generative Models
Authors Dong Liu, Antoine Honoré, Saikat Chatterjee, Lars K. Rasmussen
Abstract Hidden Markov model (HMM) has been successfully used for sequential data modeling problems. In this work, we propose to power the modeling capacity of HMM by bringing in neural network based generative models. The proposed model is termed as GenHMM. In the proposed GenHMM, each HMM hidden state is associated with a neural network based generative model that has tractability of exact likelihood and provides efficient likelihood computation. A generative model in GenHMM consists of mixture of generators that are realized by flow models. A learning algorithm for GenHMM is proposed in expectation-maximization framework. The convergence of the learning GenHMM is analyzed. We demonstrate the efficiency of GenHMM by classification tasks on practical sequential data. Code available at https://github.com/FirstHandScientist/genhmm.
Tasks
Published 2019-10-13
URL https://arxiv.org/abs/1910.05744v2
PDF https://arxiv.org/pdf/1910.05744v2.pdf
PWC https://paperswithcode.com/paper/powering-hidden-markov-model-by-neural
Repo https://github.com/FirstHandScientist/genhmm
Framework pytorch

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

Title BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
Authors Jack Turner, Elliot J. Crowley, Michael O’Boyle, Amos Storkey, Gavin Gray
Abstract The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustively searching for such a combination is prohibitively expensive. In this work, we develop BlockSwap: a fast algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. These networks can then be used as students and distilled with the original large network as a teacher. We demonstrate the effectiveness of the chosen networks across CIFAR-10 and ImageNet for classification, and COCO for detection, and provide a comprehensive ablation study of our approach. BlockSwap quickly explores possible block configurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. under 5 minutes on a single GPU for CIFAR-10). Code is available at https://github.com/BayesWatch/pytorch-blockswap.
Tasks
Published 2019-06-10
URL https://arxiv.org/abs/1906.04113v2
PDF https://arxiv.org/pdf/1906.04113v2.pdf
PWC https://paperswithcode.com/paper/blockswap-fisher-guided-block-substitution
Repo https://github.com/Vini90/pytorch-BlockSwap-InferenceTime
Framework pytorch

Deeper Text Understanding for IR with Contextual Neural Language Modeling

Title Deeper Text Understanding for IR with Contextual Neural Language Modeling
Authors Zhuyun Dai, Jamie Callan
Abstract Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations. Neural IR models have achieved promising results in learning query-document relevance patterns, but few explorations have been done on understanding the text content of a query or a document. This paper studies leveraging a recently-proposed contextual neural language model, BERT, to provide deeper text understanding for IR. Experimental results demonstrate that the contextual text representations from BERT are more effective than traditional word embeddings. Compared to bag-of-words retrieval models, the contextual language model can better leverage language structures, bringing large improvements on queries written in natural languages. Combining the text understanding ability with search knowledge leads to an enhanced pre-trained BERT model that can benefit related search tasks where training data are limited.
Tasks Ad-Hoc Information Retrieval, Language Modelling, Word Embeddings
Published 2019-05-22
URL https://arxiv.org/abs/1905.09217v1
PDF https://arxiv.org/pdf/1905.09217v1.pdf
PWC https://paperswithcode.com/paper/deeper-text-understanding-for-ir-with
Repo https://github.com/NavePnow/Google-BERT-on-fake_or_real-news-dataset
Framework pytorch

CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation

Title CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation
Authors Kartik Gupta, Lars Petersson, Richard Hartley
Abstract We present a new approach for a single view, image-based object pose estimation. Specifically, the problem of culling false positives among several pose proposal estimates is addressed in this paper. Our proposed approach targets the problem of inaccurate confidence values predicted by CNNs which is used by many current methods to choose a final object pose prediction. We present a network called CullNet, solving this task. CullNet takes pairs of pose masks rendered from a 3D model and cropped regions in the original image as input. This is then used to calibrate the confidence scores of the pose proposals. This new set of confidence scores is found to be significantly more reliable for accurate object pose estimation as shown by our results. Our experimental results on multiple challenging datasets (LINEMOD and Occlusion LINEMOD) reflects the utility of our proposed method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on these standard object pose estimation datasets. Our code is publicly available on https://github.com/kartikgupta-at-anu/CullNet.
Tasks 6D Pose Estimation using RGB, Pose Estimation, Pose Prediction
Published 2019-09-30
URL https://arxiv.org/abs/1909.13476v1
PDF https://arxiv.org/pdf/1909.13476v1.pdf
PWC https://paperswithcode.com/paper/cullnet-calibrated-and-pose-aware-confidence
Repo https://github.com/kartikgupta-at-anu/CullNet
Framework none

Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains

Title Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains
Authors Elliot Meyerson, Risto Miikkulainen
Abstract As deep learning applications continue to become more diverse, an interesting question arises: Can general problem solving arise from jointly learning several such diverse tasks? To approach this question, deep multi-task learning is extended in this paper to the setting where there is no obvious overlap between task architectures. The idea is that any set of (architecture,task) pairs can be decomposed into a set of potentially related subproblems, whose sharing is optimized by an efficient stochastic algorithm. The approach is first validated in a classic synthetic multi-task learning benchmark, and then applied to sharing across disparate architectures for vision, NLP, and genomics tasks. It discovers regularities across these domains, encodes them into sharable modules, and combines these modules systematically to improve performance in the individual tasks. The results confirm that sharing learned functionality across diverse domains and architectures is indeed beneficial, thus establishing a key ingredient for general problem solving in the future.
Tasks Multi-Task Learning
Published 2019-05-31
URL https://arxiv.org/abs/1906.00097v2
PDF https://arxiv.org/pdf/1906.00097v2.pdf
PWC https://paperswithcode.com/paper/190600097
Repo https://github.com/leaf-ai/muir
Framework pytorch

Temporal Normalizing Flows

Title Temporal Normalizing Flows
Authors Remy Kusters, Gert-Jan Both
Abstract Analyzing and interpreting time-dependent stochastic data requires accurate and robust density estimation. In this paper we extend the concept of normalizing flows to so-called temporal Normalizing Flows (tNFs) to estimate time dependent distributions, leveraging the full spatio-temporal information present in the dataset. Our approach is unsupervised, does not require an a-priori characteristic scale and can accurately estimate multi-scale distributions of vastly different length scales. We illustrate tNFs on sparse datasets of Brownian and chemotactic walkers, showing that the inclusion of temporal information enhances density estimation. Finally, we speculate how tNFs can be applied to fit and discover the continuous PDE underlying a stochastic process.
Tasks Density Estimation
Published 2019-12-19
URL https://arxiv.org/abs/1912.09092v1
PDF https://arxiv.org/pdf/1912.09092v1.pdf
PWC https://paperswithcode.com/paper/temporal-normalizing-flows
Repo https://github.com/PhIMaL/temporal_normalizing_flows
Framework pytorch

Compound Probabilistic Context-Free Grammars for Grammar Induction

Title Compound Probabilistic Context-Free Grammars for Grammar Induction
Authors Yoon Kim, Chris Dyer, Alexander M. Rush
Abstract We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our grammar’s rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized out with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-the-art methods when evaluated on unsupervised parsing.
Tasks Constituency Grammar Induction
Published 2019-06-24
URL https://arxiv.org/abs/1906.10225v9
PDF https://arxiv.org/pdf/1906.10225v9.pdf
PWC https://paperswithcode.com/paper/compound-probabilistic-context-free-grammars
Repo https://github.com/harvardnlp/compound-pcfg
Framework pytorch

MaCow: Masked Convolutional Generative Flow

Title MaCow: Masked Convolutional Generative Flow
Authors Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy
Abstract Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. Despite their computational efficiency, the density estimation performance of flow-based generative models significantly falls behind those of state-of-the-art autoregressive models. In this work, we introduce masked convolutional generative flow (MaCow), a simple yet effective architecture of generative flow using masked convolution. By restricting the local connectivity in a small kernel, MaCow enjoys the properties of fast and stable training, and efficient sampling, while achieving significant improvements over Glow for density estimation on standard image benchmarks, considerably narrowing the gap to autoregressive models.
Tasks Density Estimation, Image Generation
Published 2019-02-12
URL https://arxiv.org/abs/1902.04208v5
PDF https://arxiv.org/pdf/1902.04208v5.pdf
PWC https://paperswithcode.com/paper/macow-masked-convolutional-generative-flow
Repo https://github.com/XuezheMax/macow
Framework pytorch

Mesh R-CNN

Title Mesh R-CNN
Authors Georgia Gkioxari, Jitendra Malik, Justin Johnson
Abstract Rapid advances in 2D perception have led to systems that accurately detect objects in real-world images. However, these systems make predictions in 2D, ignoring the 3D structure of the world. Concurrently, advances in 3D shape prediction have mostly focused on synthetic benchmarks and isolated objects. We unify advances in these two areas. We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object. Our system, called Mesh R-CNN, augments Mask R-CNN with a mesh prediction branch that outputs meshes with varying topological structure by first predicting coarse voxel representations which are converted to meshes and refined with a graph convolution network operating over the mesh’s vertices and edges. We validate our mesh prediction branch on ShapeNet, where we outperform prior work on single-image shape prediction. We then deploy our full Mesh R-CNN system on Pix3D, where we jointly detect objects and predict their 3D shapes.
Tasks 3D Shape Modeling
Published 2019-06-06
URL https://arxiv.org/abs/1906.02739v2
PDF https://arxiv.org/pdf/1906.02739v2.pdf
PWC https://paperswithcode.com/paper/mesh-r-cnn
Repo https://github.com/Penguinazor/mse.wem.project
Framework none

FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks

Title FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks
Authors Rohan Lekhwani, Bhupendra Singh
Abstract Hand pose estimation from monocular depth images has been an important and challenging problem in the Computer Vision community. In this paper, we present a novel approach to estimate 3D hand joint locations from 2D depth images. Unlike most of the previous methods, our model captures the 3D spatial information from a depth image thereby giving it a greater understanding of the input. We voxelize the input depth map to capture the 3D features of the input and perform 3D data augmentations to make our network robust to real-world images. Our network is trained in an end-to-end manner which reduces time and space complexity significantly when compared to other methods. Through extensive experiments, we show that our model outperforms state-of-the-art methods with respect to the time it takes to train and predict 3D hand joint locations. This makes our method more suitable for real-world hand pose estimation scenarios.
Tasks Hand Pose Estimation, Pose Estimation
Published 2019-07-15
URL https://arxiv.org/abs/1907.06327v3
PDF https://arxiv.org/pdf/1907.06327v3.pdf
PWC https://paperswithcode.com/paper/fastv2c-handnet-fast-voxel-to-coordinate-hand
Repo https://github.com/RonLek/FastV2C-HandNet
Framework tf

AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Title AutoAssist: A Framework to Accelerate Training of Deep Neural Networks
Authors Jiong Zhang, Hsiang-fu Yu, Inderjit S. Dhillon
Abstract Deep neural networks have yielded superior performance in many applications; however, the gradient computation in a deep model with millions of instances lead to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement in the current model by a stochastic gradient update on each instance varies dynamically. In AutoAssist, we utilize this fact and design a simple instance shrinking operation, which is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible. We prove that the proposed technique outperforms vanilla SGD with existing importance sampling approaches for linear SVM problems, and establish an O(1/k) convergence for strongly convex problems. In order to apply the proposed techniques to accelerate training of deep models, we propose to jointly train a very lightweight Assistant network in addition to the original deep network referred to as Boss. The Assistant network is designed to gauge the importance of a given instance with respect to the current Boss such that a shrinking operation can be applied in the batch generator. With careful design, we train the Boss and Assistant in a nonblocking and asynchronous fashion such that overhead is minimal. We demonstrate that AutoAssist reduces the number of epochs by 40% for training a ResNet to reach the same test accuracy on an image classification data set and saves 30% training time needed for a transformer model to yield the same BLEU scores on a translation dataset.
Tasks Image Classification
Published 2019-05-08
URL https://arxiv.org/abs/1905.03381v1
PDF https://arxiv.org/pdf/1905.03381v1.pdf
PWC https://paperswithcode.com/paper/190503381
Repo https://github.com/zhangjiong724/autoassist-exp
Framework pytorch

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

Title Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
Authors Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel
Abstract We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results.
Tasks 6D Pose Estimation, 6D Pose Estimation using RGB, Denoising, Object Detection, Pose Estimation
Published 2019-02-04
URL https://arxiv.org/abs/1902.01275v2
PDF https://arxiv.org/pdf/1902.01275v2.pdf
PWC https://paperswithcode.com/paper/implicit-3d-orientation-learning-for-6d
Repo https://github.com/DLR-RM/AugmentedAutoencoder
Framework tf

Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction

Title Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction
Authors Peng Xu, Denilson Barbosa
Abstract Knowledge Bases (KBs) require constant up-dating to reflect changes to the world they represent. For general purpose KBs, this is often done through Relation Extraction (RE), the task of predicting KB relations expressed in text mentioning entities known to the KB. One way to improve RE is to use KB Embeddings (KBE) for link prediction. However, despite clear connections between RE and KBE, little has been done toward properly unifying these models systematically. We help close the gap with a framework that unifies the learning of RE and KBE models leading to significant improvements over the state-of-the-art in RE. The code is available at https://github.com/billy-inn/HRERE.
Tasks Link Prediction, Relation Extraction
Published 2019-03-25
URL https://arxiv.org/abs/1903.10126v3
PDF https://arxiv.org/pdf/1903.10126v3.pdf
PWC https://paperswithcode.com/paper/connecting-language-and-knowledge-with
Repo https://github.com/billy-inn/HRERE
Framework tf

jMetalPy: a Python Framework for Multi-Objective Optimization with Metaheuristics

Title jMetalPy: a Python Framework for Multi-Objective Optimization with Metaheuristics
Authors Antonio Benitez-Hidalgo, Antonio J. Nebro, Jose Garcia-Nieto, Izaskun Oregi, Javier Del Ser
Abstract This paper describes jMetalPy, an object-oriented Python-based framework for multi-objective optimization with metaheuristic techniques. Building upon our experiences with the well-known jMetal framework, we have developed a new multi-objective optimization software platform aiming not only at replicating the former one in a different programming language, but also at taking advantage of the full feature set of Python, including its facilities for fast prototyping and the large amount of available libraries for data processing, data analysis, data visualization, and high-performance computing. As a result, jMetalPy provides an environment for solving multi-objective optimization problems focused not only on traditional metaheuristics, but also on techniques supporting preference articulation and dynamic problems, along with a rich set of features related to the automatic generation of statistical data from the results generated, as well as the real-time and interactive visualization of the Pareto front approximations produced by the algorithms. jMetalPy offers additionally support for parallel computing in multicore and cluster systems. We include some use cases to explore the main features of jMetalPy and to illustrate how to work with it.
Tasks
Published 2019-03-07
URL http://arxiv.org/abs/1903.02915v2
PDF http://arxiv.org/pdf/1903.02915v2.pdf
PWC https://paperswithcode.com/paper/jmetalpy-a-python-framework-for-multi
Repo https://github.com/jMetal/jMetalPy
Framework none

Feeding the zombies: Synthesizing brain volumes using a 3D progressive growing GAN

Title Feeding the zombies: Synthesizing brain volumes using a 3D progressive growing GAN
Authors Anders Eklund
Abstract Deep learning requires large datasets for training (convolutional) networks with millions of parameters. In neuroimaging, there are few open datasets with more than 100 subjects, which makes it difficult to, for example, train a classifier to discriminate controls from diseased persons. Generative adversarial networks (GANs) can be used to synthesize data, but virtually all research is focused on 2D images. In medical imaging, and especially in neuroimaging, most datasets are 3D or 4D. Here we therefore present preliminary results showing that a 3D progressive growing GAN can be used to synthesize MR brain volumes.
Tasks
Published 2019-12-11
URL https://arxiv.org/abs/1912.05357v2
PDF https://arxiv.org/pdf/1912.05357v2.pdf
PWC https://paperswithcode.com/paper/feeding-the-zombies-synthesizing-brain
Repo https://github.com/wanderine/ProgressiveGAN3D
Framework tf
comments powered by Disqus