January 31, 2020

2904 words 14 mins read

Paper Group AWR 419

Powering Hidden Markov Model by Neural Network based Generative Models. BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget. Deeper Text Understanding for IR with Contextual Neural Language Modeling. CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation. Modular Universal Reparameterization: Dee …

Powering Hidden Markov Model by Neural Network based Generative Models


Title	Powering Hidden Markov Model by Neural Network based Generative Models
Authors	Dong Liu, Antoine Honoré, Saikat Chatterjee, Lars K. Rasmussen
Abstract	Hidden Markov model (HMM) has been successfully used for sequential data modeling problems. In this work, we propose to power the modeling capacity of HMM by bringing in neural network based generative models. The proposed model is termed as GenHMM. In the proposed GenHMM, each HMM hidden state is associated with a neural network based generative model that has tractability of exact likelihood and provides efficient likelihood computation. A generative model in GenHMM consists of mixture of generators that are realized by flow models. A learning algorithm for GenHMM is proposed in expectation-maximization framework. The convergence of the learning GenHMM is analyzed. We demonstrate the efficiency of GenHMM by classification tasks on practical sequential data. Code available at https://github.com/FirstHandScientist/genhmm.
Tasks
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05744v2
PDF	https://arxiv.org/pdf/1910.05744v2.pdf
PWC	https://paperswithcode.com/paper/powering-hidden-markov-model-by-neural
Repo	https://github.com/FirstHandScientist/genhmm
Framework	pytorch

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget


Title	BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
Authors	Jack Turner, Elliot J. Crowley, Michael O’Boyle, Amos Storkey, Gavin Gray
Abstract	The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustively searching for such a combination is prohibitively expensive. In this work, we develop BlockSwap: a fast algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. These networks can then be used as students and distilled with the original large network as a teacher. We demonstrate the effectiveness of the chosen networks across CIFAR-10 and ImageNet for classification, and COCO for detection, and provide a comprehensive ablation study of our approach. BlockSwap quickly explores possible block configurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. under 5 minutes on a single GPU for CIFAR-10). Code is available at https://github.com/BayesWatch/pytorch-blockswap.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04113v2
PDF	https://arxiv.org/pdf/1906.04113v2.pdf
PWC	https://paperswithcode.com/paper/blockswap-fisher-guided-block-substitution
Repo	https://github.com/Vini90/pytorch-BlockSwap-InferenceTime
Framework	pytorch

Deeper Text Understanding for IR with Contextual Neural Language Modeling


Title	Deeper Text Understanding for IR with Contextual Neural Language Modeling
Authors	Zhuyun Dai, Jamie Callan
Abstract	Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations. Neural IR models have achieved promising results in learning query-document relevance patterns, but few explorations have been done on understanding the text content of a query or a document. This paper studies leveraging a recently-proposed contextual neural language model, BERT, to provide deeper text understanding for IR. Experimental results demonstrate that the contextual text representations from BERT are more effective than traditional word embeddings. Compared to bag-of-words retrieval models, the contextual language model can better leverage language structures, bringing large improvements on queries written in natural languages. Combining the text understanding ability with search knowledge leads to an enhanced pre-trained BERT model that can benefit related search tasks where training data are limited.
Tasks	Ad-Hoc Information Retrieval, Language Modelling, Word Embeddings
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09217v1
PDF	https://arxiv.org/pdf/1905.09217v1.pdf
PWC	https://paperswithcode.com/paper/deeper-text-understanding-for-ir-with
Repo	https://github.com/NavePnow/Google-BERT-on-fake_or_real-news-dataset
Framework	pytorch

CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation


Title	CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation
Authors	Kartik Gupta, Lars Petersson, Richard Hartley
Abstract	We present a new approach for a single view, image-based object pose estimation. Specifically, the problem of culling false positives among several pose proposal estimates is addressed in this paper. Our proposed approach targets the problem of inaccurate confidence values predicted by CNNs which is used by many current methods to choose a final object pose prediction. We present a network called CullNet, solving this task. CullNet takes pairs of pose masks rendered from a 3D model and cropped regions in the original image as input. This is then used to calibrate the confidence scores of the pose proposals. This new set of confidence scores is found to be significantly more reliable for accurate object pose estimation as shown by our results. Our experimental results on multiple challenging datasets (LINEMOD and Occlusion LINEMOD) reflects the utility of our proposed method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on these standard object pose estimation datasets. Our code is publicly available on https://github.com/kartikgupta-at-anu/CullNet.
Tasks	6D Pose Estimation using RGB, Pose Estimation, Pose Prediction
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13476v1
PDF	https://arxiv.org/pdf/1909.13476v1.pdf
PWC	https://paperswithcode.com/paper/cullnet-calibrated-and-pose-aware-confidence
Repo	https://github.com/kartikgupta-at-anu/CullNet
Framework	none

Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains


Title	Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains
Authors	Elliot Meyerson, Risto Miikkulainen
Abstract	As deep learning applications continue to become more diverse, an interesting question arises: Can general problem solving arise from jointly learning several such diverse tasks? To approach this question, deep multi-task learning is extended in this paper to the setting where there is no obvious overlap between task architectures. The idea is that any set of (architecture,task) pairs can be decomposed into a set of potentially related subproblems, whose sharing is optimized by an efficient stochastic algorithm. The approach is first validated in a classic synthetic multi-task learning benchmark, and then applied to sharing across disparate architectures for vision, NLP, and genomics tasks. It discovers regularities across these domains, encodes them into sharable modules, and combines these modules systematically to improve performance in the individual tasks. The results confirm that sharing learned functionality across diverse domains and architectures is indeed beneficial, thus establishing a key ingredient for general problem solving in the future.
Tasks	Multi-Task Learning
Published	2019-05-31
URL	https://arxiv.org/abs/1906.00097v2
PDF	https://arxiv.org/pdf/1906.00097v2.pdf
PWC	https://paperswithcode.com/paper/190600097
Repo	https://github.com/leaf-ai/muir
Framework	pytorch

Temporal Normalizing Flows


Title	Temporal Normalizing Flows
Authors	Remy Kusters, Gert-Jan Both
Abstract	Analyzing and interpreting time-dependent stochastic data requires accurate and robust density estimation. In this paper we extend the concept of normalizing flows to so-called temporal Normalizing Flows (tNFs) to estimate time dependent distributions, leveraging the full spatio-temporal information present in the dataset. Our approach is unsupervised, does not require an a-priori characteristic scale and can accurately estimate multi-scale distributions of vastly different length scales. We illustrate tNFs on sparse datasets of Brownian and chemotactic walkers, showing that the inclusion of temporal information enhances density estimation. Finally, we speculate how tNFs can be applied to fit and discover the continuous PDE underlying a stochastic process.
Tasks	Density Estimation
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09092v1
PDF	https://arxiv.org/pdf/1912.09092v1.pdf
PWC	https://paperswithcode.com/paper/temporal-normalizing-flows
Repo	https://github.com/PhIMaL/temporal_normalizing_flows
Framework	pytorch

Compound Probabilistic Context-Free Grammars for Grammar Induction


Title	Compound Probabilistic Context-Free Grammars for Grammar Induction
Authors	Yoon Kim, Chris Dyer, Alexander M. Rush
Abstract	We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our grammar’s rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized out with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-the-art methods when evaluated on unsupervised parsing.
Tasks	Constituency Grammar Induction
Published	2019-06-24
URL	https://arxiv.org/abs/1906.10225v9
PDF	https://arxiv.org/pdf/1906.10225v9.pdf
PWC	https://paperswithcode.com/paper/compound-probabilistic-context-free-grammars
Repo	https://github.com/harvardnlp/compound-pcfg
Framework	pytorch

MaCow: Masked Convolutional Generative Flow


Title	MaCow: Masked Convolutional Generative Flow
Authors	Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy
Abstract	Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. Despite their computational efficiency, the density estimation performance of flow-based generative models significantly falls behind those of state-of-the-art autoregressive models. In this work, we introduce masked convolutional generative flow (MaCow), a simple yet effective architecture of generative flow using masked convolution. By restricting the local connectivity in a small kernel, MaCow enjoys the properties of fast and stable training, and efficient sampling, while achieving significant improvements over Glow for density estimation on standard image benchmarks, considerably narrowing the gap to autoregressive models.
Tasks	Density Estimation, Image Generation
Published	2019-02-12
URL	https://arxiv.org/abs/1902.04208v5
PDF	https://arxiv.org/pdf/1902.04208v5.pdf
PWC	https://paperswithcode.com/paper/macow-masked-convolutional-generative-flow
Repo	https://github.com/XuezheMax/macow
Framework	pytorch

Mesh R-CNN


Title	Mesh R-CNN
Authors	Georgia Gkioxari, Jitendra Malik, Justin Johnson
Abstract	Rapid advances in 2D perception have led to systems that accurately detect objects in real-world images. However, these systems make predictions in 2D, ignoring the 3D structure of the world. Concurrently, advances in 3D shape prediction have mostly focused on synthetic benchmarks and isolated objects. We unify advances in these two areas. We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object. Our system, called Mesh R-CNN, augments Mask R-CNN with a mesh prediction branch that outputs meshes with varying topological structure by first predicting coarse voxel representations which are converted to meshes and refined with a graph convolution network operating over the mesh’s vertices and edges. We validate our mesh prediction branch on ShapeNet, where we outperform prior work on single-image shape prediction. We then deploy our full Mesh R-CNN system on Pix3D, where we jointly detect objects and predict their 3D shapes.
Tasks	3D Shape Modeling
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02739v2
PDF	https://arxiv.org/pdf/1906.02739v2.pdf
PWC	https://paperswithcode.com/paper/mesh-r-cnn
Repo	https://github.com/Penguinazor/mse.wem.project
Framework	none

FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks


Title	FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks
Authors	Rohan Lekhwani, Bhupendra Singh
Abstract	Hand pose estimation from monocular depth images has been an important and challenging problem in the Computer Vision community. In this paper, we present a novel approach to estimate 3D hand joint locations from 2D depth images. Unlike most of the previous methods, our model captures the 3D spatial information from a depth image thereby giving it a greater understanding of the input. We voxelize the input depth map to capture the 3D features of the input and perform 3D data augmentations to make our network robust to real-world images. Our network is trained in an end-to-end manner which reduces time and space complexity significantly when compared to other methods. Through extensive experiments, we show that our model outperforms state-of-the-art methods with respect to the time it takes to train and predict 3D hand joint locations. This makes our method more suitable for real-world hand pose estimation scenarios.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06327v3
PDF	https://arxiv.org/pdf/1907.06327v3.pdf
PWC	https://paperswithcode.com/paper/fastv2c-handnet-fast-voxel-to-coordinate-hand
Repo	https://github.com/RonLek/FastV2C-HandNet
Framework	tf

AutoAssist: A Framework to Accelerate Training of Deep Neural Networks


Title	AutoAssist: A Framework to Accelerate Training of Deep Neural Networks
Authors	Jiong Zhang, Hsiang-fu Yu, Inderjit S. Dhillon
Abstract	Deep neural networks have yielded superior performance in many applications; however, the gradient computation in a deep model with millions of instances lead to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement in the current model by a stochastic gradient update on each instance varies dynamically. In AutoAssist, we utilize this fact and design a simple instance shrinking operation, which is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible. We prove that the proposed technique outperforms vanilla SGD with existing importance sampling approaches for linear SVM problems, and establish an O(1/k) convergence for strongly convex problems. In order to apply the proposed techniques to accelerate training of deep models, we propose to jointly train a very lightweight Assistant network in addition to the original deep network referred to as Boss. The Assistant network is designed to gauge the importance of a given instance with respect to the current Boss such that a shrinking operation can be applied in the batch generator. With careful design, we train the Boss and Assistant in a nonblocking and asynchronous fashion such that overhead is minimal. We demonstrate that AutoAssist reduces the number of epochs by 40% for training a ResNet to reach the same test accuracy on an image classification data set and saves 30% training time needed for a transformer model to yield the same BLEU scores on a translation dataset.
Tasks	Image Classification
Published	2019-05-08
URL	https://arxiv.org/abs/1905.03381v1
PDF	https://arxiv.org/pdf/1905.03381v1.pdf
PWC	https://paperswithcode.com/paper/190503381
Repo	https://github.com/zhangjiong724/autoassist-exp
Framework	pytorch

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images


Title	Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
Authors	Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel
Abstract	We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, Denoising, Object Detection, Pose Estimation
Published	2019-02-04
URL	https://arxiv.org/abs/1902.01275v2
PDF	https://arxiv.org/pdf/1902.01275v2.pdf
PWC	https://paperswithcode.com/paper/implicit-3d-orientation-learning-for-6d
Repo	https://github.com/DLR-RM/AugmentedAutoencoder
Framework	tf

Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction


Title	Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction
Authors	Peng Xu, Denilson Barbosa
Abstract	Knowledge Bases (KBs) require constant up-dating to reflect changes to the world they represent. For general purpose KBs, this is often done through Relation Extraction (RE), the task of predicting KB relations expressed in text mentioning entities known to the KB. One way to improve RE is to use KB Embeddings (KBE) for link prediction. However, despite clear connections between RE and KBE, little has been done toward properly unifying these models systematically. We help close the gap with a framework that unifies the learning of RE and KBE models leading to significant improvements over the state-of-the-art in RE. The code is available at https://github.com/billy-inn/HRERE.
Tasks	Link Prediction, Relation Extraction
Published	2019-03-25
URL	https://arxiv.org/abs/1903.10126v3
PDF	https://arxiv.org/pdf/1903.10126v3.pdf
PWC	https://paperswithcode.com/paper/connecting-language-and-knowledge-with
Repo	https://github.com/billy-inn/HRERE
Framework	tf

jMetalPy: a Python Framework for Multi-Objective Optimization with Metaheuristics


Title	jMetalPy: a Python Framework for Multi-Objective Optimization with Metaheuristics
Authors	Antonio Benitez-Hidalgo, Antonio J. Nebro, Jose Garcia-Nieto, Izaskun Oregi, Javier Del Ser
Abstract	This paper describes jMetalPy, an object-oriented Python-based framework for multi-objective optimization with metaheuristic techniques. Building upon our experiences with the well-known jMetal framework, we have developed a new multi-objective optimization software platform aiming not only at replicating the former one in a different programming language, but also at taking advantage of the full feature set of Python, including its facilities for fast prototyping and the large amount of available libraries for data processing, data analysis, data visualization, and high-performance computing. As a result, jMetalPy provides an environment for solving multi-objective optimization problems focused not only on traditional metaheuristics, but also on techniques supporting preference articulation and dynamic problems, along with a rich set of features related to the automatic generation of statistical data from the results generated, as well as the real-time and interactive visualization of the Pareto front approximations produced by the algorithms. jMetalPy offers additionally support for parallel computing in multicore and cluster systems. We include some use cases to explore the main features of jMetalPy and to illustrate how to work with it.
Tasks
Published	2019-03-07
URL	http://arxiv.org/abs/1903.02915v2
PDF	http://arxiv.org/pdf/1903.02915v2.pdf
PWC	https://paperswithcode.com/paper/jmetalpy-a-python-framework-for-multi
Repo	https://github.com/jMetal/jMetalPy
Framework	none

Feeding the zombies: Synthesizing brain volumes using a 3D progressive growing GAN


Title	Feeding the zombies: Synthesizing brain volumes using a 3D progressive growing GAN
Authors	Anders Eklund
Abstract	Deep learning requires large datasets for training (convolutional) networks with millions of parameters. In neuroimaging, there are few open datasets with more than 100 subjects, which makes it difficult to, for example, train a classifier to discriminate controls from diseased persons. Generative adversarial networks (GANs) can be used to synthesize data, but virtually all research is focused on 2D images. In medical imaging, and especially in neuroimaging, most datasets are 3D or 4D. Here we therefore present preliminary results showing that a 3D progressive growing GAN can be used to synthesize MR brain volumes.
Tasks
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05357v2
PDF	https://arxiv.org/pdf/1912.05357v2.pdf
PWC	https://paperswithcode.com/paper/feeding-the-zombies-synthesizing-brain
Repo	https://github.com/wanderine/ProgressiveGAN3D
Framework	tf