Paper Group AWR 419
Powering Hidden Markov Model by Neural Network based Generative Models. BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget. Deeper Text Understanding for IR with Contextual Neural Language Modeling. CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation. Modular Universal Reparameterization: Dee …
Powering Hidden Markov Model by Neural Network based Generative Models
Title | Powering Hidden Markov Model by Neural Network based Generative Models |
Authors | Dong Liu, Antoine Honoré, Saikat Chatterjee, Lars K. Rasmussen |
Abstract | Hidden Markov model (HMM) has been successfully used for sequential data modeling problems. In this work, we propose to power the modeling capacity of HMM by bringing in neural network based generative models. The proposed model is termed as GenHMM. In the proposed GenHMM, each HMM hidden state is associated with a neural network based generative model that has tractability of exact likelihood and provides efficient likelihood computation. A generative model in GenHMM consists of mixture of generators that are realized by flow models. A learning algorithm for GenHMM is proposed in expectation-maximization framework. The convergence of the learning GenHMM is analyzed. We demonstrate the efficiency of GenHMM by classification tasks on practical sequential data. Code available at https://github.com/FirstHandScientist/genhmm. |
Tasks | |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05744v2 |
https://arxiv.org/pdf/1910.05744v2.pdf | |
PWC | https://paperswithcode.com/paper/powering-hidden-markov-model-by-neural |
Repo | https://github.com/FirstHandScientist/genhmm |
Framework | pytorch |
BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
Title | BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget |
Authors | Jack Turner, Elliot J. Crowley, Michael O’Boyle, Amos Storkey, Gavin Gray |
Abstract | The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustively searching for such a combination is prohibitively expensive. In this work, we develop BlockSwap: a fast algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. These networks can then be used as students and distilled with the original large network as a teacher. We demonstrate the effectiveness of the chosen networks across CIFAR-10 and ImageNet for classification, and COCO for detection, and provide a comprehensive ablation study of our approach. BlockSwap quickly explores possible block configurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. under 5 minutes on a single GPU for CIFAR-10). Code is available at https://github.com/BayesWatch/pytorch-blockswap. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04113v2 |
https://arxiv.org/pdf/1906.04113v2.pdf | |
PWC | https://paperswithcode.com/paper/blockswap-fisher-guided-block-substitution |
Repo | https://github.com/Vini90/pytorch-BlockSwap-InferenceTime |
Framework | pytorch |
Deeper Text Understanding for IR with Contextual Neural Language Modeling
Title | Deeper Text Understanding for IR with Contextual Neural Language Modeling |
Authors | Zhuyun Dai, Jamie Callan |
Abstract | Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations. Neural IR models have achieved promising results in learning query-document relevance patterns, but few explorations have been done on understanding the text content of a query or a document. This paper studies leveraging a recently-proposed contextual neural language model, BERT, to provide deeper text understanding for IR. Experimental results demonstrate that the contextual text representations from BERT are more effective than traditional word embeddings. Compared to bag-of-words retrieval models, the contextual language model can better leverage language structures, bringing large improvements on queries written in natural languages. Combining the text understanding ability with search knowledge leads to an enhanced pre-trained BERT model that can benefit related search tasks where training data are limited. |
Tasks | Ad-Hoc Information Retrieval, Language Modelling, Word Embeddings |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.09217v1 |
https://arxiv.org/pdf/1905.09217v1.pdf | |
PWC | https://paperswithcode.com/paper/deeper-text-understanding-for-ir-with |
Repo | https://github.com/NavePnow/Google-BERT-on-fake_or_real-news-dataset |
Framework | pytorch |
CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation
Title | CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation |
Authors | Kartik Gupta, Lars Petersson, Richard Hartley |
Abstract | We present a new approach for a single view, image-based object pose estimation. Specifically, the problem of culling false positives among several pose proposal estimates is addressed in this paper. Our proposed approach targets the problem of inaccurate confidence values predicted by CNNs which is used by many current methods to choose a final object pose prediction. We present a network called CullNet, solving this task. CullNet takes pairs of pose masks rendered from a 3D model and cropped regions in the original image as input. This is then used to calibrate the confidence scores of the pose proposals. This new set of confidence scores is found to be significantly more reliable for accurate object pose estimation as shown by our results. Our experimental results on multiple challenging datasets (LINEMOD and Occlusion LINEMOD) reflects the utility of our proposed method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on these standard object pose estimation datasets. Our code is publicly available on https://github.com/kartikgupta-at-anu/CullNet. |
Tasks | 6D Pose Estimation using RGB, Pose Estimation, Pose Prediction |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13476v1 |
https://arxiv.org/pdf/1909.13476v1.pdf | |
PWC | https://paperswithcode.com/paper/cullnet-calibrated-and-pose-aware-confidence |
Repo | https://github.com/kartikgupta-at-anu/CullNet |
Framework | none |
Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains
Title | Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains |
Authors | Elliot Meyerson, Risto Miikkulainen |
Abstract | As deep learning applications continue to become more diverse, an interesting question arises: Can general problem solving arise from jointly learning several such diverse tasks? To approach this question, deep multi-task learning is extended in this paper to the setting where there is no obvious overlap between task architectures. The idea is that any set of (architecture,task) pairs can be decomposed into a set of potentially related subproblems, whose sharing is optimized by an efficient stochastic algorithm. The approach is first validated in a classic synthetic multi-task learning benchmark, and then applied to sharing across disparate architectures for vision, NLP, and genomics tasks. It discovers regularities across these domains, encodes them into sharable modules, and combines these modules systematically to improve performance in the individual tasks. The results confirm that sharing learned functionality across diverse domains and architectures is indeed beneficial, thus establishing a key ingredient for general problem solving in the future. |
Tasks | Multi-Task Learning |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.00097v2 |
https://arxiv.org/pdf/1906.00097v2.pdf | |
PWC | https://paperswithcode.com/paper/190600097 |
Repo | https://github.com/leaf-ai/muir |
Framework | pytorch |
Temporal Normalizing Flows
Title | Temporal Normalizing Flows |
Authors | Remy Kusters, Gert-Jan Both |
Abstract | Analyzing and interpreting time-dependent stochastic data requires accurate and robust density estimation. In this paper we extend the concept of normalizing flows to so-called temporal Normalizing Flows (tNFs) to estimate time dependent distributions, leveraging the full spatio-temporal information present in the dataset. Our approach is unsupervised, does not require an a-priori characteristic scale and can accurately estimate multi-scale distributions of vastly different length scales. We illustrate tNFs on sparse datasets of Brownian and chemotactic walkers, showing that the inclusion of temporal information enhances density estimation. Finally, we speculate how tNFs can be applied to fit and discover the continuous PDE underlying a stochastic process. |
Tasks | Density Estimation |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09092v1 |
https://arxiv.org/pdf/1912.09092v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-normalizing-flows |
Repo | https://github.com/PhIMaL/temporal_normalizing_flows |
Framework | pytorch |
Compound Probabilistic Context-Free Grammars for Grammar Induction
Title | Compound Probabilistic Context-Free Grammars for Grammar Induction |
Authors | Yoon Kim, Chris Dyer, Alexander M. Rush |
Abstract | We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our grammar’s rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized out with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-the-art methods when evaluated on unsupervised parsing. |
Tasks | Constituency Grammar Induction |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10225v9 |
https://arxiv.org/pdf/1906.10225v9.pdf | |
PWC | https://paperswithcode.com/paper/compound-probabilistic-context-free-grammars |
Repo | https://github.com/harvardnlp/compound-pcfg |
Framework | pytorch |
MaCow: Masked Convolutional Generative Flow
Title | MaCow: Masked Convolutional Generative Flow |
Authors | Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy |
Abstract | Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. Despite their computational efficiency, the density estimation performance of flow-based generative models significantly falls behind those of state-of-the-art autoregressive models. In this work, we introduce masked convolutional generative flow (MaCow), a simple yet effective architecture of generative flow using masked convolution. By restricting the local connectivity in a small kernel, MaCow enjoys the properties of fast and stable training, and efficient sampling, while achieving significant improvements over Glow for density estimation on standard image benchmarks, considerably narrowing the gap to autoregressive models. |
Tasks | Density Estimation, Image Generation |
Published | 2019-02-12 |
URL | https://arxiv.org/abs/1902.04208v5 |
https://arxiv.org/pdf/1902.04208v5.pdf | |
PWC | https://paperswithcode.com/paper/macow-masked-convolutional-generative-flow |
Repo | https://github.com/XuezheMax/macow |
Framework | pytorch |
Mesh R-CNN
Title | Mesh R-CNN |
Authors | Georgia Gkioxari, Jitendra Malik, Justin Johnson |
Abstract | Rapid advances in 2D perception have led to systems that accurately detect objects in real-world images. However, these systems make predictions in 2D, ignoring the 3D structure of the world. Concurrently, advances in 3D shape prediction have mostly focused on synthetic benchmarks and isolated objects. We unify advances in these two areas. We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object. Our system, called Mesh R-CNN, augments Mask R-CNN with a mesh prediction branch that outputs meshes with varying topological structure by first predicting coarse voxel representations which are converted to meshes and refined with a graph convolution network operating over the mesh’s vertices and edges. We validate our mesh prediction branch on ShapeNet, where we outperform prior work on single-image shape prediction. We then deploy our full Mesh R-CNN system on Pix3D, where we jointly detect objects and predict their 3D shapes. |
Tasks | 3D Shape Modeling |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02739v2 |
https://arxiv.org/pdf/1906.02739v2.pdf | |
PWC | https://paperswithcode.com/paper/mesh-r-cnn |
Repo | https://github.com/Penguinazor/mse.wem.project |
Framework | none |
FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks
Title | FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks |
Authors | Rohan Lekhwani, Bhupendra Singh |
Abstract | Hand pose estimation from monocular depth images has been an important and challenging problem in the Computer Vision community. In this paper, we present a novel approach to estimate 3D hand joint locations from 2D depth images. Unlike most of the previous methods, our model captures the 3D spatial information from a depth image thereby giving it a greater understanding of the input. We voxelize the input depth map to capture the 3D features of the input and perform 3D data augmentations to make our network robust to real-world images. Our network is trained in an end-to-end manner which reduces time and space complexity significantly when compared to other methods. Through extensive experiments, we show that our model outperforms state-of-the-art methods with respect to the time it takes to train and predict 3D hand joint locations. This makes our method more suitable for real-world hand pose estimation scenarios. |
Tasks | Hand Pose Estimation, Pose Estimation |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06327v3 |
https://arxiv.org/pdf/1907.06327v3.pdf | |
PWC | https://paperswithcode.com/paper/fastv2c-handnet-fast-voxel-to-coordinate-hand |
Repo | https://github.com/RonLek/FastV2C-HandNet |
Framework | tf |
AutoAssist: A Framework to Accelerate Training of Deep Neural Networks
Title | AutoAssist: A Framework to Accelerate Training of Deep Neural Networks |
Authors | Jiong Zhang, Hsiang-fu Yu, Inderjit S. Dhillon |
Abstract | Deep neural networks have yielded superior performance in many applications; however, the gradient computation in a deep model with millions of instances lead to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement in the current model by a stochastic gradient update on each instance varies dynamically. In AutoAssist, we utilize this fact and design a simple instance shrinking operation, which is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible. We prove that the proposed technique outperforms vanilla SGD with existing importance sampling approaches for linear SVM problems, and establish an O(1/k) convergence for strongly convex problems. In order to apply the proposed techniques to accelerate training of deep models, we propose to jointly train a very lightweight Assistant network in addition to the original deep network referred to as Boss. The Assistant network is designed to gauge the importance of a given instance with respect to the current Boss such that a shrinking operation can be applied in the batch generator. With careful design, we train the Boss and Assistant in a nonblocking and asynchronous fashion such that overhead is minimal. We demonstrate that AutoAssist reduces the number of epochs by 40% for training a ResNet to reach the same test accuracy on an image classification data set and saves 30% training time needed for a transformer model to yield the same BLEU scores on a translation dataset. |
Tasks | Image Classification |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.03381v1 |
https://arxiv.org/pdf/1905.03381v1.pdf | |
PWC | https://paperswithcode.com/paper/190503381 |
Repo | https://github.com/zhangjiong724/autoassist-exp |
Framework | pytorch |
Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
Title | Implicit 3D Orientation Learning for 6D Object Detection from RGB Images |
Authors | Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel |
Abstract | We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, Denoising, Object Detection, Pose Estimation |
Published | 2019-02-04 |
URL | https://arxiv.org/abs/1902.01275v2 |
https://arxiv.org/pdf/1902.01275v2.pdf | |
PWC | https://paperswithcode.com/paper/implicit-3d-orientation-learning-for-6d |
Repo | https://github.com/DLR-RM/AugmentedAutoencoder |
Framework | tf |
Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction
Title | Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction |
Authors | Peng Xu, Denilson Barbosa |
Abstract | Knowledge Bases (KBs) require constant up-dating to reflect changes to the world they represent. For general purpose KBs, this is often done through Relation Extraction (RE), the task of predicting KB relations expressed in text mentioning entities known to the KB. One way to improve RE is to use KB Embeddings (KBE) for link prediction. However, despite clear connections between RE and KBE, little has been done toward properly unifying these models systematically. We help close the gap with a framework that unifies the learning of RE and KBE models leading to significant improvements over the state-of-the-art in RE. The code is available at https://github.com/billy-inn/HRERE. |
Tasks | Link Prediction, Relation Extraction |
Published | 2019-03-25 |
URL | https://arxiv.org/abs/1903.10126v3 |
https://arxiv.org/pdf/1903.10126v3.pdf | |
PWC | https://paperswithcode.com/paper/connecting-language-and-knowledge-with |
Repo | https://github.com/billy-inn/HRERE |
Framework | tf |
jMetalPy: a Python Framework for Multi-Objective Optimization with Metaheuristics
Title | jMetalPy: a Python Framework for Multi-Objective Optimization with Metaheuristics |
Authors | Antonio Benitez-Hidalgo, Antonio J. Nebro, Jose Garcia-Nieto, Izaskun Oregi, Javier Del Ser |
Abstract | This paper describes jMetalPy, an object-oriented Python-based framework for multi-objective optimization with metaheuristic techniques. Building upon our experiences with the well-known jMetal framework, we have developed a new multi-objective optimization software platform aiming not only at replicating the former one in a different programming language, but also at taking advantage of the full feature set of Python, including its facilities for fast prototyping and the large amount of available libraries for data processing, data analysis, data visualization, and high-performance computing. As a result, jMetalPy provides an environment for solving multi-objective optimization problems focused not only on traditional metaheuristics, but also on techniques supporting preference articulation and dynamic problems, along with a rich set of features related to the automatic generation of statistical data from the results generated, as well as the real-time and interactive visualization of the Pareto front approximations produced by the algorithms. jMetalPy offers additionally support for parallel computing in multicore and cluster systems. We include some use cases to explore the main features of jMetalPy and to illustrate how to work with it. |
Tasks | |
Published | 2019-03-07 |
URL | http://arxiv.org/abs/1903.02915v2 |
http://arxiv.org/pdf/1903.02915v2.pdf | |
PWC | https://paperswithcode.com/paper/jmetalpy-a-python-framework-for-multi |
Repo | https://github.com/jMetal/jMetalPy |
Framework | none |
Feeding the zombies: Synthesizing brain volumes using a 3D progressive growing GAN
Title | Feeding the zombies: Synthesizing brain volumes using a 3D progressive growing GAN |
Authors | Anders Eklund |
Abstract | Deep learning requires large datasets for training (convolutional) networks with millions of parameters. In neuroimaging, there are few open datasets with more than 100 subjects, which makes it difficult to, for example, train a classifier to discriminate controls from diseased persons. Generative adversarial networks (GANs) can be used to synthesize data, but virtually all research is focused on 2D images. In medical imaging, and especially in neuroimaging, most datasets are 3D or 4D. Here we therefore present preliminary results showing that a 3D progressive growing GAN can be used to synthesize MR brain volumes. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05357v2 |
https://arxiv.org/pdf/1912.05357v2.pdf | |
PWC | https://paperswithcode.com/paper/feeding-the-zombies-synthesizing-brain |
Repo | https://github.com/wanderine/ProgressiveGAN3D |
Framework | tf |