October 20, 2019

3003 words 15 mins read

Paper Group AWR 328

Paper Group AWR 328

Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images. Visual Text Correction. Lipschitz regularity of deep neural networks: analysis and efficient estimation. Integrating Local Context and Global Cohesiveness for Open Information Extraction. Symbolic Priors for RNN-based Semantic Parsing. Semi-Amortized Variational Autoe …

Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images

Title Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images
Authors Stefano B. Blumberg, Ryutaro Tanno, Iasonas Kokkinos, Daniel C. Alexander
Abstract In this paper we address the memory demands that come with the processing of 3-dimensional, high-resolution, multi-channeled medical images in deep learning. We exploit memory-efficient backpropagation techniques, to reduce the memory complexity of network training from being linear in the network’s depth, to being roughly constant $ - $ permitting us to elongate deep architectures with negligible memory increase. We evaluate our methodology in the paradigm of Image Quality Transfer, whilst noting its potential application to various tasks that use deep learning. We study the impact of depth on accuracy and show that deeper models have more predictive power, which may exploit larger training sets. We obtain substantially better results than the previous state-of-the-art model with a slight memory increase, reducing the root-mean-squared-error by $ 13% $. Our code is publicly available.
Tasks
Published 2018-08-16
URL http://arxiv.org/abs/1808.05577v1
PDF http://arxiv.org/pdf/1808.05577v1.pdf
PWC https://paperswithcode.com/paper/deeper-image-quality-transfer-training-low
Repo https://github.com/sbb-gh/Deeper-Image-Quality-Transfer-Training-Low-Memory-Neural-Networks-for-3D-Images
Framework pytorch

Visual Text Correction

Title Visual Text Correction
Authors Amir Mazaheri, Mubarak Shah
Abstract Videos, images, and sentences are mediums that can express the same semantics. One can imagine a picture by reading a sentence or can describe a scene with some words. However, even small changes in a sentence can cause a significant semantic inconsistency with the corresponding video/image. For example, by changing the verb of a sentence, the meaning may drastically change. There have been many efforts to encode a video/sentence and decode it as a sentence/video. In this research, we study a new scenario in which both the sentence and the video are given, but the sentence is inaccurate. A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description. This paper introduces a new problem, called Visual Text Correction (VTC), i.e., finding and replacing an inaccurate word in the textual description of a video. We propose a deep network that can simultaneously detect an inaccuracy in a sentence, and fix it by replacing the inaccurate word(s). Our method leverages the semantic interdependence of videos and words, as well as the short-term and long-term relations of the words in a sentence. In our formulation, part of a visual feature vector for every single word is dynamically selected through a gating process. Furthermore, to train and evaluate our model, we propose an approach to automatically construct a large dataset for VTC problem. Our experiments and performance analysis demonstrates that the proposed method provides very good results and also highlights the general challenges in solving the VTC problem. To the best of our knowledge, this work is the first of its kind for the Visual Text Correction task.
Tasks Grammatical Error Correction, Visual Text Correction
Published 2018-01-06
URL http://arxiv.org/abs/1801.01967v3
PDF http://arxiv.org/pdf/1801.01967v3.pdf
PWC https://paperswithcode.com/paper/visual-text-correction
Repo https://github.com/amirmazaheri1990/Visual-Text-Correction
Framework tf

Lipschitz regularity of deep neural networks: analysis and efficient estimation

Title Lipschitz regularity of deep neural networks: analysis and efficient estimation
Authors Kevin Scaman, Aladin Virmaux
Abstract Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications. In this paper, we investigate one of the key characteristics to assess the regularity of such methods: the Lipschitz constant of deep learning architectures. First, we show that, even for two layer neural networks, the exact computation of this quantity is NP-hard and state-of-art methods may significantly overestimate it. Then, we both extend and improve previous estimation methods by providing AutoLip, the first generic algorithm for upper bounding the Lipschitz constant of any automatically differentiable function. We provide a power method algorithm working with automatic differentiation, allowing efficient computations even on large convolutions. Second, for sequential neural networks, we propose an improved algorithm named SeqLip that takes advantage of the linear computation graph to split the computation per pair of consecutive layers. Third we propose heuristics on SeqLip in order to tackle very large networks. Our experiments show that SeqLip can significantly improve on the existing upper bounds. Finally, we provide an implementation of AutoLip in the PyTorch environment that may be used to better estimate the robustness of a given neural network to small perturbations or regularize it using more precise Lipschitz estimations.
Tasks
Published 2018-05-28
URL https://arxiv.org/abs/1805.10965v2
PDF https://arxiv.org/pdf/1805.10965v2.pdf
PWC https://paperswithcode.com/paper/lipschitz-regularity-of-deep-neural-networks
Repo https://github.com/avirmaux/lipEstimation
Framework pytorch

Integrating Local Context and Global Cohesiveness for Open Information Extraction

Title Integrating Local Context and Global Cohesiveness for Open Information Extraction
Authors Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, Jiawei Han
Abstract Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems.
Tasks Open Information Extraction
Published 2018-04-26
URL http://arxiv.org/abs/1804.09931v4
PDF http://arxiv.org/pdf/1804.09931v4.pdf
PWC https://paperswithcode.com/paper/integrating-local-context-and-global
Repo https://github.com/GentleZhu/ReMine
Framework none

Symbolic Priors for RNN-based Semantic Parsing

Title Symbolic Priors for RNN-based Semantic Parsing
Authors Chunyang Xiao, Marc Dymetman, Claire Gardent
Abstract Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing for Question Answering. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior knowledge: the well-formedness of the logical forms is modeled by a weighted context-free grammar; the likelihood that certain entities present in the input utterance are also present in the logical form is modeled by weighted finite-state automata. The grammar and automata are combined together through an efficient intersection algorithm to form a soft guide (“background”) to the RNN. We test our method on an extension of the Overnight dataset and show that it not only strongly improves over an RNN baseline, but also outperforms non-RNN models based on rich sets of hand-crafted features.
Tasks Question Answering, Semantic Parsing
Published 2018-09-20
URL http://arxiv.org/abs/1809.07721v1
PDF http://arxiv.org/pdf/1809.07721v1.pdf
PWC https://paperswithcode.com/paper/symbolic-priors-for-rnn-based-semantic
Repo https://github.com/chunyangx/overnight_more
Framework none

Semi-Amortized Variational Autoencoders

Title Semi-Amortized Variational Autoencoders
Authors Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush
Abstract Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network. While AVI has enabled efficient training of deep generative models such as variational autoencoders (VAE), recent empirical work suggests that inference networks can produce suboptimal variational parameters. We propose a hybrid approach, to use AVI to initialize the variational parameters and run stochastic variational inference (SVI) to refine them. Crucially, the local SVI procedure is itself differentiable, so the inference network and generative model can be trained end-to-end with gradient-based optimization. This semi-amortized approach enables the use of rich generative models without experiencing the posterior-collapse phenomenon common in training VAEs for problems like text generation. Experiments show this approach outperforms strong autoregressive and variational baselines on standard text and image datasets.
Tasks Text Generation
Published 2018-02-07
URL http://arxiv.org/abs/1802.02550v7
PDF http://arxiv.org/pdf/1802.02550v7.pdf
PWC https://paperswithcode.com/paper/semi-amortized-variational-autoencoders
Repo https://github.com/harvardnlp/sa-vae
Framework pytorch

Benchmarking Automatic Machine Learning Frameworks

Title Benchmarking Automatic Machine Learning Frameworks
Authors Adithya Balaji, Alexander Allen
Abstract AutoML serves as the bridge between varying levels of expertise when designing machine learning systems and expedites the data science process. A wide range of techniques is taken to address this, however there does not exist an objective comparison of these techniques. We present a benchmark of current open source AutoML solutions using open source datasets. We test auto-sklearn, TPOT, auto_ml, and H2O’s AutoML solution against a compiled set of regression and classification datasets sourced from OpenML and find that auto-sklearn performs the best across classification datasets and TPOT performs the best across regression datasets.
Tasks Automated Feature Engineering, AutoML, Hyperparameter Optimization
Published 2018-08-17
URL http://arxiv.org/abs/1808.06492v1
PDF http://arxiv.org/pdf/1808.06492v1.pdf
PWC https://paperswithcode.com/paper/benchmarking-automatic-machine-learning
Repo https://github.com/ClimbsRocks/auto_ml
Framework tf

Horovod: fast and easy distributed deep learning in TensorFlow

Title Horovod: fast and easy distributed deep learning in TensorFlow
Authors Alexander Sergeev, Mike Del Balso
Abstract Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of inter-GPU communication. Depending on the training library’s API, the modification required may be either significant or minimal. Existing methods for enabling multi-GPU training under the TensorFlow library entail non-negligible communication overhead and require users to heavily modify their model-building code, leading many researchers to avoid the whole mess and stick with slower single-GPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at https://github.com/uber/horovod
Tasks
Published 2018-02-15
URL http://arxiv.org/abs/1802.05799v3
PDF http://arxiv.org/pdf/1802.05799v3.pdf
PWC https://paperswithcode.com/paper/horovod-fast-and-easy-distributed-deep
Repo https://github.com/horovod/horovod
Framework tf

Metropolis-Hastings Generative Adversarial Networks

Title Metropolis-Hastings Generative Adversarial Networks
Authors Ryan Turner, Jane Hung, Eric Frank, Yunus Saatci, Jason Yosinski
Abstract We introduce the Metropolis-Hastings generative adversarial network (MH-GAN), which combines aspects of Markov chain Monte Carlo and GANs. The MH-GAN draws samples from the distribution implicitly defined by a GAN’s discriminator-generator pair, as opposed to standard GANs which draw samples from the distribution defined only by the generator. It uses the discriminator from GAN training to build a wrapper around the generator for improved sampling. With a perfect discriminator, this wrapped generator samples from the true distribution on the data exactly even when the generator is imperfect. We demonstrate the benefits of the improved generator on multiple benchmark datasets, including CIFAR-10 and CelebA, using the DCGAN, WGAN, and progressive GAN.
Tasks
Published 2018-11-28
URL https://arxiv.org/abs/1811.11357v2
PDF https://arxiv.org/pdf/1811.11357v2.pdf
PWC https://paperswithcode.com/paper/metropolis-hastings-generative-adversarial
Repo https://github.com/nardeas/MHGAN
Framework tf

Acquisition of Localization Confidence for Accurate Object Detection

Title Acquisition of Localization Confidence for Accurate Object Detection
Authors Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, Yuning Jiang
Abstract Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors.
Tasks Object Detection
Published 2018-07-30
URL http://arxiv.org/abs/1807.11590v1
PDF http://arxiv.org/pdf/1807.11590v1.pdf
PWC https://paperswithcode.com/paper/acquisition-of-localization-confidence-for
Repo https://github.com/CSAILVision/unifiedparsing
Framework pytorch

Analyzing biological and artificial neural networks: challenges with opportunities for synergy?

Title Analyzing biological and artificial neural networks: challenges with opportunities for synergy?
Authors David G. T. Barrett, Ari S. Morcos, Jakob H. Macke
Abstract Deep neural networks (DNNs) transform stimuli across multiple processing stages to produce representations that can be used to solve complex tasks, such as object recognition in images. However, a full understanding of how they achieve this remains elusive. The complexity of biological neural networks substantially exceeds the complexity of DNNs, making it even more challenging to understand the representations that they learn. Thus, both machine learning and computational neuroscience are faced with a shared challenge: how can we analyze their representations in order to understand how they solve complex tasks? We review how data-analysis concepts and techniques developed by computational neuroscientists can be useful for analyzing representations in DNNs, and in turn, how recently developed techniques for analysis of DNNs can be useful for understanding representations in biological neural networks. We explore opportunities for synergy between the two fields, such as the use of DNNs as in-silico model systems for neuroscience, and how this synergy can lead to new hypotheses about the operating principles of biological neural networks.
Tasks Object Recognition
Published 2018-10-31
URL http://arxiv.org/abs/1810.13373v1
PDF http://arxiv.org/pdf/1810.13373v1.pdf
PWC https://paperswithcode.com/paper/analyzing-biological-and-artificial-neural
Repo https://github.com/sciple/Behavioral-Neuroscience-for-Rational-Minds
Framework none

A Tutorial on Deep Latent Variable Models of Natural Language

Title A Tutorial on Deep Latent Variable Models of Natural Language
Authors Yoon Kim, Sam Wiseman, Alexander M. Rush
Abstract There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these “deep latent variable” models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference.
Tasks Latent Variable Models
Published 2018-12-17
URL https://arxiv.org/abs/1812.06834v3
PDF https://arxiv.org/pdf/1812.06834v3.pdf
PWC https://paperswithcode.com/paper/a-tutorial-on-deep-latent-variable-models-of
Repo https://github.com/Francix/Deep-Generative-Models-for-Natural-Language-Processing
Framework tf

A General Method for Amortizing Variational Filtering

Title A General Method for Amortizing Variational Filtering
Authors Joseph Marino, Milan Cvitkovic, Yisong Yue
Abstract We introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. filtering. The algorithm is derived from the variational objective in the filtering setting and consists of an optimization procedure at each time step. By performing each inference optimization procedure with an iterative amortized inference model, we obtain a computationally efficient implementation of the algorithm, which we call amortized variational filtering. We present experiments demonstrating that this general-purpose method improves performance across several deep dynamical latent variable models.
Tasks Latent Variable Models
Published 2018-11-13
URL http://arxiv.org/abs/1811.05090v1
PDF http://arxiv.org/pdf/1811.05090v1.pdf
PWC https://paperswithcode.com/paper/a-general-method-for-amortizing-variational
Repo https://github.com/joelouismarino/amortized-variational-filtering
Framework pytorch

CompILE: Compositional Imitation Learning and Execution

Title CompILE: Compositional Imitation Learning and Execution
Authors Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia
Abstract We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data. CompILE uses a novel unsupervised, fully-differentiable sequence segmentation module to learn latent encodings of sequential data that can be re-composed and executed to perform new tasks. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate CompILE in a challenging 2D multi-task environment and a continuous control task, and show that it can find correct task boundaries and event encodings in an unsupervised manner. Latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our CompILE-based agent could learn given only sparse rewards, where agents without task-specific policies struggle.
Tasks Continuous Control, Imitation Learning
Published 2018-12-04
URL https://arxiv.org/abs/1812.01483v2
PDF https://arxiv.org/pdf/1812.01483v2.pdf
PWC https://paperswithcode.com/paper/compositional-imitation-learning-explaining
Repo https://github.com/tkipf/compile
Framework pytorch

Semantically Meaningful View Selection

Title Semantically Meaningful View Selection
Authors Joris Guérin, Olivier Gibaru, Eric Nyiri, Stéphane Thiery, Byron Boots
Abstract An understanding of the nature of objects could help robots to solve both high-level abstract tasks and improve performance at lower-level concrete tasks. Although deep learning has facilitated progress in image understanding, a robot’s performance in problems like object recognition often depends on the angle from which the object is observed. Traditionally, robot sorting tasks rely on a fixed top-down view of an object. By changing its viewing angle, a robot can select a more semantically informative view leading to better performance for object recognition. In this paper, we introduce the problem of semantic view selection, which seeks to find good camera poses to gain semantic knowledge about an observed object. We propose a conceptual formulation of the problem, together with a solvable relaxation based on clustering. We then present a new image dataset consisting of around 10k images representing various views of 144 objects under different poses. Finally we use this dataset to propose a first solution to the problem by training a neural network to predict a “semantic score” from a top view image and camera pose. The views predicted to have higher scores are then shown to provide better clustering results than fixed top-down views.
Tasks Object Recognition
Published 2018-07-26
URL http://arxiv.org/abs/1807.10303v1
PDF http://arxiv.org/pdf/1807.10303v1.pdf
PWC https://paperswithcode.com/paper/semantically-meaningful-view-selection
Repo https://github.com/jorisguerin/SemanticViewSelection_dataset
Framework none
comments powered by Disqus