October 20, 2019

3003 words 15 mins read

Paper Group AWR 328

Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images. Visual Text Correction. Lipschitz regularity of deep neural networks: analysis and efficient estimation. Integrating Local Context and Global Cohesiveness for Open Information Extraction. Symbolic Priors for RNN-based Semantic Parsing. Semi-Amortized Variational Autoe …

Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images


Title	Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images
Authors	Stefano B. Blumberg, Ryutaro Tanno, Iasonas Kokkinos, Daniel C. Alexander
Abstract	In this paper we address the memory demands that come with the processing of 3-dimensional, high-resolution, multi-channeled medical images in deep learning. We exploit memory-efficient backpropagation techniques, to reduce the memory complexity of network training from being linear in the network’s depth, to being roughly constant $ - $ permitting us to elongate deep architectures with negligible memory increase. We evaluate our methodology in the paradigm of Image Quality Transfer, whilst noting its potential application to various tasks that use deep learning. We study the impact of depth on accuracy and show that deeper models have more predictive power, which may exploit larger training sets. We obtain substantially better results than the previous state-of-the-art model with a slight memory increase, reducing the root-mean-squared-error by $ 13% $. Our code is publicly available.
Tasks
Published	2018-08-16
URL	http://arxiv.org/abs/1808.05577v1
PDF	http://arxiv.org/pdf/1808.05577v1.pdf
PWC	https://paperswithcode.com/paper/deeper-image-quality-transfer-training-low
Repo	https://github.com/sbb-gh/Deeper-Image-Quality-Transfer-Training-Low-Memory-Neural-Networks-for-3D-Images
Framework	pytorch

Visual Text Correction


Title	Visual Text Correction
Authors	Amir Mazaheri, Mubarak Shah
Abstract	Videos, images, and sentences are mediums that can express the same semantics. One can imagine a picture by reading a sentence or can describe a scene with some words. However, even small changes in a sentence can cause a significant semantic inconsistency with the corresponding video/image. For example, by changing the verb of a sentence, the meaning may drastically change. There have been many efforts to encode a video/sentence and decode it as a sentence/video. In this research, we study a new scenario in which both the sentence and the video are given, but the sentence is inaccurate. A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description. This paper introduces a new problem, called Visual Text Correction (VTC), i.e., finding and replacing an inaccurate word in the textual description of a video. We propose a deep network that can simultaneously detect an inaccuracy in a sentence, and fix it by replacing the inaccurate word(s). Our method leverages the semantic interdependence of videos and words, as well as the short-term and long-term relations of the words in a sentence. In our formulation, part of a visual feature vector for every single word is dynamically selected through a gating process. Furthermore, to train and evaluate our model, we propose an approach to automatically construct a large dataset for VTC problem. Our experiments and performance analysis demonstrates that the proposed method provides very good results and also highlights the general challenges in solving the VTC problem. To the best of our knowledge, this work is the first of its kind for the Visual Text Correction task.
Tasks	Grammatical Error Correction, Visual Text Correction
Published	2018-01-06
URL	http://arxiv.org/abs/1801.01967v3
PDF	http://arxiv.org/pdf/1801.01967v3.pdf
PWC	https://paperswithcode.com/paper/visual-text-correction
Repo	https://github.com/amirmazaheri1990/Visual-Text-Correction
Framework	tf

Lipschitz regularity of deep neural networks: analysis and efficient estimation


Title	Lipschitz regularity of deep neural networks: analysis and efficient estimation
Authors	Kevin Scaman, Aladin Virmaux
Abstract	Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications. In this paper, we investigate one of the key characteristics to assess the regularity of such methods: the Lipschitz constant of deep learning architectures. First, we show that, even for two layer neural networks, the exact computation of this quantity is NP-hard and state-of-art methods may significantly overestimate it. Then, we both extend and improve previous estimation methods by providing AutoLip, the first generic algorithm for upper bounding the Lipschitz constant of any automatically differentiable function. We provide a power method algorithm working with automatic differentiation, allowing efficient computations even on large convolutions. Second, for sequential neural networks, we propose an improved algorithm named SeqLip that takes advantage of the linear computation graph to split the computation per pair of consecutive layers. Third we propose heuristics on SeqLip in order to tackle very large networks. Our experiments show that SeqLip can significantly improve on the existing upper bounds. Finally, we provide an implementation of AutoLip in the PyTorch environment that may be used to better estimate the robustness of a given neural network to small perturbations or regularize it using more precise Lipschitz estimations.
Tasks
Published	2018-05-28
URL	https://arxiv.org/abs/1805.10965v2
PDF	https://arxiv.org/pdf/1805.10965v2.pdf
PWC	https://paperswithcode.com/paper/lipschitz-regularity-of-deep-neural-networks
Repo	https://github.com/avirmaux/lipEstimation
Framework	pytorch

Integrating Local Context and Global Cohesiveness for Open Information Extraction


Title	Integrating Local Context and Global Cohesiveness for Open Information Extraction
Authors	Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, Jiawei Han
Abstract	Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems.
Tasks	Open Information Extraction
Published	2018-04-26
URL	http://arxiv.org/abs/1804.09931v4
PDF	http://arxiv.org/pdf/1804.09931v4.pdf
PWC	https://paperswithcode.com/paper/integrating-local-context-and-global
Repo	https://github.com/GentleZhu/ReMine
Framework	none

Symbolic Priors for RNN-based Semantic Parsing


Title	Symbolic Priors for RNN-based Semantic Parsing
Authors	Chunyang Xiao, Marc Dymetman, Claire Gardent
Abstract	Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing for Question Answering. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior knowledge: the well-formedness of the logical forms is modeled by a weighted context-free grammar; the likelihood that certain entities present in the input utterance are also present in the logical form is modeled by weighted finite-state automata. The grammar and automata are combined together through an efficient intersection algorithm to form a soft guide (“background”) to the RNN. We test our method on an extension of the Overnight dataset and show that it not only strongly improves over an RNN baseline, but also outperforms non-RNN models based on rich sets of hand-crafted features.
Tasks	Question Answering, Semantic Parsing
Published	2018-09-20
URL	http://arxiv.org/abs/1809.07721v1
PDF	http://arxiv.org/pdf/1809.07721v1.pdf
PWC	https://paperswithcode.com/paper/symbolic-priors-for-rnn-based-semantic
Repo	https://github.com/chunyangx/overnight_more
Framework	none

Semi-Amortized Variational Autoencoders


Title	Semi-Amortized Variational Autoencoders
Authors	Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush
Abstract	Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network. While AVI has enabled efficient training of deep generative models such as variational autoencoders (VAE), recent empirical work suggests that inference networks can produce suboptimal variational parameters. We propose a hybrid approach, to use AVI to initialize the variational parameters and run stochastic variational inference (SVI) to refine them. Crucially, the local SVI procedure is itself differentiable, so the inference network and generative model can be trained end-to-end with gradient-based optimization. This semi-amortized approach enables the use of rich generative models without experiencing the posterior-collapse phenomenon common in training VAEs for problems like text generation. Experiments show this approach outperforms strong autoregressive and variational baselines on standard text and image datasets.
Tasks	Text Generation
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02550v7
PDF	http://arxiv.org/pdf/1802.02550v7.pdf
PWC	https://paperswithcode.com/paper/semi-amortized-variational-autoencoders
Repo	https://github.com/harvardnlp/sa-vae
Framework	pytorch

Benchmarking Automatic Machine Learning Frameworks


Title	Benchmarking Automatic Machine Learning Frameworks
Authors	Adithya Balaji, Alexander Allen
Abstract	AutoML serves as the bridge between varying levels of expertise when designing machine learning systems and expedites the data science process. A wide range of techniques is taken to address this, however there does not exist an objective comparison of these techniques. We present a benchmark of current open source AutoML solutions using open source datasets. We test auto-sklearn, TPOT, auto_ml, and H2O’s AutoML solution against a compiled set of regression and classification datasets sourced from OpenML and find that auto-sklearn performs the best across classification datasets and TPOT performs the best across regression datasets.
Tasks	Automated Feature Engineering, AutoML, Hyperparameter Optimization
Published	2018-08-17
URL	http://arxiv.org/abs/1808.06492v1
PDF	http://arxiv.org/pdf/1808.06492v1.pdf
PWC	https://paperswithcode.com/paper/benchmarking-automatic-machine-learning
Repo	https://github.com/ClimbsRocks/auto_ml
Framework	tf

Horovod: fast and easy distributed deep learning in TensorFlow


Title	Horovod: fast and easy distributed deep learning in TensorFlow
Authors	Alexander Sergeev, Mike Del Balso
Abstract	Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of inter-GPU communication. Depending on the training library’s API, the modification required may be either significant or minimal. Existing methods for enabling multi-GPU training under the TensorFlow library entail non-negligible communication overhead and require users to heavily modify their model-building code, leading many researchers to avoid the whole mess and stick with slower single-GPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at https://github.com/uber/horovod
Tasks
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05799v3
PDF	http://arxiv.org/pdf/1802.05799v3.pdf
PWC	https://paperswithcode.com/paper/horovod-fast-and-easy-distributed-deep
Repo	https://github.com/horovod/horovod
Framework	tf

Metropolis-Hastings Generative Adversarial Networks


Title	Metropolis-Hastings Generative Adversarial Networks
Authors	Ryan Turner, Jane Hung, Eric Frank, Yunus Saatci, Jason Yosinski
Abstract	We introduce the Metropolis-Hastings generative adversarial network (MH-GAN), which combines aspects of Markov chain Monte Carlo and GANs. The MH-GAN draws samples from the distribution implicitly defined by a GAN’s discriminator-generator pair, as opposed to standard GANs which draw samples from the distribution defined only by the generator. It uses the discriminator from GAN training to build a wrapper around the generator for improved sampling. With a perfect discriminator, this wrapped generator samples from the true distribution on the data exactly even when the generator is imperfect. We demonstrate the benefits of the improved generator on multiple benchmark datasets, including CIFAR-10 and CelebA, using the DCGAN, WGAN, and progressive GAN.
Tasks
Published	2018-11-28
URL	https://arxiv.org/abs/1811.11357v2
PDF	https://arxiv.org/pdf/1811.11357v2.pdf
PWC	https://paperswithcode.com/paper/metropolis-hastings-generative-adversarial
Repo	https://github.com/nardeas/MHGAN
Framework	tf

Acquisition of Localization Confidence for Accurate Object Detection


Title	Acquisition of Localization Confidence for Accurate Object Detection
Authors	Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, Yuning Jiang
Abstract	Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors.
Tasks	Object Detection
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11590v1
PDF	http://arxiv.org/pdf/1807.11590v1.pdf
PWC	https://paperswithcode.com/paper/acquisition-of-localization-confidence-for
Repo	https://github.com/CSAILVision/unifiedparsing
Framework	pytorch

Analyzing biological and artificial neural networks: challenges with opportunities for synergy?


Title	Analyzing biological and artificial neural networks: challenges with opportunities for synergy?
Authors	David G. T. Barrett, Ari S. Morcos, Jakob H. Macke
Abstract	Deep neural networks (DNNs) transform stimuli across multiple processing stages to produce representations that can be used to solve complex tasks, such as object recognition in images. However, a full understanding of how they achieve this remains elusive. The complexity of biological neural networks substantially exceeds the complexity of DNNs, making it even more challenging to understand the representations that they learn. Thus, both machine learning and computational neuroscience are faced with a shared challenge: how can we analyze their representations in order to understand how they solve complex tasks? We review how data-analysis concepts and techniques developed by computational neuroscientists can be useful for analyzing representations in DNNs, and in turn, how recently developed techniques for analysis of DNNs can be useful for understanding representations in biological neural networks. We explore opportunities for synergy between the two fields, such as the use of DNNs as in-silico model systems for neuroscience, and how this synergy can lead to new hypotheses about the operating principles of biological neural networks.
Tasks	Object Recognition
Published	2018-10-31
URL	http://arxiv.org/abs/1810.13373v1
PDF	http://arxiv.org/pdf/1810.13373v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-biological-and-artificial-neural
Repo	https://github.com/sciple/Behavioral-Neuroscience-for-Rational-Minds
Framework	none

A Tutorial on Deep Latent Variable Models of Natural Language


Title	A Tutorial on Deep Latent Variable Models of Natural Language
Authors	Yoon Kim, Sam Wiseman, Alexander M. Rush
Abstract	There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these “deep latent variable” models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference.
Tasks	Latent Variable Models
Published	2018-12-17
URL	https://arxiv.org/abs/1812.06834v3
PDF	https://arxiv.org/pdf/1812.06834v3.pdf
PWC	https://paperswithcode.com/paper/a-tutorial-on-deep-latent-variable-models-of
Repo	https://github.com/Francix/Deep-Generative-Models-for-Natural-Language-Processing
Framework	tf

A General Method for Amortizing Variational Filtering


Title	A General Method for Amortizing Variational Filtering
Authors	Joseph Marino, Milan Cvitkovic, Yisong Yue
Abstract	We introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. filtering. The algorithm is derived from the variational objective in the filtering setting and consists of an optimization procedure at each time step. By performing each inference optimization procedure with an iterative amortized inference model, we obtain a computationally efficient implementation of the algorithm, which we call amortized variational filtering. We present experiments demonstrating that this general-purpose method improves performance across several deep dynamical latent variable models.
Tasks	Latent Variable Models
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05090v1
PDF	http://arxiv.org/pdf/1811.05090v1.pdf
PWC	https://paperswithcode.com/paper/a-general-method-for-amortizing-variational
Repo	https://github.com/joelouismarino/amortized-variational-filtering
Framework	pytorch

CompILE: Compositional Imitation Learning and Execution


Title	CompILE: Compositional Imitation Learning and Execution
Authors	Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia
Abstract	We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data. CompILE uses a novel unsupervised, fully-differentiable sequence segmentation module to learn latent encodings of sequential data that can be re-composed and executed to perform new tasks. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate CompILE in a challenging 2D multi-task environment and a continuous control task, and show that it can find correct task boundaries and event encodings in an unsupervised manner. Latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our CompILE-based agent could learn given only sparse rewards, where agents without task-specific policies struggle.
Tasks	Continuous Control, Imitation Learning
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01483v2
PDF	https://arxiv.org/pdf/1812.01483v2.pdf
PWC	https://paperswithcode.com/paper/compositional-imitation-learning-explaining
Repo	https://github.com/tkipf/compile
Framework	pytorch

Semantically Meaningful View Selection


Title	Semantically Meaningful View Selection
Authors	Joris Guérin, Olivier Gibaru, Eric Nyiri, Stéphane Thiery, Byron Boots
Abstract	An understanding of the nature of objects could help robots to solve both high-level abstract tasks and improve performance at lower-level concrete tasks. Although deep learning has facilitated progress in image understanding, a robot’s performance in problems like object recognition often depends on the angle from which the object is observed. Traditionally, robot sorting tasks rely on a fixed top-down view of an object. By changing its viewing angle, a robot can select a more semantically informative view leading to better performance for object recognition. In this paper, we introduce the problem of semantic view selection, which seeks to find good camera poses to gain semantic knowledge about an observed object. We propose a conceptual formulation of the problem, together with a solvable relaxation based on clustering. We then present a new image dataset consisting of around 10k images representing various views of 144 objects under different poses. Finally we use this dataset to propose a first solution to the problem by training a neural network to predict a “semantic score” from a top view image and camera pose. The views predicted to have higher scores are then shown to provide better clustering results than fixed top-down views.
Tasks	Object Recognition
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10303v1
PDF	http://arxiv.org/pdf/1807.10303v1.pdf
PWC	https://paperswithcode.com/paper/semantically-meaningful-view-selection
Repo	https://github.com/jorisguerin/SemanticViewSelection_dataset
Framework	none