Paper Group AWR 328
Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images. Visual Text Correction. Lipschitz regularity of deep neural networks: analysis and efficient estimation. Integrating Local Context and Global Cohesiveness for Open Information Extraction. Symbolic Priors for RNN-based Semantic Parsing. Semi-Amortized Variational Autoe …
Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images
Title | Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images |
Authors | Stefano B. Blumberg, Ryutaro Tanno, Iasonas Kokkinos, Daniel C. Alexander |
Abstract | In this paper we address the memory demands that come with the processing of 3-dimensional, high-resolution, multi-channeled medical images in deep learning. We exploit memory-efficient backpropagation techniques, to reduce the memory complexity of network training from being linear in the network’s depth, to being roughly constant $ - $ permitting us to elongate deep architectures with negligible memory increase. We evaluate our methodology in the paradigm of Image Quality Transfer, whilst noting its potential application to various tasks that use deep learning. We study the impact of depth on accuracy and show that deeper models have more predictive power, which may exploit larger training sets. We obtain substantially better results than the previous state-of-the-art model with a slight memory increase, reducing the root-mean-squared-error by $ 13% $. Our code is publicly available. |
Tasks | |
Published | 2018-08-16 |
URL | http://arxiv.org/abs/1808.05577v1 |
http://arxiv.org/pdf/1808.05577v1.pdf | |
PWC | https://paperswithcode.com/paper/deeper-image-quality-transfer-training-low |
Repo | https://github.com/sbb-gh/Deeper-Image-Quality-Transfer-Training-Low-Memory-Neural-Networks-for-3D-Images |
Framework | pytorch |
Visual Text Correction
Title | Visual Text Correction |
Authors | Amir Mazaheri, Mubarak Shah |
Abstract | Videos, images, and sentences are mediums that can express the same semantics. One can imagine a picture by reading a sentence or can describe a scene with some words. However, even small changes in a sentence can cause a significant semantic inconsistency with the corresponding video/image. For example, by changing the verb of a sentence, the meaning may drastically change. There have been many efforts to encode a video/sentence and decode it as a sentence/video. In this research, we study a new scenario in which both the sentence and the video are given, but the sentence is inaccurate. A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description. This paper introduces a new problem, called Visual Text Correction (VTC), i.e., finding and replacing an inaccurate word in the textual description of a video. We propose a deep network that can simultaneously detect an inaccuracy in a sentence, and fix it by replacing the inaccurate word(s). Our method leverages the semantic interdependence of videos and words, as well as the short-term and long-term relations of the words in a sentence. In our formulation, part of a visual feature vector for every single word is dynamically selected through a gating process. Furthermore, to train and evaluate our model, we propose an approach to automatically construct a large dataset for VTC problem. Our experiments and performance analysis demonstrates that the proposed method provides very good results and also highlights the general challenges in solving the VTC problem. To the best of our knowledge, this work is the first of its kind for the Visual Text Correction task. |
Tasks | Grammatical Error Correction, Visual Text Correction |
Published | 2018-01-06 |
URL | http://arxiv.org/abs/1801.01967v3 |
http://arxiv.org/pdf/1801.01967v3.pdf | |
PWC | https://paperswithcode.com/paper/visual-text-correction |
Repo | https://github.com/amirmazaheri1990/Visual-Text-Correction |
Framework | tf |
Lipschitz regularity of deep neural networks: analysis and efficient estimation
Title | Lipschitz regularity of deep neural networks: analysis and efficient estimation |
Authors | Kevin Scaman, Aladin Virmaux |
Abstract | Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications. In this paper, we investigate one of the key characteristics to assess the regularity of such methods: the Lipschitz constant of deep learning architectures. First, we show that, even for two layer neural networks, the exact computation of this quantity is NP-hard and state-of-art methods may significantly overestimate it. Then, we both extend and improve previous estimation methods by providing AutoLip, the first generic algorithm for upper bounding the Lipschitz constant of any automatically differentiable function. We provide a power method algorithm working with automatic differentiation, allowing efficient computations even on large convolutions. Second, for sequential neural networks, we propose an improved algorithm named SeqLip that takes advantage of the linear computation graph to split the computation per pair of consecutive layers. Third we propose heuristics on SeqLip in order to tackle very large networks. Our experiments show that SeqLip can significantly improve on the existing upper bounds. Finally, we provide an implementation of AutoLip in the PyTorch environment that may be used to better estimate the robustness of a given neural network to small perturbations or regularize it using more precise Lipschitz estimations. |
Tasks | |
Published | 2018-05-28 |
URL | https://arxiv.org/abs/1805.10965v2 |
https://arxiv.org/pdf/1805.10965v2.pdf | |
PWC | https://paperswithcode.com/paper/lipschitz-regularity-of-deep-neural-networks |
Repo | https://github.com/avirmaux/lipEstimation |
Framework | pytorch |
Integrating Local Context and Global Cohesiveness for Open Information Extraction
Title | Integrating Local Context and Global Cohesiveness for Open Information Extraction |
Authors | Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, Jiawei Han |
Abstract | Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems. |
Tasks | Open Information Extraction |
Published | 2018-04-26 |
URL | http://arxiv.org/abs/1804.09931v4 |
http://arxiv.org/pdf/1804.09931v4.pdf | |
PWC | https://paperswithcode.com/paper/integrating-local-context-and-global |
Repo | https://github.com/GentleZhu/ReMine |
Framework | none |
Symbolic Priors for RNN-based Semantic Parsing
Title | Symbolic Priors for RNN-based Semantic Parsing |
Authors | Chunyang Xiao, Marc Dymetman, Claire Gardent |
Abstract | Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing for Question Answering. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior knowledge: the well-formedness of the logical forms is modeled by a weighted context-free grammar; the likelihood that certain entities present in the input utterance are also present in the logical form is modeled by weighted finite-state automata. The grammar and automata are combined together through an efficient intersection algorithm to form a soft guide (“background”) to the RNN. We test our method on an extension of the Overnight dataset and show that it not only strongly improves over an RNN baseline, but also outperforms non-RNN models based on rich sets of hand-crafted features. |
Tasks | Question Answering, Semantic Parsing |
Published | 2018-09-20 |
URL | http://arxiv.org/abs/1809.07721v1 |
http://arxiv.org/pdf/1809.07721v1.pdf | |
PWC | https://paperswithcode.com/paper/symbolic-priors-for-rnn-based-semantic |
Repo | https://github.com/chunyangx/overnight_more |
Framework | none |
Semi-Amortized Variational Autoencoders
Title | Semi-Amortized Variational Autoencoders |
Authors | Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush |
Abstract | Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network. While AVI has enabled efficient training of deep generative models such as variational autoencoders (VAE), recent empirical work suggests that inference networks can produce suboptimal variational parameters. We propose a hybrid approach, to use AVI to initialize the variational parameters and run stochastic variational inference (SVI) to refine them. Crucially, the local SVI procedure is itself differentiable, so the inference network and generative model can be trained end-to-end with gradient-based optimization. This semi-amortized approach enables the use of rich generative models without experiencing the posterior-collapse phenomenon common in training VAEs for problems like text generation. Experiments show this approach outperforms strong autoregressive and variational baselines on standard text and image datasets. |
Tasks | Text Generation |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02550v7 |
http://arxiv.org/pdf/1802.02550v7.pdf | |
PWC | https://paperswithcode.com/paper/semi-amortized-variational-autoencoders |
Repo | https://github.com/harvardnlp/sa-vae |
Framework | pytorch |
Benchmarking Automatic Machine Learning Frameworks
Title | Benchmarking Automatic Machine Learning Frameworks |
Authors | Adithya Balaji, Alexander Allen |
Abstract | AutoML serves as the bridge between varying levels of expertise when designing machine learning systems and expedites the data science process. A wide range of techniques is taken to address this, however there does not exist an objective comparison of these techniques. We present a benchmark of current open source AutoML solutions using open source datasets. We test auto-sklearn, TPOT, auto_ml, and H2O’s AutoML solution against a compiled set of regression and classification datasets sourced from OpenML and find that auto-sklearn performs the best across classification datasets and TPOT performs the best across regression datasets. |
Tasks | Automated Feature Engineering, AutoML, Hyperparameter Optimization |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.06492v1 |
http://arxiv.org/pdf/1808.06492v1.pdf | |
PWC | https://paperswithcode.com/paper/benchmarking-automatic-machine-learning |
Repo | https://github.com/ClimbsRocks/auto_ml |
Framework | tf |
Horovod: fast and easy distributed deep learning in TensorFlow
Title | Horovod: fast and easy distributed deep learning in TensorFlow |
Authors | Alexander Sergeev, Mike Del Balso |
Abstract | Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of inter-GPU communication. Depending on the training library’s API, the modification required may be either significant or minimal. Existing methods for enabling multi-GPU training under the TensorFlow library entail non-negligible communication overhead and require users to heavily modify their model-building code, leading many researchers to avoid the whole mess and stick with slower single-GPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at https://github.com/uber/horovod |
Tasks | |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05799v3 |
http://arxiv.org/pdf/1802.05799v3.pdf | |
PWC | https://paperswithcode.com/paper/horovod-fast-and-easy-distributed-deep |
Repo | https://github.com/horovod/horovod |
Framework | tf |
Metropolis-Hastings Generative Adversarial Networks
Title | Metropolis-Hastings Generative Adversarial Networks |
Authors | Ryan Turner, Jane Hung, Eric Frank, Yunus Saatci, Jason Yosinski |
Abstract | We introduce the Metropolis-Hastings generative adversarial network (MH-GAN), which combines aspects of Markov chain Monte Carlo and GANs. The MH-GAN draws samples from the distribution implicitly defined by a GAN’s discriminator-generator pair, as opposed to standard GANs which draw samples from the distribution defined only by the generator. It uses the discriminator from GAN training to build a wrapper around the generator for improved sampling. With a perfect discriminator, this wrapped generator samples from the true distribution on the data exactly even when the generator is imperfect. We demonstrate the benefits of the improved generator on multiple benchmark datasets, including CIFAR-10 and CelebA, using the DCGAN, WGAN, and progressive GAN. |
Tasks | |
Published | 2018-11-28 |
URL | https://arxiv.org/abs/1811.11357v2 |
https://arxiv.org/pdf/1811.11357v2.pdf | |
PWC | https://paperswithcode.com/paper/metropolis-hastings-generative-adversarial |
Repo | https://github.com/nardeas/MHGAN |
Framework | tf |
Acquisition of Localization Confidence for Accurate Object Detection
Title | Acquisition of Localization Confidence for Accurate Object Detection |
Authors | Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, Yuning Jiang |
Abstract | Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors. |
Tasks | Object Detection |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11590v1 |
http://arxiv.org/pdf/1807.11590v1.pdf | |
PWC | https://paperswithcode.com/paper/acquisition-of-localization-confidence-for |
Repo | https://github.com/CSAILVision/unifiedparsing |
Framework | pytorch |
Analyzing biological and artificial neural networks: challenges with opportunities for synergy?
Title | Analyzing biological and artificial neural networks: challenges with opportunities for synergy? |
Authors | David G. T. Barrett, Ari S. Morcos, Jakob H. Macke |
Abstract | Deep neural networks (DNNs) transform stimuli across multiple processing stages to produce representations that can be used to solve complex tasks, such as object recognition in images. However, a full understanding of how they achieve this remains elusive. The complexity of biological neural networks substantially exceeds the complexity of DNNs, making it even more challenging to understand the representations that they learn. Thus, both machine learning and computational neuroscience are faced with a shared challenge: how can we analyze their representations in order to understand how they solve complex tasks? We review how data-analysis concepts and techniques developed by computational neuroscientists can be useful for analyzing representations in DNNs, and in turn, how recently developed techniques for analysis of DNNs can be useful for understanding representations in biological neural networks. We explore opportunities for synergy between the two fields, such as the use of DNNs as in-silico model systems for neuroscience, and how this synergy can lead to new hypotheses about the operating principles of biological neural networks. |
Tasks | Object Recognition |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13373v1 |
http://arxiv.org/pdf/1810.13373v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-biological-and-artificial-neural |
Repo | https://github.com/sciple/Behavioral-Neuroscience-for-Rational-Minds |
Framework | none |
A Tutorial on Deep Latent Variable Models of Natural Language
Title | A Tutorial on Deep Latent Variable Models of Natural Language |
Authors | Yoon Kim, Sam Wiseman, Alexander M. Rush |
Abstract | There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these “deep latent variable” models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference. |
Tasks | Latent Variable Models |
Published | 2018-12-17 |
URL | https://arxiv.org/abs/1812.06834v3 |
https://arxiv.org/pdf/1812.06834v3.pdf | |
PWC | https://paperswithcode.com/paper/a-tutorial-on-deep-latent-variable-models-of |
Repo | https://github.com/Francix/Deep-Generative-Models-for-Natural-Language-Processing |
Framework | tf |
A General Method for Amortizing Variational Filtering
Title | A General Method for Amortizing Variational Filtering |
Authors | Joseph Marino, Milan Cvitkovic, Yisong Yue |
Abstract | We introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. filtering. The algorithm is derived from the variational objective in the filtering setting and consists of an optimization procedure at each time step. By performing each inference optimization procedure with an iterative amortized inference model, we obtain a computationally efficient implementation of the algorithm, which we call amortized variational filtering. We present experiments demonstrating that this general-purpose method improves performance across several deep dynamical latent variable models. |
Tasks | Latent Variable Models |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05090v1 |
http://arxiv.org/pdf/1811.05090v1.pdf | |
PWC | https://paperswithcode.com/paper/a-general-method-for-amortizing-variational |
Repo | https://github.com/joelouismarino/amortized-variational-filtering |
Framework | pytorch |
CompILE: Compositional Imitation Learning and Execution
Title | CompILE: Compositional Imitation Learning and Execution |
Authors | Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia |
Abstract | We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data. CompILE uses a novel unsupervised, fully-differentiable sequence segmentation module to learn latent encodings of sequential data that can be re-composed and executed to perform new tasks. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate CompILE in a challenging 2D multi-task environment and a continuous control task, and show that it can find correct task boundaries and event encodings in an unsupervised manner. Latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our CompILE-based agent could learn given only sparse rewards, where agents without task-specific policies struggle. |
Tasks | Continuous Control, Imitation Learning |
Published | 2018-12-04 |
URL | https://arxiv.org/abs/1812.01483v2 |
https://arxiv.org/pdf/1812.01483v2.pdf | |
PWC | https://paperswithcode.com/paper/compositional-imitation-learning-explaining |
Repo | https://github.com/tkipf/compile |
Framework | pytorch |
Semantically Meaningful View Selection
Title | Semantically Meaningful View Selection |
Authors | Joris Guérin, Olivier Gibaru, Eric Nyiri, Stéphane Thiery, Byron Boots |
Abstract | An understanding of the nature of objects could help robots to solve both high-level abstract tasks and improve performance at lower-level concrete tasks. Although deep learning has facilitated progress in image understanding, a robot’s performance in problems like object recognition often depends on the angle from which the object is observed. Traditionally, robot sorting tasks rely on a fixed top-down view of an object. By changing its viewing angle, a robot can select a more semantically informative view leading to better performance for object recognition. In this paper, we introduce the problem of semantic view selection, which seeks to find good camera poses to gain semantic knowledge about an observed object. We propose a conceptual formulation of the problem, together with a solvable relaxation based on clustering. We then present a new image dataset consisting of around 10k images representing various views of 144 objects under different poses. Finally we use this dataset to propose a first solution to the problem by training a neural network to predict a “semantic score” from a top view image and camera pose. The views predicted to have higher scores are then shown to provide better clustering results than fixed top-down views. |
Tasks | Object Recognition |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10303v1 |
http://arxiv.org/pdf/1807.10303v1.pdf | |
PWC | https://paperswithcode.com/paper/semantically-meaningful-view-selection |
Repo | https://github.com/jorisguerin/SemanticViewSelection_dataset |
Framework | none |