October 20, 2019

3043 words 15 mins read

Paper Group AWR 183

Paper Group AWR 183

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. geomstats: a Python Package for Riemannian Geometry in Machine Learning. Variational zero-inflated Gaussian processes with sparse kernels. Classification of Breast Cancer Histology using Deep Learning. Deep CNN Frame Interpolation with Lessons Learned from Natural Lan …

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

Title The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Authors Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Niki Parmar, Mike Schuster, Zhifeng Chen, Yonghui Wu, Macduff Hughes
Abstract The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT’14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets.
Tasks Machine Translation
Published 2018-04-26
URL http://arxiv.org/abs/1804.09849v2
PDF http://arxiv.org/pdf/1804.09849v2.pdf
PWC https://paperswithcode.com/paper/the-best-of-both-worlds-combining-recent
Repo https://github.com/zysite/post
Framework pytorch

geomstats: a Python Package for Riemannian Geometry in Machine Learning

Title geomstats: a Python Package for Riemannian Geometry in Machine Learning
Authors Nina Miolane, Johan Mathe, Claire Donnat, Mikael Jorda, Xavier Pennec
Abstract We introduce geomstats, a python package that performs computations on manifolds such as hyperspheres, hyperbolic spaces, spaces of symmetric positive definite matrices and Lie groups of transformations. We provide efficient and extensively unit-tested implementations of these manifolds, together with useful Riemannian metrics and associated Exponential and Logarithm maps. The corresponding geodesic distances provide a range of intuitive choices of Machine Learning loss functions. We also give the corresponding Riemannian gradients. The operations implemented in geomstats are available with different computing backends such as numpy, tensorflow and keras. We have enabled GPU implementation and integrated geomstats manifold computations into keras deep learning framework. This paper also presents a review of manifolds in machine learning and an overview of the geomstats package with examples demonstrating its use for efficient and user-friendly Riemannian geometry.
Tasks
Published 2018-05-21
URL http://arxiv.org/abs/1805.08308v2
PDF http://arxiv.org/pdf/1805.08308v2.pdf
PWC https://paperswithcode.com/paper/geomstats-a-python-package-for-riemannian
Repo https://github.com/geomstats/geomstats
Framework pytorch

Variational zero-inflated Gaussian processes with sparse kernels

Title Variational zero-inflated Gaussian processes with sparse kernels
Authors Pashupati Hegde, Markus Heinonen, Samuel Kaski
Abstract Zero-inflated datasets, which have an excess of zero outputs, are commonly encountered in problems such as climate or rare event modelling. Conventional machine learning approaches tend to overestimate the non-zeros leading to poor performance. We propose a novel model family of zero-inflated Gaussian processes (ZiGP) for such zero-inflated datasets, produced by sparse kernels through learning a latent probit Gaussian process that can zero out kernel rows and columns whenever the signal is absent. The ZiGPs are particularly useful for making the powerful Gaussian process networks more interpretable. We introduce sparse GP networks where variable-order latent modelling is achieved through sparse mixing signals. We derive the non-trivial stochastic variational inference tractably for scalable learning of the sparse kernels in both models. The novel output-sparse approach improves both prediction of zero-inflated data and interpretability of latent mixing models.
Tasks Gaussian Processes
Published 2018-03-13
URL http://arxiv.org/abs/1803.05036v1
PDF http://arxiv.org/pdf/1803.05036v1.pdf
PWC https://paperswithcode.com/paper/variational-zero-inflated-gaussian-processes
Repo https://github.com/hegdepashupati/zero-inflated-gp
Framework tf

Classification of Breast Cancer Histology using Deep Learning

Title Classification of Breast Cancer Histology using Deep Learning
Authors Aditya Golatkar, Deepak Anand, Amit Sethi
Abstract Breast Cancer is a major cause of death worldwide among women. Hematoxylin and Eosin (H&E) stained breast tissue samples from biopsies are observed under microscopes for the primary diagnosis of breast cancer. In this paper, we propose a deep learning-based method for classification of H&E stained breast tissue images released for BACH challenge 2018 by fine-tuning Inception-v3 convolutional neural network (CNN) proposed by Szegedy et al. These images are to be classified into four classes namely, i) normal tissue, ii) benign tumor, iii) in-situ carcinoma and iv) invasive carcinoma. Our strategy is to extract patches based on nuclei density instead of random or grid sampling, along with rejection of patches that are not rich in nuclei (non-epithelial) regions for training and testing. Every patch (nuclei-dense region) in an image is classified in one of the four above mentioned categories. The class of the entire image is determined using majority voting over the nuclear classes. We obtained an average four class accuracy of 85% and an average two class (non-cancer vs. carcinoma) accuracy of 93%, which improves upon a previous benchmark by Araujo et al.
Tasks
Published 2018-02-22
URL http://arxiv.org/abs/1802.08080v2
PDF http://arxiv.org/pdf/1802.08080v2.pdf
PWC https://paperswithcode.com/paper/classification-of-breast-cancer-histology
Repo https://github.com/AdityaGolatkar/Classification-of-Breast-Cancer-Histology-using-Deep-Learning
Framework tf

Deep CNN Frame Interpolation with Lessons Learned from Natural Language Processing

Title Deep CNN Frame Interpolation with Lessons Learned from Natural Language Processing
Authors Kian Ghodoussi, Nihar Sheth, Zane Durante, Markie Wagner
Abstract A major area of growth within deep learning has been the study and implementation of convolutional neural networks. The general explanation within the deep learning community of the robustness of convolutional neural networks (CNNs) within image recognition rests upon the idea that CNNs are able to extract localized features. However, recent developments in fields such as Natural Language Processing are demonstrating that this paradigm may be incorrect. In this paper, we analyze the current state of the field concerning CNN’s and present a hypothesis that provides a novel explanation for the robustness of CNN models. From there, we demonstrate the effectiveness of our approach by presenting novel deep CNN frame interpolation architecture that is comparable to the state of the art interpolation models with a fraction of the complexity.
Tasks
Published 2018-09-14
URL http://arxiv.org/abs/1809.05286v2
PDF http://arxiv.org/pdf/1809.05286v2.pdf
PWC https://paperswithcode.com/paper/deep-cnn-frame-interpolation-with-lessons
Repo https://github.com/ghodouss/Aperio
Framework none

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature

Title A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature
Authors Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain J. Marshall, Ani Nenkova, Byron C. Wallace
Abstract We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the `PICO’ elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine. |
Tasks Participant Intervention Comparison Outcome Extraction
Published 2018-06-11
URL http://arxiv.org/abs/1806.04185v1
PDF http://arxiv.org/pdf/1806.04185v1.pdf
PWC https://paperswithcode.com/paper/a-corpus-with-multi-level-annotations-of
Repo https://github.com/maxaalexeeva/PICO
Framework none

CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams

Title CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams
Authors Lukas Cavigelli, Luca Benini
Abstract The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets. Applying these networks to images demands a high computational effort and pushes the use of state-of-the-art networks on real-time video data out of reach of embedded platforms. Many recent works focus on reducing network complexity for real-time inference on embedded computing platforms. We adopt an orthogonal viewpoint and propose a novel algorithm exploiting the spatio-temporal sparsity of pixel changes. This optimized inference procedure resulted in an average speed-up of 9.1x over cuDNN on the Tegra X2 platform at a negligible accuracy loss of <0.1% and no retraining of the network for a semantic segmentation application. Similarly, an average speed-up of 7.0x has been achieved for a pose detection DNN and a reduction of 5x of the number of arithmetic operations to be performed for object detection on static camera video surveillance data. These throughput gains combined with a lower power consumption result in an energy efficiency of 511 GOp/s/W compared to 70 GOp/s/W for the baseline.
Tasks Object Detection, Semantic Segmentation
Published 2018-08-15
URL http://arxiv.org/abs/1808.05488v2
PDF http://arxiv.org/pdf/1808.05488v2.pdf
PWC https://paperswithcode.com/paper/cbinfer-exploiting-frame-to-frame-locality
Repo https://github.com/lukasc-ch/CBinfer
Framework pytorch

Multi-Layered Gradient Boosting Decision Trees

Title Multi-Layered Gradient Boosting Decision Trees
Authors Ji Feng, Yang Yu, Zhi-Hua Zhou
Abstract Multi-layered representation is believed to be the key ingredient of deep neural networks especially in cognitive tasks like computer vision. While non-differentiable models such as gradient boosting decision trees (GBDTs) are the dominant methods for modeling discrete or tabular data, they are hard to incorporate with such representation learning ability. In this work, we propose the multi-layered GBDT forest (mGBDTs), with an explicit emphasis on exploring the ability to learn hierarchical representations by stacking several layers of regression GBDTs as its building block. The model can be jointly trained by a variant of target propagation across layers, without the need to derive back-propagation nor differentiability. Experiments and visualizations confirmed the effectiveness of the model in terms of performance and representation learning ability.
Tasks Representation Learning
Published 2018-05-31
URL http://arxiv.org/abs/1806.00007v1
PDF http://arxiv.org/pdf/1806.00007v1.pdf
PWC https://paperswithcode.com/paper/multi-layered-gradient-boosting-decision
Repo https://github.com/kingfengji/mGBDT
Framework pytorch

A Generative Appearance Model for End-to-end Video Object Segmentation

Title A Generative Appearance Model for End-to-end Video Object Segmentation
Authors Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, Michael Felsberg
Abstract One of the fundamental challenges in video object segmentation is to find an effective representation of the target and background appearance. The best performing approaches resort to extensive fine-tuning of a convolutional neural network for this purpose. Besides being prohibitively expensive, this strategy cannot be truly trained end-to-end since the online fine-tuning procedure is not integrated into the offline training of the network. To address these issues, we propose a network architecture that learns a powerful representation of the target and background appearance in a single forward pass. The introduced appearance module learns a probabilistic generative model of target and background feature distributions. Given a new image, it predicts the posterior class probabilities, providing a highly discriminative cue, which is processed in later network modules. Both the learning and prediction stages of our appearance module are fully differentiable, enabling true end-to-end training of the entire segmentation pipeline. Comprehensive experiments demonstrate the effectiveness of the proposed approach on three video object segmentation benchmarks. We close the gap to approaches based on online fine-tuning on DAVIS17, while operating at 15 FPS on a single GPU. Furthermore, our method outperforms all published approaches on the large-scale YouTube-VOS dataset.
Tasks Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2018-11-28
URL http://arxiv.org/abs/1811.11611v2
PDF http://arxiv.org/pdf/1811.11611v2.pdf
PWC https://paperswithcode.com/paper/a-generative-appearance-model-for-end-to-end
Repo https://github.com/joakimjohnander/agame-vos
Framework pytorch

Task-Driven Convolutional Recurrent Models of the Visual System

Title Task-Driven Convolutional Recurrent Models of the Visual System
Authors Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J. DiCarlo, Daniel L. K. Yamins
Abstract Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain’s visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortical areas, and long-range feedback from downstream areas to upstream areas. Here we explored the role of recurrence in improving classification performance. We found that standard forms of recurrence (vanilla RNNs and LSTMs) do not perform well within deep CNNs on the ImageNet task. In contrast, novel cells that incorporated two structural features, bypassing and gating, were able to boost task accuracy substantially. We extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition. Moreover, these task-optimized ConvRNNs matched the dynamics of neural activity in the primate visual system better than feedforward networks, suggesting a role for the brain’s recurrent connections in performing difficult visual behaviors.
Tasks Object Classification, Object Recognition
Published 2018-06-20
URL http://arxiv.org/abs/1807.00053v2
PDF http://arxiv.org/pdf/1807.00053v2.pdf
PWC https://paperswithcode.com/paper/task-driven-convolutional-recurrent-models-of
Repo https://github.com/neuroailab/tnn
Framework tf

Neural Generative Models for Global Optimization with Gradients

Title Neural Generative Models for Global Optimization with Gradients
Authors Louis Faury, Flavian Vasile, Clément Calauzènes, Olivier Fercoq
Abstract The aim of global optimization is to find the global optimum of arbitrary classes of functions, possibly highly multimodal ones. In this paper we focus on the subproblem of global optimization for differentiable functions and we propose an Evolutionary Search-inspired solution where we model point search distributions via Generative Neural Networks. This approach enables us to model diverse and complex search distributions based on which we can efficiently explore complicated objective landscapes. In our experiments we show the practical superiority of our algorithm versus classical Evolutionary Search and gradient-based solutions on a benchmark set of multimodal functions, and demonstrate how it can be used to accelerate Bayesian Optimization with Gaussian Processes.
Tasks Gaussian Processes
Published 2018-05-22
URL http://arxiv.org/abs/1805.08594v3
PDF http://arxiv.org/pdf/1805.08594v3.pdf
PWC https://paperswithcode.com/paper/neural-generative-models-for-global
Repo https://github.com/Hugodovs/meta-blackbox-optimization
Framework none

Towards Interpretable Face Recognition

Title Towards Interpretable Face Recognition
Authors Bangjie Yin, Luan Tran, Haoxiang Li, Xiaohui Shen, Xiaoming Liu
Abstract Deep CNNs have been pushing the frontier of visual recognition over past years. Besides recognition accuracy, strong demands in understanding deep CNNs in the research community motivate developments of tools to dissect pre-trained models to visualize how they make predictions. Recent works further push the interpretability in the network learning stage to learn more meaningful representations. In this work, focusing on a specific area of visual recognition, we report our efforts towards interpretable face recognition. We propose a spatial activation diversity loss to learn more structured face representations. By leveraging the structure, we further design a feature activation diversity loss to push the interpretable representations to be discriminative and robust to occlusions. We demonstrate on three face recognition benchmarks that our proposed method is able to improve face recognition accuracy with easily interpretable face representations.
Tasks Face Recognition
Published 2018-05-02
URL https://arxiv.org/abs/1805.00611v2
PDF https://arxiv.org/pdf/1805.00611v2.pdf
PWC https://paperswithcode.com/paper/towards-interpretable-face-recognition
Repo https://github.com/yubangji123/Interpret_FR
Framework tf

MURAUER: Mapping Unlabeled Real Data for Label AUstERity

Title MURAUER: Mapping Unlabeled Real Data for Label AUstERity
Authors Georg Poier, Michael Opitz, David Schinagl, Horst Bischof
Abstract Data labeling for learning 3D hand pose estimation models is a huge effort. Readily available, accurately labeled synthetic data has the potential to reduce the effort. However, to successfully exploit synthetic data, current state-of-the-art methods still require a large amount of labeled real data. In this work, we remove this requirement by learning to map from the features of real data to the features of synthetic data mainly using a large amount of synthetic and unlabeled real data. We exploit unlabeled data using two auxiliary objectives, which enforce that (i) the mapped representation is pose specific and (ii) at the same time, the distributions of real and synthetic data are aligned. While pose specifity is enforced by a self-supervisory signal requiring that the representation is predictive for the appearance from different views, distributions are aligned by an adversarial term. In this way, we can significantly improve the results of the baseline system, which does not use unlabeled data and outperform many recent approaches already with about 1% of the labeled real data. This presents a step towards faster deployment of learning based hand pose estimation, making it accessible for a larger range of applications.
Tasks Hand Pose Estimation, Pose Estimation
Published 2018-11-23
URL http://arxiv.org/abs/1811.09497v2
PDF http://arxiv.org/pdf/1811.09497v2.pdf
PWC https://paperswithcode.com/paper/murauer-mapping-unlabeled-real-data-for-label
Repo https://github.com/poier/murauer
Framework pytorch

Visual Question Generation for Class Acquisition of Unknown Objects

Title Visual Question Generation for Class Acquisition of Unknown Objects
Authors Kohei Uehara, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, Tatsuya Harada
Abstract Traditional image recognition methods only consider objects belonging to already learned classes. However, since training a recognition model with every object class in the world is unfeasible, a way of getting information on unknown objects (i.e., objects whose class has not been learned) is necessary. A way for an image recognition system to learn new classes could be asking a human about objects that are unknown. In this paper, we propose a method for generating questions about unknown objects in an image, as means to get information about classes that have not been learned. Our method consists of a module for proposing objects, a module for identifying unknown objects, and a module for generating questions about unknown objects. The experimental results via human evaluation show that our method can successfully get information about unknown objects in an image dataset. Our code and dataset are available at https://github.com/mil-tokyo/vqg-unknown.
Tasks Question Generation
Published 2018-08-06
URL http://arxiv.org/abs/1808.01821v1
PDF http://arxiv.org/pdf/1808.01821v1.pdf
PWC https://paperswithcode.com/paper/visual-question-generation-for-class
Repo https://github.com/mil-tokyo/vqg-unknown
Framework none

Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-Scale Modeling

Title Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-Scale Modeling
Authors Josue Nassar, Scott W. Linderman, Monica Bugallo, Il Memming Park
Abstract Many real-world systems studied are governed by complex, nonlinear dynamics. By modeling these dynamics, we can gain insight into how these systems work, make predictions about how they will behave, and develop strategies for controlling them. While there are many methods for modeling nonlinear dynamical systems, existing techniques face a trade off between offering interpretable descriptions and making accurate predictions. Here, we develop a class of models that aims to achieve both simultaneously, smoothly interpolating between simple descriptions and more complex, yet also more accurate models. Our probabilistic model achieves this multi-scale property through a hierarchy of locally linear dynamics that jointly approximate global nonlinear dynamics. We call it the tree-structured recurrent switching linear dynamical system. To fit this model, we present a fully-Bayesian sampling procedure using Polya-Gamma data augmentation to allow for fast and conjugate Gibbs sampling. Through a variety of synthetic and real examples, we show how these models outperform existing methods in both interpretability and predictive capability.
Tasks Data Augmentation
Published 2018-11-29
URL https://arxiv.org/abs/1811.12386v6
PDF https://arxiv.org/pdf/1811.12386v6.pdf
PWC https://paperswithcode.com/paper/tree-structured-recurrent-switching-linear
Repo https://github.com/catniplab/tree_structured_rslds
Framework pytorch
comments powered by Disqus