January 31, 2020

3391 words 16 mins read

Paper Group AWR 438

Paper Group AWR 438

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates. Improved ICH classification using task-dependent learning. 3D BAT: A Semi-Automatic, Web-based 3D Annotation Toolbox for Full-Surround, Multi-Modal Data Streams. Neural Outlier Rejection for Self-Supervised Keypoint Learning. TorchBeast: A PyTorch Platform for Distribu …

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Title Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
Authors Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien
Abstract Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD’s practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data. In the interpolation setting, we prove that SGD with a stochastic variant of the classic Armijo line-search attains the deterministic convergence rates for both convex and strongly-convex functions. Under additional assumptions, SGD with Armijo line-search is shown to achieve fast convergence for non-convex functions. Furthermore, we show that stochastic extra-gradient with a Lipschitz line-search attains linear convergence for an important class of non-convex functions and saddle-point problems satisfying interpolation. To improve the proposed methods’ practical performance, we give heuristics to use larger step-sizes and acceleration. We compare the proposed algorithms against numerous optimization methods on standard classification tasks using both kernel methods and deep networks. The proposed methods result in competitive performance across all models and datasets, while being robust to the precise choices of hyper-parameters. For multi-class classification using deep networks, SGD with Armijo line-search results in both faster convergence and better generalization.
Tasks
Published 2019-05-24
URL https://arxiv.org/abs/1905.09997v3
PDF https://arxiv.org/pdf/1905.09997v3.pdf
PWC https://paperswithcode.com/paper/painless-stochastic-gradient-interpolation
Repo https://github.com/IssamLaradji/sls
Framework pytorch

Improved ICH classification using task-dependent learning

Title Improved ICH classification using task-dependent learning
Authors Amir Bar, Michal Mauda, Yoni Turner, Michal Safadi, Eldad Elnekave
Abstract Head CT is one of the most commonly performed imaging studied in the Emergency Department setting and Intracranial hemorrhage (ICH) is among the most critical and timesensitive findings to be detected on Head CT. We present BloodNet, a deep learning architecture designed for optimal triaging of Head CTs, with the goal of decreasing the time from CT acquisition to accurate ICH detection. The BloodNet architecture incorporates dependency between the otherwise independent tasks of segmentation and classification, achieving improved classification results. AUCs of 0.9493 and 0.9566 are reported on held out positive-enriched and randomly sampled sets comprised of over 1400 studies acquired from over 10 different hospitals. These results are comparable to previously reported results with smaller number of tagged studies.
Tasks
Published 2019-06-29
URL https://arxiv.org/abs/1907.00148v1
PDF https://arxiv.org/pdf/1907.00148v1.pdf
PWC https://paperswithcode.com/paper/improved-ich-classification-using-task
Repo https://github.com/portelaraian/Raian-Intracranial-Hemorrhage
Framework pytorch

3D BAT: A Semi-Automatic, Web-based 3D Annotation Toolbox for Full-Surround, Multi-Modal Data Streams

Title 3D BAT: A Semi-Automatic, Web-based 3D Annotation Toolbox for Full-Surround, Multi-Modal Data Streams
Authors Walter Zimmer, Akshay Rangesh, Mohan Trivedi
Abstract In this paper, we focus on obtaining 2D and 3D labels, as well as track IDs for objects on the road with the help of a novel 3D Bounding Box Annotation Toolbox (3D BAT). Our open source, web-based 3D BAT incorporates several smart features to improve usability and efficiency. For instance, this annotation toolbox supports semi-automatic labeling of tracks using interpolation, which is vital for downstream tasks like tracking, motion planning and motion prediction. Moreover, annotations for all camera images are automatically obtained by projecting annotations from 3D space into the image domain. In addition to the raw image and point cloud feeds, a Masterview consisting of the top view (bird’s-eye-view), side view and front views is made available to observe objects of interest from different perspectives. Comparisons of our method with other publicly available annotation tools reveal that 3D annotations can be obtained faster and more efficiently by using our toolbox.
Tasks Motion Planning, motion prediction
Published 2019-05-01
URL http://arxiv.org/abs/1905.00525v1
PDF http://arxiv.org/pdf/1905.00525v1.pdf
PWC https://paperswithcode.com/paper/3d-bat-a-semi-automatic-web-based-3d
Repo https://github.com/walzimmer/3d-bat
Framework none

Neural Outlier Rejection for Self-Supervised Keypoint Learning

Title Neural Outlier Rejection for Self-Supervised Keypoint Learning
Authors Jiexiong Tang, Hanme Kim, Vitor Guizilini, Sudeep Pillai, Rares Ambrus
Abstract Identifying salient points in images is a crucial component for visual odometry, Structure-from-Motion or SLAM algorithms. Recently, several learned keypoint methods have demonstrated compelling performance on challenging benchmarks. However, generating consistent and accurate training data for interest-point detection in natural images still remains challenging, especially for human annotators. We introduce IO-Net (i.e. InlierOutlierNet), a novel proxy task for the self-supervision of keypoint detection, description and matching. By making the sampling of inlier-outlier sets from point-pair correspondences fully differentiable within the keypoint learning framework, we show that are able to simultaneously self-supervise keypoint description and improve keypoint matching. Second, we introduce KeyPointNet, a keypoint-network architecture that is especially amenable to robust keypoint detection and description. We design the network to allow local keypoint aggregation to avoid artifacts due to spatial discretizations commonly used for this task, and we improve fine-grained keypoint descriptor performance by taking advantage of efficient sub-pixel convolutions to upsample the descriptor feature-maps to a higher operating resolution. Through extensive experiments and ablative analysis, we show that the proposed self-supervised keypoint learning method greatly improves the quality of feature matching and homography estimation on challenging benchmarks over the state-of-the-art.
Tasks Homography Estimation, Interest Point Detection, Keypoint Detection, Visual Odometry
Published 2019-12-23
URL https://arxiv.org/abs/1912.10615v1
PDF https://arxiv.org/pdf/1912.10615v1.pdf
PWC https://paperswithcode.com/paper/neural-outlier-rejection-for-self-supervised-1
Repo https://github.com/TRI-ML/KP2D
Framework none

TorchBeast: A PyTorch Platform for Distributed RL

Title TorchBeast: A PyTorch Platform for Distributed RL
Authors Heinrich Küttler, Nantas Nardelli, Thibaut Lavril, Marco Selvatici, Viswanath Sivakumar, Tim Rocktäschel, Edward Grefenstette
Abstract TorchBeast is a platform for reinforcement learning (RL) research in PyTorch. It implements a version of the popular IMPALA algorithm for fast, asynchronous, parallel training of RL agents. Additionally, TorchBeast has simplicity as an explicit design goal: We provide both a pure-Python implementation (“MonoBeast”) as well as a multi-machine high-performance version (“PolyBeast”). In the latter, parts of the implementation are written in C++, but all parts pertaining to machine learning are kept in simple Python using PyTorch, with the environments provided using the OpenAI Gym interface. This enables researchers to conduct scalable RL research using TorchBeast without any programming knowledge beyond Python and PyTorch. In this paper, we describe the TorchBeast design principles and implementation and demonstrate that it performs on-par with IMPALA on Atari. TorchBeast is released as an open-source package under the Apache 2.0 license and is available at \url{https://github.com/facebookresearch/torchbeast}.
Tasks
Published 2019-10-08
URL https://arxiv.org/abs/1910.03552v1
PDF https://arxiv.org/pdf/1910.03552v1.pdf
PWC https://paperswithcode.com/paper/torchbeast-a-pytorch-platform-for-distributed
Repo https://github.com/facebookresearch/mvfst-rl
Framework pytorch

Cluster Alignment with a Teacher for Unsupervised Domain Adaptation

Title Cluster Alignment with a Teacher for Unsupervised Domain Adaptation
Authors Zhijie Deng, Yucen Luo, Jun Zhu
Abstract Deep learning methods have shown promise in unsupervised domain adaptation, which aims to leverage a labeled source domain to learn a classifier for the unlabeled target domain with a different distribution. However, such methods typically learn a domain-invariant representation space to match the marginal distributions of the source and target domains, while ignoring their fine-level structures. In this paper, we propose Cluster Alignment with a Teacher (CAT) for unsupervised domain adaptation, which can effectively incorporate the discriminative clustering structures in both domains for better adaptation. Technically, CAT leverages an implicit ensembling teacher model to reliably discover the class-conditional structure in the feature space for the unlabeled target domain. Then CAT forces the features of both the source and the target domains to form discriminative class-conditional clusters and aligns the corresponding clusters across domains. Empirical results demonstrate that CAT achieves state-of-the-art results in several unsupervised domain adaptation scenarios.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2019-03-24
URL https://arxiv.org/abs/1903.09980v2
PDF https://arxiv.org/pdf/1903.09980v2.pdf
PWC https://paperswithcode.com/paper/cluster-alignment-with-a-teacher-for
Repo https://github.com/thudzj/CAT
Framework tf

Unifying Unsupervised Domain Adaptation and Zero-Shot Visual Recognition

Title Unifying Unsupervised Domain Adaptation and Zero-Shot Visual Recognition
Authors Qian Wang, Penghui Bu, Toby P. Breckon
Abstract Unsupervised domain adaptation aims to transfer knowledge from a source domain to a target domain so that the target domain data can be recognized without any explicit labelling information for this domain. One limitation of the problem setting is that testing data, despite having no labels, from the target domain is needed during training, which prevents the trained model being directly applied to classify unseen test instances. We formulate a new cross-domain classification problem arising from real-world scenarios where labelled data is available for a subset of classes (known classes) in the target domain, and we expect to recognize new samples belonging to any class (known and unseen classes) once the model is learned. This is a generalized zero-shot learning problem where the side information comes from the source domain in the form of labelled samples instead of class-level semantic representations commonly used in traditional zero-shot learning. We present a unified domain adaptation framework for both unsupervised and zero-shot learning conditions. Our approach learns a joint subspace from source and target domains so that the projections of both data in the subspace can be domain invariant and easily separable. We use the supervised locality preserving projection (SLPP) as the enabling technique and conduct experiments under both unsupervised and zero-shot learning conditions, achieving state-of-the-art results on three domain adaptation benchmark datasets: Office-Caltech, Office31 and Office-Home.
Tasks Domain Adaptation, Unsupervised Domain Adaptation, Zero-Shot Learning
Published 2019-03-25
URL https://arxiv.org/abs/1903.10601v2
PDF https://arxiv.org/pdf/1903.10601v2.pdf
PWC https://paperswithcode.com/paper/unifying-unsupervised-domain-adaptation-and
Repo https://github.com/hellowangqian/domain-adaptation-capls
Framework none

STAR: A Structure and Texture Aware Retinex Model

Title STAR: A Structure and Texture Aware Retinex Model
Authors Jun Xu, Yingkun Hou, Dongwei Ren, Li Liu, Fan Zhu, Mengyang Yu, Haoqian Wang, Ling Shao
Abstract Retinex theory is developed mainly to decompose an image into the illumination and reflectance components by analyzing local image derivatives. In this theory, larger derivatives are attributed to the changes in reflectance, while smaller derivatives are emerged in the smooth illumination. In this paper, we utilize exponentiated local derivatives (with an exponent {\gamma}) of an observed image to generate its structure map and texture map. The structure map is produced by been amplified with {\gamma} > 1, while the texture map is generated by been shrank with {\gamma} < 1. To this end, we design exponential filters for the local derivatives, and present their capability on extracting accurate structure and texture maps, influenced by the choices of exponents {\gamma}. The extracted structure and texture maps are employed to regularize the illumination and reflectance components in Retinex decomposition. A novel Structure and Texture Aware Retinex (STAR) model is further proposed for illumination and reflectance decomposition of a single image. We solve the STAR model by an alternating optimization algorithm. Each sub-problem is transformed into a vectorized least squares regression, with closed-form solutions. Comprehensive experiments on commonly tested datasets demonstrate that, the proposed STAR model produce better quantitative and qualitative performance than previous competing methods, on illumination and reflectance decomposition, low-light image enhancement, and color correction. The code is publicly available at https://github.com/csjunxu/STAR.
Tasks Image Enhancement, Low-Light Image Enhancement
Published 2019-06-16
URL https://arxiv.org/abs/1906.06690v5
PDF https://arxiv.org/pdf/1906.06690v5.pdf
PWC https://paperswithcode.com/paper/star-a-structure-and-texture-aware-retinex
Repo https://github.com/csjunxu/STAR
Framework none

HybridSN: Exploring 3D-2D CNN Feature Hierarchy for Hyperspectral Image Classification

Title HybridSN: Exploring 3D-2D CNN Feature Hierarchy for Hyperspectral Image Classification
Authors Swalpa Kumar Roy, Gopal Krishna, Shiv Ram Dubey, Bidyut B. Chaudhuri
Abstract Hyperspectral image (HSI) classification is widely used for the analysis of remotely sensed images. Hyperspectral imagery includes varying bands of images. Convolutional Neural Network (CNN) is one of the most frequently used deep learning based methods for visual data processing. The use of CNN for HSI classification is also visible in recent works. These approaches are mostly based on 2D CNN. Whereas, the HSI classification performance is highly dependent on both spatial and spectral information. Very few methods have utilized the 3D CNN because of increased computational complexity. This letter proposes a Hybrid Spectral Convolutional Neural Network (HybridSN) for HSI classification. Basically, the HybridSN is a spectral-spatial 3D-CNN followed by spatial 2D-CNN. The 3D-CNN facilitates the joint spatial-spectral feature representation from a stack of spectral bands. The 2D-CNN on top of the 3D-CNN further learns more abstract level spatial representation. Moreover, the use of hybrid CNNs reduces the complexity of the model compared to 3D-CNN alone. To test the performance of this hybrid approach, very rigorous HSI classification experiments are performed over Indian Pines, Pavia University and Salinas Scene remote sensing datasets. The results are compared with the state-of-the-art hand-crafted as well as end-to-end deep learning based methods. A very satisfactory performance is obtained using the proposed HybridSN for HSI classification. The source code can be found at \url{https://github.com/gokriznastic/HybridSN}.
Tasks Hyperspectral Image Classification, Image Classification
Published 2019-02-18
URL https://arxiv.org/abs/1902.06701v3
PDF https://arxiv.org/pdf/1902.06701v3.pdf
PWC https://paperswithcode.com/paper/hybridsn-exploring-3d-2d-cnn-feature
Repo https://github.com/gokriznastic/HybridSN
Framework tf

Political Text Scaling Meets Computational Semantics

Title Political Text Scaling Meets Computational Semantics
Authors Federico Nanni, Goran Glavas, Simone Paolo Ponzetto, Heiner Stuckenschmidt
Abstract During the last fifteen years, text scaling approaches have become a central element for the text-as-data community. However, they are based on the assumption that latent positions can be captured just by modeling word-frequency information from the different documents under study. We challenge this by presenting a new semantically aware unsupervised scaling algorithm, SemScale, which relies upon distributional representations of the documents under study. We conduct an extensive quantitative analysis over a collection of speeches from the European Parliament in five different languages and from two different legislations, in order to understand whether a) an approach that is aware of semantics would better capture known underlying political dimensions compared to a frequency-based scaling method, b) such positioning correlates in particular with a specific subset of linguistic traits, compared to the use of the entire text, and c) these findings hold across different languages. To support further research on this new branch of text scaling approaches, we release the employed dataset and evaluation setting, an easy-to-use online demo, and a Python implementation of SemScale.
Tasks
Published 2019-04-12
URL https://arxiv.org/abs/1904.06217v2
PDF https://arxiv.org/pdf/1904.06217v2.pdf
PWC https://paperswithcode.com/paper/political-text-scaling-meets-computational
Repo https://github.com/umanlp/SemScale
Framework none

GraphX-Convolution for Point Cloud Deformation in 2D-to-3D Conversion

Title GraphX-Convolution for Point Cloud Deformation in 2D-to-3D Conversion
Authors Anh-Duc Nguyen, Seonghwa Choi, Woojae Kim, Sanghoon Lee
Abstract In this paper, we present a novel deep method to reconstruct a point cloud of an object from a single still image. Prior arts in the field struggle to reconstruct an accurate and scalable 3D model due to either the inefficient and expensive 3D representations, the dependency between the output and number of model parameters or the lack of a suitable computing operation. We propose to overcome these by deforming a random point cloud to the object shape through two steps: feature blending and deformation. In the first step, the global and point-specific shape features extracted from a 2D object image are blended with the encoded feature of a randomly generated point cloud, and then this mixture is sent to the deformation step to produce the final representative point set of the object. In the deformation process, we introduce a new layer termed as GraphX that considers the inter-relationship between points like common graph convolutions but operates on unordered sets. Moreover, with a simple trick, the proposed model can generate an arbitrary-sized point cloud, which is the first deep method to do so. Extensive experiments verify that we outperform existing models and halve the state-of-the-art distance score in single image 3D reconstruction.
Tasks 3D Reconstruction
Published 2019-11-15
URL https://arxiv.org/abs/1911.06600v1
PDF https://arxiv.org/pdf/1911.06600v1.pdf
PWC https://paperswithcode.com/paper/graphx-convolution-for-point-cloud-1
Repo https://github.com/justanhduc/graphx-conv
Framework pytorch

Image-Adaptive GAN based Reconstruction

Title Image-Adaptive GAN based Reconstruction
Authors Shady Abu Hussein, Tom Tirer, Raja Giryes
Abstract In the recent years, there has been a significant improvement in the quality of samples produced by (deep) generative models such as variational auto-encoders and generative adversarial networks. However, the representation capabilities of these methods still do not capture the full distribution for complex classes of images, such as human faces. This deficiency has been clearly observed in previous works that use pre-trained generative models to solve imaging inverse problems. In this paper, we suggest to mitigate the limited representation capabilities of generators by making them image-adaptive and enforcing compliance of the restoration with the observations via back-projections. We empirically demonstrate the advantages of our proposed approach for image super-resolution and compressed sensing.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-06-12
URL https://arxiv.org/abs/1906.05284v2
PDF https://arxiv.org/pdf/1906.05284v2.pdf
PWC https://paperswithcode.com/paper/image-adaptive-gan-based-reconstruction
Repo https://github.com/shadyabh/IAGAN
Framework pytorch

Effectively Unbiased FID and Inception Score and where to find them

Title Effectively Unbiased FID and Inception Score and where to find them
Authors Min Jin Chong, David Forsyth
Abstract This paper shows that two commonly used evaluation metrics for generative models, the Fr'echet Inception Distance (FID) and the Inception Score (IS), are biased – the expected value of the score computed for a finite sample set is not the true value of the score. Worse, the paper shows that the bias term depends on the particular model being evaluated, so model A may get a better score than model B simply because model A’s bias term is smaller. This effect cannot be fixed by evaluating at a fixed number of samples. This means all comparisons using FID or IS as currently computed are unreliable. We then show how to extrapolate the score to obtain an effectively bias-free estimate of scores computed with an infinite number of samples, which we term $\overline{\textrm{FID}}\infty$ and $\overline{\textrm{IS}}\infty$. In turn, this effectively bias-free estimate requires good estimates of scores with a finite number of samples. We show that using Quasi-Monte Carlo integration notably improves estimates of FID and IS for finite sample sets. Our extrapolated scores are simple, drop-in replacements for the finite sample scores. Additionally, we show that using low discrepancy sequence in GAN training offers small improvements in the resulting generator.
Tasks
Published 2019-11-16
URL https://arxiv.org/abs/1911.07023v2
PDF https://arxiv.org/pdf/1911.07023v2.pdf
PWC https://paperswithcode.com/paper/effectively-unbiased-fid-and-inception-score
Repo https://github.com/mchong6/FID_IS_infinity
Framework pytorch

Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data

Title Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data
Authors Moonsu Han, Minki Kang, Hyunwoo Jung, Sung Ju Hwang
Abstract We consider a novel question answering (QA) task where the machine needs to read from large streaming data (long documents or videos) without knowing when the questions will be given, which is difficult to solve with existing QA methods due to their lack of scalability. To tackle this problem, we propose a novel end-to-end deep network model for reading comprehension, which we refer to as Episodic Memory Reader (EMR) that sequentially reads the input contexts into an external memory, while replacing memories that are less important for answering \emph{unseen} questions. Specifically, we train an RL agent to replace a memory entry when the memory is full, in order to maximize its QA accuracy at a future timepoint, while encoding the external memory using either the GRU or the Transformer architecture to learn representations that considers relative importance between the memory entries. We validate our model on a synthetic dataset (bAbI) as well as real-world large-scale textual QA (TriviaQA) and video QA (TVQA) datasets, on which it achieves significant improvements over rule-based memory scheduling policies or an RL-based baseline that independently learns the query-specific importance of each memory.
Tasks Question Answering, Reading Comprehension
Published 2019-03-14
URL https://arxiv.org/abs/1903.06164v3
PDF https://arxiv.org/pdf/1903.06164v3.pdf
PWC https://paperswithcode.com/paper/episodic-memory-reader-learning-what-to
Repo https://github.com/h19920918/emr
Framework pytorch

Contextual Word Representations: A Contextual Introduction

Title Contextual Word Representations: A Contextual Introduction
Authors Noah A. Smith
Abstract This introduction aims to tell the story of how we put words into computers. It is part of the story of the field of natural language processing (NLP), a branch of artificial intelligence. It targets a wide audience with a basic understanding of computer programming, but avoids a detailed mathematical treatment, and it does not present any algorithms. It also does not focus on any particular application of NLP such as translation, question answering, or information extraction. The ideas presented here were developed by many researchers over many decades, so the citations are not exhaustive but rather direct the reader to a handful of papers that are, in the author’s view, seminal. After reading this document, you should have a general understanding of word vectors (also known as word embeddings): why they exist, what problems they solve, where they come from, how they have changed over time, and what some of the open questions about them are. Readers already familiar with word vectors are advised to skip to Section 5 for the discussion of the most recent advance, contextual word vectors.
Tasks Question Answering, Word Embeddings
Published 2019-02-15
URL http://arxiv.org/abs/1902.06006v2
PDF http://arxiv.org/pdf/1902.06006v2.pdf
PWC https://paperswithcode.com/paper/contextual-word-representations-a-contextual
Repo https://github.com/tintinrevient/methods-of-ai-research
Framework none
comments powered by Disqus