April 2, 2020

3158 words 15 mins read

Paper Group ANR 180

Paper Group ANR 180

Few-shot Learning with Multi-scale Self-supervision. A Machine Learning Imaging Core using Separable FIR-IIR Filters. Histogram Layers for Texture Analysis. Deep Semantic Face Deblurring. Radiomic feature selection for lung cancer classifiers. On Learning Sets of Symmetric Elements. Motion Deblurring using Spatiotemporal Phase Aperture Coding. Semi …

Few-shot Learning with Multi-scale Self-supervision

Title Few-shot Learning with Multi-scale Self-supervision
Authors Hongguang Zhang, Philip H. S. Torr, Piotr Koniusz
Abstract Learning concepts from the limited number of datapoints is a challenging task usually addressed by the so-called one- or few-shot learning. Recently, an application of second-order pooling in few-shot learning demonstrated its superior performance due to the aggregation step handling varying image resolutions without the need of modifying CNNs to fit to specific image sizes, yet capturing highly descriptive co-occurrences. However, using a single resolution per image (even if the resolution varies across a dataset) is suboptimal as the importance of image contents varies across the coarse-to-fine levels depending on the object and its class label e. g., generic objects and scenes rely on their global appearance while fine-grained objects rely more on their localized texture patterns. Multi-scale representations are popular in image deblurring, super-resolution and image recognition but they have not been investigated in few-shot learning due to its relational nature complicating the use of standard techniques. In this paper, we propose a novel multi-scale relation network based on the properties of second-order pooling to estimate image relations in few-shot setting. To optimize the model, we leverage a scale selector to re-weight scale-wise representations based on their second-order features. Furthermore, we propose to a apply self-supervised scale prediction. Specifically, we leverage an extra discriminator to predict the scale labels and the scale discrepancy between pairs of images. Our model achieves state-of-the-art results on standard few-shot learning datasets.
Tasks Deblurring, Few-Shot Learning, Super-Resolution
Published 2020-01-06
URL https://arxiv.org/abs/2001.01600v1
PDF https://arxiv.org/pdf/2001.01600v1.pdf
PWC https://paperswithcode.com/paper/few-shot-learning-with-multi-scale-self

A Machine Learning Imaging Core using Separable FIR-IIR Filters

Title A Machine Learning Imaging Core using Separable FIR-IIR Filters
Authors Masayoshi Asama, Leo F. Isikdogan, Sushma Rao, Bhavin V. Nayak, Gilad Michael
Abstract We propose fixed-function neural network hardware that is designed to perform pixel-to-pixel image transformations in a highly efficient way. We use a fully trainable, fixed-topology neural network to build a model that can perform a wide variety of image processing tasks. Our model uses compressed skip lines and hybrid FIR-IIR blocks to reduce the latency and hardware footprint. Our proposed Machine Learning Imaging Core, dubbed MagIC, uses a silicon area of ~3mm^2 (in TSMC 16nm), which is orders of magnitude smaller than a comparable pixel-wise dense prediction model. MagIC requires no DDR bandwidth, no SRAM, and practically no external memory. Each MagIC core consumes 56mW (215 mW max power) at 500MHz and achieves an energy-efficient throughput of 23TOPS/W/mm^2. MagIC can be used as a multi-purpose image processing block in an imaging pipeline, approximating compute-heavy image processing applications, such as image deblurring, denoising, and colorization, within the power and silicon area limits of mobile devices.
Tasks Colorization, Deblurring, Denoising
Published 2020-01-02
URL https://arxiv.org/abs/2001.00630v1
PDF https://arxiv.org/pdf/2001.00630v1.pdf
PWC https://paperswithcode.com/paper/a-machine-learning-imaging-core-using

Histogram Layers for Texture Analysis

Title Histogram Layers for Texture Analysis
Authors Joshua Peeples, Weihuang Xu, Alina Zare
Abstract We present a histogram layer for artificial neural networks (ANNs). An essential aspect of texture analysis is the extraction of features that describe the distribution of values in local spatial regions. The proposed histogram layer directly computes the spatial distribution of features for texture analysis and parameters for the layer are estimated during backpropagation. We compare our method with state-of-the-art texture encoding methods such as the Deep Encoding Network Pooling (DEP), Deep Texture Encoding Network (DeepTEN), Fisher Vector convolutional neural network (FV-CNN), and Multi-level Texture Encoding and Representation (MuLTER) on three material/texture datasets: (1) the Describable Texture Dataset (DTD); (2) an extension of the ground terrain in outdoor scenes (GTOS-mobile); (3) and a subset of the Materials in Context (MINC-2500) dataset. Results indicate that the inclusion of the proposed histogram layer improves performance. The source code for the histogram layer is publicly available.
Tasks Texture Classification
Published 2020-01-01
URL https://arxiv.org/abs/2001.00215v6
PDF https://arxiv.org/pdf/2001.00215v6.pdf
PWC https://paperswithcode.com/paper/histogram-layers-for-texture-analysis

Deep Semantic Face Deblurring

Title Deep Semantic Face Deblurring
Authors Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, Ming-Hsuan Yang
Abstract In this paper, we propose an effective and efficient face deblurring algorithm by exploiting semantic cues via deep convolutional neural networks. As the human faces are highly structured and share unified facial components (e.g., eyes and mouths), such semantic information provides a strong prior for restoration. We incorporate face semantic labels as input priors and propose an adaptive structural loss to regularize facial local structures within an end-to-end deep convolutional neural network. Specifically, we first use a coarse deblurring network to reduce the motion blur on the input face image. We then adopt a parsing network to extract the semantic features from the coarse deblurred image. Finally, the fine deblurring network utilizes the semantic information to restore a clear face image. We train the network with perceptual and adversarial losses to generate photo-realistic results. The proposed method restores sharp images with more accurate facial features and details. Quantitative and qualitative evaluations demonstrate that the proposed face deblurring algorithm performs favorably against the state-of-the-art methods in terms of restoration quality, face recognition and execution speed.
Tasks Deblurring, Face Recognition
Published 2020-01-19
URL https://arxiv.org/abs/2001.06822v1
PDF https://arxiv.org/pdf/2001.06822v1.pdf
PWC https://paperswithcode.com/paper/deep-semantic-face-deblurring-2

Radiomic feature selection for lung cancer classifiers

Title Radiomic feature selection for lung cancer classifiers
Authors Hina Shakir, Haroon Rasheed, Tariq Mairaj Rasool Khan
Abstract Machine learning methods with quantitative imaging features integration have recently gained a lot of attention for lung nodule classification. However, there is a dearth of studies in the literature on effective features ranking methods for classification purpose. Moreover, optimal number of features required for the classification task also needs to be evaluated. In this study, we investigate the impact of supervised and unsupervised feature selection techniques on machine learning methods for nodule classification in Computed Tomography (CT) images. The research work explores the classification performance of Naive Bayes and Support Vector Machine(SVM) when trained with 2, 4, 8, 12, 16 and 20 highly ranked features from supervised and unsupervised ranking approaches. The best classification results were achieved using SVM trained with 8 radiomic features selected from supervised feature ranking methods and the accuracy was 100%. The study further revealed that very good nodule classification can be achieved by training any of the SVM or Naive Bayes with a fewer radiomic features. A periodic increment in the number of radiomic features from 2 to 20 did not improve the classification results whether the selection was made using supervised or unsupervised ranking approaches.
Tasks Computed Tomography (CT), Feature Selection, Lung Nodule Classification
Published 2020-03-16
URL https://arxiv.org/abs/2003.07098v1
PDF https://arxiv.org/pdf/2003.07098v1.pdf
PWC https://paperswithcode.com/paper/radiomic-feature-selection-for-lung-cancer

On Learning Sets of Symmetric Elements

Title On Learning Sets of Symmetric Elements
Authors Haggai Maron, Or Litany, Gal Chechik, Ethan Fetaya
Abstract Learning from unordered sets is a fundamental learning setup, which is attracting increasing attention. Research in this area has focused on the case where elements of the set are represented by feature vectors, and far less emphasis has been given to the common case where set elements themselves adhere to certain symmetries. That case is relevant to numerous applications, from deblurring image bursts to multi-view 3D shape recognition and reconstruction. In this paper, we present a principled approach to learning sets of general symmetric elements. We first characterize the space of linear layers that are equivariant both to element reordering and to the inherent symmetries of elements, like translation in the case of images. We further show that networks that are composed of these layers, called Deep Sets for Symmetric elements layers (DSS), are universal approximators of both invariant and equivariant functions. DSS layers are also straightforward to implement. Finally, we show that they improve over existing set-learning architectures in a series of experiments with images, graphs, and point-clouds.
Tasks 3D Shape Recognition, Deblurring
Published 2020-02-20
URL https://arxiv.org/abs/2002.08599v1
PDF https://arxiv.org/pdf/2002.08599v1.pdf
PWC https://paperswithcode.com/paper/on-learning-sets-of-symmetric-elements

Motion Deblurring using Spatiotemporal Phase Aperture Coding

Title Motion Deblurring using Spatiotemporal Phase Aperture Coding
Authors Shay Elmalem, Raja Giryes, Emanuel Marom
Abstract Motion blur is a known issue in photography, as it limits the exposure time while capturing moving objects. Extensive research has been carried to compensate for it. In this work, a computational imaging approach for motion deblurring is proposed and demonstrated. Using dynamic phase-coding in the lens aperture during the image acquisition, the trajectory of the motion is encoded in an intermediate optical image. This encoding embeds both the motion direction and extent by coloring the spatial blur of each object. The color cues serve as prior information for a blind deblurring process, implemented using a convolutional neural network (CNN) trained to utilize such coding for image restoration. We demonstrate the advantage of the proposed approach over blind-deblurring with no coding and other solutions that use coded acquisition, both in simulation and real-world experiments.
Tasks Deblurring, Image Restoration
Published 2020-02-18
URL https://arxiv.org/abs/2002.07483v1
PDF https://arxiv.org/pdf/2002.07483v1.pdf
PWC https://paperswithcode.com/paper/motion-deblurring-using-spatiotemporal-phase

Semi-Automatic Generation of Tight Binary Masks and Non-Convex Isosurfaces for Quantitative Analysis of 3D Biological Samples

Title Semi-Automatic Generation of Tight Binary Masks and Non-Convex Isosurfaces for Quantitative Analysis of 3D Biological Samples
Authors Sourabh Bhide, Ralf Mikut, Maria Leptin, Johannes Stegmaier
Abstract Current in vivo microscopy allows us detailed spatiotemporal imaging (3D+t) of complete organisms and offers insights into their development on the cellular level. Even though the imaging speed and quality is steadily improving, fully-automated segmentation and analysis methods are often not accurate enough. This is particularly true while imaging large samples (100um - 1mm) and deep inside the specimen. Drosophila embryogenesis, widely used as a developmental paradigm, presents an example for such a challenge, especially where cell outlines need to imaged - a general challenge in other systems as well. To deal with the current bottleneck in analyzing quantitatively the 3D+t light-sheet microscopy images of Drosophila embryos, we developed a collection of semi-automatic open-source tools. The presented methods include a semi-automatic masking procedure, automatic projection of non-convex 3D isosurfaces to 2D representations as well as cell segmentation and tracking.
Tasks Cell Segmentation
Published 2020-01-30
URL https://arxiv.org/abs/2001.11469v1
PDF https://arxiv.org/pdf/2001.11469v1.pdf
PWC https://paperswithcode.com/paper/semi-automatic-generation-of-tight-binary

Multi-Complementary and Unlabeled Learning for Arbitrary Losses and Models

Title Multi-Complementary and Unlabeled Learning for Arbitrary Losses and Models
Authors Yuzhou Cao, Yitian Xu
Abstract A weakly-supervised learning framework named as complementary-label learning has been proposed recently, where each sample is equipped with a single complementary label that denotes one of the classes the sample does not belong to. However, the existing complementary-label learning methods cannot learn from the easily accessible unlabeled samples and samples with multiple complementary labels, which are more informative. In this paper, to remove these limitations, we propose the novel multi-complementary and unlabeled learning framework that allows unbiased estimation of classification risk from samples with any number of complementary labels and unlabeled samples, for arbitrary loss functions and models. We first give an unbiased estimator of the classification risk from samples with multiple complementary labels, and then further improve the estimator by incorporating unlabeled samples into the risk formulation. The estimation error bounds show that the proposed methods are in the optimal parametric convergence rate. Finally, the experiments on both linear and deep models show the effectiveness of our methods.
Published 2020-01-13
URL https://arxiv.org/abs/2001.04243v2
PDF https://arxiv.org/pdf/2001.04243v2.pdf
PWC https://paperswithcode.com/paper/multi-complementary-and-unlabeled-learning

Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

Title Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances
Authors Phillip Keung, Wei Niu, Yichao Lu, Julian Salazar, Vikas Bhardwaj
Abstract We discuss the problem of echographic transcription in autoregressive sequence-to-sequence attentional architectures for automatic speech recognition, where a model produces very long sequences of repetitive outputs when presented with out-of-domain utterances. We decode audio from the British National Corpus with an attentional encoder-decoder model trained solely on the LibriSpeech corpus. We observe that there are many 5-second recordings that produce more than 500 characters of decoding output (i.e. more than 100 characters per second). A frame-synchronous hybrid (DNN-HMM) model trained on the same data does not produce these unusually long transcripts. These decoding issues are reproducible in a speech transformer model from ESPnet, and to a lesser extent in a self-attention CTC model, suggesting that these issues are intrinsic to the use of the attention mechanism. We create a separate length prediction model to predict the correct number of wordpieces in the output, which allows us to identify and truncate problematic decoding results without increasing word error rates on the LibriSpeech task.
Tasks Speech Recognition
Published 2020-02-12
URL https://arxiv.org/abs/2002.05150v1
PDF https://arxiv.org/pdf/2002.05150v1.pdf
PWC https://paperswithcode.com/paper/attentional-speech-recognition-models

Stein Self-Repulsive Dynamics: Benefits From Past Samples

Title Stein Self-Repulsive Dynamics: Benefits From Past Samples
Authors Mao Ye, Tongzheng Ren, Qiang Liu
Abstract We propose a new Stein self-repulsive dynamics for obtaining diversified samples from intractable un-normalized distributions. Our idea is to introduce Stein variational gradient as a repulsive force to push the samples of Langevin dynamics away from the past trajectories. This simple idea allows us to significantly decrease the auto-correlation in Langevin dynamics and hence increase the effective sample size. Importantly, as we establish in our theoretical analysis, the asymptotic stationary distribution remains correct even with the addition of the repulsive force, thanks to the special properties of the Stein variational gradient. We perform extensive empirical studies of our new algorithm, showing that our method yields much higher sample efficiency and better uncertainty estimation than vanilla Langevin dynamics.
Published 2020-02-21
URL https://arxiv.org/abs/2002.09070v1
PDF https://arxiv.org/pdf/2002.09070v1.pdf
PWC https://paperswithcode.com/paper/stein-self-repulsive-dynamics-benefits-from-1

Behavior Cloning in OpenAI using Case Based Reasoning

Title Behavior Cloning in OpenAI using Case Based Reasoning
Authors Chad Peters, Babak Esfandiari, Mohamad Zalat, Robert West
Abstract Learning from Observation (LfO), also known as Behavioral Cloning, is an approach for building software agents by recording the behavior of an expert (human or artificial) and using the recorded data to generate the required behavior. jLOAF is a platform that uses Case-Based Reasoning to achieve LfO. In this paper we interface jLOAF with the popular OpenAI Gym environment. Our experimental results show how our approach can be used to provide a baseline for comparison in this domain, as well as identify the strengths and weaknesses when dealing with environmental complexity.
Published 2020-02-23
URL https://arxiv.org/abs/2002.11197v1
PDF https://arxiv.org/pdf/2002.11197v1.pdf
PWC https://paperswithcode.com/paper/behavior-cloning-in-openai-using-case-based

Deep Gated Networks: A framework to understand training and generalisation in deep learning

Title Deep Gated Networks: A framework to understand training and generalisation in deep learning
Authors Chandrashekar Lakshminarayanan, Amit Vikram Singh
Abstract Understanding the role of (stochastic) gradient descent (SGD) in the training and generalisation of deep neural networks (DNNs) with ReLU activation has been the object study in the recent past. In this paper, we make use of deep gated networks (DGNs) as a framework to obtain insights about DNNs with ReLU activation. In DGNs, a single neuronal unit has two components namely the pre-activation input (equal to the inner product the weights of the layer and the previous layer outputs), and a gating value which belongs to $[0,1]$ and the output of the neuronal unit is equal to the multiplication of pre-activation input and the gating value. The standard DNN with ReLU activation, is a special case of the DGNs, wherein the gating value is $1/0$ based on whether or not the pre-activation input is positive or negative. We theoretically analyse and experiment with several variants of DGNs, each variant suited to understand a particular aspect of either training or generalisation in DNNs with ReLU activation. Our theory throws light on two questions namely i) why increasing depth till a point helps in training and ii) why increasing depth beyond a point hurts training? We also present experimental evidence to show that gate adaptation, i.e., the change of gating value through the course of training is key for generalisation.
Published 2020-02-10
URL https://arxiv.org/abs/2002.03996v2
PDF https://arxiv.org/pdf/2002.03996v2.pdf
PWC https://paperswithcode.com/paper/deep-gated-networks-a-framework-to-understand

Beyond the Camera: Neural Networks in World Coordinates

Title Beyond the Camera: Neural Networks in World Coordinates
Authors Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Karteek Alahari
Abstract Eye movement and strategic placement of the visual field onto the retina, gives animals increased resolution of the scene and suppresses distracting information. This fundamental system has been missing from video understanding with deep networks, typically limited to 224 by 224 pixel content locked to the camera frame. We propose a simple idea, WorldFeatures, where each feature at every layer has a spatial transformation, and the feature map is only transformed as needed. We show that a network built with these WorldFeatures, can be used to model eye movements, such as saccades, fixation, and smooth pursuit, even in a batch setting on pre-recorded video. That is, the network can for example use all 224 by 224 pixels to look at a small detail one moment, and the whole scene the next. We show that typical building blocks, such as convolutions and pooling, can be adapted to support WorldFeatures using available tools. Experiments are presented on the Charades, Olympic Sports, and Caltech-UCSD Birds-200-2011 datasets, exploring action recognition, fine-grained recognition, and video stabilization.
Tasks Video Understanding
Published 2020-03-12
URL https://arxiv.org/abs/2003.05614v1
PDF https://arxiv.org/pdf/2003.05614v1.pdf
PWC https://paperswithcode.com/paper/beyond-the-camera-neural-networks-in-world

Classifying All Interacting Pairs in a Single Shot

Title Classifying All Interacting Pairs in a Single Shot
Authors Sanaa Chafik, Astrid Orcesi, Romaric Audigier, Bertrand Luvison
Abstract In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Classifying ALl Interacting Pairs in a Single shOt), a classifier of human-object interactions. This new single-shot interaction classifier estimates interactions simultaneously for all human-object pairs, regardless of their number and class. State-of-the-art approaches adopt a multi-shot strategy based on a pairwise estimate of interactions for a set of human-object candidate pairs, which leads to a complexity depending, at least, on the number of interactions or, at most, on the number of candidate pairs. In contrast, the proposed method estimates the interactions on the whole image. Indeed, it simultaneously estimates all interactions between all human subjects and object targets by performing a single forward pass throughout the image. Consequently, it leads to a constant complexity and computation time independent of the number of subjects, objects or interactions in the image. In detail, interaction classification is achieved on a dense grid of anchors thanks to a joint multi-task network that learns three complementary tasks simultaneously: (i) prediction of the types of interaction, (ii) estimation of the presence of a target and (iii) learning of an embedding which maps interacting subject and target to a same representation, by using a metric learning strategy. In addition, we introduce an object-centric passive-voice verb estimation which significantly improves results. Evaluations on the two well-known Human-Object Interaction image datasets, V-COCO and HICO-DET, demonstrate the competitiveness of the proposed method (2nd place) compared to the state-of-the-art while having constant computation time regardless of the number of objects and interactions in the image.
Tasks Human-Object Interaction Detection, Metric Learning
Published 2020-01-13
URL https://arxiv.org/abs/2001.04360v1
PDF https://arxiv.org/pdf/2001.04360v1.pdf
PWC https://paperswithcode.com/paper/classifying-all-interacting-pairs-in-a-single
comments powered by Disqus