January 30, 2020

3122 words 15 mins read

Paper Group ANR 447

Paper Group ANR 447

Band-limited Training and Inference for Convolutional Neural Networks. Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces. Active Scene Understanding via Online Semantic Reconstruction. Towards Omni-Supervised Face Alignment for Large Scale Unlabeled Videos. BARISTA: Efficient and Scal …

Band-limited Training and Inference for Convolutional Neural Networks

Title Band-limited Training and Inference for Convolutional Neural Networks
Authors Adam Dziedzic, John Paparrizos, Sanjay Krishnan, Aaron Elmore, Michael Franklin
Abstract The convolutional layers are core building blocks of neural network architectures. In general, a convolutional filter applies to the entire frequency spectrum of the input data. We explore artificially constraining the frequency spectra of these filters and data, called band-limiting, during training. The frequency domain constraints apply to both the feed-forward and back-propagation steps. Experimentally, we observe that Convolutional Neural Networks (CNNs) are resilient to this compression scheme and results suggest that CNNs learn to leverage lower-frequency components. In particular, we found: (1) band-limited training can effectively control the resource usage (GPU and memory); (2) models trained with band-limited layers retain high prediction accuracy; and (3) requires no modification to existing training algorithms or neural network architectures to use unlike other compression schemes.
Tasks
Published 2019-11-21
URL https://arxiv.org/abs/1911.09287v1
PDF https://arxiv.org/pdf/1911.09287v1.pdf
PWC https://paperswithcode.com/paper/band-limited-training-and-inference-for
Repo
Framework

Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces

Title Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces
Authors Alberto Marchesi, Francesco Trovò, Nicola Gatti
Abstract We tackle the problem of learning equilibria in simulation-based games. In such games, the players’ utility functions cannot be described analytically, as they are given through a black-box simulator that can be queried to obtain noisy estimates of the utilities. This is the case in many real-world games in which a complete description of the elements involved is not available upfront, such as complex military settings and online auctions. In these situations, one usually needs to run costly simulation processes to get an accurate estimate of the game outcome. As a result, solving these games begets the challenge of designing learning algorithms that can find (approximate) equilibria with high confidence, using as few simulator queries as possible. Moreover, since running the simulator during the game is unfeasible, the algorithms must first perform a pure exploration learning phase and, then, use the (approximate) equilibrium learned this way to play the game. In this work, we focus on two-player zero-sum games with infinite strategy spaces. Drawing from the best arm identification literature, we design two algorithms with theoretical guarantees to learn maximin strategies in these games. The first one works in the fixed-confidence setting, guaranteeing the desired confidence level while minimizing the number of queries. Instead, the second algorithm fits the fixed-budget setting, maximizing the confidence without exceeding the given maximum number of queries. First, we formally prove {\delta}-PAC theoretical guarantees for our algorithms under some regularity assumptions, which are encoded by letting the utility functions be drawn from a Gaussian process. Then, we experimentally evaluate our techniques on a testbed made of randomly generated games and instances representing simple real-world security settings.
Tasks
Published 2019-11-18
URL https://arxiv.org/abs/1911.07755v2
PDF https://arxiv.org/pdf/1911.07755v2.pdf
PWC https://paperswithcode.com/paper/learning-probably-approximately-correct
Repo
Framework

Active Scene Understanding via Online Semantic Reconstruction

Title Active Scene Understanding via Online Semantic Reconstruction
Authors Lintao Zheng, Chenyang Zhu, Jiazhao Zhang, Hang Zhao, Hui Huang, Matthias Niessner, Kai Xu
Abstract We propose a novel approach to robot-operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene. Our algorithm is built on top of the volumetric depth fusion framework (e.g., KinectFusion) and performs real-time voxel-based semantic labeling over the online reconstructed volume. The robot is guided by an online estimated discrete viewing score field (VSF) parameterized over the 3D space of 2D location and azimuth rotation. VSF stores for each grid the score of the corresponding view, which measures how much it reduces the uncertainty (entropy) of both geometric reconstruction and semantic labeling. Based on VSF, we select the next best views (NBV) as the target for each time step. We then jointly optimize the traverse path and camera trajectory between two adjacent NBVs, through maximizing the integral viewing score (information gain) along path and trajectory. Through extensive evaluation, we show that our method achieves efficient and accurate online scene parsing during exploratory scanning.
Tasks Scene Parsing, Scene Understanding, Semantic Segmentation
Published 2019-06-18
URL https://arxiv.org/abs/1906.07409v1
PDF https://arxiv.org/pdf/1906.07409v1.pdf
PWC https://paperswithcode.com/paper/active-scene-understanding-via-online
Repo
Framework

Towards Omni-Supervised Face Alignment for Large Scale Unlabeled Videos

Title Towards Omni-Supervised Face Alignment for Large Scale Unlabeled Videos
Authors Congcong Zhu, Hao Liu, Zhenhua Yu, Xuehong Sun
Abstract In this paper, we propose a spatial-temporal relational reasoning networks (STRRN) approach to investigate the problem of omni-supervised face alignment in videos. Unlike existing fully supervised methods which rely on numerous annotations by hand, our learner exploits large scale unlabeled videos plus available labeled data to generate auxiliary plausible training annotations. Motivated by the fact that neighbouring facial landmarks are usually correlated and coherent across consecutive frames, our approach automatically reasons about discriminative spatial-temporal relationships among landmarks for stable face tracking. Specifically, we carefully develop an interpretable and efficient network module, which disentangles facial geometry relationship for every static frame and simultaneously enforces the bi-directional cycle-consistency across adjacent frames, thus allowing the modeling of intrinsic spatial-temporal relations from raw face sequences. Extensive experimental results demonstrate that our approach surpasses the performance of most fully supervised state-of-the-arts.
Tasks Face Alignment, Relational Reasoning
Published 2019-12-16
URL https://arxiv.org/abs/1912.07243v1
PDF https://arxiv.org/pdf/1912.07243v1.pdf
PWC https://paperswithcode.com/paper/towards-omni-supervised-face-alignment-for
Repo
Framework

BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction Services

Title BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction Services
Authors Anirban Bhattacharjee, Ajay Dev Chhokra, Zhuangwei Kang, Hongyang Sun, Aniruddha Gokhale, Gabor Karsai
Abstract Pre-trained deep learning models are increasingly being used to offer a variety of compute-intensive predictive analytics services such as fitness tracking, speech and image recognition. The stateless and highly parallelizable nature of deep learning models makes them well-suited for serverless computing paradigm. However, making effective resource management decisions for these services is a hard problem due to the dynamic workloads and diverse set of available resource configurations that have their deployment and management costs. To address these challenges, we present a distributed and scalable deep-learning prediction serving system called Barista and make the following contributions. First, we present a fast and effective methodology for forecasting workloads by identifying various trends. Second, we formulate an optimization problem to minimize the total cost incurred while ensuring bounded prediction latency with reasonable accuracy. Third, we propose an efficient heuristic to identify suitable compute resource configurations. Fourth, we propose an intelligent agent to allocate and manage the compute resources by horizontal and vertical scaling to maintain the required prediction latency. Finally, using representative real-world workloads for urban transportation service, we demonstrate and validate the capabilities of Barista.
Tasks
Published 2019-04-02
URL http://arxiv.org/abs/1904.01576v2
PDF http://arxiv.org/pdf/1904.01576v2.pdf
PWC https://paperswithcode.com/paper/barista-efficient-and-scalable-serverless
Repo
Framework

Automatic discrete differentiation and its applications

Title Automatic discrete differentiation and its applications
Authors Ai Ishikawa, Takaharu Yaguchi
Abstract In this paper, a method for automatically deriving energy-preserving numerical methods for the Euler-Lagrange equation and the Hamilton equation is proposed. The derived energy-preserving scheme is based on the discrete gradient method. In the proposed approach, the discrete gradient, which is a key tool for designing the scheme, is automatically computed by a similar algorithm to the automatic differentiation. Besides, the discrete gradient coincides with the usual gradient if the two arguments required to define the discrete gradient are the same. Hence the proposed method is an extension of the automatic differentiation in the sense that the proposed method derives not only the discrete gradient but also the usual gradient. Due to this feature, both energy-preserving integrators and variational (and hence symplectic) integrators can be implemented in the same programming code simultaneously. This allows users to freely switch between the energy-preserving numerical method and the symplectic numerical method in accordance with the problem-setting and other requirements. As applications, an energy-preserving numerical scheme for a nonlinear wave equation and a training algorithm of artificial neural networks derived from an energy-dissipative numerical scheme are shown.
Tasks
Published 2019-05-21
URL https://arxiv.org/abs/1905.08604v1
PDF https://arxiv.org/pdf/1905.08604v1.pdf
PWC https://paperswithcode.com/paper/automatic-discrete-differentiation-and-its
Repo
Framework

Feature Losses for Adversarial Robustness

Title Feature Losses for Adversarial Robustness
Authors Kirthi Shankar Sivamani
Abstract Deep learning has made tremendous advances in computer vision tasks such as image classification. However, recent studies have shown that deep learning models are vulnerable to specifically crafted adversarial inputs that are quasi-imperceptible to humans. In this work, we propose a novel approach to defending adversarial attacks. We employ an input processing technique based on denoising autoencoders as a defense. It has been shown that the input perturbations grow and accumulate as noise in feature maps while propagating through a convolutional neural network (CNN). We exploit the noisy feature maps by using an additional subnetwork to extract image feature maps and train an auto-encoder on perceptual losses of these feature maps. This technique achieves close to state-of-the-art results on defending MNIST and CIFAR10 datasets, but more importantly, shows a new way of employing a defense that cannot be trivially trained end-to-end by the attacker. Empirical results demonstrate the effectiveness of this approach on the MNIST and CIFAR10 datasets on simple as well as iterative LP attacks. Our method can be applied as a preprocessing technique to any off the shelf CNN.
Tasks Denoising, Image Classification
Published 2019-12-10
URL https://arxiv.org/abs/1912.04497v1
PDF https://arxiv.org/pdf/1912.04497v1.pdf
PWC https://paperswithcode.com/paper/feature-losses-for-adversarial-robustness
Repo
Framework

Incorporating Biological Knowledge with Factor Graph Neural Network for Interpretable Deep Learning

Title Incorporating Biological Knowledge with Factor Graph Neural Network for Interpretable Deep Learning
Authors Tianle Ma, Aidong Zhang
Abstract While deep learning has achieved great success in many fields, one common criticism about deep learning is its lack of interpretability. In most cases, the hidden units in a deep neural network do not have a clear semantic meaning or correspond to any physical entities. However, model interpretability and explainability are crucial in many biomedical applications. To address this challenge, we developed the Factor Graph Neural Network model that is interpretable and predictable by combining probabilistic graphical models with deep learning. We directly encode biological knowledge such as Gene Ontology as a factor graph into the model architecture, making the model transparent and interpretable. Furthermore, we devised an attention mechanism that can capture multi-scale hierarchical interactions among biological entities such as genes and Gene Ontology terms. With parameter sharing mechanism, the unrolled Factor Graph Neural Network model can be trained with stochastic depth and generalize well. We applied our model to two cancer genomic datasets to predict target clinical variables and achieved better results than other traditional machine learning and deep learning models. Our model can also be used for gene set enrichment analysis and selecting Gene Ontology terms that are important to target clinical variables.
Tasks
Published 2019-06-03
URL https://arxiv.org/abs/1906.00537v1
PDF https://arxiv.org/pdf/1906.00537v1.pdf
PWC https://paperswithcode.com/paper/190600537
Repo
Framework

Dance Hit Song Prediction

Title Dance Hit Song Prediction
Authors Dorien herremans, David Martens, Kenneth Sörensen
Abstract Record companies invest billions of dollars in new talent around the globe each year. Gaining insight into what actually makes a hit song would provide tremendous benefits for the music industry. In this research we tackle this question by focussing on the dance hit song classification problem. A database of dance hit songs from 1985 until 2013 is built, including basic musical features, as well as more advanced features that capture a temporal aspect. A number of different classifiers are used to build and test dance hit prediction models. The resulting best model has a good performance when predicting whether a song is a “top 10” dance hit versus a lower listed position.
Tasks
Published 2019-05-17
URL https://arxiv.org/abs/1905.08076v1
PDF https://arxiv.org/pdf/1905.08076v1.pdf
PWC https://paperswithcode.com/paper/dance-hit-song-prediction
Repo
Framework

Coarse Correlation in Extensive-Form Games

Title Coarse Correlation in Extensive-Form Games
Authors Gabriele Farina, Tommaso Bianchi, Tuomas Sandholm
Abstract Coarse correlation models strategic interactions of rational agents complemented by a correlation device, that is a mediator that can recommend behavior but not enforce it. Despite being a classical concept in the theory of normal-form games for more than forty years, not much is known about the merits of coarse correlation in extensive-form settings. In this paper, we consider two instantiations of the idea of coarse correlation in extensive-form games: normal-form coarse-correlated equilibrium (NFCCE), already defined in the literature, and extensive-form coarse-correlated equilibrium (EFCCE), which we introduce for the first time. We show that EFCCE is a subset of NFCCE and a superset of the related extensive-form correlated equilibrium. We also show that, in two-player extensive-form games, social-welfare-maximizing EFCCEs and NFCEEs are bilinear saddle points, and give new efficient algorithms for the special case of games with no chance moves. In our experiments, our proposed algorithm for NFCCE is two to four orders of magnitude faster than the prior state of the art.
Tasks
Published 2019-08-26
URL https://arxiv.org/abs/1908.09893v1
PDF https://arxiv.org/pdf/1908.09893v1.pdf
PWC https://paperswithcode.com/paper/coarse-correlation-in-extensive-form-games
Repo
Framework

Dermtrainer: A Decision Support System for Dermatological Diseases

Title Dermtrainer: A Decision Support System for Dermatological Diseases
Authors Gernot Salzer, Agata Ciabattoni, Christian Fermüller, Martin Haiduk, Harald Kittler, Arno Lukas, Rosa María Rodríguez Domínguez, Antonia Wesinger, Elisabeth Riedl
Abstract Dermtrainer is a medical decision support system that assists general practitioners in diagnosing skin diseases and serves as a training platform for dermatologists. Its key components are a comprehensive dermatological knowledge base, a clinical algorithm for diagnosing skin diseases, a reasoning component for deducing the most likely differential diagnoses for a patient, and a library of high-quality images. This report describes the technical components of the system, in particular the ranking algorithm for retrieving appropriate diseases as diagnoses.
Tasks
Published 2019-07-01
URL https://arxiv.org/abs/1907.00635v1
PDF https://arxiv.org/pdf/1907.00635v1.pdf
PWC https://paperswithcode.com/paper/dermtrainer-a-decision-support-system-for
Repo
Framework
Title A Framework for Decoding Event-Related Potentials from Text
Authors Shaorong Yan, Aaron Steven White
Abstract We propose a novel framework for modeling event-related potentials (ERPs) collected during reading that couples pre-trained convolutional decoders with a language model. Using this framework, we compare the abilities of a variety of existing and novel sentence processing models to reconstruct ERPs. We find that modern contextual word embeddings underperform surprisal-based models but that, combined, the two outperform either on its own.
Tasks Language Modelling, Word Embeddings
Published 2019-02-27
URL http://arxiv.org/abs/1902.10296v2
PDF http://arxiv.org/pdf/1902.10296v2.pdf
PWC https://paperswithcode.com/paper/a-framework-for-decoding-event-related
Repo
Framework

Identity Crisis: Memorization and Generalization under Extreme Overparameterization

Title Identity Crisis: Memorization and Generalization under Extreme Overparameterization
Authors Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer
Abstract We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task. We examine fully-connected and convolutional networks (FCN and CNN), both linear and nonlinear, initialized randomly and then trained to minimize the reconstruction error. The trained networks stereotypically take one of two forms: the constant function (memorization) and the identity function (generalization). We formally characterize generalization in single-layer FCNs and CNNs. We show empirically that different architectures exhibit strikingly different inductive biases. For example, CNNs of up to 10 layers are able to generalize from a single example, whereas FCNs cannot learn the identity function reliably from 60k examples. Deeper CNNs often fail, but nonetheless do astonishing work to memorize the training output: because CNN biases are location invariant, the model must progressively grow an output pattern from the image boundaries via the coordination of many layers. Our work helps to quantify and visualize the sensitivity of inductive biases to architectural choices such as depth, kernel width, and number of channels.
Tasks
Published 2019-02-13
URL https://arxiv.org/abs/1902.04698v4
PDF https://arxiv.org/pdf/1902.04698v4.pdf
PWC https://paperswithcode.com/paper/identity-crisis-memorization-and
Repo
Framework

Understanding Unconventional Preprocessors in Deep Convolutional Neural Networks for Face Identification

Title Understanding Unconventional Preprocessors in Deep Convolutional Neural Networks for Face Identification
Authors Chollette C. Olisah, Lyndon Smith
Abstract Deep networks have achieved huge successes in application domains like object and face recognition. The performance gain is attributed to different facets of the network architecture such as: depth of the convolutional layers, activation function, pooling, batch normalization, forward and back propagation and many more. However, very little emphasis is made on the preprocessors. Therefore, in this paper, the network’s preprocessing module is varied across different preprocessing approaches while keeping constant other facets of the network architecture, to investigate the contribution preprocessing makes to the network. Commonly used preprocessors are the data augmentation and normalization and are termed conventional preprocessors. Others are termed the unconventional preprocessors, they are: color space converters; HSV, CIE Lab* and YCBCR, grey-level resolution preprocessors; full-based and plane-based image quantization, illumination normalization and insensitive feature preprocessing using: histogram equalization (HE), local contrast normalization (LN) and complete face structural pattern (CFSP). To achieve fixed network parameters, CNNs with transfer learning is employed. Knowledge from the high-level feature vectors of the Inception-V3 network is transferred to offline preprocessed LFW target data; and features trained using the SoftMax classifier for face identification. The experiments show that the discriminative capability of the deep networks can be improved by preprocessing RGB data with HE, full-based and plane-based quantization, rgbGELog, and YCBCR, preprocessors before feeding it to CNNs. However, for best performance, the right setup of preprocessed data with augmentation and/or normalization is required. The plane-based image quantization is found to increase the homogeneity of neighborhood pixels and utilizes reduced bit depth for better storage efficiency.
Tasks Data Augmentation, Face Identification, Face Recognition, Quantization, Transfer Learning
Published 2019-03-27
URL http://arxiv.org/abs/1904.00815v2
PDF http://arxiv.org/pdf/1904.00815v2.pdf
PWC https://paperswithcode.com/paper/understanding-unconventional-preprocessors-in
Repo
Framework

Multi-Scale Dual-Branch Fully Convolutional Network for Hand Parsing

Title Multi-Scale Dual-Branch Fully Convolutional Network for Hand Parsing
Authors Yang Lu, Xiaohui Liang, Frederick W. B. Li
Abstract Recently, fully convolutional neural networks (FCNs) have shown significant performance in image parsing, including scene parsing and object parsing. Different from generic object parsing tasks, hand parsing is more challenging due to small size, complex structure, heavy self-occlusion and ambiguous texture problems. In this paper, we propose a novel parsing framework, Multi-Scale Dual-Branch Fully Convolutional Network (MSDB-FCN), for hand parsing tasks. Our network employs a Dual-Branch architecture to extract features of hand area, paying attention on the hand itself. These features are used to generate multi-scale features with pyramid pooling strategy. In order to better encode multi-scale features, we design a Deconvolution and Bilinear Interpolation Block (DB-Block) for upsampling and merging the features of different scales. To address data imbalance, which is a common problem in many computer vision tasks as well as hand parsing tasks, we propose a generalization of Focal Loss, namely Multi-Class Balanced Focal Loss, to tackle data imbalance in multi-class classification. Extensive experiments on RHD-PARSING dataset demonstrate that our MSDB-FCN has achieved the state-of-the-art performance for hand parsing.
Tasks Scene Parsing
Published 2019-05-24
URL https://arxiv.org/abs/1905.10100v1
PDF https://arxiv.org/pdf/1905.10100v1.pdf
PWC https://paperswithcode.com/paper/multi-scale-dual-branch-fully-convolutional
Repo
Framework
comments powered by Disqus