January 26, 2020

3037 words 15 mins read

Paper Group ANR 1580

Paper Group ANR 1580

Predictive Ensemble Learning with Application to Scene Text Detection. A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification. Optimal Algorithms for Ski Rental with Soft Machine-Learned Predictions. Skeleton-Based Online Action Prediction Using Scale Selection Network. Scene Text Synthesis for Efficient and …

Predictive Ensemble Learning with Application to Scene Text Detection

Title Predictive Ensemble Learning with Application to Scene Text Detection
Authors Danlu Chen, Xu-Yao Zhang, Wei Zhang, Yao Lu, Xiuli Li, Tao Mei
Abstract Deep learning based approaches have achieved significant progresses in different tasks like classification, detection, segmentation, and so on. Ensemble learning is widely known to further improve performance by combining multiple complementary models. It is easy to apply ensemble learning for classification tasks, for example, based on averaging, voting, or other methods. However, for other tasks (like object detection) where the outputs are varying in quantity and unable to be simply compared, the ensemble of multiple models become difficult. In this paper, we propose a new method called Predictive Ensemble Learning (PEL), based on powerful predictive ability of deep neural networks, to directly predict the best performing model among a pool of base models for each test example, thus transforming ensemble learning to a traditional classification task. Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression. Experimental results show the possibility and potential of PEL in predicting different models’ performance based only on a query example, which can be extended for ensemble learning in many other complex tasks.
Tasks Object Detection, Scene Text Detection
Published 2019-05-12
URL https://arxiv.org/abs/1905.04641v2
PDF https://arxiv.org/pdf/1905.04641v2.pdf
PWC https://paperswithcode.com/paper/predictive-ensemble-learning-with-application
Repo
Framework

A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification

Title A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification
Authors Farid Kenarangi, Inna Partin-Vaisband
Abstract Mixed-signal machine-learning classification has recently been demonstrated as an efficient alternative for classification with power expensive digital circuits. In this paper, a high-COnfidence high-REsolution (CORE) mixed-signal classifier is proposed for classifying high-dimensional input data into multi-class output space with less power and area than state-of-the-art classifiers. A high-resolution multiplication is facilitated within a single-MOSFET by feeding the features and feature weights into, respectively, the body and gate inputs. High-resolution classifier that considers the confidence of the individual predictors is designed at 45 nm technology node and operates at 100 MHz in subthreshold region. To evaluate the performance of the classifier, a reduced MNIST dataset is generated by downsampling the MNIST digit images from 28 $\times$ 28 features to 9 $\times$ 9 features. The system is simulated across a wide range of PVT variations, exhibiting nominal accuracy of 90%, energy consumption of 6.2 pJ per classification (over 45 times lower than state-of-the-art classifiers), area of 2,179 $\mu$$m^{2}$ (over 7.3 times lower than state-of-the-art classifiers), and a stable response under PVT variations.
Tasks
Published 2019-10-21
URL https://arxiv.org/abs/1910.09597v1
PDF https://arxiv.org/pdf/1910.09597v1.pdf
PWC https://paperswithcode.com/paper/a-single-mosfet-mac-for-confidence-and
Repo
Framework

Optimal Algorithms for Ski Rental with Soft Machine-Learned Predictions

Title Optimal Algorithms for Ski Rental with Soft Machine-Learned Predictions
Authors Rohan Kodialam
Abstract We consider a variant of the classic Ski Rental online algorithm with applications to machine learning. In our variant, we allow the skier access to a black-box machine-learning algorithm that provides an estimate of the probability that there will be at most a threshold number of ski-days. We derive a class of optimal randomized algorithms to determine the strategy that minimizes the worst-case expected competitive ratio for the skier given a prediction from the machine learning algorithm,and analyze the performance and robustness of these algorithms.
Tasks
Published 2019-02-28
URL http://arxiv.org/abs/1903.00092v2
PDF http://arxiv.org/pdf/1903.00092v2.pdf
PWC https://paperswithcode.com/paper/optimal-algorithms-for-ski-rental-with-soft
Repo
Framework

Skeleton-Based Online Action Prediction Using Scale Selection Network

Title Skeleton-Based Online Action Prediction Using Scale Selection Network
Authors Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, Alex C. Kot
Abstract Action prediction is to recognize the class label of an ongoing activity when only a part of it is observed. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis. Since there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed to make our network focus on the performed part of the ongoing action and try to suppress the possible incoming interference from the previous actions at each step. An activation sharing scheme is also proposed to handle the overlapping computations among the adjacent time steps, which enables our framework to run more efficiently. Moreover, to enhance the performance of our framework for action prediction with the skeletal input data, a hierarchy of dilated tree convolutions are also designed to learn the multi-level structured semantic representations over the skeleton joints at each frame. Our proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of our method for skeleton-based online action prediction.
Tasks Skeleton Based Action Recognition
Published 2019-02-08
URL http://arxiv.org/abs/1902.03084v2
PDF http://arxiv.org/pdf/1902.03084v2.pdf
PWC https://paperswithcode.com/paper/skeleton-based-online-action-prediction-using
Repo
Framework

Scene Text Synthesis for Efficient and Effective Deep Network Training

Title Scene Text Synthesis for Efficient and Effective Deep Network Training
Authors Fangneng Zhan, Hongyuan Zhu, Shijian Lu
Abstract A large amount of annotated training images is critical for training accurate and robust deep network models but the collection of a large amount of annotated training images is often time-consuming and costly. Image synthesis alleviates this constraint by generating annotated training images automatically by machines which has attracted increasing interest in the recent deep learning research. We develop an innovative image synthesis technique that composes annotated training images by realistically embedding foreground objects of interest (OOI) into background images. The proposed technique consists of two key components that in principle boost the usefulness of the synthesized images in deep network training. The first is context-aware semantic coherence which ensures that the OOI are placed around semantically coherent regions within the background image. The second is harmonious appearance adaptation which ensures that the embedded OOI are agreeable to the surrounding background from both geometry alignment and appearance realism. The proposed technique has been evaluated over two related but very different computer vision challenges, namely, scene text detection and scene text recognition. Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique - the use of our synthesized images in deep network training is capable of achieving similar or even better scene text detection and scene text recognition performance as compared with using real images.
Tasks Image Generation, Scene Text Detection, Scene Text Recognition
Published 2019-01-26
URL http://arxiv.org/abs/1901.09193v1
PDF http://arxiv.org/pdf/1901.09193v1.pdf
PWC https://paperswithcode.com/paper/scene-text-synthesis-for-efficient-and
Repo
Framework

Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures

Title Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures
Authors Dylan Cashman, Adam Perer, Remco Chang, Hendrik Strobelt
Abstract Deep learning models require the configuration of many layers and parameters in order to get good results. However, there are currently few systematic guidelines for how to configure a successful model. This means model builders often have to experiment with different configurations by manually programming different architectures (which is tedious and time consuming) or rely on purely automated approaches to generate and train the architectures (which is expensive). In this paper, we present Rapid Exploration of Model Architectures and Parameters, or REMAP, a visual analytics tool that allows a model builder to discover a deep learning model quickly via exploration and rapid experimentation of neural network architectures. In REMAP, the user explores the large and complex parameter space for neural network architectures using a combination of global inspection and local experimentation. Through a visual overview of a set of models, the user identifies interesting clusters of architectures. Based on their findings, the user can run ablation and variation experiments to identify the effects of adding, removing, or replacing layers in a given architecture and generate new models accordingly. They can also handcraft new models using a simple graphical interface. As a result, a model builder can build deep learning models quickly, efficiently, and without manual programming. We inform the design of REMAP through a design study with four deep learning model builders. Through a use case, we demonstrate that REMAP allows users to discover performant neural network architectures efficiently using visual exploration and user-defined semi-automated searches through the model space.
Tasks
Published 2019-07-30
URL https://arxiv.org/abs/1908.00387v1
PDF https://arxiv.org/pdf/1908.00387v1.pdf
PWC https://paperswithcode.com/paper/ablate-variate-and-contemplate-visual
Repo
Framework

Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations

Title Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations
Authors Sven Gowal, Chongli Qin, Po-Sen Huang, Taylan Cemgil, Krishnamurthy Dvijotham, Timothy Mann, Pushmeet Kohli
Abstract Recent research has made the surprising finding that state-of-the-art deep learning models sometimes fail to generalize to small variations of the input. Adversarial training has been shown to be an effective approach to overcome this problem. However, its application has been limited to enforcing invariance to analytically defined transformations like $\ell_p$-norm bounded perturbations. Such perturbations do not necessarily cover plausible real-world variations that preserve the semantics of the input (such as a change in lighting conditions). In this paper, we propose a novel approach to express and formalize robustness to these kinds of real-world transformations of the input. The two key ideas underlying our formulation are (1) leveraging disentangled representations of the input to define different factors of variations, and (2) generating new input images by adversarially composing the representations of different images. We use a StyleGAN model to demonstrate the efficacy of this framework. Specifically, we leverage the disentangled latent representations computed by a StyleGAN model to generate perturbations of an image that are similar to real-world variations (like adding make-up, or changing the skin-tone of a person) and train models to be invariant to these perturbations. Extensive experiments show that our method improves generalization and reduces the effect of spurious correlations (reducing the error rate of a “smile” detector by 21% for example).
Tasks
Published 2019-12-06
URL https://arxiv.org/abs/1912.03192v2
PDF https://arxiv.org/pdf/1912.03192v2.pdf
PWC https://paperswithcode.com/paper/achieving-robustness-in-the-wild-via
Repo
Framework

Calibration of fisheye camera using entrance pupil

Title Calibration of fisheye camera using entrance pupil
Authors Peter Fasogbon, Emre Aksu
Abstract Most conventional camera calibration algorithms assume that the imaging device has a Single Viewpoint (SVP). This is not necessarily true for special imaging device such as fisheye lenses. As a consequence, the intrinsic camera calibration result is not always reliable. In this paper, we propose a new formation model that tends to relax this assumption so that a Non-Single Viewpoint (NSVP) system is corrected to always maintain a SVP, by taking into account the variation of the Entrance Pupil (EP) using thin lens modeling. In addition, we present a calibration procedure for the image formation to estimate these EP parameters using non linear optimization procedure with bundle adjustment. From experiments, we are able to obtain slightly better re-projection error than traditional methods, and the camera parameters are better estimated. The proposed calibration procedure is simple and can easily be integrated to any other thin lens image formation model.
Tasks Calibration
Published 2019-07-03
URL https://arxiv.org/abs/1907.01759v1
PDF https://arxiv.org/pdf/1907.01759v1.pdf
PWC https://paperswithcode.com/paper/calibration-of-fisheye-camera-using-entrance
Repo
Framework

Learning protein conformational space by enforcing physics with convolutions and latent interpolations

Title Learning protein conformational space by enforcing physics with convolutions and latent interpolations
Authors Venkata K. Ramaswamy, Chris G. Willcocks, Matteo T. Degiacomi
Abstract Determining the different conformational states of a protein and the transition paths between them is key to fully understanding the relationship between biomolecular structure and function. This can be accomplished by sampling protein conformational space with molecular simulation methodologies. Despite advances in computing hardware and sampling techniques, simulations always yield a discretized representation of this space, with transition states undersampled proportionally to their associated energy barrier. We present a convolutional neural network that learns a continuous conformational space representation from example structures, and loss functions that ensure intermediates between examples are physically plausible. We show that this network, trained with simulations of distinct protein states, can correctly predict a biologically relevant non-linear transition path, without any example on the path provided. We also show we can transfer features learnt from one protein to others, which results in superior performances, and requires a surprisingly small number of training examples.
Tasks Transfer Learning
Published 2019-10-10
URL https://arxiv.org/abs/1910.04543v2
PDF https://arxiv.org/pdf/1910.04543v2.pdf
PWC https://paperswithcode.com/paper/learning-protein-conformational-space-by
Repo
Framework

Axial Attention in Multidimensional Transformers

Title Axial Attention in Multidimensional Transformers
Authors Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, Tim Salimans
Abstract We propose Axial Transformers, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors. Existing autoregressive models either suffer from excessively large computational resource requirements for high dimensional data, or make compromises in terms of distribution expressiveness or ease of implementation in order to decrease resource requirements. Our architecture, by contrast, maintains both full expressiveness over joint distributions over data and ease of implementation with standard deep learning frameworks, while requiring reasonable memory and computation and achieving state-of-the-art results on standard generative modeling benchmarks. Our models are based on axial attention, a simple generalization of self-attention that naturally aligns with the multiple dimensions of the tensors in both the encoding and the decoding settings. Notably the proposed structure of the layers allows for the vast majority of the context to be computed in parallel during decoding without introducing any independence assumptions. This semi-parallel structure goes a long way to making decoding from even a very large Axial Transformer broadly applicable. We demonstrate state-of-the-art results for the Axial Transformer on the ImageNet-32 and ImageNet-64 image benchmarks as well as on the BAIR Robotic Pushing video benchmark. We open source the implementation of Axial Transformers.
Tasks
Published 2019-12-20
URL https://arxiv.org/abs/1912.12180v1
PDF https://arxiv.org/pdf/1912.12180v1.pdf
PWC https://paperswithcode.com/paper/axial-attention-in-multidimensional-1
Repo
Framework

The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction

Title The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction
Authors Phu Mon Htut, Joel Tetreault
Abstract In recent years, sequence-to-sequence models have been very effective for end-to-end grammatical error correction (GEC). As creating human-annotated parallel corpus for GEC is expensive and time-consuming, there has been work on artificial corpus generation with the aim of creating sentences that contain realistic grammatical errors from grammatically correct sentences. In this paper, we investigate the impact of using recent neural models for generating errors to help neural models to correct errors. We conduct a battery of experiments on the effect of data size, models, and comparison with a rule-based approach.
Tasks Grammatical Error Correction
Published 2019-07-21
URL https://arxiv.org/abs/1907.08889v1
PDF https://arxiv.org/pdf/1907.08889v1.pdf
PWC https://paperswithcode.com/paper/the-unbearable-weight-of-generating
Repo
Framework

Higher-order Comparisons of Sentence Encoder Representations

Title Higher-order Comparisons of Sentence Encoder Representations
Authors Mostafa Abdou, Artur Kulmizev, Felix Hill, Daniel M. Low, Anders Søgaard
Abstract Representational Similarity Analysis (RSA) is a technique developed by neuroscientists for comparing activity patterns of different measurement modalities (e.g., fMRI, electrophysiology, behavior). As a framework, RSA has several advantages over existing approaches to interpretation of language encoders based on probing or diagnostic classification: namely, it does not require large training samples, is not prone to overfitting, and it enables a more transparent comparison between the representational geometries of different models and modalities. We demonstrate the utility of RSA by establishing a previously unknown correspondence between widely-employed pretrained language encoders and human processing difficulty via eye-tracking data, showcasing its potential in the interpretability toolbox for neural models
Tasks Eye Tracking
Published 2019-09-01
URL https://arxiv.org/abs/1909.00303v2
PDF https://arxiv.org/pdf/1909.00303v2.pdf
PWC https://paperswithcode.com/paper/higher-order-comparisons-of-sentence-encoder
Repo
Framework

Deep One-bit Compressive Autoencoding

Title Deep One-bit Compressive Autoencoding
Authors Shahin Khobahi, Arindam Bose, Mojtaba Soltanalian
Abstract Parameterized mathematical models play a central role in understanding and design of complex information systems. However, they often cannot take into account the intricate interactions innate to such systems. On the contrary, purely data-driven approaches do not need explicit mathematical models for data generation and have a wider applicability at the cost of interpretability. In this paper, we consider the design of a one-bit compressive autoencoder, and propose a novel hybrid model-based and data-driven methodology that allows us to not only design the sensing matrix for one-bit data acquisition, but also allows for learning the latent-parameters of an iterative optimization algorithm specifically designed for the problem of one-bit sparse signal recovery. Our results demonstrate a significant improvement compared to state-of-the-art model-based algorithms.
Tasks
Published 2019-12-10
URL https://arxiv.org/abs/1912.05539v1
PDF https://arxiv.org/pdf/1912.05539v1.pdf
PWC https://paperswithcode.com/paper/deep-one-bit-compressive-autoencoding
Repo
Framework

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling

Title Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling
Authors Xinyu Peng, Li Li, Fei-Yue Wang
Abstract Machine learning, especially deep neural networks, has been rapidly developed in fields including computer vision, speech recognition and reinforcement learning. Although Mini-batch SGD is one of the most popular stochastic optimization methods in training deep networks, it shows a slow convergence rate due to the large noise in gradient approximation. In this paper, we attempt to remedy this problem by building more efficient batch selection method based on typicality sampling, which reduces the error of gradient estimation in conventional Minibatch SGD. We analyze the convergence rate of the resulting typical batch SGD algorithm and compare convergence properties between Minibatch SGD and the algorithm. Experimental results demonstrate that our batch selection scheme works well and more complex Minibatch SGD variants can benefit from the proposed batch selection strategy.
Tasks Speech Recognition, Stochastic Optimization
Published 2019-03-11
URL http://arxiv.org/abs/1903.04192v1
PDF http://arxiv.org/pdf/1903.04192v1.pdf
PWC https://paperswithcode.com/paper/accelerating-minibatch-stochastic-gradient-1
Repo
Framework

A Closer Look at Double Backpropagation

Title A Closer Look at Double Backpropagation
Authors Christian Etmann
Abstract In recent years, an increasing number of neural network models have included derivatives with respect to inputs in their loss functions, resulting in so-called double backpropagation for first-order optimization. However, so far no general description of the involved derivatives exists. Here, we cover a wide array of special cases in a very general Hilbert space framework, which allows us to provide optimized backpropagation rules for many real-world scenarios. This includes the reduction of calculations for Frobenius-norm-penalties on Jacobians by roughly a third for locally linear activation functions. Furthermore, we provide a description of the discontinuous loss surface of ReLU networks both in the inputs and the parameters and demonstrate why the discontinuities do not pose a big problem in reality.
Tasks
Published 2019-06-16
URL https://arxiv.org/abs/1906.06637v1
PDF https://arxiv.org/pdf/1906.06637v1.pdf
PWC https://paperswithcode.com/paper/a-closer-look-at-double-backpropagation
Repo
Framework
comments powered by Disqus