January 26, 2020

3037 words 15 mins read

Paper Group ANR 1580

Predictive Ensemble Learning with Application to Scene Text Detection. A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification. Optimal Algorithms for Ski Rental with Soft Machine-Learned Predictions. Skeleton-Based Online Action Prediction Using Scale Selection Network. Scene Text Synthesis for Efficient and …

Predictive Ensemble Learning with Application to Scene Text Detection


Title	Predictive Ensemble Learning with Application to Scene Text Detection
Authors	Danlu Chen, Xu-Yao Zhang, Wei Zhang, Yao Lu, Xiuli Li, Tao Mei
Abstract	Deep learning based approaches have achieved significant progresses in different tasks like classification, detection, segmentation, and so on. Ensemble learning is widely known to further improve performance by combining multiple complementary models. It is easy to apply ensemble learning for classification tasks, for example, based on averaging, voting, or other methods. However, for other tasks (like object detection) where the outputs are varying in quantity and unable to be simply compared, the ensemble of multiple models become difficult. In this paper, we propose a new method called Predictive Ensemble Learning (PEL), based on powerful predictive ability of deep neural networks, to directly predict the best performing model among a pool of base models for each test example, thus transforming ensemble learning to a traditional classification task. Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression. Experimental results show the possibility and potential of PEL in predicting different models’ performance based only on a query example, which can be extended for ensemble learning in many other complex tasks.
Tasks	Object Detection, Scene Text Detection
Published	2019-05-12
URL	https://arxiv.org/abs/1905.04641v2
PDF	https://arxiv.org/pdf/1905.04641v2.pdf
PWC	https://paperswithcode.com/paper/predictive-ensemble-learning-with-application
Repo
Framework

A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification


Title	A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification
Authors	Farid Kenarangi, Inna Partin-Vaisband
Abstract	Mixed-signal machine-learning classification has recently been demonstrated as an efficient alternative for classification with power expensive digital circuits. In this paper, a high-COnfidence high-REsolution (CORE) mixed-signal classifier is proposed for classifying high-dimensional input data into multi-class output space with less power and area than state-of-the-art classifiers. A high-resolution multiplication is facilitated within a single-MOSFET by feeding the features and feature weights into, respectively, the body and gate inputs. High-resolution classifier that considers the confidence of the individual predictors is designed at 45 nm technology node and operates at 100 MHz in subthreshold region. To evaluate the performance of the classifier, a reduced MNIST dataset is generated by downsampling the MNIST digit images from 28 $\times$ 28 features to 9 $\times$ 9 features. The system is simulated across a wide range of PVT variations, exhibiting nominal accuracy of 90%, energy consumption of 6.2 pJ per classification (over 45 times lower than state-of-the-art classifiers), area of 2,179 $\mu$$m^{2}$ (over 7.3 times lower than state-of-the-art classifiers), and a stable response under PVT variations.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09597v1
PDF	https://arxiv.org/pdf/1910.09597v1.pdf
PWC	https://paperswithcode.com/paper/a-single-mosfet-mac-for-confidence-and
Repo
Framework

Optimal Algorithms for Ski Rental with Soft Machine-Learned Predictions


Title	Optimal Algorithms for Ski Rental with Soft Machine-Learned Predictions
Authors	Rohan Kodialam
Abstract	We consider a variant of the classic Ski Rental online algorithm with applications to machine learning. In our variant, we allow the skier access to a black-box machine-learning algorithm that provides an estimate of the probability that there will be at most a threshold number of ski-days. We derive a class of optimal randomized algorithms to determine the strategy that minimizes the worst-case expected competitive ratio for the skier given a prediction from the machine learning algorithm,and analyze the performance and robustness of these algorithms.
Tasks
Published	2019-02-28
URL	http://arxiv.org/abs/1903.00092v2
PDF	http://arxiv.org/pdf/1903.00092v2.pdf
PWC	https://paperswithcode.com/paper/optimal-algorithms-for-ski-rental-with-soft
Repo
Framework

Skeleton-Based Online Action Prediction Using Scale Selection Network


Title	Skeleton-Based Online Action Prediction Using Scale Selection Network
Authors	Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, Alex C. Kot
Abstract	Action prediction is to recognize the class label of an ongoing activity when only a part of it is observed. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis. Since there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed to make our network focus on the performed part of the ongoing action and try to suppress the possible incoming interference from the previous actions at each step. An activation sharing scheme is also proposed to handle the overlapping computations among the adjacent time steps, which enables our framework to run more efficiently. Moreover, to enhance the performance of our framework for action prediction with the skeletal input data, a hierarchy of dilated tree convolutions are also designed to learn the multi-level structured semantic representations over the skeleton joints at each frame. Our proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of our method for skeleton-based online action prediction.
Tasks	Skeleton Based Action Recognition
Published	2019-02-08
URL	http://arxiv.org/abs/1902.03084v2
PDF	http://arxiv.org/pdf/1902.03084v2.pdf
PWC	https://paperswithcode.com/paper/skeleton-based-online-action-prediction-using
Repo
Framework

Scene Text Synthesis for Efficient and Effective Deep Network Training


Title	Scene Text Synthesis for Efficient and Effective Deep Network Training
Authors	Fangneng Zhan, Hongyuan Zhu, Shijian Lu
Abstract	A large amount of annotated training images is critical for training accurate and robust deep network models but the collection of a large amount of annotated training images is often time-consuming and costly. Image synthesis alleviates this constraint by generating annotated training images automatically by machines which has attracted increasing interest in the recent deep learning research. We develop an innovative image synthesis technique that composes annotated training images by realistically embedding foreground objects of interest (OOI) into background images. The proposed technique consists of two key components that in principle boost the usefulness of the synthesized images in deep network training. The first is context-aware semantic coherence which ensures that the OOI are placed around semantically coherent regions within the background image. The second is harmonious appearance adaptation which ensures that the embedded OOI are agreeable to the surrounding background from both geometry alignment and appearance realism. The proposed technique has been evaluated over two related but very different computer vision challenges, namely, scene text detection and scene text recognition. Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique - the use of our synthesized images in deep network training is capable of achieving similar or even better scene text detection and scene text recognition performance as compared with using real images.
Tasks	Image Generation, Scene Text Detection, Scene Text Recognition
Published	2019-01-26
URL	http://arxiv.org/abs/1901.09193v1
PDF	http://arxiv.org/pdf/1901.09193v1.pdf
PWC	https://paperswithcode.com/paper/scene-text-synthesis-for-efficient-and
Repo
Framework

Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures


Title	Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures
Authors	Dylan Cashman, Adam Perer, Remco Chang, Hendrik Strobelt
Abstract	Deep learning models require the configuration of many layers and parameters in order to get good results. However, there are currently few systematic guidelines for how to configure a successful model. This means model builders often have to experiment with different configurations by manually programming different architectures (which is tedious and time consuming) or rely on purely automated approaches to generate and train the architectures (which is expensive). In this paper, we present Rapid Exploration of Model Architectures and Parameters, or REMAP, a visual analytics tool that allows a model builder to discover a deep learning model quickly via exploration and rapid experimentation of neural network architectures. In REMAP, the user explores the large and complex parameter space for neural network architectures using a combination of global inspection and local experimentation. Through a visual overview of a set of models, the user identifies interesting clusters of architectures. Based on their findings, the user can run ablation and variation experiments to identify the effects of adding, removing, or replacing layers in a given architecture and generate new models accordingly. They can also handcraft new models using a simple graphical interface. As a result, a model builder can build deep learning models quickly, efficiently, and without manual programming. We inform the design of REMAP through a design study with four deep learning model builders. Through a use case, we demonstrate that REMAP allows users to discover performant neural network architectures efficiently using visual exploration and user-defined semi-automated searches through the model space.
Tasks
Published	2019-07-30
URL	https://arxiv.org/abs/1908.00387v1
PDF	https://arxiv.org/pdf/1908.00387v1.pdf
PWC	https://paperswithcode.com/paper/ablate-variate-and-contemplate-visual
Repo
Framework

Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations


Title	Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations
Authors	Sven Gowal, Chongli Qin, Po-Sen Huang, Taylan Cemgil, Krishnamurthy Dvijotham, Timothy Mann, Pushmeet Kohli
Abstract	Recent research has made the surprising finding that state-of-the-art deep learning models sometimes fail to generalize to small variations of the input. Adversarial training has been shown to be an effective approach to overcome this problem. However, its application has been limited to enforcing invariance to analytically defined transformations like $\ell_p$-norm bounded perturbations. Such perturbations do not necessarily cover plausible real-world variations that preserve the semantics of the input (such as a change in lighting conditions). In this paper, we propose a novel approach to express and formalize robustness to these kinds of real-world transformations of the input. The two key ideas underlying our formulation are (1) leveraging disentangled representations of the input to define different factors of variations, and (2) generating new input images by adversarially composing the representations of different images. We use a StyleGAN model to demonstrate the efficacy of this framework. Specifically, we leverage the disentangled latent representations computed by a StyleGAN model to generate perturbations of an image that are similar to real-world variations (like adding make-up, or changing the skin-tone of a person) and train models to be invariant to these perturbations. Extensive experiments show that our method improves generalization and reduces the effect of spurious correlations (reducing the error rate of a “smile” detector by 21% for example).
Tasks
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03192v2
PDF	https://arxiv.org/pdf/1912.03192v2.pdf
PWC	https://paperswithcode.com/paper/achieving-robustness-in-the-wild-via
Repo
Framework

Calibration of fisheye camera using entrance pupil


Title	Calibration of fisheye camera using entrance pupil
Authors	Peter Fasogbon, Emre Aksu
Abstract	Most conventional camera calibration algorithms assume that the imaging device has a Single Viewpoint (SVP). This is not necessarily true for special imaging device such as fisheye lenses. As a consequence, the intrinsic camera calibration result is not always reliable. In this paper, we propose a new formation model that tends to relax this assumption so that a Non-Single Viewpoint (NSVP) system is corrected to always maintain a SVP, by taking into account the variation of the Entrance Pupil (EP) using thin lens modeling. In addition, we present a calibration procedure for the image formation to estimate these EP parameters using non linear optimization procedure with bundle adjustment. From experiments, we are able to obtain slightly better re-projection error than traditional methods, and the camera parameters are better estimated. The proposed calibration procedure is simple and can easily be integrated to any other thin lens image formation model.
Tasks	Calibration
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01759v1
PDF	https://arxiv.org/pdf/1907.01759v1.pdf
PWC	https://paperswithcode.com/paper/calibration-of-fisheye-camera-using-entrance
Repo
Framework

Learning protein conformational space by enforcing physics with convolutions and latent interpolations


Title	Learning protein conformational space by enforcing physics with convolutions and latent interpolations
Authors	Venkata K. Ramaswamy, Chris G. Willcocks, Matteo T. Degiacomi
Abstract	Determining the different conformational states of a protein and the transition paths between them is key to fully understanding the relationship between biomolecular structure and function. This can be accomplished by sampling protein conformational space with molecular simulation methodologies. Despite advances in computing hardware and sampling techniques, simulations always yield a discretized representation of this space, with transition states undersampled proportionally to their associated energy barrier. We present a convolutional neural network that learns a continuous conformational space representation from example structures, and loss functions that ensure intermediates between examples are physically plausible. We show that this network, trained with simulations of distinct protein states, can correctly predict a biologically relevant non-linear transition path, without any example on the path provided. We also show we can transfer features learnt from one protein to others, which results in superior performances, and requires a surprisingly small number of training examples.
Tasks	Transfer Learning
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04543v2
PDF	https://arxiv.org/pdf/1910.04543v2.pdf
PWC	https://paperswithcode.com/paper/learning-protein-conformational-space-by
Repo
Framework

Axial Attention in Multidimensional Transformers


Title	Axial Attention in Multidimensional Transformers
Authors	Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, Tim Salimans
Abstract	We propose Axial Transformers, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors. Existing autoregressive models either suffer from excessively large computational resource requirements for high dimensional data, or make compromises in terms of distribution expressiveness or ease of implementation in order to decrease resource requirements. Our architecture, by contrast, maintains both full expressiveness over joint distributions over data and ease of implementation with standard deep learning frameworks, while requiring reasonable memory and computation and achieving state-of-the-art results on standard generative modeling benchmarks. Our models are based on axial attention, a simple generalization of self-attention that naturally aligns with the multiple dimensions of the tensors in both the encoding and the decoding settings. Notably the proposed structure of the layers allows for the vast majority of the context to be computed in parallel during decoding without introducing any independence assumptions. This semi-parallel structure goes a long way to making decoding from even a very large Axial Transformer broadly applicable. We demonstrate state-of-the-art results for the Axial Transformer on the ImageNet-32 and ImageNet-64 image benchmarks as well as on the BAIR Robotic Pushing video benchmark. We open source the implementation of Axial Transformers.
Tasks
Published	2019-12-20
URL	https://arxiv.org/abs/1912.12180v1
PDF	https://arxiv.org/pdf/1912.12180v1.pdf
PWC	https://paperswithcode.com/paper/axial-attention-in-multidimensional-1
Repo
Framework

The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction


Title	The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction
Authors	Phu Mon Htut, Joel Tetreault
Abstract	In recent years, sequence-to-sequence models have been very effective for end-to-end grammatical error correction (GEC). As creating human-annotated parallel corpus for GEC is expensive and time-consuming, there has been work on artificial corpus generation with the aim of creating sentences that contain realistic grammatical errors from grammatically correct sentences. In this paper, we investigate the impact of using recent neural models for generating errors to help neural models to correct errors. We conduct a battery of experiments on the effect of data size, models, and comparison with a rule-based approach.
Tasks	Grammatical Error Correction
Published	2019-07-21
URL	https://arxiv.org/abs/1907.08889v1
PDF	https://arxiv.org/pdf/1907.08889v1.pdf
PWC	https://paperswithcode.com/paper/the-unbearable-weight-of-generating
Repo
Framework

Higher-order Comparisons of Sentence Encoder Representations


Title	Higher-order Comparisons of Sentence Encoder Representations
Authors	Mostafa Abdou, Artur Kulmizev, Felix Hill, Daniel M. Low, Anders Søgaard
Abstract	Representational Similarity Analysis (RSA) is a technique developed by neuroscientists for comparing activity patterns of different measurement modalities (e.g., fMRI, electrophysiology, behavior). As a framework, RSA has several advantages over existing approaches to interpretation of language encoders based on probing or diagnostic classification: namely, it does not require large training samples, is not prone to overfitting, and it enables a more transparent comparison between the representational geometries of different models and modalities. We demonstrate the utility of RSA by establishing a previously unknown correspondence between widely-employed pretrained language encoders and human processing difficulty via eye-tracking data, showcasing its potential in the interpretability toolbox for neural models
Tasks	Eye Tracking
Published	2019-09-01
URL	https://arxiv.org/abs/1909.00303v2
PDF	https://arxiv.org/pdf/1909.00303v2.pdf
PWC	https://paperswithcode.com/paper/higher-order-comparisons-of-sentence-encoder
Repo
Framework

Deep One-bit Compressive Autoencoding


Title	Deep One-bit Compressive Autoencoding
Authors	Shahin Khobahi, Arindam Bose, Mojtaba Soltanalian
Abstract	Parameterized mathematical models play a central role in understanding and design of complex information systems. However, they often cannot take into account the intricate interactions innate to such systems. On the contrary, purely data-driven approaches do not need explicit mathematical models for data generation and have a wider applicability at the cost of interpretability. In this paper, we consider the design of a one-bit compressive autoencoder, and propose a novel hybrid model-based and data-driven methodology that allows us to not only design the sensing matrix for one-bit data acquisition, but also allows for learning the latent-parameters of an iterative optimization algorithm specifically designed for the problem of one-bit sparse signal recovery. Our results demonstrate a significant improvement compared to state-of-the-art model-based algorithms.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.05539v1
PDF	https://arxiv.org/pdf/1912.05539v1.pdf
PWC	https://paperswithcode.com/paper/deep-one-bit-compressive-autoencoding
Repo
Framework

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling


Title	Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling
Authors	Xinyu Peng, Li Li, Fei-Yue Wang
Abstract	Machine learning, especially deep neural networks, has been rapidly developed in fields including computer vision, speech recognition and reinforcement learning. Although Mini-batch SGD is one of the most popular stochastic optimization methods in training deep networks, it shows a slow convergence rate due to the large noise in gradient approximation. In this paper, we attempt to remedy this problem by building more efficient batch selection method based on typicality sampling, which reduces the error of gradient estimation in conventional Minibatch SGD. We analyze the convergence rate of the resulting typical batch SGD algorithm and compare convergence properties between Minibatch SGD and the algorithm. Experimental results demonstrate that our batch selection scheme works well and more complex Minibatch SGD variants can benefit from the proposed batch selection strategy.
Tasks	Speech Recognition, Stochastic Optimization
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04192v1
PDF	http://arxiv.org/pdf/1903.04192v1.pdf
PWC	https://paperswithcode.com/paper/accelerating-minibatch-stochastic-gradient-1
Repo
Framework

A Closer Look at Double Backpropagation


Title	A Closer Look at Double Backpropagation
Authors	Christian Etmann
Abstract	In recent years, an increasing number of neural network models have included derivatives with respect to inputs in their loss functions, resulting in so-called double backpropagation for first-order optimization. However, so far no general description of the involved derivatives exists. Here, we cover a wide array of special cases in a very general Hilbert space framework, which allows us to provide optimized backpropagation rules for many real-world scenarios. This includes the reduction of calculations for Frobenius-norm-penalties on Jacobians by roughly a third for locally linear activation functions. Furthermore, we provide a description of the discontinuous loss surface of ReLU networks both in the inputs and the parameters and demonstrate why the discontinuities do not pose a big problem in reality.
Tasks
Published	2019-06-16
URL	https://arxiv.org/abs/1906.06637v1
PDF	https://arxiv.org/pdf/1906.06637v1.pdf
PWC	https://paperswithcode.com/paper/a-closer-look-at-double-backpropagation
Repo
Framework