Paper Group ANR 1580
Predictive Ensemble Learning with Application to Scene Text Detection. A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification. Optimal Algorithms for Ski Rental with Soft Machine-Learned Predictions. Skeleton-Based Online Action Prediction Using Scale Selection Network. Scene Text Synthesis for Efficient and …
Predictive Ensemble Learning with Application to Scene Text Detection
Title | Predictive Ensemble Learning with Application to Scene Text Detection |
Authors | Danlu Chen, Xu-Yao Zhang, Wei Zhang, Yao Lu, Xiuli Li, Tao Mei |
Abstract | Deep learning based approaches have achieved significant progresses in different tasks like classification, detection, segmentation, and so on. Ensemble learning is widely known to further improve performance by combining multiple complementary models. It is easy to apply ensemble learning for classification tasks, for example, based on averaging, voting, or other methods. However, for other tasks (like object detection) where the outputs are varying in quantity and unable to be simply compared, the ensemble of multiple models become difficult. In this paper, we propose a new method called Predictive Ensemble Learning (PEL), based on powerful predictive ability of deep neural networks, to directly predict the best performing model among a pool of base models for each test example, thus transforming ensemble learning to a traditional classification task. Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression. Experimental results show the possibility and potential of PEL in predicting different models’ performance based only on a query example, which can be extended for ensemble learning in many other complex tasks. |
Tasks | Object Detection, Scene Text Detection |
Published | 2019-05-12 |
URL | https://arxiv.org/abs/1905.04641v2 |
https://arxiv.org/pdf/1905.04641v2.pdf | |
PWC | https://paperswithcode.com/paper/predictive-ensemble-learning-with-application |
Repo | |
Framework | |
A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification
Title | A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification |
Authors | Farid Kenarangi, Inna Partin-Vaisband |
Abstract | Mixed-signal machine-learning classification has recently been demonstrated as an efficient alternative for classification with power expensive digital circuits. In this paper, a high-COnfidence high-REsolution (CORE) mixed-signal classifier is proposed for classifying high-dimensional input data into multi-class output space with less power and area than state-of-the-art classifiers. A high-resolution multiplication is facilitated within a single-MOSFET by feeding the features and feature weights into, respectively, the body and gate inputs. High-resolution classifier that considers the confidence of the individual predictors is designed at 45 nm technology node and operates at 100 MHz in subthreshold region. To evaluate the performance of the classifier, a reduced MNIST dataset is generated by downsampling the MNIST digit images from 28 $\times$ 28 features to 9 $\times$ 9 features. The system is simulated across a wide range of PVT variations, exhibiting nominal accuracy of 90%, energy consumption of 6.2 pJ per classification (over 45 times lower than state-of-the-art classifiers), area of 2,179 $\mu$$m^{2}$ (over 7.3 times lower than state-of-the-art classifiers), and a stable response under PVT variations. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09597v1 |
https://arxiv.org/pdf/1910.09597v1.pdf | |
PWC | https://paperswithcode.com/paper/a-single-mosfet-mac-for-confidence-and |
Repo | |
Framework | |
Optimal Algorithms for Ski Rental with Soft Machine-Learned Predictions
Title | Optimal Algorithms for Ski Rental with Soft Machine-Learned Predictions |
Authors | Rohan Kodialam |
Abstract | We consider a variant of the classic Ski Rental online algorithm with applications to machine learning. In our variant, we allow the skier access to a black-box machine-learning algorithm that provides an estimate of the probability that there will be at most a threshold number of ski-days. We derive a class of optimal randomized algorithms to determine the strategy that minimizes the worst-case expected competitive ratio for the skier given a prediction from the machine learning algorithm,and analyze the performance and robustness of these algorithms. |
Tasks | |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1903.00092v2 |
http://arxiv.org/pdf/1903.00092v2.pdf | |
PWC | https://paperswithcode.com/paper/optimal-algorithms-for-ski-rental-with-soft |
Repo | |
Framework | |
Skeleton-Based Online Action Prediction Using Scale Selection Network
Title | Skeleton-Based Online Action Prediction Using Scale Selection Network |
Authors | Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, Alex C. Kot |
Abstract | Action prediction is to recognize the class label of an ongoing activity when only a part of it is observed. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis. Since there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed to make our network focus on the performed part of the ongoing action and try to suppress the possible incoming interference from the previous actions at each step. An activation sharing scheme is also proposed to handle the overlapping computations among the adjacent time steps, which enables our framework to run more efficiently. Moreover, to enhance the performance of our framework for action prediction with the skeletal input data, a hierarchy of dilated tree convolutions are also designed to learn the multi-level structured semantic representations over the skeleton joints at each frame. Our proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of our method for skeleton-based online action prediction. |
Tasks | Skeleton Based Action Recognition |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03084v2 |
http://arxiv.org/pdf/1902.03084v2.pdf | |
PWC | https://paperswithcode.com/paper/skeleton-based-online-action-prediction-using |
Repo | |
Framework | |
Scene Text Synthesis for Efficient and Effective Deep Network Training
Title | Scene Text Synthesis for Efficient and Effective Deep Network Training |
Authors | Fangneng Zhan, Hongyuan Zhu, Shijian Lu |
Abstract | A large amount of annotated training images is critical for training accurate and robust deep network models but the collection of a large amount of annotated training images is often time-consuming and costly. Image synthesis alleviates this constraint by generating annotated training images automatically by machines which has attracted increasing interest in the recent deep learning research. We develop an innovative image synthesis technique that composes annotated training images by realistically embedding foreground objects of interest (OOI) into background images. The proposed technique consists of two key components that in principle boost the usefulness of the synthesized images in deep network training. The first is context-aware semantic coherence which ensures that the OOI are placed around semantically coherent regions within the background image. The second is harmonious appearance adaptation which ensures that the embedded OOI are agreeable to the surrounding background from both geometry alignment and appearance realism. The proposed technique has been evaluated over two related but very different computer vision challenges, namely, scene text detection and scene text recognition. Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique - the use of our synthesized images in deep network training is capable of achieving similar or even better scene text detection and scene text recognition performance as compared with using real images. |
Tasks | Image Generation, Scene Text Detection, Scene Text Recognition |
Published | 2019-01-26 |
URL | http://arxiv.org/abs/1901.09193v1 |
http://arxiv.org/pdf/1901.09193v1.pdf | |
PWC | https://paperswithcode.com/paper/scene-text-synthesis-for-efficient-and |
Repo | |
Framework | |
Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures
Title | Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures |
Authors | Dylan Cashman, Adam Perer, Remco Chang, Hendrik Strobelt |
Abstract | Deep learning models require the configuration of many layers and parameters in order to get good results. However, there are currently few systematic guidelines for how to configure a successful model. This means model builders often have to experiment with different configurations by manually programming different architectures (which is tedious and time consuming) or rely on purely automated approaches to generate and train the architectures (which is expensive). In this paper, we present Rapid Exploration of Model Architectures and Parameters, or REMAP, a visual analytics tool that allows a model builder to discover a deep learning model quickly via exploration and rapid experimentation of neural network architectures. In REMAP, the user explores the large and complex parameter space for neural network architectures using a combination of global inspection and local experimentation. Through a visual overview of a set of models, the user identifies interesting clusters of architectures. Based on their findings, the user can run ablation and variation experiments to identify the effects of adding, removing, or replacing layers in a given architecture and generate new models accordingly. They can also handcraft new models using a simple graphical interface. As a result, a model builder can build deep learning models quickly, efficiently, and without manual programming. We inform the design of REMAP through a design study with four deep learning model builders. Through a use case, we demonstrate that REMAP allows users to discover performant neural network architectures efficiently using visual exploration and user-defined semi-automated searches through the model space. |
Tasks | |
Published | 2019-07-30 |
URL | https://arxiv.org/abs/1908.00387v1 |
https://arxiv.org/pdf/1908.00387v1.pdf | |
PWC | https://paperswithcode.com/paper/ablate-variate-and-contemplate-visual |
Repo | |
Framework | |
Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations
Title | Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations |
Authors | Sven Gowal, Chongli Qin, Po-Sen Huang, Taylan Cemgil, Krishnamurthy Dvijotham, Timothy Mann, Pushmeet Kohli |
Abstract | Recent research has made the surprising finding that state-of-the-art deep learning models sometimes fail to generalize to small variations of the input. Adversarial training has been shown to be an effective approach to overcome this problem. However, its application has been limited to enforcing invariance to analytically defined transformations like $\ell_p$-norm bounded perturbations. Such perturbations do not necessarily cover plausible real-world variations that preserve the semantics of the input (such as a change in lighting conditions). In this paper, we propose a novel approach to express and formalize robustness to these kinds of real-world transformations of the input. The two key ideas underlying our formulation are (1) leveraging disentangled representations of the input to define different factors of variations, and (2) generating new input images by adversarially composing the representations of different images. We use a StyleGAN model to demonstrate the efficacy of this framework. Specifically, we leverage the disentangled latent representations computed by a StyleGAN model to generate perturbations of an image that are similar to real-world variations (like adding make-up, or changing the skin-tone of a person) and train models to be invariant to these perturbations. Extensive experiments show that our method improves generalization and reduces the effect of spurious correlations (reducing the error rate of a “smile” detector by 21% for example). |
Tasks | |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.03192v2 |
https://arxiv.org/pdf/1912.03192v2.pdf | |
PWC | https://paperswithcode.com/paper/achieving-robustness-in-the-wild-via |
Repo | |
Framework | |
Calibration of fisheye camera using entrance pupil
Title | Calibration of fisheye camera using entrance pupil |
Authors | Peter Fasogbon, Emre Aksu |
Abstract | Most conventional camera calibration algorithms assume that the imaging device has a Single Viewpoint (SVP). This is not necessarily true for special imaging device such as fisheye lenses. As a consequence, the intrinsic camera calibration result is not always reliable. In this paper, we propose a new formation model that tends to relax this assumption so that a Non-Single Viewpoint (NSVP) system is corrected to always maintain a SVP, by taking into account the variation of the Entrance Pupil (EP) using thin lens modeling. In addition, we present a calibration procedure for the image formation to estimate these EP parameters using non linear optimization procedure with bundle adjustment. From experiments, we are able to obtain slightly better re-projection error than traditional methods, and the camera parameters are better estimated. The proposed calibration procedure is simple and can easily be integrated to any other thin lens image formation model. |
Tasks | Calibration |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.01759v1 |
https://arxiv.org/pdf/1907.01759v1.pdf | |
PWC | https://paperswithcode.com/paper/calibration-of-fisheye-camera-using-entrance |
Repo | |
Framework | |
Learning protein conformational space by enforcing physics with convolutions and latent interpolations
Title | Learning protein conformational space by enforcing physics with convolutions and latent interpolations |
Authors | Venkata K. Ramaswamy, Chris G. Willcocks, Matteo T. Degiacomi |
Abstract | Determining the different conformational states of a protein and the transition paths between them is key to fully understanding the relationship between biomolecular structure and function. This can be accomplished by sampling protein conformational space with molecular simulation methodologies. Despite advances in computing hardware and sampling techniques, simulations always yield a discretized representation of this space, with transition states undersampled proportionally to their associated energy barrier. We present a convolutional neural network that learns a continuous conformational space representation from example structures, and loss functions that ensure intermediates between examples are physically plausible. We show that this network, trained with simulations of distinct protein states, can correctly predict a biologically relevant non-linear transition path, without any example on the path provided. We also show we can transfer features learnt from one protein to others, which results in superior performances, and requires a surprisingly small number of training examples. |
Tasks | Transfer Learning |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04543v2 |
https://arxiv.org/pdf/1910.04543v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-protein-conformational-space-by |
Repo | |
Framework | |
Axial Attention in Multidimensional Transformers
Title | Axial Attention in Multidimensional Transformers |
Authors | Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, Tim Salimans |
Abstract | We propose Axial Transformers, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors. Existing autoregressive models either suffer from excessively large computational resource requirements for high dimensional data, or make compromises in terms of distribution expressiveness or ease of implementation in order to decrease resource requirements. Our architecture, by contrast, maintains both full expressiveness over joint distributions over data and ease of implementation with standard deep learning frameworks, while requiring reasonable memory and computation and achieving state-of-the-art results on standard generative modeling benchmarks. Our models are based on axial attention, a simple generalization of self-attention that naturally aligns with the multiple dimensions of the tensors in both the encoding and the decoding settings. Notably the proposed structure of the layers allows for the vast majority of the context to be computed in parallel during decoding without introducing any independence assumptions. This semi-parallel structure goes a long way to making decoding from even a very large Axial Transformer broadly applicable. We demonstrate state-of-the-art results for the Axial Transformer on the ImageNet-32 and ImageNet-64 image benchmarks as well as on the BAIR Robotic Pushing video benchmark. We open source the implementation of Axial Transformers. |
Tasks | |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.12180v1 |
https://arxiv.org/pdf/1912.12180v1.pdf | |
PWC | https://paperswithcode.com/paper/axial-attention-in-multidimensional-1 |
Repo | |
Framework | |
The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction
Title | The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction |
Authors | Phu Mon Htut, Joel Tetreault |
Abstract | In recent years, sequence-to-sequence models have been very effective for end-to-end grammatical error correction (GEC). As creating human-annotated parallel corpus for GEC is expensive and time-consuming, there has been work on artificial corpus generation with the aim of creating sentences that contain realistic grammatical errors from grammatically correct sentences. In this paper, we investigate the impact of using recent neural models for generating errors to help neural models to correct errors. We conduct a battery of experiments on the effect of data size, models, and comparison with a rule-based approach. |
Tasks | Grammatical Error Correction |
Published | 2019-07-21 |
URL | https://arxiv.org/abs/1907.08889v1 |
https://arxiv.org/pdf/1907.08889v1.pdf | |
PWC | https://paperswithcode.com/paper/the-unbearable-weight-of-generating |
Repo | |
Framework | |
Higher-order Comparisons of Sentence Encoder Representations
Title | Higher-order Comparisons of Sentence Encoder Representations |
Authors | Mostafa Abdou, Artur Kulmizev, Felix Hill, Daniel M. Low, Anders Søgaard |
Abstract | Representational Similarity Analysis (RSA) is a technique developed by neuroscientists for comparing activity patterns of different measurement modalities (e.g., fMRI, electrophysiology, behavior). As a framework, RSA has several advantages over existing approaches to interpretation of language encoders based on probing or diagnostic classification: namely, it does not require large training samples, is not prone to overfitting, and it enables a more transparent comparison between the representational geometries of different models and modalities. We demonstrate the utility of RSA by establishing a previously unknown correspondence between widely-employed pretrained language encoders and human processing difficulty via eye-tracking data, showcasing its potential in the interpretability toolbox for neural models |
Tasks | Eye Tracking |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.00303v2 |
https://arxiv.org/pdf/1909.00303v2.pdf | |
PWC | https://paperswithcode.com/paper/higher-order-comparisons-of-sentence-encoder |
Repo | |
Framework | |
Deep One-bit Compressive Autoencoding
Title | Deep One-bit Compressive Autoencoding |
Authors | Shahin Khobahi, Arindam Bose, Mojtaba Soltanalian |
Abstract | Parameterized mathematical models play a central role in understanding and design of complex information systems. However, they often cannot take into account the intricate interactions innate to such systems. On the contrary, purely data-driven approaches do not need explicit mathematical models for data generation and have a wider applicability at the cost of interpretability. In this paper, we consider the design of a one-bit compressive autoencoder, and propose a novel hybrid model-based and data-driven methodology that allows us to not only design the sensing matrix for one-bit data acquisition, but also allows for learning the latent-parameters of an iterative optimization algorithm specifically designed for the problem of one-bit sparse signal recovery. Our results demonstrate a significant improvement compared to state-of-the-art model-based algorithms. |
Tasks | |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.05539v1 |
https://arxiv.org/pdf/1912.05539v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-one-bit-compressive-autoencoding |
Repo | |
Framework | |
Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling
Title | Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling |
Authors | Xinyu Peng, Li Li, Fei-Yue Wang |
Abstract | Machine learning, especially deep neural networks, has been rapidly developed in fields including computer vision, speech recognition and reinforcement learning. Although Mini-batch SGD is one of the most popular stochastic optimization methods in training deep networks, it shows a slow convergence rate due to the large noise in gradient approximation. In this paper, we attempt to remedy this problem by building more efficient batch selection method based on typicality sampling, which reduces the error of gradient estimation in conventional Minibatch SGD. We analyze the convergence rate of the resulting typical batch SGD algorithm and compare convergence properties between Minibatch SGD and the algorithm. Experimental results demonstrate that our batch selection scheme works well and more complex Minibatch SGD variants can benefit from the proposed batch selection strategy. |
Tasks | Speech Recognition, Stochastic Optimization |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04192v1 |
http://arxiv.org/pdf/1903.04192v1.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-minibatch-stochastic-gradient-1 |
Repo | |
Framework | |
A Closer Look at Double Backpropagation
Title | A Closer Look at Double Backpropagation |
Authors | Christian Etmann |
Abstract | In recent years, an increasing number of neural network models have included derivatives with respect to inputs in their loss functions, resulting in so-called double backpropagation for first-order optimization. However, so far no general description of the involved derivatives exists. Here, we cover a wide array of special cases in a very general Hilbert space framework, which allows us to provide optimized backpropagation rules for many real-world scenarios. This includes the reduction of calculations for Frobenius-norm-penalties on Jacobians by roughly a third for locally linear activation functions. Furthermore, we provide a description of the discontinuous loss surface of ReLU networks both in the inputs and the parameters and demonstrate why the discontinuities do not pose a big problem in reality. |
Tasks | |
Published | 2019-06-16 |
URL | https://arxiv.org/abs/1906.06637v1 |
https://arxiv.org/pdf/1906.06637v1.pdf | |
PWC | https://paperswithcode.com/paper/a-closer-look-at-double-backpropagation |
Repo | |
Framework | |