May 5, 2019

3153 words 15 mins read

Paper Group ANR 545

Paper Group ANR 545

Stamp processing with examplar features. DSD: Dense-Sparse-Dense Training for Deep Neural Networks. Guided Filter based Edge-preserving Image Non-blind Deconvolution. Asymptotic equivalence of regularization methods in thresholded parameter space. Facial Expression Recognition in the Wild using Rich Deep Features. A robust particle detection algori …

Stamp processing with examplar features

Title Stamp processing with examplar features
Authors Yash Bhalgat, Mandar Kulkarni, Shirish Karande, Sachin Lodha
Abstract Document digitization is becoming increasingly crucial. In this work, we propose a shape based approach for automatic stamp verification/detection in document images using an unsupervised feature learning. Given a small set of training images, our algorithm learns an appropriate shape representation using an unsupervised clustering. Experimental results demonstrate the effectiveness of our framework in challenging scenarios.
Tasks
Published 2016-09-16
URL http://arxiv.org/abs/1609.05001v1
PDF http://arxiv.org/pdf/1609.05001v1.pdf
PWC https://paperswithcode.com/paper/stamp-processing-with-examplar-features
Repo
Framework

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Title DSD: Dense-Sparse-Dense Training for Deep Neural Networks
Authors Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally
Abstract Modern deep neural networks have a large number of parameters, making them very hard to train. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance. In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimportant connections with small weights and retraining the network given the sparsity constraint. In the final D (re-Dense) step, we increase the model capacity by removing the sparsity constraint, re-initialize the pruned parameters from zero and retrain the whole dense network. Experiments show that DSD training can improve the performance for a wide range of CNNs, RNNs and LSTMs on the tasks of image classification, caption generation and speech recognition. On ImageNet, DSD improved the Top1 accuracy of GoogLeNet by 1.1%, VGG-16 by 4.3%, ResNet-18 by 1.2% and ResNet-50 by 1.1%, respectively. On the WSJ’93 dataset, DSD improved DeepSpeech and DeepSpeech2 WER by 2.0% and 1.1%. On the Flickr-8K dataset, DSD improved the NeuralTalk BLEU score by over 1.7. DSD is easy to use in practice: at training time, DSD incurs only one extra hyper-parameter: the sparsity ratio in the S step. At testing time, DSD doesn’t change the network architecture or incur any inference overhead. The consistent and significant performance gain of DSD experiments shows the inadequacy of the current training methods for finding the best local optimum, while DSD effectively achieves superior optimization performance for finding a better solution. DSD models are available to download at https://songhan.github.io/DSD.
Tasks Image Classification, Speech Recognition
Published 2016-07-15
URL http://arxiv.org/abs/1607.04381v2
PDF http://arxiv.org/pdf/1607.04381v2.pdf
PWC https://paperswithcode.com/paper/dsd-dense-sparse-dense-training-for-deep
Repo
Framework

Guided Filter based Edge-preserving Image Non-blind Deconvolution

Title Guided Filter based Edge-preserving Image Non-blind Deconvolution
Authors Hang Yang, Ming Zhu, Zhongbo Zhang, Heyan Huang
Abstract In this work, we propose a new approach for efficient edge-preserving image deconvolution. Our algorithm is based on a novel type of explicit image filter - guided filter. The guided filter can be used as an edge-preserving smoothing operator like the popular bilateral filter, but has better behaviors near edges. We propose an efficient iterative algorithm with the decouple of deblurring and denoising steps in the restoration process. In deblurring step, we proposed two cost function which could be computed with fast Fourier transform efficiently. The solution of the first one is used as the guidance image, and another solution will be filtered in next step. In the denoising step, the guided filter is used with the two obtained images for efficient edge-preserving filtering. Furthermore, we derive a simple and effective method to automatically adjust the regularization parameter at each iteration. We compare our deconvolution algorithm with many competitive deconvolution techniques in terms of ISNR and visual quality.
Tasks Deblurring, Denoising, Image Deconvolution
Published 2016-09-07
URL http://arxiv.org/abs/1609.01839v1
PDF http://arxiv.org/pdf/1609.01839v1.pdf
PWC https://paperswithcode.com/paper/guided-filter-based-edge-preserving-image-non
Repo
Framework

Asymptotic equivalence of regularization methods in thresholded parameter space

Title Asymptotic equivalence of regularization methods in thresholded parameter space
Authors Yingying Fan, Jinchi Lv
Abstract High-dimensional data analysis has motivated a spectrum of regularization methods for variable selection and sparse modeling, with two popular classes of convex ones and concave ones. A long debate has been on whether one class dominates the other, an important question both in theory and to practitioners. In this paper, we characterize the asymptotic equivalence of regularization methods, with general penalty functions, in a thresholded parameter space under the generalized linear model setting, where the dimensionality can grow up to exponentially with the sample size. To assess their performance, we establish the oracle inequalities, as in Bickel, Ritov and Tsybakov (2009), of the global minimizer for these methods under various prediction and variable selection losses. These results reveal an interesting phase transition phenomenon. For polynomially growing dimensionality, the $L_1$-regularization method of Lasso and concave methods are asymptotically equivalent, having the same convergence rates in the oracle inequalities. For exponentially growing dimensionality, concave methods are asymptotically equivalent but have faster convergence rates than the Lasso. We also establish a stronger property of the oracle risk inequalities of the regularization methods, as well as the sampling properties of computable solutions. Our new theoretical results are illustrated and justified by simulation and real data examples.
Tasks
Published 2016-05-11
URL http://arxiv.org/abs/1605.03310v1
PDF http://arxiv.org/pdf/1605.03310v1.pdf
PWC https://paperswithcode.com/paper/asymptotic-equivalence-of-regularization
Repo
Framework

Facial Expression Recognition in the Wild using Rich Deep Features

Title Facial Expression Recognition in the Wild using Rich Deep Features
Authors Abubakrelsedik Karali, Ahmad Bassiouny, Motaz El-Saban
Abstract Facial Expression Recognition is an active area of research in computer vision with a wide range of applications. Several approaches have been developed to solve this problem for different benchmark datasets. However, Facial Expression Recognition in the wild remains an area where much work is still needed to serve real-world applications. To this end, in this paper we present a novel approach towards facial expression recognition. We fuse rich deep features with domain knowledge through encoding discriminant facial patches. We conduct experiments on two of the most popular benchmark datasets; CK and TFE. Moreover, we present a novel dataset that, unlike its precedents, consists of natural - not acted - expression images. Experimental results show that our approach achieves state-of-the-art results over standard benchmarks and our own dataset
Tasks Facial Expression Recognition
Published 2016-01-11
URL http://arxiv.org/abs/1601.02487v1
PDF http://arxiv.org/pdf/1601.02487v1.pdf
PWC https://paperswithcode.com/paper/facial-expression-recognition-in-the-wild
Repo
Framework

A robust particle detection algorithm based on symmetry

Title A robust particle detection algorithm based on symmetry
Authors Alvaro Rodriguez, Hanqing Zhang, Krister Wiklund, Tomas Brodin, Jonatan Klaminder, Patrik Andersson, Magnus Andersson
Abstract Particle tracking is common in many biophysical, ecological, and micro-fluidic applications. Reliable tracking information is heavily dependent on of the system under study and algorithms that correctly determines particle position between images. However, in a real environmental context with the presence of noise including particular or dissolved matter in water, and low and fluctuating light conditions, many algorithms fail to obtain reliable information. We propose a new algorithm, the Circular Symmetry algorithm (C-Sym), for detecting the position of a circular particle with high accuracy and precision in noisy conditions. The algorithm takes advantage of the spatial symmetry of the particle allowing for subpixel accuracy. We compare the proposed algorithm with four different methods using both synthetic and experimental datasets. The results show that C-Sym is the most accurate and precise algorithm when tracking micro-particles in all tested conditions and it has the potential for use in applications including tracking biota in their environment.
Tasks
Published 2016-05-11
URL http://arxiv.org/abs/1605.03328v1
PDF http://arxiv.org/pdf/1605.03328v1.pdf
PWC https://paperswithcode.com/paper/a-robust-particle-detection-algorithm-based
Repo
Framework

Deep Cuboid Detection: Beyond 2D Bounding Boxes

Title Deep Cuboid Detection: Beyond 2D Bounding Boxes
Authors Debidatta Dwibedi, Tomasz Malisiewicz, Vijay Badrinarayanan, Andrew Rabinovich
Abstract We present a Deep Cuboid Detector which takes a consumer-quality RGB image of a cluttered scene and localizes all 3D cuboids (box-like objects). Contrary to classical approaches which fit a 3D model from low-level cues like corners, edges, and vanishing points, we propose an end-to-end deep learning system to detect cuboids across many semantic categories (e.g., ovens, shipping boxes, and furniture). We localize cuboids with a 2D bounding box, and simultaneously localize the cuboid’s corners, effectively producing a 3D interpretation of box-like objects. We refine keypoints by pooling convolutional features iteratively, improving the baseline method significantly. Our deep learning cuboid detector is trained in an end-to-end fashion and is suitable for real-time applications in augmented reality (AR) and robotics.
Tasks
Published 2016-11-30
URL http://arxiv.org/abs/1611.10010v1
PDF http://arxiv.org/pdf/1611.10010v1.pdf
PWC https://paperswithcode.com/paper/deep-cuboid-detection-beyond-2d-bounding
Repo
Framework

Online Learning to Rank with Feedback at the Top

Title Online Learning to Rank with Feedback at the Top
Authors Sougata Chaudhuri, Ambuj Tewari
Abstract We consider an online learning to rank setting in which, at each round, an oblivious adversary generates a list of $m$ documents, pertaining to a query, and the learner produces scores to rank the documents. The adversary then generates a relevance vector and the learner updates its ranker according to the feedback received. We consider the setting where the feedback is restricted to be the relevance levels of only the top $k$ documents in the ranked list for $k \ll m$. However, the performance of learner is judged based on the unrevealed full relevance vectors, using an appropriate learning to rank loss function. We develop efficient algorithms for well known losses in the pointwise, pairwise and listwise families. We also prove that no online algorithm can have sublinear regret, with top-1 feedback, for any loss that is calibrated with respect to NDCG. We apply our algorithms on benchmark datasets demonstrating efficient online learning of a ranking function from highly restricted feedback.
Tasks Learning-To-Rank
Published 2016-03-06
URL http://arxiv.org/abs/1603.01855v1
PDF http://arxiv.org/pdf/1603.01855v1.pdf
PWC https://paperswithcode.com/paper/online-learning-to-rank-with-feedback-at-the
Repo
Framework

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

Title A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality
Authors Kenji Hata, Ranjay Krishna, Li Fei-Fei, Michael S. Bernstein
Abstract Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets. As a result, crowd workers spend weeks or months repeating the exact same tasks, making it necessary to understand their behavior over these long periods of time. We utilize three large, longitudinal datasets of nine million annotations collected from Amazon Mechanical Turk to examine claims that workers fatigue or satisfice over these long periods, producing lower quality work. We find that, contrary to these claims, workers are extremely stable in their quality over the entire period. To understand whether workers set their quality based on the task’s requirements for acceptance, we then perform an experiment where we vary the required quality for a large crowdsourcing task. Workers did not adjust their quality based on the acceptance threshold: workers who were above the threshold continued working at their usual quality level, and workers below the threshold self-selected themselves out of the task. Capitalizing on this consistency, we demonstrate that it is possible to predict workers’ long-term quality using just a glimpse of their quality on the first five tasks.
Tasks
Published 2016-09-15
URL http://arxiv.org/abs/1609.04855v2
PDF http://arxiv.org/pdf/1609.04855v2.pdf
PWC https://paperswithcode.com/paper/a-glimpse-far-into-the-future-understanding
Repo
Framework

A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks

Title A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks
Authors Luke B. Godfrey, Michael S. Gashler
Abstract We present the soft exponential activation function for artificial neural networks that continuously interpolates between logarithmic, linear, and exponential functions. This activation function is simple, differentiable, and parameterized so that it can be trained as the rest of the network is trained. We hypothesize that soft exponential has the potential to improve neural network learning, as it can exactly calculate many natural operations that typical neural networks can only approximate, including addition, multiplication, inner product, distance, polynomials, and sinusoids.
Tasks
Published 2016-02-03
URL http://arxiv.org/abs/1602.01321v1
PDF http://arxiv.org/pdf/1602.01321v1.pdf
PWC https://paperswithcode.com/paper/a-continuum-among-logarithmic-linear-and
Repo
Framework

Fast and Reliable Parameter Estimation from Nonlinear Observations

Title Fast and Reliable Parameter Estimation from Nonlinear Observations
Authors Samet Oymak, Mahdi Soltanolkotabi
Abstract In this paper we study the problem of recovering a structured but unknown parameter ${\bf{\theta}}^$ from $n$ nonlinear observations of the form $y_i=f(\langle {\bf{x}}_i,{\bf{\theta}}^\rangle)$ for $i=1,2,\ldots,n$. We develop a framework for characterizing time-data tradeoffs for a variety of parameter estimation algorithms when the nonlinear function $f$ is unknown. This framework includes many popular heuristics such as projected/proximal gradient descent and stochastic schemes. For example, we show that a projected gradient descent scheme converges at a linear rate to a reliable solution with a near minimal number of samples. We provide a sharp characterization of the convergence rate of such algorithms as a function of sample size, amount of a-prior knowledge available about the parameter and a measure of the nonlinearity of the function $f$. These results provide a precise understanding of the various tradeoffs involved between statistical and computational resources as well as a-prior side information available for such nonlinear parameter estimation problems.
Tasks
Published 2016-10-23
URL http://arxiv.org/abs/1610.07108v1
PDF http://arxiv.org/pdf/1610.07108v1.pdf
PWC https://paperswithcode.com/paper/fast-and-reliable-parameter-estimation-from
Repo
Framework

The Bayesian Linear Information Filtering Problem

Title The Bayesian Linear Information Filtering Problem
Authors Bangrui Chen, Peter I. Frazier
Abstract We present a Bayesian sequential decision-making formulation of the information filtering problem, in which an algorithm presents items (news articles, scientific papers, tweets) arriving in a stream, and learns relevance from user feedback on presented items. We model user preferences using a Bayesian linear model, similar in spirit to a Bayesian linear bandit. We compute a computational upper bound on the value of the optimal policy, which allows computing an optimality gap for implementable policies. We then use this analysis as motivation in introducing a pair of new Decompose-Then-Decide (DTD) heuristic policies, DTD-Dynamic-Programming (DTD-DP) and DTD-Upper-Confidence-Bound (DTD-UCB). We compare DTD-DP and DTD-UCB against several benchmarks on real and simulated data, demonstrating significant improvement, and show that the achieved performance is close to the upper bound.
Tasks Decision Making
Published 2016-05-30
URL http://arxiv.org/abs/1605.09088v2
PDF http://arxiv.org/pdf/1605.09088v2.pdf
PWC https://paperswithcode.com/paper/the-bayesian-linear-information-filtering
Repo
Framework

Orientation Driven Bag of Appearances for Person Re-identification

Title Orientation Driven Bag of Appearances for Person Re-identification
Authors Liqian Ma, Hong Liu, Liang Hu, Can Wang, Qianru Sun
Abstract Person re-identification (re-id) consists of associating individual across camera network, which is valuable for intelligent video surveillance and has drawn wide attention. Although person re-identification research is making progress, it still faces some challenges such as varying poses, illumination and viewpoints. For feature representation in re-identification, existing works usually use low-level descriptors which do not take full advantage of body structure information, resulting in low representation ability. %discrimination. To solve this problem, this paper proposes the mid-level body-structure based feature representation (BSFR) which introduces body structure pyramid for codebook learning and feature pooling in the vertical direction of human body. Besides, varying viewpoints in the horizontal direction of human body usually causes the data missing problem, $i.e.$, the appearances obtained in different orientations of the identical person could vary significantly. To address this problem, the orientation driven bag of appearances (ODBoA) is proposed to utilize person orientation information extracted by orientation estimation technic. To properly evaluate the proposed approach, we introduce a new re-identification dataset (Market-1203) based on the Market-1501 dataset and propose a new re-identification dataset (PKU-Reid). Both datasets contain multiple images captured in different body orientations for each person. Experimental results on three public datasets and two proposed datasets demonstrate the superiority of the proposed approach, indicating the effectiveness of body structure and orientation information for improving re-identification performance.
Tasks Person Re-Identification
Published 2016-05-09
URL http://arxiv.org/abs/1605.02464v1
PDF http://arxiv.org/pdf/1605.02464v1.pdf
PWC https://paperswithcode.com/paper/orientation-driven-bag-of-appearances-for
Repo
Framework

Errors-in-variables models with dependent measurements

Title Errors-in-variables models with dependent measurements
Authors Mark Rudelson, Shuheng Zhou
Abstract Suppose that we observe $y \in \mathbb{R}^n$ and $X \in \mathbb{R}^{n \times m}$ in the following errors-in-variables model: \begin{eqnarray*} y & = & X_0 \beta^* +\epsilon \ X & = & X_0 + W, \end{eqnarray*} where $X_0$ is an $n \times m$ design matrix with independent subgaussian row vectors, $\epsilon \in \mathbb{R}^n$ is a noise vector and $W$ is a mean zero $n \times m$ random noise matrix with independent subgaussian column vectors, independent of $X_0$ and $\epsilon$. This model is significantly different from those analyzed in the literature in the sense that we allow the measurement error for each covariate to be a dependent vector across its $n$ observations. Such error structures appear in the science literature when modeling the trial-to-trial fluctuations in response strength shared across a set of neurons. Under sparsity and restrictive eigenvalue type of conditions, we show that one is able to recover a sparse vector $\beta^* \in \mathbb{R}^m$ from the model given a single observation matrix $X$ and the response vector $y$. We establish consistency in estimating $\beta^*$ and obtain the rates of convergence in the $\ell_q$ norm, where $q = 1, 2$. We show error bounds which approach that of the regular Lasso and the Dantzig selector in case the errors in $W$ are tending to 0. We analyze the convergence rates of the gradient descent methods for solving the nonconvex programs and show that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to a neighborhood of the global minimizers: the size of the neighborhood is bounded by the statistical error in the $\ell_2$ norm. Our analysis reveals interesting connections between computational and statistical efficiency and the concentration of measure phenomenon in random matrix theory. We provide simulation evidence illuminating the theoretical predictions.
Tasks
Published 2016-11-15
URL http://arxiv.org/abs/1611.04701v2
PDF http://arxiv.org/pdf/1611.04701v2.pdf
PWC https://paperswithcode.com/paper/errors-in-variables-models-with-dependent
Repo
Framework

Recognition of Text Image Using Multilayer Perceptron

Title Recognition of Text Image Using Multilayer Perceptron
Authors Singh Vijendra, Nisha Vasudeva, Hem Jyotsana Parashar
Abstract The biggest challenge in the field of image processing is to recognize documents both in printed and handwritten format. Optical Character Recognition OCR is a type of document image analysis where scanned digital image that contains either machine printed or handwritten script input into an OCR software engine and translating it into an editable machine readable digital text format. A Neural network is designed to model the way in which the brain performs a particular task or function of interest: The neural network is simulated in software on a digital computer. Character Recognition refers to the process of converting printed Text documents into translated Unicode Text. The printed documents available in the form of books, papers, magazines, etc. are scanned using standard scanners which produce an image of the scanned document. Lines are identifying by an algorithm where we identify top and bottom of line. Then in each line character boundaries are calculated by an algorithm then using these calculation, characters is isolated from the image and then we classify each character by basic back propagation. Each image character is comprised of 30*20 pixels. We have used the Back propagation Neural Network for efficient recognition where the errors were corrected through back propagation and rectified neuron values were transmitted by feed-forward method in the neural network of multiple layers.
Tasks Optical Character Recognition
Published 2016-12-02
URL http://arxiv.org/abs/1612.00625v1
PDF http://arxiv.org/pdf/1612.00625v1.pdf
PWC https://paperswithcode.com/paper/recognition-of-text-image-using-multilayer
Repo
Framework
comments powered by Disqus