July 27, 2019

2958 words 14 mins read

Paper Group ANR 521

Paper Group ANR 521

Enhanced Characterness for Text Detection in the Wild. Blind image deblurring using class-adapted image priors. Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks. Parle: parallelizing stochastic gradient descent. Comparison of Global Algorithms in Word Sense Disambiguation. Playing for Benchmarks. The IIT Bombay …

Enhanced Characterness for Text Detection in the Wild

Title Enhanced Characterness for Text Detection in the Wild
Authors Aarushi Agrawal, Prerana Mukherjee, Siddharth Srivastava, Brejesh Lall
Abstract Text spotting is an interesting research problem as text may appear at any random place and may occur in various forms. Moreover, ability to detect text opens the horizons for improving many advanced computer vision problems. In this paper, we propose a novel language agnostic text detection method utilizing edge enhanced Maximally Stable Extremal Regions in natural scenes by defining strong characterness measures. We show that a simple combination of characterness cues help in rejecting the non text regions. These regions are further fine-tuned for rejecting the non-textual neighbor regions. Comprehensive evaluation of the proposed scheme shows that it provides comparative to better generalization performance to the traditional methods for this task.
Tasks Text Spotting
Published 2017-12-04
URL http://arxiv.org/abs/1712.04927v1
PDF http://arxiv.org/pdf/1712.04927v1.pdf
PWC https://paperswithcode.com/paper/enhanced-characterness-for-text-detection-in
Repo
Framework

Blind image deblurring using class-adapted image priors

Title Blind image deblurring using class-adapted image priors
Authors Marina Ljubenović, Mário A. T. Figueiredo
Abstract Blind image deblurring (BID) is an ill-posed inverse problem, usually addressed by imposing prior knowledge on the (unknown) image and on the blurring filter. Most of the work on BID has focused on natural images, using image priors based on statistical properties of generic natural images. However, in many applications, it is known that the image being recovered belongs to some specific class (e.g., text, face, fingerprints), and exploiting this knowledge allows obtaining more accurate priors. In this work, we propose a method where a Gaussian mixture model (GMM) is used to learn a class-adapted prior, by training on a dataset of clean images of that class. Experiments show the competitiveness of the proposed method in terms of restoration quality when dealing with images containing text, faces, or fingerprints. Additionally, experiments show that the proposed method is able to handle text images at high noise levels, outperforming state-of-the-art methods specifically designed for BID of text images.
Tasks Blind Image Deblurring, Deblurring
Published 2017-09-06
URL http://arxiv.org/abs/1709.01710v1
PDF http://arxiv.org/pdf/1709.01710v1.pdf
PWC https://paperswithcode.com/paper/blind-image-deblurring-using-class-adapted
Repo
Framework

Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks

Title Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks
Authors Arash Vahdat
Abstract Collecting large training datasets, annotated with high-quality labels, is costly and time-consuming. This paper proposes a novel framework for training deep convolutional neural networks from noisy labeled datasets that can be obtained cheaply. The problem is formulated using an undirected graphical model that represents the relationship between noisy and clean labels, trained in a semi-supervised setting. In our formulation, the inference over latent clean labels is tractable and is regularized during training using auxiliary sources of information. The proposed model is applied to the image labeling problem and is shown to be effective in labeling unseen images as well as reducing label noise in training on CIFAR-10 and MS COCO datasets.
Tasks
Published 2017-05-31
URL http://arxiv.org/abs/1706.00038v2
PDF http://arxiv.org/pdf/1706.00038v2.pdf
PWC https://paperswithcode.com/paper/toward-robustness-against-label-noise-in
Repo
Framework

Parle: parallelizing stochastic gradient descent

Title Parle: parallelizing stochastic gradient descent
Authors Pratik Chaudhari, Carlo Baldassi, Riccardo Zecchina, Stefano Soatto, Ameet Talwalkar, Adam Oberman
Abstract We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4x faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We exploit the phenomenon of flat minima that has been shown to lead to improved generalization error for deep networks. Parle requires very infrequent communication with the parameter server and instead performs more computation on each client, which makes it well-suited to both single-machine, multi-GPU settings and distributed implementations.
Tasks
Published 2017-07-03
URL http://arxiv.org/abs/1707.00424v2
PDF http://arxiv.org/pdf/1707.00424v2.pdf
PWC https://paperswithcode.com/paper/parle-parallelizing-stochastic-gradient
Repo
Framework

Comparison of Global Algorithms in Word Sense Disambiguation

Title Comparison of Global Algorithms in Word Sense Disambiguation
Authors Loïc Vial, Andon Tchechmedjiev, Didier Schwab
Abstract This article compares four probabilistic algorithms (global algorithms) for Word Sense Disambiguation (WSD) in terms of the number of scorer calls (local algo- rithm) and the F1 score as determined by a gold-standard scorer. Two algorithms come from the state of the art, a Simulated Annealing Algorithm (SAA) and a Genetic Algorithm (GA) as well as two algorithms that we first adapt from WSD that are state of the art probabilistic search algorithms, namely a Cuckoo search algorithm (CSA) and a Bat Search algorithm (BS). As WSD requires to evaluate exponentially many word sense combinations (with branching factors of up to 6 or more), probabilistic algorithms allow to find approximate solution in a tractable time by sampling the search space. We find that CSA, GA and SA all eventually converge to similar results (0.98 F1 score), but CSA gets there faster (in fewer scorer calls) and reaches up to 0.95 F1 before SA in fewer scorer calls. In BA a strict convergence criterion prevents it from reaching above 0.89 F1.
Tasks Word Sense Disambiguation
Published 2017-04-07
URL http://arxiv.org/abs/1704.02293v1
PDF http://arxiv.org/pdf/1704.02293v1.pdf
PWC https://paperswithcode.com/paper/comparison-of-global-algorithms-in-word-sense
Repo
Framework

Playing for Benchmarks

Title Playing for Benchmarks
Authors Stephan R. Richter, Zeeshan Hayder, Vladlen Koltun
Abstract We present a benchmark suite for visual perception. The benchmark is based on more than 250K high-resolution video frames, all annotated with ground-truth data for both low-level and high-level vision tasks, including optical flow, semantic instance segmentation, object detection and tracking, object-level 3D scene layout, and visual odometry. Ground-truth data for all tasks is available for every frame. The data was collected while driving, riding, and walking a total of 184 kilometers in diverse ambient conditions in a realistic virtual world. To create the benchmark, we have developed a new approach to collecting ground-truth data from simulated worlds without access to their source code or content. We conduct statistical analyses that show that the composition of the scenes in the benchmark closely matches the composition of corresponding physical environments. The realism of the collected data is further validated via perceptual experiments. We analyze the performance of state-of-the-art methods for multiple tasks, providing reference baselines and highlighting challenges for future research. The supplementary video can be viewed at https://youtu.be/T9OybWv923Y
Tasks Instance Segmentation, Object Detection, Optical Flow Estimation, Semantic Segmentation, Visual Odometry
Published 2017-09-21
URL http://arxiv.org/abs/1709.07322v1
PDF http://arxiv.org/pdf/1709.07322v1.pdf
PWC https://paperswithcode.com/paper/playing-for-benchmarks
Repo
Framework

The IIT Bombay English-Hindi Parallel Corpus

Title The IIT Bombay English-Hindi Parallel Corpus
Authors Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya
Abstract We present the IIT Bombay English-Hindi Parallel Corpus. The corpus is a compilation of parallel corpora previously available in the public domain as well as new parallel corpora we collected. The corpus contains 1.49 million parallel segments, of which 694k segments were not previously available in the public domain. The corpus has been pre-processed for machine translation, and we report baseline phrase-based SMT and NMT translation results on this corpus. This corpus has been used in two editions of shared tasks at the Workshop on Asian Language Translation (2016 and 2017). The corpus is freely available for non-commercial research. To the best of our knowledge, this is the largest publicly available English-Hindi parallel corpus.
Tasks Machine Translation
Published 2017-10-08
URL http://arxiv.org/abs/1710.02855v2
PDF http://arxiv.org/pdf/1710.02855v2.pdf
PWC https://paperswithcode.com/paper/the-iit-bombay-english-hindi-parallel-corpus
Repo
Framework

Learning a Predictive Model for Music Using PULSE

Title Learning a Predictive Model for Music Using PULSE
Authors Jonas Langhabel
Abstract Predictive models for music are studied by researchers of algorithmic composition, the cognitive sciences and machine learning. They serve as base models for composition, can simulate human prediction and provide a multidisciplinary application domain for learning algorithms. A particularly well established and constantly advanced subtask is the prediction of monophonic melodies. As melodies typically involve non-Markovian dependencies their prediction requires a capable learning algorithm. In this thesis, I apply the recent feature discovery and learning method PULSE to the realm of symbolic music modeling. PULSE is comprised of a feature generating operation and L1-regularized optimization. These are used to iteratively expand and cull the feature set, effectively exploring feature spaces that are too large for common feature selection approaches. I design a general Python framework for PULSE, propose task-optimized feature generating operations and various music-theoretically motivated features that are evaluated on a standard corpus of monophonic folk and chorale melodies. The proposed method significantly outperforms comparable state-of-the-art models. I further discuss the free parameters of the learning algorithm and analyze the feature composition of the learned models. The models learned by PULSE afford an easy inspection and are musicologically interpreted for the first time.
Tasks Feature Selection, Music Modeling
Published 2017-09-26
URL http://arxiv.org/abs/1709.08842v1
PDF http://arxiv.org/pdf/1709.08842v1.pdf
PWC https://paperswithcode.com/paper/learning-a-predictive-model-for-music-using
Repo
Framework

Texture and Structure Incorporated ScatterNet Hybrid Deep Learning Network (TS-SHDL) For Brain Matter Segmentation

Title Texture and Structure Incorporated ScatterNet Hybrid Deep Learning Network (TS-SHDL) For Brain Matter Segmentation
Authors Amarjot Singh, Devamanyu Hazarika, Aniruddha Bhattacharya
Abstract Automation of brain matter segmentation from MR images is a challenging task due to the irregular boundaries between the grey and white matter regions. In addition, the presence of intensity inhomogeneity in the MR images further complicates the problem. In this paper, we propose a texture and vesselness incorporated version of the ScatterNet Hybrid Deep Learning Network (TS-SHDL) that extracts hierarchical invariant mid-level features, used by fisher vector encoding and a conditional random field (CRF) to perform the desired segmentation. The performance of the proposed network is evaluated by extensive experimentation and comparison with the state-of-the-art methods on several 2D MRI scans taken from the synthetic McGill Brain Web as well as on the MRBrainS dataset of real 3D MRI scans. The advantages of the TS-SHDL network over supervised deep learning networks is also presented in addition to its superior performance over the state-of-the-art.
Tasks
Published 2017-08-30
URL http://arxiv.org/abs/1708.09300v1
PDF http://arxiv.org/pdf/1708.09300v1.pdf
PWC https://paperswithcode.com/paper/texture-and-structure-incorporated-scatternet
Repo
Framework

Socially Compliant Navigation through Raw Depth Inputs with Generative Adversarial Imitation Learning

Title Socially Compliant Navigation through Raw Depth Inputs with Generative Adversarial Imitation Learning
Authors Lei Tai, Jingwei Zhang, Ming Liu, Wolfram Burgard
Abstract We present an approach for mobile robots to learn to navigate in dynamic environments with pedestrians via raw depth inputs, in a socially compliant manner. To achieve this, we adopt a generative adversarial imitation learning (GAIL) strategy, which improves upon a pre-trained behavior cloning policy. Our approach overcomes the disadvantages of previous methods, as they heavily depend on the full knowledge of the location and velocity information of nearby pedestrians, which not only requires specific sensors, but also the extraction of such state information from raw sensory input could consume much computation time. In this paper, our proposed GAIL-based model performs directly on raw depth inputs and plans in real-time. Experiments show that our GAIL-based approach greatly improves the safety and efficiency of the behavior of mobile robots from pure behavior cloning. The real-world deployment also shows that our method is capable of guiding autonomous vehicles to navigate in a socially compliant manner directly through raw depth inputs. In addition, we release a simulation plugin for modeling pedestrian behaviors based on the social force model.
Tasks Autonomous Vehicles, Imitation Learning
Published 2017-10-06
URL http://arxiv.org/abs/1710.02543v2
PDF http://arxiv.org/pdf/1710.02543v2.pdf
PWC https://paperswithcode.com/paper/socially-compliant-navigation-through-raw
Repo
Framework

Robust Localized Multi-view Subspace Clustering

Title Robust Localized Multi-view Subspace Clustering
Authors Yanbo Fan, Jian Liang, Ran He, Bao-Gang Hu, Siwei Lyu
Abstract In multi-view clustering, different views may have different confidence levels when learning a consensus representation. Existing methods usually address this by assigning distinctive weights to different views. However, due to noisy nature of real-world applications, the confidence levels of samples in the same view may also vary. Thus considering a unified weight for a view may lead to suboptimal solutions. In this paper, we propose a novel localized multi-view subspace clustering model that considers the confidence levels of both views and samples. By assigning weight to each sample under each view properly, we can obtain a robust consensus representation via fusing the noiseless structures among views and samples. We further develop a regularizer on weight parameters based on the convex conjugacy theory, and samples weights are determined in an adaptive manner. An efficient iterative algorithm is developed with a convergence guarantee. Experimental results on four benchmarks demonstrate the correctness and effectiveness of the proposed model.
Tasks Multi-view Subspace Clustering
Published 2017-05-22
URL http://arxiv.org/abs/1705.07777v1
PDF http://arxiv.org/pdf/1705.07777v1.pdf
PWC https://paperswithcode.com/paper/robust-localized-multi-view-subspace
Repo
Framework

Any-Angle Pathfinding for Multiple Agents Based on SIPP Algorithm

Title Any-Angle Pathfinding for Multiple Agents Based on SIPP Algorithm
Authors Konstantin Yakovlev, Anton Andreychuk
Abstract The problem of finding conflict-free trajectories for multiple agents of identical circular shape, operating in shared 2D workspace, is addressed in the paper and decoupled, e.g., prioritized, approach is used to solve this problem. Agents’ workspace is tessellated into the square grid on which any-angle moves are allowed, e.g. each agent can move into an arbitrary direction as long as this move follows the straight line segment whose endpoints are tied to the distinct grid elements. A novel any-angle planner based on Safe Interval Path Planning (SIPP) algorithm is proposed to find trajectories for an agent moving amidst dynamic obstacles (other agents) on a grid. This algorithm is then used as part of a prioritized multi-agent planner AA-SIPP(m). On the theoretical, side we show that AA-SIPP(m) is complete under well-defined conditions. On the experimental side, in simulation tests with up to 200 agents involved, we show that our planner finds much better solutions in terms of cost (up to 20%) compared to the planners relying on cardinal moves only.
Tasks
Published 2017-03-12
URL http://arxiv.org/abs/1703.04159v2
PDF http://arxiv.org/pdf/1703.04159v2.pdf
PWC https://paperswithcode.com/paper/any-angle-pathfinding-for-multiple-agents
Repo
Framework

SVSGAN: Singing Voice Separation via Generative Adversarial Network

Title SVSGAN: Singing Voice Separation via Generative Adversarial Network
Authors Zhe-Cheng Fan, Yen-Lin Lai, Jyh-Shing Roger Jang
Abstract Separating two sources from an audio mixture is an important task with many applications. It is a challenging problem since only one signal channel is available for analysis. In this paper, we propose a novel framework for singing voice separation using the generative adversarial network (GAN) with a time-frequency masking function. The mixture spectra is considered to be a distribution and is mapped to the clean spectra which is also considered a distribtution. The approximation of distributions between mixture spectra and clean spectra is performed during the adversarial training process. In contrast with current deep learning approaches for source separation, the parameters of the proposed framework are first initialized in a supervised setting and then optimized by the training procedure of GAN in an unsupervised setting. Experimental results on three datasets (MIR-1K, iKala and DSD100) show that performance can be improved by the proposed framework consisting of conventional networks.
Tasks
Published 2017-10-31
URL http://arxiv.org/abs/1710.11428v2
PDF http://arxiv.org/pdf/1710.11428v2.pdf
PWC https://paperswithcode.com/paper/svsgan-singing-voice-separation-via
Repo
Framework

Marginal sequential Monte Carlo for doubly intractable models

Title Marginal sequential Monte Carlo for doubly intractable models
Authors Richard G. Everitt, Dennis Prangle, Philip Maybank, Mark Bell
Abstract Bayesian inference for models that have an intractable partition function is known as a doubly intractable problem, where standard Monte Carlo methods are not applicable. The past decade has seen the development of auxiliary variable Monte Carlo techniques (M{\o}ller et al., 2006; Murray et al., 2006) for tackling this problem; these approaches being members of the more general class of pseudo-marginal, or exact-approximate, Monte Carlo algorithms (Andrieu and Roberts, 2009), which make use of unbiased estimates of intractable posteriors. Everitt et al. (2017) investigated the use of exact-approximate importance sampling (IS) and sequential Monte Carlo (SMC) in doubly intractable problems, but focussed only on SMC algorithms that used data-point tempering. This paper describes SMC samplers that may use alternative sequences of distributions, and describes ways in which likelihood estimates may be improved adaptively as the algorithm progresses, building on ideas from Moores et al. (2015). This approach is compared with a number of alternative algorithms for doubly intractable problems, including approximate Bayesian computation (ABC), which we show is closely related to the method of M{\o}ller et al. (2006).
Tasks Bayesian Inference
Published 2017-10-12
URL http://arxiv.org/abs/1710.04382v1
PDF http://arxiv.org/pdf/1710.04382v1.pdf
PWC https://paperswithcode.com/paper/marginal-sequential-monte-carlo-for-doubly
Repo
Framework

Mixing time estimation in reversible Markov chains from a single sample path

Title Mixing time estimation in reversible Markov chains from a single sample path
Authors Daniel Hsu, Aryeh Kontorovich, David A. Levin, Yuval Peres, Csaba Szepesvári
Abstract The spectral gap $\gamma$ of a finite, ergodic, and reversible Markov chain is an important parameter measuring the asymptotic rate of convergence. In applications, the transition matrix $P$ may be unknown, yet one sample of the chain up to a fixed time $n$ may be observed. We consider here the problem of estimating $\gamma$ from this data. Let $\pi$ be the stationary distribution of $P$, and $\pi_\star = \min_x \pi(x)$. We show that if $n = \tilde{O}\bigl(\frac{1}{\gamma \pi_\star}\bigr)$, then $\gamma$ can be estimated to within multiplicative constants with high probability. When $\pi$ is uniform on $d$ states, this matches (up to logarithmic correction) a lower bound of $\tilde{\Omega}\bigl(\frac{d}{\gamma}\bigr)$ steps required for precise estimation of $\gamma$. Moreover, we provide the first procedure for computing a fully data-dependent interval, from a single finite-length trajectory of the chain, that traps the mixing time $t_{\text{mix}}$ of the chain at a prescribed confidence level. The interval does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge. The interval is constructed around the relaxation time $t_{\text{relax}} = 1/\gamma$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $1/\sqrt{n}$ rate, where $n$ is the length of the sample path.
Tasks
Published 2017-08-24
URL http://arxiv.org/abs/1708.07367v1
PDF http://arxiv.org/pdf/1708.07367v1.pdf
PWC https://paperswithcode.com/paper/mixing-time-estimation-in-reversible-markov-1
Repo
Framework
comments powered by Disqus