Paper Group ANR 245
MCMC Shape Sampling for Image Segmentation with Nonparametric Shape Priors. Invariant Representations for Noisy Speech Recognition. Color Constancy with Derivative Colors. A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation. Fast and Extensible Online Multivariate Kernel Density Estimation. Microscopic Pedes …
MCMC Shape Sampling for Image Segmentation with Nonparametric Shape Priors
Title | MCMC Shape Sampling for Image Segmentation with Nonparametric Shape Priors |
Authors | Ertunc Erdil, Sinan Yıldırım, Müjdat Çetin, Tolga Taşdizen |
Abstract | Segmenting images of low quality or with missing data is a challenging problem. Integrating statistical prior information about the shapes to be segmented can improve the segmentation results significantly. Most shape-based segmentation algorithms optimize an energy functional and find a point estimate for the object to be segmented. This does not provide a measure of the degree of confidence in that result, neither does it provide a picture of other probable solutions based on the data and the priors. With a statistical view, addressing these issues would involve the problem of characterizing the posterior densities of the shapes of the objects to be segmented. For such characterization, we propose a Markov chain Monte Carlo (MCMC) sampling-based image segmentation algorithm that uses statistical shape priors. In addition to better characterization of the statistical structure of the problem, such an approach would also have the potential to address issues with getting stuck at local optima, suffered by existing shape-based segmentation methods. Our approach is able to characterize the posterior probability density in the space of shapes through its samples, and to return multiple solutions, potentially from different modes of a multimodal probability density, which would be encountered, e.g., in segmenting objects from multiple shape classes. We present promising results on a variety of data sets. We also provide an extension for segmenting shapes of objects with parts that can go through independent shape variations. This extension involves the use of local shape priors on object parts and provides robustness to limitations in shape training data size. |
Tasks | Semantic Segmentation |
Published | 2016-11-11 |
URL | http://arxiv.org/abs/1611.03749v1 |
http://arxiv.org/pdf/1611.03749v1.pdf | |
PWC | https://paperswithcode.com/paper/mcmc-shape-sampling-for-image-segmentation |
Repo | |
Framework | |
Invariant Representations for Noisy Speech Recognition
Title | Invariant Representations for Noisy Speech Recognition |
Authors | Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio |
Abstract | Modern automatic speech recognition (ASR) systems need to be robust under acoustic variability arising from environmental, speaker, channel, and recording conditions. Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training. We attempt to address this problem by encouraging the neural network acoustic model to learn invariant feature representations. We use ideas from recent research on image generation using Generative Adversarial Networks and domain adaptation ideas extending adversarial gradient-based training. A recent work from Ganin et al. proposes to use adversarial training for image domain adaptation by using an intermediate representation from the main target classification network to deteriorate the domain classifier performance through a separate neural network. Our work focuses on investigating neural architectures which produce representations invariant to noise conditions for ASR. We evaluate the proposed architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We show that our method generalizes better than the standard multi-condition training especially when only a few noise categories are seen during training. |
Tasks | Domain Adaptation, Image Generation, Noisy Speech Recognition, Speech Recognition |
Published | 2016-11-27 |
URL | http://arxiv.org/abs/1612.01928v1 |
http://arxiv.org/pdf/1612.01928v1.pdf | |
PWC | https://paperswithcode.com/paper/invariant-representations-for-noisy-speech |
Repo | |
Framework | |
Color Constancy with Derivative Colors
Title | Color Constancy with Derivative Colors |
Authors | Huan Lei, Guang Jiang, Long Quan |
Abstract | Information about the illuminant color is well contained in both achromatic regions and the specular components of highlight regions. In this paper, we propose a novel way to achieve color constancy by exploiting such clues. The key to our approach lies in the use of suitably extracted derivative colors, which are able to compute the illuminant color robustly with kernel density estimation. While extracting derivative colors from achromatic regions to approximate the illuminant color well is basically straightforward, the success of our extraction in highlight regions is attributed to the different rates of variation of the diffuse and specular magnitudes in the dichromatic reflection model. The proposed approach requires no training phase and is simple to implement. More significantly, it performs quite satisfactorily under inter-database parameter settings. Our experiments on three standard databases demonstrate its effectiveness and fine performance in comparison to state-of-the-art methods. |
Tasks | Color Constancy, Density Estimation |
Published | 2016-11-25 |
URL | http://arxiv.org/abs/1611.08389v1 |
http://arxiv.org/pdf/1611.08389v1.pdf | |
PWC | https://paperswithcode.com/paper/color-constancy-with-derivative-colors |
Repo | |
Framework | |
A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation
Title | A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation |
Authors | Helge Rhodin, Nadia Robertini, Christian Richardt, Hans-Peter Seidel, Christian Theobalt |
Abstract | Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation. |
Tasks | Motion Capture, Pose Estimation |
Published | 2016-02-11 |
URL | http://arxiv.org/abs/1602.03725v1 |
http://arxiv.org/pdf/1602.03725v1.pdf | |
PWC | https://paperswithcode.com/paper/a-versatile-scene-model-with-differentiable |
Repo | |
Framework | |
Fast and Extensible Online Multivariate Kernel Density Estimation
Title | Fast and Extensible Online Multivariate Kernel Density Estimation |
Authors | Jaime Ferreira, David Martins de Matos, Ricardo Ribeiro |
Abstract | We present xokde++, a state-of-the-art online kernel density estimation approach that maintains Gaussian mixture models input data streams. The approach follows state-of-the-art work on online density estimation, but was redesigned with computational efficiency, numerical robustness, and extensibility in mind. Our approach produces comparable or better results than the current state-of-the-art, while achieving significant computational performance gains and improved numerical stability. The use of diagonal covariance Gaussian kernels, which further improve performance and stability, at a small loss of modelling quality, is also explored. Our approach is up to 40 times faster, while requiring 90% less memory than the closest state-of-the-art counterpart. |
Tasks | Density Estimation |
Published | 2016-06-08 |
URL | http://arxiv.org/abs/1606.02608v1 |
http://arxiv.org/pdf/1606.02608v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-extensible-online-multivariate |
Repo | |
Framework | |
Microscopic Pedestrian Flow Characteristics: Development of an Image Processing Data Collection and Simulation Model
Title | Microscopic Pedestrian Flow Characteristics: Development of an Image Processing Data Collection and Simulation Model |
Authors | Kardi Teknomo |
Abstract | Microscopic pedestrian studies consider detailed interaction of pedestrians to control their movement in pedestrian traffic flow. The tools to collect the microscopic data and to analyze microscopic pedestrian flow are still very much in its infancy. The microscopic pedestrian flow characteristics need to be understood. Manual, semi manual and automatic image processing data collection systems were developed. It was found that the microscopic speed resemble a normal distribution with a mean of 1.38 m/second and standard deviation of 0.37 m/second. The acceleration distribution also bear a resemblance to the normal distribution with an average of 0.68 m/ square second. A physical based microscopic pedestrian simulation model was also developed. Both Microscopic Video Data Collection and Microscopic Pedestrian Simulation Model generate a database called NTXY database. The formulations of the flow performance or microscopic pedestrian characteristics are explained. Sensitivity of the simulation and relationship between the flow performances are described. Validation of the simulation using real world data is then explained through the comparison between average instantaneous speed distributions of the real world data with the result of the simulations. The simulation model is then applied for some experiments on a hypothetical situation to gain more understanding of pedestrian behavior in one way and two way situations, to know the behavior of the system if the number of elderly pedestrian increases and to evaluate a policy of lane-like segregation toward pedestrian crossing and inspects the performance of the crossing. It was revealed that the microscopic pedestrian studies have been successfully applied to give more understanding to the behavior of microscopic pedestrians flow, predict the theoretical and practical situation and evaluate some design policies before its implementation. |
Tasks | |
Published | 2016-09-06 |
URL | http://arxiv.org/abs/1610.00029v1 |
http://arxiv.org/pdf/1610.00029v1.pdf | |
PWC | https://paperswithcode.com/paper/microscopic-pedestrian-flow-characteristics |
Repo | |
Framework | |
Multispectral image denoising with optimized vector non-local mean filter
Title | Multispectral image denoising with optimized vector non-local mean filter |
Authors | Ahmed Ben Said, Rachid Hadjidj, Kamel Eddine Melkemi, Sebti Foufou |
Abstract | Nowadays, many applications rely on images of high quality to ensure good performance in conducting their tasks. However, noise goes against this objective as it is an unavoidable issue in most applications. Therefore, it is essential to develop techniques to attenuate the impact of noise, while maintaining the integrity of relevant information in images. We propose in this work to extend the application of the Non-Local Means filter (NLM) to the vector case and apply it for denoising multispectral images. The objective is to benefit from the additional information brought by multispectral imaging systems. The NLM filter exploits the redundancy of information in an image to remove noise. A restored pixel is a weighted average of all pixels in the image. In our contribution, we propose an optimization framework where we dynamically fine tune the NLM filter parameters and attenuate its computational complexity by considering only pixels which are most similar to each other in computing a restored pixel. Filter parameters are optimized using Stein’s Unbiased Risk Estimator (SURE) rather than using ad hoc means. Experiments have been conducted on multispectral images corrupted with additive white Gaussian noise and PSNR and similarity comparison with other approaches are provided to illustrate the efficiency of our approach in terms of both denoising performance and computation complexity. |
Tasks | Denoising, Image Denoising |
Published | 2016-10-21 |
URL | http://arxiv.org/abs/1610.06688v1 |
http://arxiv.org/pdf/1610.06688v1.pdf | |
PWC | https://paperswithcode.com/paper/multispectral-image-denoising-with-optimized |
Repo | |
Framework | |
Automatic Synchronization of Multi-User Photo Galleries
Title | Automatic Synchronization of Multi-User Photo Galleries |
Authors | E. Sansone, K. Apostolidis, N. Conci, G. Boato, V. Mezaris, F. G. B. De Natale |
Abstract | In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solution that achieves better generalization performance for the synchronization task compared to the available literature. The method is characterized by three stages: at first, deep convolutional neural network features are used to assess the visual similarity among the photos; then, pairs of similar photos are detected across different galleries and used to construct a graph; eventually, a probabilistic graphical model is used to estimate the temporal offset of each pair of galleries, by traversing the minimum spanning tree extracted from this graph. The experimental evaluation is conducted on four publicly available datasets covering different types of events, demonstrating the strength of our proposed method. A thorough discussion of the obtained results is provided for a critical assessment of the quality in synchronization. |
Tasks | |
Published | 2016-08-24 |
URL | http://arxiv.org/abs/1608.06770v2 |
http://arxiv.org/pdf/1608.06770v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-synchronization-of-multi-user-photo |
Repo | |
Framework | |
VIPLFaceNet: An Open Source Deep Face Recognition SDK
Title | VIPLFaceNet: An Open Source Deep Face Recognition SDK |
Authors | Xin Liu, Meina Kan, Wanglong Wu, Shiguang Shan, Xilin Chen |
Abstract | Robust face representation is imperative to highly accurate face recognition. In this work, we propose an open source face recognition method with deep representation named as VIPLFaceNet, which is a 10-layer deep convolutional neural network with 7 convolutional layers and 3 fully-connected layers. Compared with the well-known AlexNet, our VIPLFaceNet takes only 20% training time and 60% testing time, but achieves 40% drop in error rate on the real-world face recognition benchmark LFW. Our VIPLFaceNet achieves 98.60% mean accuracy on LFW using one single network. An open-source C++ SDK based on VIPLFaceNet is released under BSD license. The SDK takes about 150ms to process one face image in a single thread on an i7 desktop CPU. VIPLFaceNet provides a state-of-the-art start point for both academic and industrial face recognition applications. |
Tasks | Face Recognition |
Published | 2016-09-13 |
URL | http://arxiv.org/abs/1609.03892v1 |
http://arxiv.org/pdf/1609.03892v1.pdf | |
PWC | https://paperswithcode.com/paper/viplfacenet-an-open-source-deep-face |
Repo | |
Framework | |
Probabilistic Fluorescence-Based Synapse Detection
Title | Probabilistic Fluorescence-Based Synapse Detection |
Authors | Anish K. Simhal, Cecilia Aguerrebere, Forrest Collman, Joshua T. Vogelstein, Kristina D. Micheva, Richard J. Weinberg, Stephen J. Smith, Guillermo Sapiro |
Abstract | Brain function results from communication between neurons connected by complex synaptic networks. Synapses are themselves highly complex and diverse signaling machines, containing protein products of hundreds of different genes, some in hundreds of copies, arranged in precise lattice at each individual synapse. Synapses are fundamental not only to synaptic network function but also to network development, adaptation, and memory. In addition, abnormalities of synapse numbers or molecular components are implicated in most mental and neurological disorders. Despite their obvious importance, mammalian synapse populations have so far resisted detailed quantitative study. In human brains and most animal nervous systems, synapses are very small and very densely packed: there are approximately 1 billion synapses per cubic millimeter of human cortex. This volumetric density poses very substantial challenges to proteometric analysis at the critical level of the individual synapse. The present work describes new probabilistic image analysis methods for single-synapse analysis of synapse populations in both animal and human brains. |
Tasks | |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05479v1 |
http://arxiv.org/pdf/1611.05479v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-fluorescence-based-synapse |
Repo | |
Framework | |
Deep Embedding for Spatial Role Labeling
Title | Deep Embedding for Spatial Role Labeling |
Authors | Oswaldo Ludwig, Xiao Liu, Parisa Kordjamshidi, Marie-Francine Moens |
Abstract | This paper introduces the visually informed embedding of word (VIEW), a continuous vector representation for a word extracted from a deep neural model trained using the Microsoft COCO data set to forecast the spatial arrangements between visual objects, given a textual description. The model is composed of a deep multilayer perceptron (MLP) stacked on the top of a Long Short Term Memory (LSTM) network, the latter being preceded by an embedding layer. The VIEW is applied to transferring multimodal background knowledge to Spatial Role Labeling (SpRL) algorithms, which recognize spatial relations between objects mentioned in the text. This work also contributes with a new method to select complementary features and a fine-tuning method for MLP that improves the $F1$ measure in classifying the words into spatial roles. The VIEW is evaluated with the Task 3 of SemEval-2013 benchmark data set, SpaceEval. |
Tasks | |
Published | 2016-03-28 |
URL | http://arxiv.org/abs/1603.08474v1 |
http://arxiv.org/pdf/1603.08474v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-embedding-for-spatial-role-labeling |
Repo | |
Framework | |
Overdispersed Black-Box Variational Inference
Title | Overdispersed Black-Box Variational Inference |
Authors | Francisco J. R. Ruiz, Michalis K. Titsias, David M. Blei |
Abstract | We introduce overdispersed black-box variational inference, a method to reduce the variance of the Monte Carlo estimator of the gradient in black-box variational inference. Instead of taking samples from the variational distribution, we use importance sampling to take samples from an overdispersed distribution in the same exponential family as the variational approximation. Our approach is general since it can be readily applied to any exponential family distribution, which is the typical choice for the variational approximation. We run experiments on two non-conjugate probabilistic models to show that our method effectively reduces the variance, and the overhead introduced by the computation of the proposal parameters and the importance weights is negligible. We find that our overdispersed importance sampling scheme provides lower variance than black-box variational inference, even when the latter uses twice the number of samples. This results in faster convergence of the black-box inference procedure. |
Tasks | |
Published | 2016-03-03 |
URL | http://arxiv.org/abs/1603.01140v1 |
http://arxiv.org/pdf/1603.01140v1.pdf | |
PWC | https://paperswithcode.com/paper/overdispersed-black-box-variational-inference |
Repo | |
Framework | |
Designing Intelligent Instruments
Title | Designing Intelligent Instruments |
Authors | Kevin H. Knuth, Philip M. Erner, Scott Frasso |
Abstract | Remote science operations require automated systems that can both act and react with minimal human intervention. One such vision is that of an intelligent instrument that collects data in an automated fashion, and based on what it learns, decides which new measurements to take. This innovation implements experimental design and unites it with data analysis in such a way that it completes the cycle of learning. This cycle is the basis of the Scientific Method. The three basic steps of this cycle are hypothesis generation, inquiry, and inference. Hypothesis generation is implemented by artificially supplying the instrument with a parameterized set of possible hypotheses that might be used to describe the physical system. The act of inquiry is handled by an inquiry engine that relies on Bayesian adaptive exploration where the optimal experiment is chosen as the one which maximizes the expected information gain. The inference engine is implemented using the nested sampling algorithm, which provides the inquiry engine with a set of posterior samples from which the expected information gain can be estimated. With these computational structures in place, the instrument will refine its hypotheses, and repeat the learning cycle by taking measurements until the system under study is described within a pre-specified tolerance. We will demonstrate our first attempts toward achieving this goal with an intelligent instrument constructed using the LEGO MINDSTORMS NXT robotics platform. |
Tasks | |
Published | 2016-02-13 |
URL | http://arxiv.org/abs/1602.04290v1 |
http://arxiv.org/pdf/1602.04290v1.pdf | |
PWC | https://paperswithcode.com/paper/designing-intelligent-instruments |
Repo | |
Framework | |
Effective Deterministic Initialization for $k$-Means-Like Methods via Local Density Peaks Searching
Title | Effective Deterministic Initialization for $k$-Means-Like Methods via Local Density Peaks Searching |
Authors | Fengfu Li, Hong Qiao, Bo Zhang |
Abstract | The $k$-means clustering algorithm is popular but has the following main drawbacks: 1) the number of clusters, $k$, needs to be provided by the user in advance, 2) it can easily reach local minima with randomly selected initial centers, 3) it is sensitive to outliers, and 4) it can only deal with well separated hyperspherical clusters. In this paper, we propose a Local Density Peaks Searching (LDPS) initialization framework to address these issues. The LDPS framework includes two basic components: one of them is the local density that characterizes the density distribution of a data set, and the other is the local distinctiveness index (LDI) which we introduce to characterize how distinctive a data point is compared with its neighbors. Based on these two components, we search for the local density peaks which are characterized with high local densities and high LDIs to deal with 1) and 2). Moreover, we detect outliers characterized with low local densities but high LDIs, and exclude them out before clustering begins. Finally, we apply the LDPS initialization framework to $k$-medoids, which is a variant of $k$-means and chooses data samples as centers, with diverse similarity measures other than the Euclidean distance to fix the last drawback of $k$-means. Combining the LDPS initialization framework with $k$-means and $k$-medoids, we obtain two novel clustering methods called LDPS-means and LDPS-medoids, respectively. Experiments on synthetic data sets verify the effectiveness of the proposed methods, especially when the ground truth of the cluster number $k$ is large. Further, experiments on several real world data sets, Handwritten Pendigits, Coil-20, Coil-100 and Olivetti Face Database, illustrate that our methods give a superior performance than the analogous approaches on both estimating $k$ and unsupervised object categorization. |
Tasks | |
Published | 2016-11-21 |
URL | http://arxiv.org/abs/1611.06777v1 |
http://arxiv.org/pdf/1611.06777v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-deterministic-initialization-for-k |
Repo | |
Framework | |
Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning
Title | Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning |
Authors | Mario Lucic, Mesrob I. Ohannessian, Amin Karbasi, Andreas Krause |
Abstract | Faced with massive data, is it possible to trade off (statistical) risk, and (computational) space and time? This challenge lies at the heart of large-scale machine learning. Using k-means clustering as a prototypical unsupervised learning problem, we show how we can strategically summarize the data (control space) in order to trade off risk and time when data is generated by a probabilistic model. Our summarization is based on coreset constructions from computational geometry. We also develop an algorithm, TRAM, to navigate the space/time/data/risk tradeoff in practice. In particular, we show that for a fixed risk (or data size), as the data size increases (resp. risk increases) the running time of TRAM decreases. Our extensive experiments on real data sets demonstrate the existence and practical utility of such tradeoffs, not only for k-means but also for Gaussian Mixture Models. |
Tasks | |
Published | 2016-05-02 |
URL | http://arxiv.org/abs/1605.00529v1 |
http://arxiv.org/pdf/1605.00529v1.pdf | |
PWC | https://paperswithcode.com/paper/tradeoffs-for-space-time-data-and-risk-in |
Repo | |
Framework | |