May 6, 2019

2979 words 14 mins read

Paper Group ANR 180

When is Nontrivial Estimation Possible for Graphons and Stochastic Block Models?. Cohomology of Cryo-Electron Microscopy. Distribution-dependent concentration inequalities for tighter generalization bounds. Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions. A review of Gaussian Markov models for conditional independence. …

When is Nontrivial Estimation Possible for Graphons and Stochastic Block Models?


Title	When is Nontrivial Estimation Possible for Graphons and Stochastic Block Models?
Authors	Audra McMillan, Adam Smith
Abstract	Block graphons (also called stochastic block models) are an important and widely-studied class of models for random networks. We provide a lower bound on the accuracy of estimators for block graphons with a large number of blocks. We show that, given only the number $k$ of blocks and an upper bound $\rho$ on the values (connection probabilities) of the graphon, every estimator incurs error at least on the order of $\min(\rho, \sqrt{\rho k^2/n^2})$ in the $\delta_2$ metric with constant probability, in the worst case over graphons. In particular, our bound rules out any nontrivial estimation (that is, with $\delta_2$ error substantially less than $\rho$) when $k\geq n\sqrt{\rho}$. Combined with previous upper and lower bounds, our results characterize, up to logarithmic terms, the minimax accuracy of graphon estimation in the $\delta_2$ metric. A similar lower bound to ours was obtained independently by Klopp, Tsybakov and Verzelen (2016).
Tasks	Graphon Estimation
Published	2016-04-07
URL	http://arxiv.org/abs/1604.01871v1
PDF	http://arxiv.org/pdf/1604.01871v1.pdf
PWC	https://paperswithcode.com/paper/when-is-nontrivial-estimation-possible-for
Repo
Framework

Cohomology of Cryo-Electron Microscopy


Title	Cohomology of Cryo-Electron Microscopy
Authors	Ke Ye, Lek-Heng Lim
Abstract	The goal of cryo-electron microscopy (EM) is to reconstruct the 3-dimensional structure of a molecule from a collection of its 2-dimensional projected images. In this article, we show that the basic premise of cryo-EM — patching together 2-dimensional projections to reconstruct a 3-dimensional object — is naturally one of Cech cohomology with SO(2)-coefficients. We deduce that every cryo-EM reconstruction problem corresponds to an oriented circle bundle on a simplicial complex, allowing us to classify cryo-EM problems via principal bundles. In practice, the 2-dimensional images are noisy and a main task in cryo-EM is to denoise them. We will see how the aforementioned insights can be used towards this end.
Tasks
Published	2016-04-05
URL	http://arxiv.org/abs/1604.01319v2
PDF	http://arxiv.org/pdf/1604.01319v2.pdf
PWC	https://paperswithcode.com/paper/cohomology-of-cryo-electron-microscopy
Repo
Framework

Distribution-dependent concentration inequalities for tighter generalization bounds


Title	Distribution-dependent concentration inequalities for tighter generalization bounds
Authors	Xinxing Wu, Junping Zhang
Abstract	Concentration inequalities are indispensable tools for studying the generalization capacity of learning models. Hoeffding’s and McDiarmid’s inequalities are commonly used, giving bounds independent of the data distribution. Although this makes them widely applicable, a drawback is that the bounds can be too loose in some specific cases. Although efforts have been devoted to improving the bounds, we find that the bounds can be further tightened in some distribution-dependent scenarios and conditions for the inequalities can be relaxed. In particular, we propose four types of conditions for probabilistic boundedness and bounded differences, and derive several distribution-dependent extensions of Hoeffding’s and McDiarmid’s inequalities. These extensions provide bounds for functions not satisfying the conditions of the existing inequalities, and in some special cases, tighter bounds. Furthermore, we obtain generalization bounds for unbounded and hierarchy-bounded loss functions. Finally we discuss the potential applications of our extensions to learning theory.
Tasks
Published	2016-07-19
URL	http://arxiv.org/abs/1607.05506v2
PDF	http://arxiv.org/pdf/1607.05506v2.pdf
PWC	https://paperswithcode.com/paper/distribution-dependent-concentration
Repo
Framework

Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions


Title	Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions
Authors	Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh
Abstract	Visual Question Answering (VQA) is the task of answering natural-language questions about images. We introduce the novel problem of determining the relevance of questions to images in VQA. Current VQA models do not reason about whether a question is even related to the given image (e.g. What is the capital of Argentina?) or if it requires information from external resources to answer correctly. This can break the continuity of a dialogue in human-machine interaction. Our approaches for determining relevance are composed of two stages. Given an image and a question, (1) we first determine whether the question is visual or not, (2) if visual, we determine whether the question is relevant to the given image or not. Our approaches, based on LSTM-RNNs, VQA model uncertainty, and caption-question similarity, are able to outperform strong baselines on both relevance tasks. We also present human studies showing that VQA models augmented with such question relevance reasoning are perceived as more intelligent, reasonable, and human-like.
Tasks	Question Answering, Question Similarity, Visual Question Answering
Published	2016-06-21
URL	http://arxiv.org/abs/1606.06622v3
PDF	http://arxiv.org/pdf/1606.06622v3.pdf
PWC	https://paperswithcode.com/paper/question-relevance-in-vqa-identifying-non
Repo
Framework

A review of Gaussian Markov models for conditional independence


Title	A review of Gaussian Markov models for conditional independence
Authors	Irene Córdoba, Concha Bielza, Pedro Larrañaga
Abstract	Markov models lie at the interface between statistical independence in a probability distribution and graph separation properties. We review model selection and estimation in directed and undirected Markov models with Gaussian parametrization, emphasizing the main similarities and differences. These two model classes are similar but not equivalent, although they share a common intersection. We present the existing results from a historical perspective, taking into account the amount of literature existing from both the artificial intelligence and statistics research communities, where these models were originated. We cover classical topics such as maximum likelihood estimation and model selection via hypothesis testing, but also more modern approaches like regularization and Bayesian methods. We also discuss how the Markov models reviewed fit in the rich hierarchy of other, higher level Markov model classes. Finally, we close the paper overviewing relaxations of the Gaussian assumption and pointing out the main areas of application where these Markov models are nowadays used.
Tasks	Model Selection
Published	2016-06-23
URL	https://arxiv.org/abs/1606.07282v5
PDF	https://arxiv.org/pdf/1606.07282v5.pdf
PWC	https://paperswithcode.com/paper/on-gaussian-markov-models-for-conditional
Repo
Framework

Correlation Preserving Sparse Coding Over Multi-level Dictionaries for Image Denoising


Title	Correlation Preserving Sparse Coding Over Multi-level Dictionaries for Image Denoising
Authors	Rui Chen, Huizhu Jia, Xiaodong Xie, Wen Gao
Abstract	In this letter, we propose a novel image denoising method based on correlation preserving sparse coding. Because the instable and unreliable correlations among basis set can limit the performance of the dictionary-driven denoising methods, two effective regularized strategies are employed in the coding process. Specifically, a graph-based regularizer is built for preserving the global similarity correlations, which can adaptively capture both the geometrical structures and discriminative features of textured patches. In particular, edge weights in the graph are obtained by seeking a nonnegative low-rank construction. Besides, a robust locality-constrained coding can automatically preserve not only spatial neighborhood information but also internal consistency present in noisy patches while learning overcomplete dictionary. Experimental results demonstrate that our proposed method achieves state-of-the-art denoising performance in terms of both PSNR and subjective visual quality.
Tasks	Denoising, Image Denoising
Published	2016-12-23
URL	http://arxiv.org/abs/1612.08049v1
PDF	http://arxiv.org/pdf/1612.08049v1.pdf
PWC	https://paperswithcode.com/paper/correlation-preserving-sparse-coding-over
Repo
Framework

Leveraging Semantic Web Search and Browse Sessions for Multi-Turn Spoken Dialog Systems


Title	Leveraging Semantic Web Search and Browse Sessions for Multi-Turn Spoken Dialog Systems
Authors	Lu Wang, Larry Heck, Dilek Hakkani-Tur
Abstract	Training statistical dialog models in spoken dialog systems (SDS) requires large amounts of annotated data. The lack of scalable methods for data mining and annotation poses a significant hurdle for state-of-the-art statistical dialog managers. This paper presents an approach that directly leverage billions of web search and browse sessions to overcome this hurdle. The key insight is that task completion through web search and browse sessions is (a) predictable and (b) generalizes to spoken dialog task completion. The new method automatically mines behavioral search and browse patterns from web logs and translates them into spoken dialog models. We experiment with naturally occurring spoken dialogs and large scale web logs. Our session-based models outperform the state-of-the-art method for entity extraction task in SDS. We also achieve better performance for both entity and relation extraction on web search queries when compared with nontrivial baselines.
Tasks	Entity Extraction, Relation Extraction
Published	2016-06-25
URL	http://arxiv.org/abs/1606.07967v1
PDF	http://arxiv.org/pdf/1606.07967v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-semantic-web-search-and-browse
Repo
Framework

Estimation of Bandlimited Grayscale Images From the Single Bit Observations of Pixels Affected by Additive Gaussian Noise


Title	Estimation of Bandlimited Grayscale Images From the Single Bit Observations of Pixels Affected by Additive Gaussian Noise
Authors	Abhinav Kumar, Animesh Kumar
Abstract	The estimation of grayscale images using their single-bit zero mean Gaussian noise-affected pixels is presented in this paper. The images are assumed to be bandlimited in the Fourier Cosine transform (FCT) domain. The images are oversampled over their Nyquist rate in the FCT domain. We propose a non-recursive approach based on first order approximation of Cumulative Distribution Function (CDF) to estimate the image from single bit pixels which itself is based on Banach’s contraction theorem. The decay rate for mean squared error of estimating such images is found to be independent of the precision of the quantizer and it varies as $O(1/N)$ where $N$ is the “effective” oversampling ratio with respect to the Nyquist rate in the FCT domain.
Tasks
Published	2016-10-27
URL	http://arxiv.org/abs/1610.08627v1
PDF	http://arxiv.org/pdf/1610.08627v1.pdf
PWC	https://paperswithcode.com/paper/estimation-of-bandlimited-grayscale-images
Repo
Framework

Auto-JacoBin: Auto-encoder Jacobian Binary Hashing


Title	Auto-JacoBin: Auto-encoder Jacobian Binary Hashing
Authors	Xiping Fu, Brendan McCane, Steven Mills, Michael Albert, Lech Szymanski
Abstract	Binary codes can be used to speed up nearest neighbor search tasks in large scale data sets as they are efficient for both storage and retrieval. In this paper, we propose a robust auto-encoder model that preserves the geometric relationships of high-dimensional data sets in Hamming space. This is done by considering a noise-removing function in a region surrounding the manifold where the training data points lie. This function is defined with the property that it projects the data points near the manifold into the manifold wisely, and we approximate this function by its first order approximation. Experimental results show that the proposed method achieves better than state-of-the-art results on three large scale high dimensional data sets.
Tasks
Published	2016-02-25
URL	http://arxiv.org/abs/1602.08127v2
PDF	http://arxiv.org/pdf/1602.08127v2.pdf
PWC	https://paperswithcode.com/paper/auto-jacobin-auto-encoder-jacobian-binary
Repo
Framework

Single-image RGB Photometric Stereo With Spatially-varying Albedo


Title	Single-image RGB Photometric Stereo With Spatially-varying Albedo
Authors	Ayan Chakrabarti, Kalyan Sunkavalli
Abstract	We present a single-shot system to recover surface geometry of objects with spatially-varying albedos, from images captured under a calibrated RGB photometric stereo setup—with three light directions multiplexed across different color channels in the observed RGB image. Since the problem is ill-posed point-wise, we assume that the albedo map can be modeled as piece-wise constant with a restricted number of distinct albedo values. We show that under ideal conditions, the shape of a non-degenerate local constant albedo surface patch can theoretically be recovered exactly. Moreover, we present a practical and efficient algorithm that uses this model to robustly recover shape from real images. Our method first reasons about shape locally in a dense set of patches in the observed image, producing shape distributions for every patch. These local distributions are then combined to produce a single consistent surface normal map. We demonstrate the efficacy of the approach through experiments on both synthetic renderings as well as real captured images.
Tasks
Published	2016-09-14
URL	http://arxiv.org/abs/1609.04079v1
PDF	http://arxiv.org/pdf/1609.04079v1.pdf
PWC	https://paperswithcode.com/paper/single-image-rgb-photometric-stereo-with
Repo
Framework

High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm


Title	High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm
Authors	Alain Durmus, Eric Moulines
Abstract	We consider in this paper the problem of sampling a high-dimensional probability distribution $\pi$ having a density with respect to the Lebesgue measure on $\mathbb{R}^d$, known up to a normalization constant $x \mapsto \pi(x)= \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y$. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that $U$ is continuously differentiable, $\nabla U$ is globally Lipschitz and $U$ is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order $2$ and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of these bounds is explicit. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented to support our claims.
Tasks	Bayesian Inference
Published	2016-05-05
URL	http://arxiv.org/abs/1605.01559v4
PDF	http://arxiv.org/pdf/1605.01559v4.pdf
PWC	https://paperswithcode.com/paper/high-dimensional-bayesian-inference-via-the
Repo
Framework

Semantic tracking: Single-target tracking with inter-supervised convolutional networks


Title	Semantic tracking: Single-target tracking with inter-supervised convolutional networks
Authors	Jingjing Xiao, Qiang Lan, Linbo Qiao, Ales Leonardis
Abstract	This article presents a semantic tracker which simultaneously tracks a single target and recognises its category. In general, it is hard to design a tracking model suitable for all object categories, e.g., a rigid tracker for a car is not suitable for a deformable gymnast. Category-based trackers usually achieve superior tracking performance for the objects of that specific category, but have difficulties being generalised. Therefore, we propose a novel unified robust tracking framework which explicitly encodes both generic features and category-based features. The tracker consists of a shared convolutional network (NetS), which feeds into two parallel networks, NetC for classification and NetT for tracking. NetS is pre-trained on ImageNet to serve as a generic feature extractor across the different object categories for NetC and NetT. NetC utilises those features within fully connected layers to classify the object category. NetT has multiple branches, corresponding to multiple categories, to distinguish the tracked object from the background. Since each branch in NetT is trained by the videos of a specific category or groups of similar categories, NetT encodes category-based features for tracking. During online tracking, NetC and NetT jointly determine the target regions with the right category and foreground labels for target estimation. To improve the robustness and precision, NetC and NetT inter-supervise each other and trigger network adaptation when their outputs are ambiguous for the same image regions (i.e., when the category label contradicts the foreground/background classification). We have compared the performance of our tracker to other state-of-the-art trackers on a large-scale tracking benchmark (100 sequences)—the obtained results demonstrate the effectiveness of our proposed tracker as it outperformed other 38 state-of-the-art tracking algorithms.
Tasks
Published	2016-11-19
URL	http://arxiv.org/abs/1611.06395v1
PDF	http://arxiv.org/pdf/1611.06395v1.pdf
PWC	https://paperswithcode.com/paper/semantic-tracking-single-target-tracking-with
Repo
Framework

Local Training for PLDA in Speaker Verification


Title	Local Training for PLDA in Speaker Verification
Authors	Chenghui Zhao, Lantian Li, Dong Wang, April Pu
Abstract	PLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification. However, PLDA training requires a large amount of labeled development data, which is highly expensive in most cases. A possible approach to mitigate the problem is various unsupervised adaptation methods, which use unlabeled data to adapt the PLDA scattering matrices to the target domain. In this paper, we present a new `local training' approach that utilizes inaccurate but much cheaper local labels to train the PLDA model. These local labels discriminate speakers within a single conversion only, and so are much easier to obtain compared to the normal` global labels’. Our experiments show that the proposed approach can deliver significant performance improvement, particularly with limited globally-labeled data.
Tasks	Speaker Verification
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08433v1
PDF	http://arxiv.org/pdf/1609.08433v1.pdf
PWC	https://paperswithcode.com/paper/local-training-for-plda-in-speaker
Repo
Framework

A Semi-Automated Method for Object Segmentation in Infant’s Egocentric Videos to Study Object Perception


Title	A Semi-Automated Method for Object Segmentation in Infant’s Egocentric Videos to Study Object Perception
Authors	Qazaleh Mirsharif, Sidharth Sadani, Shishir Shah, Hanako Yoshida, Joseph Burling
Abstract	Object segmentation in infant’s egocentric videos is a fundamental step in studying how children perceive objects in early stages of development. From the computer vision perspective, object segmentation in such videos pose quite a few challenges because the child’s view is unfocused, often with large head movements, effecting in sudden changes in the child’s point of view which leads to frequent change in object properties such as size, shape and illumination. In this paper, we develop a semi-automated, domain specific, method to address these concerns and facilitate the object annotation process for cognitive scientists allowing them to select and monitor the object under segmentation. The method starts with an annotation from the user of the desired object and employs graph cut segmentation and optical flow computation to predict the object mask for subsequent video frames automatically. To maintain accuracy, we use domain specific heuristic rules to re-initialize the program with new user input whenever object properties change dramatically. The evaluations demonstrate the high speed and accuracy of the presented method for object segmentation in voluminous egocentric videos. We apply the proposed method to investigate potential patterns in object distribution in child’s view at progressive ages.
Tasks	Optical Flow Estimation, Semantic Segmentation
Published	2016-02-08
URL	http://arxiv.org/abs/1602.02522v1
PDF	http://arxiv.org/pdf/1602.02522v1.pdf
PWC	https://paperswithcode.com/paper/a-semi-automated-method-for-object
Repo
Framework

4D Cardiac Ultrasound Standard Plane Location by Spatial-Temporal Correlation


Title	4D Cardiac Ultrasound Standard Plane Location by Spatial-Temporal Correlation
Authors	Yun Gu, Guang-Zhong Yang, Jie Yang, Kun Sun
Abstract	Echocardiography plays an important part in diagnostic aid in cardiac diseases. A critical step in echocardiography-aided diagnosis is to extract the standard planes since they tend to provide promising views to present different structures that are benefit to diagnosis. To this end, this paper proposes a spatial-temporal embedding framework to extract the standard view planes from 4D STIC (spatial-temporal image corre- lation) volumes. The proposed method is comprised of three stages, the frame smoothing, spatial-temporal embedding and final classification. In first stage, an L 0 smoothing filter is used to preprocess the frames that removes the noise and preserves the boundary. Then a compact repre- sentation is learned via embedding spatial and temporal features into a latent space in the supervised scheme considering both standard plane information and diagnosis result. In last stage, the learned features are fed into support vector machine to identify the standard plane. We eval- uate the proposed method on a 4D STIC volume dataset with 92 normal cases and 93 abnormal cases in three standard planes. It demonstrates that our method outperforms the baselines in both classification accuracy and computational efficiency.
Tasks
Published	2016-07-20
URL	http://arxiv.org/abs/1607.05969v1
PDF	http://arxiv.org/pdf/1607.05969v1.pdf
PWC	https://paperswithcode.com/paper/4d-cardiac-ultrasound-standard-plane-location
Repo
Framework