Paper Group ANR 180
When is Nontrivial Estimation Possible for Graphons and Stochastic Block Models?. Cohomology of Cryo-Electron Microscopy. Distribution-dependent concentration inequalities for tighter generalization bounds. Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions. A review of Gaussian Markov models for conditional independence. …
When is Nontrivial Estimation Possible for Graphons and Stochastic Block Models?
Title | When is Nontrivial Estimation Possible for Graphons and Stochastic Block Models? |
Authors | Audra McMillan, Adam Smith |
Abstract | Block graphons (also called stochastic block models) are an important and widely-studied class of models for random networks. We provide a lower bound on the accuracy of estimators for block graphons with a large number of blocks. We show that, given only the number $k$ of blocks and an upper bound $\rho$ on the values (connection probabilities) of the graphon, every estimator incurs error at least on the order of $\min(\rho, \sqrt{\rho k^2/n^2})$ in the $\delta_2$ metric with constant probability, in the worst case over graphons. In particular, our bound rules out any nontrivial estimation (that is, with $\delta_2$ error substantially less than $\rho$) when $k\geq n\sqrt{\rho}$. Combined with previous upper and lower bounds, our results characterize, up to logarithmic terms, the minimax accuracy of graphon estimation in the $\delta_2$ metric. A similar lower bound to ours was obtained independently by Klopp, Tsybakov and Verzelen (2016). |
Tasks | Graphon Estimation |
Published | 2016-04-07 |
URL | http://arxiv.org/abs/1604.01871v1 |
http://arxiv.org/pdf/1604.01871v1.pdf | |
PWC | https://paperswithcode.com/paper/when-is-nontrivial-estimation-possible-for |
Repo | |
Framework | |
Cohomology of Cryo-Electron Microscopy
Title | Cohomology of Cryo-Electron Microscopy |
Authors | Ke Ye, Lek-Heng Lim |
Abstract | The goal of cryo-electron microscopy (EM) is to reconstruct the 3-dimensional structure of a molecule from a collection of its 2-dimensional projected images. In this article, we show that the basic premise of cryo-EM — patching together 2-dimensional projections to reconstruct a 3-dimensional object — is naturally one of Cech cohomology with SO(2)-coefficients. We deduce that every cryo-EM reconstruction problem corresponds to an oriented circle bundle on a simplicial complex, allowing us to classify cryo-EM problems via principal bundles. In practice, the 2-dimensional images are noisy and a main task in cryo-EM is to denoise them. We will see how the aforementioned insights can be used towards this end. |
Tasks | |
Published | 2016-04-05 |
URL | http://arxiv.org/abs/1604.01319v2 |
http://arxiv.org/pdf/1604.01319v2.pdf | |
PWC | https://paperswithcode.com/paper/cohomology-of-cryo-electron-microscopy |
Repo | |
Framework | |
Distribution-dependent concentration inequalities for tighter generalization bounds
Title | Distribution-dependent concentration inequalities for tighter generalization bounds |
Authors | Xinxing Wu, Junping Zhang |
Abstract | Concentration inequalities are indispensable tools for studying the generalization capacity of learning models. Hoeffding’s and McDiarmid’s inequalities are commonly used, giving bounds independent of the data distribution. Although this makes them widely applicable, a drawback is that the bounds can be too loose in some specific cases. Although efforts have been devoted to improving the bounds, we find that the bounds can be further tightened in some distribution-dependent scenarios and conditions for the inequalities can be relaxed. In particular, we propose four types of conditions for probabilistic boundedness and bounded differences, and derive several distribution-dependent extensions of Hoeffding’s and McDiarmid’s inequalities. These extensions provide bounds for functions not satisfying the conditions of the existing inequalities, and in some special cases, tighter bounds. Furthermore, we obtain generalization bounds for unbounded and hierarchy-bounded loss functions. Finally we discuss the potential applications of our extensions to learning theory. |
Tasks | |
Published | 2016-07-19 |
URL | http://arxiv.org/abs/1607.05506v2 |
http://arxiv.org/pdf/1607.05506v2.pdf | |
PWC | https://paperswithcode.com/paper/distribution-dependent-concentration |
Repo | |
Framework | |
Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions
Title | Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions |
Authors | Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh |
Abstract | Visual Question Answering (VQA) is the task of answering natural-language questions about images. We introduce the novel problem of determining the relevance of questions to images in VQA. Current VQA models do not reason about whether a question is even related to the given image (e.g. What is the capital of Argentina?) or if it requires information from external resources to answer correctly. This can break the continuity of a dialogue in human-machine interaction. Our approaches for determining relevance are composed of two stages. Given an image and a question, (1) we first determine whether the question is visual or not, (2) if visual, we determine whether the question is relevant to the given image or not. Our approaches, based on LSTM-RNNs, VQA model uncertainty, and caption-question similarity, are able to outperform strong baselines on both relevance tasks. We also present human studies showing that VQA models augmented with such question relevance reasoning are perceived as more intelligent, reasonable, and human-like. |
Tasks | Question Answering, Question Similarity, Visual Question Answering |
Published | 2016-06-21 |
URL | http://arxiv.org/abs/1606.06622v3 |
http://arxiv.org/pdf/1606.06622v3.pdf | |
PWC | https://paperswithcode.com/paper/question-relevance-in-vqa-identifying-non |
Repo | |
Framework | |
A review of Gaussian Markov models for conditional independence
Title | A review of Gaussian Markov models for conditional independence |
Authors | Irene Córdoba, Concha Bielza, Pedro Larrañaga |
Abstract | Markov models lie at the interface between statistical independence in a probability distribution and graph separation properties. We review model selection and estimation in directed and undirected Markov models with Gaussian parametrization, emphasizing the main similarities and differences. These two model classes are similar but not equivalent, although they share a common intersection. We present the existing results from a historical perspective, taking into account the amount of literature existing from both the artificial intelligence and statistics research communities, where these models were originated. We cover classical topics such as maximum likelihood estimation and model selection via hypothesis testing, but also more modern approaches like regularization and Bayesian methods. We also discuss how the Markov models reviewed fit in the rich hierarchy of other, higher level Markov model classes. Finally, we close the paper overviewing relaxations of the Gaussian assumption and pointing out the main areas of application where these Markov models are nowadays used. |
Tasks | Model Selection |
Published | 2016-06-23 |
URL | https://arxiv.org/abs/1606.07282v5 |
https://arxiv.org/pdf/1606.07282v5.pdf | |
PWC | https://paperswithcode.com/paper/on-gaussian-markov-models-for-conditional |
Repo | |
Framework | |
Correlation Preserving Sparse Coding Over Multi-level Dictionaries for Image Denoising
Title | Correlation Preserving Sparse Coding Over Multi-level Dictionaries for Image Denoising |
Authors | Rui Chen, Huizhu Jia, Xiaodong Xie, Wen Gao |
Abstract | In this letter, we propose a novel image denoising method based on correlation preserving sparse coding. Because the instable and unreliable correlations among basis set can limit the performance of the dictionary-driven denoising methods, two effective regularized strategies are employed in the coding process. Specifically, a graph-based regularizer is built for preserving the global similarity correlations, which can adaptively capture both the geometrical structures and discriminative features of textured patches. In particular, edge weights in the graph are obtained by seeking a nonnegative low-rank construction. Besides, a robust locality-constrained coding can automatically preserve not only spatial neighborhood information but also internal consistency present in noisy patches while learning overcomplete dictionary. Experimental results demonstrate that our proposed method achieves state-of-the-art denoising performance in terms of both PSNR and subjective visual quality. |
Tasks | Denoising, Image Denoising |
Published | 2016-12-23 |
URL | http://arxiv.org/abs/1612.08049v1 |
http://arxiv.org/pdf/1612.08049v1.pdf | |
PWC | https://paperswithcode.com/paper/correlation-preserving-sparse-coding-over |
Repo | |
Framework | |
Leveraging Semantic Web Search and Browse Sessions for Multi-Turn Spoken Dialog Systems
Title | Leveraging Semantic Web Search and Browse Sessions for Multi-Turn Spoken Dialog Systems |
Authors | Lu Wang, Larry Heck, Dilek Hakkani-Tur |
Abstract | Training statistical dialog models in spoken dialog systems (SDS) requires large amounts of annotated data. The lack of scalable methods for data mining and annotation poses a significant hurdle for state-of-the-art statistical dialog managers. This paper presents an approach that directly leverage billions of web search and browse sessions to overcome this hurdle. The key insight is that task completion through web search and browse sessions is (a) predictable and (b) generalizes to spoken dialog task completion. The new method automatically mines behavioral search and browse patterns from web logs and translates them into spoken dialog models. We experiment with naturally occurring spoken dialogs and large scale web logs. Our session-based models outperform the state-of-the-art method for entity extraction task in SDS. We also achieve better performance for both entity and relation extraction on web search queries when compared with nontrivial baselines. |
Tasks | Entity Extraction, Relation Extraction |
Published | 2016-06-25 |
URL | http://arxiv.org/abs/1606.07967v1 |
http://arxiv.org/pdf/1606.07967v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-semantic-web-search-and-browse |
Repo | |
Framework | |
Estimation of Bandlimited Grayscale Images From the Single Bit Observations of Pixels Affected by Additive Gaussian Noise
Title | Estimation of Bandlimited Grayscale Images From the Single Bit Observations of Pixels Affected by Additive Gaussian Noise |
Authors | Abhinav Kumar, Animesh Kumar |
Abstract | The estimation of grayscale images using their single-bit zero mean Gaussian noise-affected pixels is presented in this paper. The images are assumed to be bandlimited in the Fourier Cosine transform (FCT) domain. The images are oversampled over their Nyquist rate in the FCT domain. We propose a non-recursive approach based on first order approximation of Cumulative Distribution Function (CDF) to estimate the image from single bit pixels which itself is based on Banach’s contraction theorem. The decay rate for mean squared error of estimating such images is found to be independent of the precision of the quantizer and it varies as $O(1/N)$ where $N$ is the “effective” oversampling ratio with respect to the Nyquist rate in the FCT domain. |
Tasks | |
Published | 2016-10-27 |
URL | http://arxiv.org/abs/1610.08627v1 |
http://arxiv.org/pdf/1610.08627v1.pdf | |
PWC | https://paperswithcode.com/paper/estimation-of-bandlimited-grayscale-images |
Repo | |
Framework | |
Auto-JacoBin: Auto-encoder Jacobian Binary Hashing
Title | Auto-JacoBin: Auto-encoder Jacobian Binary Hashing |
Authors | Xiping Fu, Brendan McCane, Steven Mills, Michael Albert, Lech Szymanski |
Abstract | Binary codes can be used to speed up nearest neighbor search tasks in large scale data sets as they are efficient for both storage and retrieval. In this paper, we propose a robust auto-encoder model that preserves the geometric relationships of high-dimensional data sets in Hamming space. This is done by considering a noise-removing function in a region surrounding the manifold where the training data points lie. This function is defined with the property that it projects the data points near the manifold into the manifold wisely, and we approximate this function by its first order approximation. Experimental results show that the proposed method achieves better than state-of-the-art results on three large scale high dimensional data sets. |
Tasks | |
Published | 2016-02-25 |
URL | http://arxiv.org/abs/1602.08127v2 |
http://arxiv.org/pdf/1602.08127v2.pdf | |
PWC | https://paperswithcode.com/paper/auto-jacobin-auto-encoder-jacobian-binary |
Repo | |
Framework | |
Single-image RGB Photometric Stereo With Spatially-varying Albedo
Title | Single-image RGB Photometric Stereo With Spatially-varying Albedo |
Authors | Ayan Chakrabarti, Kalyan Sunkavalli |
Abstract | We present a single-shot system to recover surface geometry of objects with spatially-varying albedos, from images captured under a calibrated RGB photometric stereo setup—with three light directions multiplexed across different color channels in the observed RGB image. Since the problem is ill-posed point-wise, we assume that the albedo map can be modeled as piece-wise constant with a restricted number of distinct albedo values. We show that under ideal conditions, the shape of a non-degenerate local constant albedo surface patch can theoretically be recovered exactly. Moreover, we present a practical and efficient algorithm that uses this model to robustly recover shape from real images. Our method first reasons about shape locally in a dense set of patches in the observed image, producing shape distributions for every patch. These local distributions are then combined to produce a single consistent surface normal map. We demonstrate the efficacy of the approach through experiments on both synthetic renderings as well as real captured images. |
Tasks | |
Published | 2016-09-14 |
URL | http://arxiv.org/abs/1609.04079v1 |
http://arxiv.org/pdf/1609.04079v1.pdf | |
PWC | https://paperswithcode.com/paper/single-image-rgb-photometric-stereo-with |
Repo | |
Framework | |
High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm
Title | High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm |
Authors | Alain Durmus, Eric Moulines |
Abstract | We consider in this paper the problem of sampling a high-dimensional probability distribution $\pi$ having a density with respect to the Lebesgue measure on $\mathbb{R}^d$, known up to a normalization constant $x \mapsto \pi(x)= \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y$. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that $U$ is continuously differentiable, $\nabla U$ is globally Lipschitz and $U$ is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order $2$ and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of these bounds is explicit. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented to support our claims. |
Tasks | Bayesian Inference |
Published | 2016-05-05 |
URL | http://arxiv.org/abs/1605.01559v4 |
http://arxiv.org/pdf/1605.01559v4.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-bayesian-inference-via-the |
Repo | |
Framework | |
Semantic tracking: Single-target tracking with inter-supervised convolutional networks
Title | Semantic tracking: Single-target tracking with inter-supervised convolutional networks |
Authors | Jingjing Xiao, Qiang Lan, Linbo Qiao, Ales Leonardis |
Abstract | This article presents a semantic tracker which simultaneously tracks a single target and recognises its category. In general, it is hard to design a tracking model suitable for all object categories, e.g., a rigid tracker for a car is not suitable for a deformable gymnast. Category-based trackers usually achieve superior tracking performance for the objects of that specific category, but have difficulties being generalised. Therefore, we propose a novel unified robust tracking framework which explicitly encodes both generic features and category-based features. The tracker consists of a shared convolutional network (NetS), which feeds into two parallel networks, NetC for classification and NetT for tracking. NetS is pre-trained on ImageNet to serve as a generic feature extractor across the different object categories for NetC and NetT. NetC utilises those features within fully connected layers to classify the object category. NetT has multiple branches, corresponding to multiple categories, to distinguish the tracked object from the background. Since each branch in NetT is trained by the videos of a specific category or groups of similar categories, NetT encodes category-based features for tracking. During online tracking, NetC and NetT jointly determine the target regions with the right category and foreground labels for target estimation. To improve the robustness and precision, NetC and NetT inter-supervise each other and trigger network adaptation when their outputs are ambiguous for the same image regions (i.e., when the category label contradicts the foreground/background classification). We have compared the performance of our tracker to other state-of-the-art trackers on a large-scale tracking benchmark (100 sequences)—the obtained results demonstrate the effectiveness of our proposed tracker as it outperformed other 38 state-of-the-art tracking algorithms. |
Tasks | |
Published | 2016-11-19 |
URL | http://arxiv.org/abs/1611.06395v1 |
http://arxiv.org/pdf/1611.06395v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-tracking-single-target-tracking-with |
Repo | |
Framework | |
Local Training for PLDA in Speaker Verification
Title | Local Training for PLDA in Speaker Verification |
Authors | Chenghui Zhao, Lantian Li, Dong Wang, April Pu |
Abstract | PLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification. However, PLDA training requires a large amount of labeled development data, which is highly expensive in most cases. A possible approach to mitigate the problem is various unsupervised adaptation methods, which use unlabeled data to adapt the PLDA scattering matrices to the target domain. In this paper, we present a new local training' approach that utilizes inaccurate but much cheaper local labels to train the PLDA model. These local labels discriminate speakers within a single conversion only, and so are much easier to obtain compared to the normal global labels’. Our experiments show that the proposed approach can deliver significant performance improvement, particularly with limited globally-labeled data. |
Tasks | Speaker Verification |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08433v1 |
http://arxiv.org/pdf/1609.08433v1.pdf | |
PWC | https://paperswithcode.com/paper/local-training-for-plda-in-speaker |
Repo | |
Framework | |
A Semi-Automated Method for Object Segmentation in Infant’s Egocentric Videos to Study Object Perception
Title | A Semi-Automated Method for Object Segmentation in Infant’s Egocentric Videos to Study Object Perception |
Authors | Qazaleh Mirsharif, Sidharth Sadani, Shishir Shah, Hanako Yoshida, Joseph Burling |
Abstract | Object segmentation in infant’s egocentric videos is a fundamental step in studying how children perceive objects in early stages of development. From the computer vision perspective, object segmentation in such videos pose quite a few challenges because the child’s view is unfocused, often with large head movements, effecting in sudden changes in the child’s point of view which leads to frequent change in object properties such as size, shape and illumination. In this paper, we develop a semi-automated, domain specific, method to address these concerns and facilitate the object annotation process for cognitive scientists allowing them to select and monitor the object under segmentation. The method starts with an annotation from the user of the desired object and employs graph cut segmentation and optical flow computation to predict the object mask for subsequent video frames automatically. To maintain accuracy, we use domain specific heuristic rules to re-initialize the program with new user input whenever object properties change dramatically. The evaluations demonstrate the high speed and accuracy of the presented method for object segmentation in voluminous egocentric videos. We apply the proposed method to investigate potential patterns in object distribution in child’s view at progressive ages. |
Tasks | Optical Flow Estimation, Semantic Segmentation |
Published | 2016-02-08 |
URL | http://arxiv.org/abs/1602.02522v1 |
http://arxiv.org/pdf/1602.02522v1.pdf | |
PWC | https://paperswithcode.com/paper/a-semi-automated-method-for-object |
Repo | |
Framework | |
4D Cardiac Ultrasound Standard Plane Location by Spatial-Temporal Correlation
Title | 4D Cardiac Ultrasound Standard Plane Location by Spatial-Temporal Correlation |
Authors | Yun Gu, Guang-Zhong Yang, Jie Yang, Kun Sun |
Abstract | Echocardiography plays an important part in diagnostic aid in cardiac diseases. A critical step in echocardiography-aided diagnosis is to extract the standard planes since they tend to provide promising views to present different structures that are benefit to diagnosis. To this end, this paper proposes a spatial-temporal embedding framework to extract the standard view planes from 4D STIC (spatial-temporal image corre- lation) volumes. The proposed method is comprised of three stages, the frame smoothing, spatial-temporal embedding and final classification. In first stage, an L 0 smoothing filter is used to preprocess the frames that removes the noise and preserves the boundary. Then a compact repre- sentation is learned via embedding spatial and temporal features into a latent space in the supervised scheme considering both standard plane information and diagnosis result. In last stage, the learned features are fed into support vector machine to identify the standard plane. We eval- uate the proposed method on a 4D STIC volume dataset with 92 normal cases and 93 abnormal cases in three standard planes. It demonstrates that our method outperforms the baselines in both classification accuracy and computational efficiency. |
Tasks | |
Published | 2016-07-20 |
URL | http://arxiv.org/abs/1607.05969v1 |
http://arxiv.org/pdf/1607.05969v1.pdf | |
PWC | https://paperswithcode.com/paper/4d-cardiac-ultrasound-standard-plane-location |
Repo | |
Framework | |