Paper Group ANR 163
Adams Conditioning and Likelihood Ratio Transfer Mediated Inference. Bottom-Up Top-Down Cues for Weakly-Supervised Semantic Segmentation. Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis. Information Projection and Approximate Inference for Structured Sparse Variables. Learning Moore Machines from Input-Output Traces. A …
Adams Conditioning and Likelihood Ratio Transfer Mediated Inference
Title | Adams Conditioning and Likelihood Ratio Transfer Mediated Inference |
Authors | Jan A. Bergstra |
Abstract | Bayesian inference as applied in a legal setting is about belief transfer and involves a plurality of agents and communication protocols. A forensic expert (FE) may communicate to a trier of fact (TOF) first its value of a certain likelihood ratio with respect to FE’s belief state as represented by a probability function on FE’s proposition space. Subsequently FE communicates its recently acquired confirmation that a certain evidence proposition is true. Then TOF performs likelihood ratio transfer mediated reasoning thereby revising their own belief state. The logical principles involved in likelihood transfer mediated reasoning are discussed in a setting where probabilistic arithmetic is done within a meadow, and with Adams conditioning placed in a central role. |
Tasks | Bayesian Inference |
Published | 2016-11-26 |
URL | https://arxiv.org/abs/1611.09351v5 |
https://arxiv.org/pdf/1611.09351v5.pdf | |
PWC | https://paperswithcode.com/paper/adams-conditioning-and-likelihood-ratio |
Repo | |
Framework | |
Bottom-Up Top-Down Cues for Weakly-Supervised Semantic Segmentation
Title | Bottom-Up Top-Down Cues for Weakly-Supervised Semantic Segmentation |
Authors | Qinbin Hou, Puneet Kumar Dokania, Daniela Massiceti, Yunchao Wei, Ming-Ming Cheng, Philip Torr |
Abstract | We consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels which specify the object classes present in the image. Our method uses deep convolutional neural networks (CNNs) and adopts an Expectation-Maximization (EM) based approach. We focus on the following three aspects of EM: (i) initialization; (ii) latent posterior estimation (E-step) and (iii) the parameter update (M-step). We show that saliency and attention maps, our bottom-up and top-down cues respectively, of simple images provide very good cues to learn an initialization for the EM-based algorithm. Intuitively, we show that before trying to learn to segment complex images, it is much easier and highly effective to first learn to segment a set of simple images and then move towards the complex ones. Next, in order to update the parameters, we propose minimizing the combination of the standard softmax loss and the KL divergence between the true latent posterior and the likelihood given by the CNN. We argue that this combination is more robust to wrong predictions made by the expectation step of the EM method. We support this argument with empirical and visual results. Extensive experiments and discussions show that: (i) our method is very simple and intuitive; (ii) requires only image-level labels; and (iii) consistently outperforms other weakly-supervised state-of-the-art methods with a very high margin on the PASCAL VOC 2012 dataset. |
Tasks | Semantic Segmentation, Weakly-Supervised Semantic Segmentation |
Published | 2016-12-07 |
URL | http://arxiv.org/abs/1612.02101v3 |
http://arxiv.org/pdf/1612.02101v3.pdf | |
PWC | https://paperswithcode.com/paper/bottom-up-top-down-cues-for-weakly-supervised |
Repo | |
Framework | |
Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis
Title | Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis |
Authors | Zhourong Chen, Nevin L. Zhang, Dit-Yan Yeung, Peixian Chen |
Abstract | We are interested in exploring the possibility and benefits of structure learning for deep models. As the first step, this paper investigates the matter for Restricted Boltzmann Machines (RBMs). We conduct the study with Replicated Softmax, a variant of RBMs for unsupervised text analysis. We present a method for learning what we call Sparse Boltzmann Machines, where each hidden unit is connected to a subset of the visible units instead of all of them. Empirical results show that the method yields models with significantly improved model fit and interpretability as compared with RBMs where each hidden unit is connected to all visible units. |
Tasks | |
Published | 2016-09-17 |
URL | http://arxiv.org/abs/1609.05294v3 |
http://arxiv.org/pdf/1609.05294v3.pdf | |
PWC | https://paperswithcode.com/paper/sparse-boltzmann-machines-with-structure |
Repo | |
Framework | |
Information Projection and Approximate Inference for Structured Sparse Variables
Title | Information Projection and Approximate Inference for Structured Sparse Variables |
Authors | Rajiv Khanna, Joydeep Ghosh, Russell Poldrack, Oluwasanmi Koyejo |
Abstract | Approximate inference via information projection has been recently introduced as a general-purpose approach for efficient probabilistic inference given sparse variables. This manuscript goes beyond classical sparsity by proposing efficient algorithms for approximate inference via information projection that are applicable to any structure on the set of variables that admits enumeration using a \emph{matroid}. We show that the resulting information projection can be reduced to combinatorial submodular optimization subject to matroid constraints. Further, leveraging recent advances in submodular optimization, we provide an efficient greedy algorithm with strong optimization-theoretic guarantees. The class of probabilistic models that can be expressed in this way is quite broad and, as we show, includes group sparse regression, group sparse principal components analysis and sparse canonical correlation analysis, among others. Moreover, empirical results on simulated data and high dimensional neuroimaging data highlight the superior performance of the information projection approach as compared to established baselines for a range of probabilistic models. |
Tasks | |
Published | 2016-07-12 |
URL | http://arxiv.org/abs/1607.03204v1 |
http://arxiv.org/pdf/1607.03204v1.pdf | |
PWC | https://paperswithcode.com/paper/information-projection-and-approximate |
Repo | |
Framework | |
Learning Moore Machines from Input-Output Traces
Title | Learning Moore Machines from Input-Output Traces |
Authors | Georgios Giantamidis, Stavros Tripakis |
Abstract | The problem of learning automata from example traces (but no equivalence or membership queries) is fundamental in automata learning theory and practice. In this paper we study this problem for finite state machines with inputs and outputs, and in particular for Moore machines. We develop three algorithms for solving this problem: (1) the PTAP algorithm, which transforms a set of input-output traces into an incomplete Moore machine and then completes the machine with self-loops; (2) the PRPNI algorithm, which uses the well-known RPNI algorithm for automata learning to learn a product of automata encoding a Moore machine; and (3) the MooreMI algorithm, which directly learns a Moore machine using PTAP extended with state merging. We prove that MooreMI has the fundamental identification in the limit property. We also compare the algorithms experimentally in terms of the size of the learned machine and several notions of accuracy, introduced in this paper. Finally, we compare with OSTIA, an algorithm that learns a more general class of transducers, and find that OSTIA generally does not learn a Moore machine, even when fed with a characteristic sample. |
Tasks | |
Published | 2016-05-25 |
URL | http://arxiv.org/abs/1605.07805v2 |
http://arxiv.org/pdf/1605.07805v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-moore-machines-from-input-output |
Repo | |
Framework | |
Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction
Title | Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction |
Authors | Jacob Steinhardt, Gregory Valiant, Moses Charikar |
Abstract | We consider a crowdsourcing model in which $n$ workers are asked to rate the quality of $n$ items previously generated by other workers. An unknown set of $\alpha n$ workers generate reliable ratings, while the remaining workers may behave arbitrarily and possibly adversarially. The manager of the experiment can also manually evaluate the quality of a small number of items, and wishes to curate together almost all of the high-quality items with at most an $\epsilon$ fraction of low-quality items. Perhaps surprisingly, we show that this is possible with an amount of work required of the manager, and each worker, that does not scale with $n$: the dataset can be curated with $\tilde{O}\Big(\frac{1}{\beta\alpha^3\epsilon^4}\Big)$ ratings per worker, and $\tilde{O}\Big(\frac{1}{\beta\epsilon^2}\Big)$ ratings by the manager, where $\beta$ is the fraction of high-quality items. Our results extend to the more general setting of peer prediction, including peer grading in online classrooms. |
Tasks | |
Published | 2016-06-16 |
URL | http://arxiv.org/abs/1606.05374v1 |
http://arxiv.org/pdf/1606.05374v1.pdf | |
PWC | https://paperswithcode.com/paper/avoiding-imposters-and-delinquents |
Repo | |
Framework | |
Sparse Signal Reconstruction with Multiple Side Information using Adaptive Weights for Multiview Sources
Title | Sparse Signal Reconstruction with Multiple Side Information using Adaptive Weights for Multiview Sources |
Authors | Huynh Van Luong, Jürgen Seiler, André Kaup, Søren Forchhammer |
Abstract | This work considers reconstructing a target signal in a context of distributed sparse sources. We propose an efficient reconstruction algorithm with the aid of other given sources as multiple side information (SI). The proposed algorithm takes advantage of compressive sensing (CS) with SI and adaptive weights by solving a proposed weighted $n$-$\ell_{1}$ minimization. The proposed algorithm computes the adaptive weights in two levels, first each individual intra-SI and then inter-SI weights are iteratively updated at every reconstructed iteration. This two-level optimization leads the proposed reconstruction algorithm with multiple SI using adaptive weights (RAMSIA) to robustly exploit the multiple SIs with different qualities. We experimentally perform our algorithm on generated sparse signals and also correlated feature histograms as multiview sparse sources from a multiview image database. The results show that RAMSIA significantly outperforms both classical CS and CS with single SI, and RAMSIA with higher number of SIs gained more than the one with smaller number of SIs. |
Tasks | Compressive Sensing |
Published | 2016-05-22 |
URL | http://arxiv.org/abs/1605.06776v1 |
http://arxiv.org/pdf/1605.06776v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-signal-reconstruction-with-multiple |
Repo | |
Framework | |
SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
Title | SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity |
Authors | Daniela Gerz, Ivan Vulić, Felix Hill, Roi Reichart, Anna Korhonen |
Abstract | Verbs play a critical role in the meaning of sentences, but these ubiquitous words have received little attention in recent distributional semantics research. We introduce SimVerb-3500, an evaluation resource that provides human ratings for the similarity of 3,500 verb pairs. SimVerb-3500 covers all normed verb types from the USF free-association database, providing at least three examples for every VerbNet class. This broad coverage facilitates detailed analyses of how syntactic and semantic phenomena together influence human understanding of verb meaning. Further, with significantly larger development and test sets than existing benchmarks, SimVerb-3500 enables more robust evaluation of representation learning architectures and promotes the development of methods tailored to verbs. We hope that SimVerb-3500 will enable a richer understanding of the diversity and complexity of verb semantics and guide the development of systems that can effectively represent and interpret this meaning. |
Tasks | Representation Learning |
Published | 2016-08-02 |
URL | http://arxiv.org/abs/1608.00869v4 |
http://arxiv.org/pdf/1608.00869v4.pdf | |
PWC | https://paperswithcode.com/paper/simverb-3500-a-large-scale-evaluation-set-of |
Repo | |
Framework | |
Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising
Title | Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising |
Authors | Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, Ricardo Baeza-Yates, Andrew Feng, Erik Ordentlich, Lee Yang, Gavin Owens |
Abstract | Sponsored search represents a major source of revenue for web search engines. This popular advertising model brings a unique possibility for advertisers to target users’ immediate intent communicated through a search query, usually by displaying their ads alongside organic search results for queries deemed relevant to their products or services. However, due to a large number of unique queries it is challenging for advertisers to identify all such relevant queries. For this reason search engines often provide a service of advanced matching, which automatically finds additional relevant queries for advertisers to bid on. We present a novel advanced matching approach based on the idea of semantic embeddings of queries and ads. The embeddings were learned using a large data set of user search sessions, consisting of search queries, clicked ads and search links, while utilizing contextual information such as dwell time and skipped ads. To address the large-scale nature of our problem, both in terms of data and vocabulary size, we propose a novel distributed algorithm for training of the embeddings. Finally, we present an approach for overcoming a cold-start problem associated with new ads and queries. We report results of editorial evaluation and online tests on actual search traffic. The results show that our approach significantly outperforms baselines in terms of relevance, coverage, and incremental revenue. Lastly, we open-source learned query embeddings to be used by researchers in computational advertising and related fields. |
Tasks | |
Published | 2016-07-07 |
URL | http://arxiv.org/abs/1607.01869v1 |
http://arxiv.org/pdf/1607.01869v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-semantic-matching-of-queries-to-ads |
Repo | |
Framework | |
Construction Safety Risk Modeling and Simulation
Title | Construction Safety Risk Modeling and Simulation |
Authors | Antoine J. -P. Tixier, Matthew R. Hallowell, Balaji Rajagopalan |
Abstract | By building on a recently introduced genetic-inspired attribute-based conceptual framework for safety risk analysis, we propose a novel methodology to compute construction univariate and bivariate construction safety risk at a situational level. Our fully data-driven approach provides construction practitioners and academicians with an easy and automated way of extracting valuable empirical insights from databases of unstructured textual injury reports. By applying our methodology on an attribute and outcome dataset directly obtained from 814 injury reports, we show that the frequency-magnitude distribution of construction safety risk is very similar to that of natural phenomena such as precipitation or earthquakes. Motivated by this observation, and drawing on state-of-the-art techniques in hydroclimatology and insurance, we introduce univariate and bivariate nonparametric stochastic safety risk generators, based on Kernel Density Estimators and Copulas. These generators enable the user to produce large numbers of synthetic safety risk values faithfully to the original data, allowing safetyrelated decision-making under uncertainty to be grounded on extensive empirical evidence. Just like the accurate modeling and simulation of natural phenomena such as wind or streamflow is indispensable to successful structure dimensioning or water reservoir management, we posit that improving construction safety calls for the accurate modeling, simulation, and assessment of safety risk. The underlying assumption is that like natural phenomena, construction safety may benefit from being studied in an empirical and quantitative way rather than qualitatively which is the current industry standard. Finally, a side but interesting finding is that attributes related to high energy levels and to human error emerge as strong risk shapers on the dataset we used to illustrate our methodology. |
Tasks | Decision Making, Decision Making Under Uncertainty |
Published | 2016-09-26 |
URL | http://arxiv.org/abs/1609.07912v1 |
http://arxiv.org/pdf/1609.07912v1.pdf | |
PWC | https://paperswithcode.com/paper/construction-safety-risk-modeling-and |
Repo | |
Framework | |
Learning Word Embeddings from Intrinsic and Extrinsic Views
Title | Learning Word Embeddings from Intrinsic and Extrinsic Views |
Authors | Jifan Chen, Kan Chen, Xipeng Qiu, Qi Zhang, Xuanjing Huang, Zheng Zhang |
Abstract | While word embeddings are currently predominant for natural language processing, most of existing models learn them solely from their contexts. However, these context-based word embeddings are limited since not all words’ meaning can be learned based on only context. Moreover, it is also difficult to learn the representation of the rare words due to data sparsity problem. In this work, we address these issues by learning the representations of words by integrating their intrinsic (descriptive) and extrinsic (contextual) information. To prove the effectiveness of our model, we evaluate it on four tasks, including word similarity, reverse dictionaries,Wiki link prediction, and document classification. Experiment results show that our model is powerful in both word and document modeling. |
Tasks | Document Classification, Learning Word Embeddings, Link Prediction, Word Embeddings |
Published | 2016-08-20 |
URL | http://arxiv.org/abs/1608.05852v1 |
http://arxiv.org/pdf/1608.05852v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-word-embeddings-from-intrinsic-and |
Repo | |
Framework | |
WarpNet: Weakly Supervised Matching for Single-view Reconstruction
Title | WarpNet: Weakly Supervised Matching for Single-view Reconstruction |
Authors | Angjoo Kanazawa, David W. Jacobs, Manmohan Chandraker |
Abstract | We present an approach to matching images of objects in fine-grained datasets without using part annotations, with an application to the challenging problem of weakly supervised single-view reconstruction. This is in contrast to prior works that require part annotations, since matching objects across class and pose variations is challenging with appearance features alone. We overcome this challenge through a novel deep learning architecture, WarpNet, that aligns an object in one image with a different object in another. We exploit the structure of the fine-grained dataset to create artificial data for training this network in an unsupervised-discriminative learning approach. The output of the network acts as a spatial prior that allows generalization at test time to match real images across variations in appearance, viewpoint and articulation. On the CUB-200-2011 dataset of bird categories, we improve the AP over an appearance-only network by 13.6%. We further demonstrate that our WarpNet matches, together with the structure of fine-grained datasets, allow single-view reconstructions with quality comparable to using annotated point correspondences. |
Tasks | |
Published | 2016-04-19 |
URL | http://arxiv.org/abs/1604.05592v2 |
http://arxiv.org/pdf/1604.05592v2.pdf | |
PWC | https://paperswithcode.com/paper/warpnet-weakly-supervised-matching-for-single |
Repo | |
Framework | |
Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
Title | Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression |
Authors | Aymeric Dieuleveut, Nicolas Flammarion, Francis Bach |
Abstract | We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that achieves jointly the optimal prediction error rates for least-squares regression, both in terms of forgetting of initial conditions in O(1/n 2), and in terms of dependence on the noise and dimension d of the problem, as O(d/n). Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free quantities that may still be small while the " optimal " terms above are large. In order to characterize the tightness of these new bounds, we consider an application to non-parametric regression and use the known lower bounds on the statistical performance (without computational limits), which happen to match our bounds obtained from a single pass on the data and thus show optimality of our algorithm in a wide variety of particular trade-offs between bias and variance. |
Tasks | |
Published | 2016-02-17 |
URL | http://arxiv.org/abs/1602.05419v2 |
http://arxiv.org/pdf/1602.05419v2.pdf | |
PWC | https://paperswithcode.com/paper/harder-better-faster-stronger-convergence |
Repo | |
Framework | |
Bridging LSTM Architecture and the Neural Dynamics during Reading
Title | Bridging LSTM Architecture and the Neural Dynamics during Reading |
Authors | Peng Qian, Xipeng Qiu, Xuanjing Huang |
Abstract | Recently, the long short-term memory neural network (LSTM) has attracted wide interest due to its success in many tasks. LSTM architecture consists of a memory cell and three gates, which looks similar to the neuronal networks in the brain. However, there still lacks the evidence of the cognitive plausibility of LSTM architecture as well as its working mechanism. In this paper, we study the cognitive plausibility of LSTM by aligning its internal architecture with the brain activity observed via fMRI when the subjects read a story. Experiment results show that the artificial memory vector in LSTM can accurately predict the observed sequential brain activities, indicating the correlation between LSTM architecture and the cognitive process of story reading. |
Tasks | |
Published | 2016-04-22 |
URL | http://arxiv.org/abs/1604.06635v1 |
http://arxiv.org/pdf/1604.06635v1.pdf | |
PWC | https://paperswithcode.com/paper/bridging-lstm-architecture-and-the-neural |
Repo | |
Framework | |
Statistical Mechanics of High-Dimensional Inference
Title | Statistical Mechanics of High-Dimensional Inference |
Authors | Madhu Advani, Surya Ganguli |
Abstract | To model modern large-scale datasets, we need efficient algorithms to infer a set of $P$ unknown model parameters from $N$ noisy measurements. What are fundamental limits on the accuracy of parameter inference, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density $\alpha = \frac{N}{P}\rightarrow \infty$. However, these classical results are not relevant to modern high-dimensional inference problems, which instead occur at finite $\alpha$. We formulate and analyze high-dimensional inference as a problem in the statistical physics of quenched disorder. Our analysis uncovers fundamental limits on the accuracy of inference in high dimensions, and reveals that widely cherished inference algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference cannot achieve these limits. We further find optimal, computationally tractable algorithms that can achieve these limits. Intriguingly, in high dimensions, these optimal algorithms become computationally simpler than MAP and ML, while still outperforming them. For example, such optimal algorithms can lead to as much as a 20% reduction in the amount of data to achieve the same performance relative to MAP. Moreover, our analysis reveals simple relations between optimal high dimensional inference and low dimensional scalar Bayesian inference, insights into the nature of generalization and predictive power in high dimensions, information theoretic limits on compressed sensing, phase transitions in quadratic inference, and connections to central mathematical objects in convex optimization theory and random matrix theory. |
Tasks | Bayesian Inference |
Published | 2016-01-18 |
URL | http://arxiv.org/abs/1601.04650v2 |
http://arxiv.org/pdf/1601.04650v2.pdf | |
PWC | https://paperswithcode.com/paper/statistical-mechanics-of-high-dimensional |
Repo | |
Framework | |