Paper Group ANR 412
Resampled Priors for Variational Autoencoders. Doing the impossible: Why neural networks can be trained at all. Feature Analysis for Assessing the Quality of Wikipedia Articles through Supervised Classification. Extraction Of Technical Information From Normative Documents Using Automated Methods Based On Ontologies : Application To The Iso 15531 Ma …
Resampled Priors for Variational Autoencoders
Title | Resampled Priors for Variational Autoencoders |
Authors | Matthias Bauer, Andriy Mnih |
Abstract | We propose Learned Accept/Reject Sampling (LARS), a method for constructing richer priors using rejection sampling with a learned acceptance function. This work is motivated by recent analyses of the VAE objective, which pointed out that commonly used simple priors can lead to underfitting. As the distribution induced by LARS involves an intractable normalizing constant, we show how to estimate it and its gradients efficiently. We demonstrate that LARS priors improve VAE performance on several standard datasets both when they are learned jointly with the rest of the model and when they are fitted to a pretrained model. Finally, we show that LARS can be combined with existing methods for defining flexible priors for an additional boost in performance. |
Tasks | |
Published | 2018-10-26 |
URL | http://arxiv.org/abs/1810.11428v2 |
http://arxiv.org/pdf/1810.11428v2.pdf | |
PWC | https://paperswithcode.com/paper/resampled-priors-for-variational-autoencoders |
Repo | |
Framework | |
Doing the impossible: Why neural networks can be trained at all
Title | Doing the impossible: Why neural networks can be trained at all |
Authors | Nathan O. Hodas, Panos Stinis |
Abstract | As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don’t we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results? Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the above phenomena that forces them to achieve configurations that live on a low-dimensional manifold, avoiding the curse of dimensionality. In the current work we use the concept of mutual information between successive layers of a deep neural network to elucidate this mechanism and suggest possible ways of exploiting it to accelerate training. We show that adding structure to the neural network that enforces higher mutual information between layers speeds training and leads to more accurate results. High mutual information between layers implies that the effective number of free parameters is exponentially smaller than the raw number of tunable weights. |
Tasks | |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.04928v2 |
http://arxiv.org/pdf/1805.04928v2.pdf | |
PWC | https://paperswithcode.com/paper/doing-the-impossible-why-neural-networks-can |
Repo | |
Framework | |
Feature Analysis for Assessing the Quality of Wikipedia Articles through Supervised Classification
Title | Feature Analysis for Assessing the Quality of Wikipedia Articles through Supervised Classification |
Authors | Elias Bassani, Marco Viviani |
Abstract | Nowadays, thanks to Web 2.0 technologies, people have the possibility to generate and spread contents on different social media in a very easy way. In this context, the evaluation of the quality of the information that is available online is becoming more and more a crucial issue. In fact, a constant flow of contents is generated every day by often unknown sources, which are not certified by traditional authoritative entities. This requires the development of appropriate methodologies that can evaluate in a systematic way these contents, based on `objective’ aspects connected with them. This would help individuals, who nowadays tend to increasingly form their opinions based on what they read online and on social media, to come into contact with information that is actually useful and verified. Wikipedia is nowadays one of the biggest online resources on which users rely as a source of information. The amount of collaboratively generated content that is sent to the online encyclopedia every day can let to the possible creation of low-quality articles (and, consequently, misinformation) if not properly monitored and revised. For this reason, in this paper, the problem of automatically assessing the quality of Wikipedia articles is considered. In particular, the focus is on the analysis of hand-crafted features that can be employed by supervised machine learning techniques to perform the classification of Wikipedia articles on qualitative bases. With respect to prior literature, a wider set of characteristics connected to Wikipedia articles are taken into account and illustrated in detail. Evaluations are performed by considering a labeled dataset provided in a prior work, and different supervised machine learning algorithms, which produced encouraging results with respect to the considered features. | |
Tasks | |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02655v1 |
http://arxiv.org/pdf/1812.02655v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-analysis-for-assessing-the-quality-of |
Repo | |
Framework | |
Extraction Of Technical Information From Normative Documents Using Automated Methods Based On Ontologies : Application To The Iso 15531 Mandate Standard - Methodology And First Results
Title | Extraction Of Technical Information From Normative Documents Using Automated Methods Based On Ontologies : Application To The Iso 15531 Mandate Standard - Methodology And First Results |
Authors | A. F. Cutting-Decelle, A. Digeon, R. I. Young, J. L. Barraud, P. Lamboley |
Abstract | Problems faced by international standardization bodies become more and more crucial as the number and the size of the standards they produce increase. Sometimes, also, the lack of coordination among the committees in charge of the development of standards may lead to overlaps, mistakes or incompatibilities in the documents. The aim of this study is to present a methodology enabling an automatic extraction of the technical concepts (terms) found in normative documents, through the use of semantic tools coming from the field of language processing. The first part of the paper provides a description of the standardization world, its structure, its way of working and the problems faced; we then introduce the concepts of semantic annotation, information extraction and the software tools available in this domain. The next section explains the concept of ontology and its potential use in the field of standardization. We propose here a methodology enabling the extraction of technical information from a given normative corpus, based on a semantic annotation process done according to reference ontologies. The application to the ISO 15531 MANDATE corpus provides a first use case of the methodology described in this paper. The paper ends with the description of the first experimental results produced by this approach, and with some issues and perspectives, notably its application to other standards and, or Technical Committees and the possibility offered to create pre-defined technical dictionaries of terms. |
Tasks | |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02242v2 |
http://arxiv.org/pdf/1806.02242v2.pdf | |
PWC | https://paperswithcode.com/paper/extraction-of-technical-information-from |
Repo | |
Framework | |
Quantum Codes from Neural Networks
Title | Quantum Codes from Neural Networks |
Authors | Johannes Bausch, Felix Leditzky |
Abstract | We examine the usefulness of applying neural networks as a variational state ansatz for many-body quantum systems in the context of quantum information-processing tasks. In the neural network state ansatz, the complex amplitude function of a quantum state is computed by a neural network. The resulting multipartite entanglement structure captured by this ansatz has proven rich enough to describe the ground states and unitary dynamics of various physical systems of interest. In the present paper, we initiate the study of neural network states in quantum information-processing tasks. We demonstrate that neural network states are capable of efficiently representing quantum codes for quantum information transmission and quantum error correction, supplying further evidence for the usefulness of neural network states to describe multipartite entanglement. In particular, we show the following main results: a) Neural network states yield quantum codes with a high coherent information for two important quantum channels, the generalized amplitude damping channel and the dephrasure channel. These codes outperform all other known codes for these channels, and cannot be found using a direct parametrization of the quantum state. b) For the depolarizing channel, the neural network state ansatz reliably finds the best known codes given by repetition codes. c) Neural network states can be used to represent absolutely maximally entangled states, a special type of quantum error-correcting codes. In all three cases, the neural network state ansatz provides an efficient and versatile means as a variational parametrization of these highly entangled states. |
Tasks | |
Published | 2018-06-22 |
URL | https://arxiv.org/abs/1806.08781v2 |
https://arxiv.org/pdf/1806.08781v2.pdf | |
PWC | https://paperswithcode.com/paper/quantum-codes-from-neural-networks |
Repo | |
Framework | |
A Hybrid Approach for Trajectory Control Design
Title | A Hybrid Approach for Trajectory Control Design |
Authors | Luigi Freda, Mario Gianni, Fiora Pirri |
Abstract | This work presents a methodology to design trajectory tracking feedback control laws, which embed non-parametric statistical models, such as Gaussian Processes (GPs). The aim is to minimize unmodeled dynamics such as undesired slippages. The proposed approach has the benefit of avoiding complex terramechanics analysis to directly estimate from data the robot dynamics on a wide class of trajectories. Experiments in both real and simulated environments prove that the proposed methodology is promising. |
Tasks | Gaussian Processes |
Published | 2018-10-08 |
URL | http://arxiv.org/abs/1810.03711v2 |
http://arxiv.org/pdf/1810.03711v2.pdf | |
PWC | https://paperswithcode.com/paper/a-hybrid-approach-for-trajectory-control |
Repo | |
Framework | |
A Comparative Study of Rule Extraction for Recurrent Neural Networks
Title | A Comparative Study of Rule Extraction for Recurrent Neural Networks |
Authors | Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia II, Xinyu Xing, Xue Liu, C. Lee Giles |
Abstract | Understanding recurrent networks through rule extraction has a long history. This has taken on new interests due to the need for interpreting or verifying neural networks. One basic form for representing stateful rules is deterministic finite automata (DFA). Previous research shows that extracting DFAs from trained second-order recurrent networks is not only possible but also relatively stable. Recently, several new types of recurrent networks with more complicated architectures have been introduced. These handle challenging learning tasks usually involving sequential data. However, it remains an open problem whether DFAs can be adequately extracted from these models. Specifically, it is not clear how DFA extraction will be affected when applied to different recurrent networks trained on data sets with different levels of complexity. Here, we investigate DFA extraction on several widely adopted recurrent networks that are trained to learn a set of seven regular Tomita grammars. We first formally analyze the complexity of Tomita grammars and categorize these grammars according to that complexity. Then we empirically evaluate different recurrent networks for their performance of DFA extraction on all Tomita grammars. Our experiments show that for most recurrent networks, their extraction performance decreases as the complexity of the underlying grammar increases. On grammars of lower complexity, most recurrent networks obtain desirable extraction performance. As for grammars with the highest level of complexity, while several complicated models fail with only certain recurrent networks having satisfactory extraction performance. |
Tasks | |
Published | 2018-01-16 |
URL | http://arxiv.org/abs/1801.05420v2 |
http://arxiv.org/pdf/1801.05420v2.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-study-of-rule-extraction-for |
Repo | |
Framework | |
A Geometric Analysis of Model- and Algorithm-Induced Uncertainties for Randomized Least Squares Regression
Title | A Geometric Analysis of Model- and Algorithm-Induced Uncertainties for Randomized Least Squares Regression |
Authors | Jocelyn T. Chi, Ilse C. F. Ipsen |
Abstract | For full-rank least squares regression problems under a Gaussian linear model, we analyze the uncertainties when the minimum-norm solution is computed by random row-sketching and, in particular random row-sampling. Our expressions for the total expectation and variance of the solution – with regard to both model- and algorithm-induced uncertainties – are exact; hold for general sketching matrices; and make no assumptions on the rank of the sketched matrix. They show that expectation and variance are governed by the rank-deficiency and spatial geometry induced by the sketching process, rather than by structural properties of specific sketching or sampling methods. In order to analyze the rank-deficient matrices from row-sketching, we introduce two projectors that connect least squares problems of different dimensions. From a deterministic perspective, our structural perturbation bounds imply that least squares solutions are less sensitive to multiplicative perturbations than to additive perturbations. From a probabilistic perspective, we show that the differences between the total bias and variance on the one hand, and the model bias and variance on the other hand, are governed by two factors: (i) the expected rank deficiency of the sketched matrix, and (ii) the expected difference between projectors onto the spaces of the original and the sketched problems. Surprisingly, the matrix condition number has far less impact on the statistical quantities than it has on numerical errors. |
Tasks | |
Published | 2018-08-17 |
URL | https://arxiv.org/abs/1808.05924v2 |
https://arxiv.org/pdf/1808.05924v2.pdf | |
PWC | https://paperswithcode.com/paper/randomized-least-squares-regression-combining |
Repo | |
Framework | |
Toward Metric Indexes for Incremental Insertion and Querying
Title | Toward Metric Indexes for Incremental Insertion and Querying |
Authors | Edward Raff, Charles Nicholas |
Abstract | In this work we explore the use of metric index structures, which accelerate nearest neighbor queries, in the scenario where we need to interleave insertions and queries during deployment. This use-case is inspired by a real-life need in malware analysis triage, and is surprisingly understudied. Existing literature tends to either focus on only final query efficiency, often does not support incremental insertion, or does not support arbitrary distance metrics. We modify and improve three algorithms to support our scenario of incremental insertion and querying with arbitrary metrics, and evaluate them on multiple datasets and distance metrics while varying the value of $k$ for the desired number of nearest neighbors. In doing so we determine that our improved Vantage-Point tree of Minimum-Variance performs best for this scenario. |
Tasks | |
Published | 2018-01-12 |
URL | http://arxiv.org/abs/1801.05055v1 |
http://arxiv.org/pdf/1801.05055v1.pdf | |
PWC | https://paperswithcode.com/paper/toward-metric-indexes-for-incremental |
Repo | |
Framework | |
Approximate Exploration through State Abstraction
Title | Approximate Exploration through State Abstraction |
Authors | Adrien Ali Taïga, Aaron Courville, Marc G. Bellemare |
Abstract | Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exploration scheme based on density modelling. As a warm-up, we quantify the performance of an exploration algorithm, MBIE-EB (Strehl and Littman, 2008), when explicitly combined with state aggregation. This allows us to confirm that, as might be expected, approximation allows the agent to trade off between learning speed and quality of the learned policy. Next, we show how a given density model can be related to an abstraction and that the corresponding pseudo-count bonus can act as a substitute in MBIE-EB combined with this abstraction, but may lead to either under- or over-exploration. Then, we show that a given density model also defines an implicit abstraction, and find a surprising mismatch between pseudo-counts derived either implicitly or explicitly. Finally we derive a new pseudo-count bonus alleviating this issue. |
Tasks | |
Published | 2018-08-29 |
URL | http://arxiv.org/abs/1808.09819v2 |
http://arxiv.org/pdf/1808.09819v2.pdf | |
PWC | https://paperswithcode.com/paper/approximate-exploration-through-state |
Repo | |
Framework | |
Out-of-Distribution Detection using Multiple Semantic Label Representations
Title | Out-of-Distribution Detection using Multiple Semantic Label Representations |
Authors | Gabi Shalev, Yossi Adi, Joseph Keshet |
Abstract | Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks. These models are shown to be extremely efficient when training and test data are drawn from the same distribution. However, it is not clear how a network will act when it is fed with an out-of-distribution example. In this work, we consider the problem of out-of-distribution detection in neural networks. We propose to use multiple semantic dense representations instead of sparse representation as the target label. Specifically, we propose to use several word representations obtained from different corpora or architectures as target labels. We evaluated the proposed model on computer vision, and speech commands detection tasks and compared it to previous methods. Results suggest that our method compares favorably with previous work. Besides, we present the efficiency of our approach for detecting wrongly classified and adversarial examples. |
Tasks | Out-of-Distribution Detection |
Published | 2018-08-20 |
URL | http://arxiv.org/abs/1808.06664v3 |
http://arxiv.org/pdf/1808.06664v3.pdf | |
PWC | https://paperswithcode.com/paper/out-of-distribution-detection-using-multiple |
Repo | |
Framework | |
Functional Nonlinear Sparse Models
Title | Functional Nonlinear Sparse Models |
Authors | Luiz F. O. Chamon, Yonina C. Eldar, Alejandro Ribeiro |
Abstract | Signal processing is rich in inherently continuous and often nonlinear applications, such as spectral estimation, optical imaging, and super-resolution microscopy, in which sparsity plays a key role in obtaining state-of-the-art results. Coping with the infinite dimensionality and non-convexity of these problems typically involves discretization and convex relaxations, e.g., using atomic norms. Nevertheless, grid mismatch and other coherence issues often lead to discretized versions of sparse signals that are not sparse. Even if they are, recovering sparse solutions using convex relaxations requires assumptions that may be hard to meet in practice. What is more, problems involving nonlinear measurements remain non-convex even after relaxing the sparsity objective. We address these issues by directly tackling the continuous, nonlinear problem cast as a sparse functional optimization program. We prove that when these problems are non-atomic, they have no duality gap and can therefore be solved efficiently using duality and~(stochastic) convex optimization methods. We illustrate the wide range of applications of this approach by formulating and solving problems from nonlinear spectral estimation and robust classification. |
Tasks | Spectrum Cartography, Super-Resolution |
Published | 2018-11-01 |
URL | https://arxiv.org/abs/1811.00577v4 |
https://arxiv.org/pdf/1811.00577v4.pdf | |
PWC | https://paperswithcode.com/paper/functional-nonlinear-sparse-models |
Repo | |
Framework | |
Paired Comparison Sentiment Scores
Title | Paired Comparison Sentiment Scores |
Authors | Christoph Dalitz, Jens Wilberg, Katrin E. Bednarek |
Abstract | The method of paired comparisons is an established method in psychology. In this article, it is applied to obtain continuous sentiment scores for words from comparisons made by test persons. We created an initial lexicon with $n=199$ German words from a two-fold all-pair comparison experiment with ten different test persons. From the probabilistic models taken into account, the logistic model showed the best agreement with the results of the comparison experiment. The initial lexicon can then be used in different ways. One is to create special purpose sentiment lexica through the addition of arbitrary words that are compared with some of the initial words by test persons. A cross-validation experiment suggests that only about 18 two-fold comparisons are necessary to estimate the score of a new, yet unknown word, provided these words are selected by a modification of a method by Silverstein & Farrell. Another application of the initial lexicon is the evaluation of automatically created corpus-based lexica. By such an evaluation, we compared the corpus-based lexica SentiWS, SenticNet, and SentiWordNet, of which SenticNet 4 performed best. This technical report is a corrected and extended version of a presentation made at the ICDM Sentire workshop in 2016. |
Tasks | |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03591v1 |
http://arxiv.org/pdf/1807.03591v1.pdf | |
PWC | https://paperswithcode.com/paper/paired-comparison-sentiment-scores |
Repo | |
Framework | |
Stochastic model-based minimization under high-order growth
Title | Stochastic model-based minimization under high-order growth |
Authors | Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee |
Abstract | Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively sample and minimize stochastic convex models of the objective function. Assuming that the one-sided approximation quality and the variation of the models is controlled by a Bregman divergence, we show that the scheme drives a natural stationarity measure to zero at the rate $O(k^{-1/4})$. Under additional convexity and relative strong convexity assumptions, the function values converge to the minimum at the rate of $O(k^{-1/2})$ and $\widetilde{O}(k^{-1})$, respectively. We discuss consequences for stochastic proximal point, mirror descent, regularized Gauss-Newton, and saddle point algorithms. |
Tasks | |
Published | 2018-07-01 |
URL | http://arxiv.org/abs/1807.00255v1 |
http://arxiv.org/pdf/1807.00255v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-model-based-minimization-under |
Repo | |
Framework | |
Collective Learning From Diverse Datasets for Entity Typing in the Wild
Title | Collective Learning From Diverse Datasets for Entity Typing in the Wild |
Authors | Abhishek Abhishek, Amar Prakash Azad, Balaji Ganesan, Ashish Anand, Amit Awekar |
Abstract | Entity typing (ET) is the problem of assigning labels to given entity mentions in a sentence. Existing works for ET require knowledge about the domain and target label set for a given test instance. ET in the absence of such knowledge is a novel problem that we address as ET in the wild. We hypothesize that the solution to this problem is to build supervised models that generalize better on the ET task as a whole, rather than a specific dataset. In this direction, we propose a Collective Learning Framework (CLF), which enables learning from diverse datasets in a unified way. The CLF first creates a unified hierarchical label set (UHLS) and a label mapping by aggregating label information from all available datasets. Then it builds a single neural network classifier using UHLS, label mapping, and a partial loss function. The single classifier predicts the finest possible label across all available domains even though these labels may not be present in any domain-specific dataset. We also propose a set of evaluation schemes and metrics to evaluate the performance of models in this novel problem. Extensive experimentation on seven diverse real-world datasets demonstrates the efficacy of our CLF. |
Tasks | Entity Typing |
Published | 2018-10-20 |
URL | https://arxiv.org/abs/1810.08782v3 |
https://arxiv.org/pdf/1810.08782v3.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-labeling-approach-by-pooling |
Repo | |
Framework | |