Paper Group ANR 729
Fast spatial inference in the homogeneous Ising model. Real-time convolutional networks for sonar image classification in low-power embedded systems. Inferring Narrative Causality between Event Pairs in Films. Revisiting the Design Issues of Local Models for Japanese Predicate-Argument Structure Analysis. DocTag2Vec: An Embedding Based Multi-label …
Fast spatial inference in the homogeneous Ising model
Title | Fast spatial inference in the homogeneous Ising model |
Authors | Alejandro Murua, Ranjan Maitra |
Abstract | The Ising model is important in statistical modeling and inference in many applications, however its normalizing constant, mean number of active vertices and mean spin interaction are intractable. We provide accurate approximations that make it possible to calculate these quantities numerically. Simulation studies indicate good performance when compared to Markov Chain Monte Carlo methods and at a tiny fraction of the time. The methodology is also used to perform Bayesian inference in a functional Magnetic Resonance Imaging activation detection experiment. |
Tasks | Bayesian Inference |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02195v2 |
http://arxiv.org/pdf/1712.02195v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-spatial-inference-in-the-homogeneous |
Repo | |
Framework | |
Real-time convolutional networks for sonar image classification in low-power embedded systems
Title | Real-time convolutional networks for sonar image classification in low-power embedded systems |
Authors | Matias Valdenegro-Toro |
Abstract | Deep Neural Networks have impressive classification performance, but this comes at the expense of significant computational resources at inference time. Autonomous Underwater Vehicles use low-power embedded systems for sonar image perception, and cannot execute large neural networks in real-time. We propose the use of max-pooling aggressively, and we demonstrate it with a Fire-based module and a new Tiny module that includes max-pooling in each module. By stacking them we build networks that achieve the same accuracy as bigger ones, while reducing the number of parameters and considerably increasing computational performance. Our networks can classify a 96x96 sonar image with 98.8 - 99.7 accuracy on only 41 to 61 milliseconds on a Raspberry Pi 2, which corresponds to speedups of 28.6 - 19.7. |
Tasks | Image Classification |
Published | 2017-09-07 |
URL | http://arxiv.org/abs/1709.02153v1 |
http://arxiv.org/pdf/1709.02153v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-convolutional-networks-for-sonar |
Repo | |
Framework | |
Inferring Narrative Causality between Event Pairs in Films
Title | Inferring Narrative Causality between Event Pairs in Films |
Authors | Zhichao Hu, Marilyn A. Walker |
Abstract | To understand narrative, humans draw inferences about the underlying relations between narrative events. Cognitive theories of narrative understanding define these inferences as four different types of causality, that include pairs of events A, B where A physically causes B (X drop, X break), to pairs of events where A causes emotional state B (Y saw X, Y felt fear). Previous work on learning narrative relations from text has either focused on “strict” physical causality, or has been vague about what relation is being learned. This paper learns pairs of causal events from a corpus of film scene descriptions which are action rich and tend to be told in chronological order. We show that event pairs induced using our methods are of high quality and are judged to have a stronger causal relation than event pairs from Rel-grams. |
Tasks | |
Published | 2017-08-30 |
URL | http://arxiv.org/abs/1708.09496v1 |
http://arxiv.org/pdf/1708.09496v1.pdf | |
PWC | https://paperswithcode.com/paper/inferring-narrative-causality-between-event |
Repo | |
Framework | |
Revisiting the Design Issues of Local Models for Japanese Predicate-Argument Structure Analysis
Title | Revisiting the Design Issues of Local Models for Japanese Predicate-Argument Structure Analysis |
Authors | Yuichiroh Matsubayashi, Kentaro Inui |
Abstract | The research trend in Japanese predicate-argument structure (PAS) analysis is shifting from pointwise prediction models with local features to global models designed to search for globally optimal solutions. However, the existing global models tend to employ only relatively simple local features; therefore, the overall performance gains are rather limited. The importance of designing a local model is demonstrated in this study by showing that the performance of a sophisticated local model can be considerably improved with recent feature embedding methods and a feature combination learning based on a neural network, outperforming the state-of-the-art global models in $F_1$ on a common benchmark dataset. |
Tasks | |
Published | 2017-10-12 |
URL | http://arxiv.org/abs/1710.04437v1 |
http://arxiv.org/pdf/1710.04437v1.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-the-design-issues-of-local-models |
Repo | |
Framework | |
DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging
Title | DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging |
Authors | Sheng Chen, Akshay Soni, Aasish Pappu, Yashar Mehdad |
Abstract | Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec—two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple $k$-nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and the ability of handling newly created tags. To demonstrate the effectiveness of our approach, we conduct experiments on several datasets and show promising results against state-of-the-art methods. |
Tasks | Multi-Label Learning |
Published | 2017-07-14 |
URL | http://arxiv.org/abs/1707.04596v1 |
http://arxiv.org/pdf/1707.04596v1.pdf | |
PWC | https://paperswithcode.com/paper/doctag2vec-an-embedding-based-multi-label |
Repo | |
Framework | |
Aggregating Algorithm for Prediction of Packs
Title | Aggregating Algorithm for Prediction of Packs |
Authors | Dmitry Adamskiy, Tony Bellotti, Raisa Dzhamtyrova, Yuri Kalnishkan |
Abstract | This paper formulates the protocol for prediction of packs, which a special case of prediction under delayed feedback. Under this protocol, the learner must make a few predictions without seeing the outcomes and then the outcomes are revealed. We develop the theory of prediction with expert advice for packs. By applying Vovk’s Aggregating Algorithm to this problem we obtain a number of algorithms with tight upper bounds. We carry out empirical experiments on housing data. |
Tasks | |
Published | 2017-10-23 |
URL | http://arxiv.org/abs/1710.08114v1 |
http://arxiv.org/pdf/1710.08114v1.pdf | |
PWC | https://paperswithcode.com/paper/aggregating-algorithm-for-prediction-of-packs |
Repo | |
Framework | |
Numerical Integration and Dynamic Discretization in Heuristic Search Planning over Hybrid Domains
Title | Numerical Integration and Dynamic Discretization in Heuristic Search Planning over Hybrid Domains |
Authors | Miquel Ramirez, Enrico Scala, Patrik Haslum, Sylvie Thiebaux |
Abstract | In this paper we look into the problem of planning over hybrid domains, where change can be both discrete and instantaneous, or continuous over time. In addition, it is required that each state on the trajectory induced by the execution of plans complies with a given set of global constraints. We approach the computation of plans for such domains as the problem of searching over a deterministic state model. In this model, some of the successor states are obtained by solving numerically the so-called initial value problem over a set of ordinary differential equations (ODE) given by the current plan prefix. These equations hold over time intervals whose duration is determined dynamically, according to whether zero crossing events take place for a set of invariant conditions. The resulting planner, FS+, incorporates these features together with effective heuristic guidance. FS+ does not impose any of the syntactic restrictions on process effects often found on the existing literature on Hybrid Planning. A key concept of our approach is that a clear separation is struck between planning and simulation time steps. The former is the time allowed to observe the evolution of a given dynamical system before committing to a future course of action, whilst the later is part of the model of the environment. FS+ is shown to be a robust planner over a diverse set of hybrid domains, taken from the existing literature on hybrid planning and systems. |
Tasks | |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04232v1 |
http://arxiv.org/pdf/1703.04232v1.pdf | |
PWC | https://paperswithcode.com/paper/numerical-integration-and-dynamic |
Repo | |
Framework | |
Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo
Title | Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo |
Authors | Umut Şimşekli |
Abstract | Along with the recent advances in scalable Markov Chain Monte Carlo methods, sampling techniques that are based on Langevin diffusions have started receiving increasing attention. These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms. Even though these approaches have proven successful in many applications, their performance can be limited by the light-tailed nature of the Gaussian proposals. In this study, we extend classical LMC and develop a novel Fractional LMC (FLMC) framework that is based on a family of heavy-tailed distributions, called $\alpha$-stable L'{e}vy distributions. As opposed to classical approaches, the proposed approach can possess large jumps while targeting the correct distribution, which would be beneficial for efficient exploration of the state space. We develop novel computational methods that can scale up to large-scale problems and we provide formal convergence analysis of the proposed scheme. Our experiments support our theory: FLMC can provide superior performance in multi-modal settings, improved convergence rates, and robustness to algorithm parameters. |
Tasks | Efficient Exploration |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03649v1 |
http://arxiv.org/pdf/1706.03649v1.pdf | |
PWC | https://paperswithcode.com/paper/fractional-langevin-monte-carlo-exploring-1 |
Repo | |
Framework | |
Robust Frequent Directions with Application in Online Learning
Title | Robust Frequent Directions with Application in Online Learning |
Authors | Luo Luo, Cheng Chen, Zhihua Zhang, Wu-Jun Li, Tong Zhang |
Abstract | The frequent directions (FD) technique is a deterministic approach for online sketching that has many applications in machine learning. The conventional FD is a heuristic procedure that often outputs rank deficient matrices. To overcome the rank deficiency problem, we propose a new sketching strategy called robust frequent directions (RFD) by introducing a regularization term. RFD can be derived from an optimization problem. It updates the sketch matrix and the regularization term adaptively and jointly. RFD reduces the approximation error of FD without increasing the computational cost. We also apply RFD to online learning and propose an effective hyperparameter-free online Newton algorithm. We derive a regret bound for our online Newton algorithm based on RFD, which guarantees the robustness of the algorithm. The experimental studies demonstrate that the proposed method outperforms state-of-the-art second order online learning algorithms. |
Tasks | |
Published | 2017-05-15 |
URL | http://arxiv.org/abs/1705.05067v3 |
http://arxiv.org/pdf/1705.05067v3.pdf | |
PWC | https://paperswithcode.com/paper/robust-frequent-directions-with-application |
Repo | |
Framework | |
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning
Title | On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning |
Authors | Huizhen Yu |
Abstract | We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several gradient-based TD algorithms with linear function approximation. The algorithms we analyze include: (i) two basic forms of two-time-scale gradient-based TD algorithms, which we call GTD and which minimize the mean squared projected Bellman error using stochastic gradient-descent; (ii) their “robustified” biased variants; (iii) their mirror-descent versions which combine the mirror-descent idea with TD learning; and (iv) a single-time-scale version of GTD that solves minimax problems formulated for approximate policy evaluation. We derive convergence results for three types of stepsizes: constant stepsize, slowly diminishing stepsize, as well as the standard type of diminishing stepsize with a square-summable condition. For the first two types of stepsizes, we apply the weak convergence method from stochastic approximation theory to characterize the asymptotic behavior of the algorithms, and for the standard type of stepsize, we analyze the algorithmic behavior with respect to a stronger mode of convergence, almost sure convergence. Our convergence results are for the aforementioned TD algorithms with three general ways of setting their $\lambda$-parameters: (i) state-dependent $\lambda$; (ii) a recently proposed scheme of using history-dependent $\lambda$ to keep the eligibility traces of the algorithms bounded while allowing for relatively large values of $\lambda$; and (iii) a composite scheme of setting the $\lambda$-parameters that combines the preceding two schemes and allows a broader class of generalized Bellman operators to be used for approximate policy evaluation with TD methods. |
Tasks | |
Published | 2017-12-27 |
URL | http://arxiv.org/abs/1712.09652v2 |
http://arxiv.org/pdf/1712.09652v2.pdf | |
PWC | https://paperswithcode.com/paper/on-convergence-of-some-gradient-based |
Repo | |
Framework | |
Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing
Title | Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing |
Authors | Patrick Judd, Alberto Delmas, Sayeh Sharify, Andreas Moshovos |
Abstract | We discuss several modifications and extensions over the previous proposed Cnvlutin (CNV) accelerator for convolutional and fully-connected layers of Deep Learning Network. We first describe different encodings of the activations that are deemed ineffectual. The encodings have different memory overhead and energy characteristics. We propose using a level of indirection when accessing activations from memory to reduce their memory footprint by storing only the effectual activations. We also present a modified organization that detects the activations that are deemed as ineffectual while fetching them from memory. This is different than the original design that instead detected them at the output of the preceding layer. Finally, we present an extended CNV that can also skip ineffectual weights. |
Tasks | |
Published | 2017-04-29 |
URL | http://arxiv.org/abs/1705.00125v1 |
http://arxiv.org/pdf/1705.00125v1.pdf | |
PWC | https://paperswithcode.com/paper/cnvlutin2-ineffectual-activation-and-weight |
Repo | |
Framework | |
Mapping higher-order network flows in memory and multilayer networks with Infomap
Title | Mapping higher-order network flows in memory and multilayer networks with Infomap |
Authors | Daniel Edler, Ludvig Bohlin, Martin Rosvall |
Abstract | Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and mapping higher-order network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile community-detection algorithms to reveal important modular regularities in the flows. Here we show that various forms of higher-order network flows can be represented in a unified way with networks that distinguish physical nodes for representing a~complex system’s objects from state nodes for describing flows between the objects. Moreover, these so-called sparse memory networks allow the information-theoretic community detection method known as the map equation to identify overlapping and nested flow modules in data from a range of~different higher-order interactions such as multistep, multi-source, and temporal data. We derive the map equation applied to sparse memory networks and describe its search algorithm Infomap, which can exploit the flexibility of sparse memory networks. Together they provide a general solution to reveal overlapping modular patterns in higher-order flows through complex systems. |
Tasks | Community Detection |
Published | 2017-06-15 |
URL | http://arxiv.org/abs/1706.04792v2 |
http://arxiv.org/pdf/1706.04792v2.pdf | |
PWC | https://paperswithcode.com/paper/mapping-higher-order-network-flows-in-memory |
Repo | |
Framework | |
Exhaustive search for sparse variable selection in linear regression
Title | Exhaustive search for sparse variable selection in linear regression |
Authors | Yasuhiko Igarashi, Hikaru Takenaka, Yoshinori Nakanishi-Ohno, Makoto Uemura, Shiro Ikeda, Masato Okada |
Abstract | We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage. |
Tasks | |
Published | 2017-07-07 |
URL | http://arxiv.org/abs/1707.02050v1 |
http://arxiv.org/pdf/1707.02050v1.pdf | |
PWC | https://paperswithcode.com/paper/exhaustive-search-for-sparse-variable |
Repo | |
Framework | |
Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks
Title | Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks |
Authors | Jonathan Chang, Stefan Scherer |
Abstract | Automatically assessing emotional valence in human speech has historically been a difficult task for machine learning algorithms. The subtle changes in the voice of the speaker that are indicative of positive or negative emotional states are often “overshadowed” by voice characteristics relating to emotional intensity or emotional activation. In this work we explore a representation learning approach that automatically derives discriminative representations of emotional speech. In particular, we investigate two machine learning strategies to improve classifier performance: (1) utilization of unlabeled data using a deep convolutional generative adversarial network (DCGAN), and (2) multitask learning. Within our extensive experiments we leverage a multitask annotated emotional corpus as well as a large unlabeled meeting corpus (around 100 hours). Our speaker-independent classification experiments show that in particular the use of unlabeled data in our investigations improves performance of the classifiers and both fully supervised baseline approaches are outperformed considerably. We improve the classification of emotional valence on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which is competitive to state-of-the-art performance. |
Tasks | Representation Learning |
Published | 2017-04-22 |
URL | http://arxiv.org/abs/1705.02394v1 |
http://arxiv.org/pdf/1705.02394v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-representations-of-emotional-speech |
Repo | |
Framework | |
Exploration of Large Networks with Covariates via Fast and Universal Latent Space Model Fitting
Title | Exploration of Large Networks with Covariates via Fast and Universal Latent Space Model Fitting |
Authors | Zhuang Ma, Zongming Ma |
Abstract | Latent space models are effective tools for statistical modeling and exploration of network data. These models can effectively model real world network characteristics such as degree heterogeneity, transitivity, homophily, etc. Due to their close connection to generalized linear models, it is also natural to incorporate covariate information in them. The current paper presents two universal fitting algorithms for networks with edge covariates: one based on nuclear norm penalization and the other based on projected gradient descent. Both algorithms are motivated by maximizing likelihood for a special class of inner-product models while working simultaneously for a wide range of different latent space models, such as distance models, which allow latent vectors to affect edge formation in flexible ways. These fitting methods, especially the one based on projected gradient descent, are fast and scalable to large networks. We obtain their rates of convergence for both inner-product models and beyond. The effectiveness of the modeling approach and fitting algorithms is demonstrated on five real world network datasets for different statistical tasks, including community detection with and without edge covariates, and network assisted learning. |
Tasks | Community Detection |
Published | 2017-05-05 |
URL | http://arxiv.org/abs/1705.02372v2 |
http://arxiv.org/pdf/1705.02372v2.pdf | |
PWC | https://paperswithcode.com/paper/exploration-of-large-networks-with-covariates |
Repo | |
Framework | |