Paper Group ANR 244
A network that learns Strassen multiplication. Sparse Linear Regression via Generalized Orthogonal Least-Squares. Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly. Improved Parsing for Argument-Clusters Coordination. Reinforcement Learning approach for Real Time Strategy Games Battle city and S3. Zero- …
A network that learns Strassen multiplication
Title | A network that learns Strassen multiplication |
Authors | Veit Elser |
Abstract | We study neural networks whose only non-linear components are multipliers, to test a new training rule in a context where the precise representation of data is paramount. These networks are challenged to discover the rules of matrix multiplication, given many examples. By limiting the number of multipliers, the network is forced to discover the Strassen multiplication rules. This is the mathematical equivalent of finding low rank decompositions of the $n\times n$ matrix multiplication tensor, $M_n$. We train these networks with the conservative learning rule, which makes minimal changes to the weights so as to give the correct output for each input at the time the input-output pair is received. Conservative learning needs a few thousand examples to find the rank 7 decomposition of $M_2$, and $10^5$ for the rank 23 decomposition of $M_3$ (the lowest known). High precision is critical, especially for $M_3$, to discriminate between true decompositions and “border approximations”. |
Tasks | |
Published | 2016-01-26 |
URL | http://arxiv.org/abs/1601.07227v1 |
http://arxiv.org/pdf/1601.07227v1.pdf | |
PWC | https://paperswithcode.com/paper/a-network-that-learns-strassen-multiplication |
Repo | |
Framework | |
Sparse Linear Regression via Generalized Orthogonal Least-Squares
Title | Sparse Linear Regression via Generalized Orthogonal Least-Squares |
Authors | Abolfazl Hashemi, Haris Vikalo |
Abstract | Sparse linear regression, which entails finding a sparse solution to an underdetermined system of linear equations, can formally be expressed as an $l_0$-constrained least-squares problem. The Orthogonal Least-Squares (OLS) algorithm sequentially selects the features (i.e., columns of the coefficient matrix) to greedily find an approximate sparse solution. In this paper, a generalization of Orthogonal Least-Squares which relies on a recursive relation between the components of the optimal solution to select L features at each step and solve the resulting overdetermined system of equations is proposed. Simulation results demonstrate that the generalized OLS algorithm is computationally efficient and achieves performance superior to that of existing greedy algorithms broadly used in the literature. |
Tasks | |
Published | 2016-02-22 |
URL | http://arxiv.org/abs/1602.06916v2 |
http://arxiv.org/pdf/1602.06916v2.pdf | |
PWC | https://paperswithcode.com/paper/sparse-linear-regression-via-generalized |
Repo | |
Framework | |
Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly
Title | Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly |
Authors | Hao Jiang, Kristen Grauman |
Abstract | Today’s person detection methods work best when people are in common upright poses and appear reasonably well spaced out in the image. However, in many real images, that’s not what people do. People often appear quite close to each other, e.g., with limbs linked or heads touching, and their poses are often not pedestrian-like. We propose an approach to detangle people in multi-person images. We formulate the task as a region assembly problem. Starting from a large set of overlapping regions from body part semantic segmentation and generic object proposals, our optimization approach reassembles those pieces together into multiple person instances. It enforces that the composed body part regions of each person instance obey constraints on relative sizes, mutual spatial relationships, foreground coverage, and exclusive label assignments when overlapping. Since optimal region assembly is a challenging combinatorial problem, we present a Lagrangian relaxation method to accelerate the lower bound estimation, thereby enabling a fast branch and bound solution for the global optimum. As output, our method produces a pixel-level map indicating both 1) the body part labels (arm, leg, torso, and head), and 2) which parts belong to which individual person. Our results on three challenging datasets show our method is robust to clutter, occlusion, and complex poses. It outperforms a variety of competing methods, including existing detector CRF methods and region CNN approaches. In addition, we demonstrate its impact on a proxemics recognition task, which demands a precise representation of “whose body part is where” in crowded images. |
Tasks | Human Detection, Semantic Segmentation |
Published | 2016-04-13 |
URL | http://arxiv.org/abs/1604.03880v1 |
http://arxiv.org/pdf/1604.03880v1.pdf | |
PWC | https://paperswithcode.com/paper/detangling-people-individuating-multiple |
Repo | |
Framework | |
Improved Parsing for Argument-Clusters Coordination
Title | Improved Parsing for Argument-Clusters Coordination |
Authors | Jessica Ficler, Yoav Goldberg |
Abstract | Syntactic parsers perform poorly in prediction of Argument-Cluster Coordination (ACC). We change the PTB representation of ACC to be more suitable for learning by a statistical PCFG parser, affecting 125 trees in the training set. Training on the modified trees yields a slight improvement in EVALB scores on sections 22 and 23. The main evaluation is on a corpus of 4th grade science exams, in which ACC structures are prevalent. On this corpus, we obtain an impressive x2.7 improvement in recovering ACC structures compared to a parser trained on the original PTB trees. |
Tasks | |
Published | 2016-06-01 |
URL | http://arxiv.org/abs/1606.00294v1 |
http://arxiv.org/pdf/1606.00294v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-parsing-for-argument-clusters |
Repo | |
Framework | |
Reinforcement Learning approach for Real Time Strategy Games Battle city and S3
Title | Reinforcement Learning approach for Real Time Strategy Games Battle city and S3 |
Authors | Harshit Sethy, Amit Patel |
Abstract | In this paper we proposed reinforcement learning algorithms with the generalized reward function. In our proposed method we use Q-learning and SARSA algorithms with generalised reward function to train the reinforcement learning agent. We evaluated the performance of our proposed algorithms on two real-time strategy games called BattleCity and S3. There are two main advantages of having such an approach as compared to other works in RTS. (1) We can ignore the concept of a simulator which is often game specific and is usually hard coded in any type of RTS games (2) our system can learn from interaction with any opponents and quickly change the strategy according to the opponents and do not need any human traces as used in previous works. Keywords : Reinforcement learning, Machine learning, Real time strategy, Artificial intelligence. |
Tasks | Q-Learning, Real-Time Strategy Games |
Published | 2016-02-16 |
URL | http://arxiv.org/abs/1602.04936v1 |
http://arxiv.org/pdf/1602.04936v1.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-approach-for-real-time |
Repo | |
Framework | |
Zero-Shot Learning with Multi-Battery Factor Analysis
Title | Zero-Shot Learning with Multi-Battery Factor Analysis |
Authors | Zhong Ji, Yuzhong Xie, Yanwei Pang, Lei Chen, Zhongfei Zhang |
Abstract | Zero-shot learning (ZSL) extends the conventional image classification technique to a more challenging situation where the test image categories are not seen in the training samples. Most studies on ZSL utilize side information such as attributes or word vectors to bridge the relations between the seen classes and the unseen classes. However, existing approaches on ZSL typically exploit a shared space for each type of side information independently, which cannot make full use of the complementary knowledge of different types of side information. To this end, this paper presents an MBFA-ZSL approach to embed different types of side information as well as the visual feature into one shared space. Specifically, we first develop an algorithm named Multi-Battery Factor Analysis (MBFA) to build a unified semantic space, and then employ multiple types of side information in it to achieve the ZSL. The close-form solution makes MBFA-ZSL simple to implement and efficient to run on large datasets. Extensive experiments on the popular AwA, CUB, and SUN datasets show its significant superiority over the state-of-the-art approaches. |
Tasks | Image Classification, Zero-Shot Learning |
Published | 2016-06-30 |
URL | http://arxiv.org/abs/1606.09349v1 |
http://arxiv.org/pdf/1606.09349v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-learning-with-multi-battery-factor |
Repo | |
Framework | |
Recalling Holistic Information for Semantic Segmentation
Title | Recalling Holistic Information for Semantic Segmentation |
Authors | Hexiang Hu, Zhiwei Deng, Guang-tong Zhou, Fei Sha, Greg Mori |
Abstract | Semantic segmentation requires a detailed labeling of image pixels by object category. Information derived from local image patches is necessary to describe the detailed shape of individual objects. However, this information is ambiguous and can result in noisy labels. Global inference of image content can instead capture the general semantic concepts present. We advocate that high-recall holistic inference of image concepts provides valuable information for detailed pixel labeling. We build a two-stream neural network architecture that facilitates information flow from holistic information to local pixels, while keeping common image features shared among the low-level layers of both the holistic analysis and segmentation branches. We empirically evaluate our network on four standard semantic segmentation datasets. Our network obtains state-of-the-art performance on PASCAL-Context and NYUDv2, and ablation studies verify its effectiveness on ADE20K and SIFT-Flow. |
Tasks | Semantic Segmentation |
Published | 2016-11-24 |
URL | http://arxiv.org/abs/1611.08061v1 |
http://arxiv.org/pdf/1611.08061v1.pdf | |
PWC | https://paperswithcode.com/paper/recalling-holistic-information-for-semantic |
Repo | |
Framework | |
A Spectral Series Approach to High-Dimensional Nonparametric Regression
Title | A Spectral Series Approach to High-Dimensional Nonparametric Regression |
Authors | Ann B. Lee, Rafael Izbicki |
Abstract | A key question in modern statistics is how to make fast and reliable inferences for complex, high-dimensional data. While there has been much interest in sparse techniques, current methods do not generalize well to data with nonlinear structure. In this work, we present an orthogonal series estimator for predictors that are complex aggregate objects, such as natural images, galaxy spectra, trajectories, and movies. Our series approach ties together ideas from kernel machine learning, and Fourier methods. We expand the unknown regression on the data in terms of the eigenfunctions of a kernel-based operator, and we take advantage of orthogonality of the basis with respect to the underlying data distribution, P, to speed up computations and tuning of parameters. If the kernel is appropriately chosen, then the eigenfunctions adapt to the intrinsic geometry and dimension of the data. We provide theoretical guarantees for a radial kernel with varying bandwidth, and we relate smoothness of the regression function with respect to P to sparsity in the eigenbasis. Finally, using simulated and real-world data, we systematically compare the performance of the spectral series approach with classical kernel smoothing, k-nearest neighbors regression, kernel ridge regression, and state-of-the-art manifold and local regression methods. |
Tasks | |
Published | 2016-02-01 |
URL | http://arxiv.org/abs/1602.00355v1 |
http://arxiv.org/pdf/1602.00355v1.pdf | |
PWC | https://paperswithcode.com/paper/a-spectral-series-approach-to-high |
Repo | |
Framework | |
PCANet: An energy perspective
Title | PCANet: An energy perspective |
Authors | Jiasong Wu, Shijie Qiu, Youyong Kong, Longyu Jiang, Lotfi Senhadji, Huazhong Shu |
Abstract | The principal component analysis network (PCANet), which is one of the recently proposed deep learning architectures, achieves the state-of-the-art classification accuracy in various databases. However, the explanation of the PCANet is lacked. In this paper, we try to explain why PCANet works well from energy perspective point of view based on a set of experiments. The impact of various parameters on the error rate of PCANet is analyzed in depth. It was found that this error rate is correlated with the logarithm of energy of image. The proposed energy explanation approach can be used as a testing method for checking if every step of the constructed networks is necessary. |
Tasks | |
Published | 2016-03-03 |
URL | http://arxiv.org/abs/1603.00944v1 |
http://arxiv.org/pdf/1603.00944v1.pdf | |
PWC | https://paperswithcode.com/paper/pcanet-an-energy-perspective |
Repo | |
Framework | |
Connectome Smoothing via Low-rank Approximations
Title | Connectome Smoothing via Low-rank Approximations |
Authors | Runze Tang, Michael Ketcha, Alexandra Badea, Evan D. Calabrese, Daniel S. Margulies, Joshua T. Vogelstein, Carey E. Priebe, Daniel L. Sussman |
Abstract | In statistical connectomics, the quantitative study of brain networks, estimating the mean of a population of graphs based on a sample is a core problem. Often, this problem is especially difficult because the sample or cohort size is relatively small, sometimes even a single subject. While using the element-wise sample mean of the adjacency matrices is a common approach, this method does not exploit any underlying structural properties of the graphs. We propose using a low-rank method which incorporates tools for dimension selection and diagonal augmentation to smooth the estimates and improve performance over the naive methodology for small sample sizes. Theoretical results for the stochastic blockmodel show that this method offers major improvements when there are many vertices. Similarly, we demonstrate that the low-rank methods outperform the standard sample mean for a variety of independent edge distributions as well as human connectome data derived from magnetic resonance imaging, especially when sample sizes are small. Moreover, the low-rank methods yield “eigen-connectomes”, which correlate with the lobe-structure of the human brain and superstructures of the mouse brain. These results indicate that low-rank methods are an important part of the tool box for researchers studying populations of graphs in general, and statistical connectomics in particular. |
Tasks | |
Published | 2016-09-06 |
URL | http://arxiv.org/abs/1609.01672v3 |
http://arxiv.org/pdf/1609.01672v3.pdf | |
PWC | https://paperswithcode.com/paper/connectome-smoothing-via-low-rank |
Repo | |
Framework | |
Regression-based reduced-order models to predict transient thermal output for enhanced geothermal systems
Title | Regression-based reduced-order models to predict transient thermal output for enhanced geothermal systems |
Authors | M. K. Mudunuru, S. Karra, D. R. Harp, G. D. Guthrie, H. S. Viswanathan |
Abstract | The goal of this paper is to assess the utility of Reduced-Order Models (ROMs) developed from 3D physics-based models for predicting transient thermal power output for an enhanced geothermal reservoir while explicitly accounting for uncertainties in the subsurface system and site-specific details. Numerical simulations are performed based on Latin Hypercube Sampling (LHS) of model inputs drawn from uniform probability distributions. Key sensitive parameters are identified from these simulations, which are fracture zone permeability, well/skin factor, bottom hole pressure, and injection flow rate. The inputs for ROMs are based on these key sensitive parameters. The ROMs are then used to evaluate the influence of subsurface attributes on thermal power production curves. The resulting ROMs are compared with field-data and the detailed physics-based numerical simulations. We propose three different ROMs with different levels of model parsimony, each describing key and essential features of the power production curves. ROM-1 is able to accurately reproduce the power output of numerical simulations for low values of permeabilities and certain features of the field-scale data, and is relatively parsimonious. ROM-2 is a more complex model than ROM-1 but it accurately describes the field-data. At higher permeabilities, ROM-2 reproduces numerical results better than ROM-1, however, there is a considerable deviation at low fracture zone permeabilities. ROM-3 is developed by taking the best aspects of ROM-1 and ROM-2 and provides a middle ground for model parsimony. It is able to describe various features of numerical simulations and field-data. From the proposed workflow, we demonstrate that the proposed simple ROMs are able to capture various complex features of the power production curves of Fenton Hill HDR system. For typical EGS applications, ROM-2 and ROM-3 outperform ROM-1. |
Tasks | |
Published | 2016-06-14 |
URL | http://arxiv.org/abs/1606.04567v3 |
http://arxiv.org/pdf/1606.04567v3.pdf | |
PWC | https://paperswithcode.com/paper/regression-based-reduced-order-models-to |
Repo | |
Framework | |
Solving General Arithmetic Word Problems
Title | Solving General Arithmetic Word Problems |
Authors | Subhro Roy, Dan Roth |
Abstract | This paper presents a novel approach to automatically solving arithmetic word problems. This is the first algorithmic approach that can handle arithmetic problems with multiple steps and operations, without depending on additional annotations or predefined templates. We develop a theory for expression trees that can be used to represent and evaluate the target arithmetic expressions; we use it to uniquely decompose the target arithmetic problem to multiple classification problems; we then compose an expression tree, combining these with world knowledge through a constrained inference framework. Our classifiers gain from the use of {\em quantity schemas} that supports better extraction of features. Experimental results show that our method outperforms existing systems, achieving state of the art performance on benchmark datasets of arithmetic word problems. |
Tasks | |
Published | 2016-08-04 |
URL | http://arxiv.org/abs/1608.01413v2 |
http://arxiv.org/pdf/1608.01413v2.pdf | |
PWC | https://paperswithcode.com/paper/solving-general-arithmetic-word-problems |
Repo | |
Framework | |
Less than a Single Pass: Stochastically Controlled Stochastic Gradient Method
Title | Less than a Single Pass: Stochastically Controlled Stochastic Gradient Method |
Authors | Lihua Lei, Michael I. Jordan |
Abstract | We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG). As a member of the SVRG family of algorithms, SCSG makes use of gradient estimates at two scales, with the number of updates at the faster scale being governed by a geometric random variable. Unlike most existing algorithms in this family, both the computation cost and the communication cost of SCSG do not necessarily scale linearly with the sample size $n$; indeed, these costs are independent of $n$ when the target accuracy is low. An experimental evaluation on real datasets confirms the effectiveness of SCSG. |
Tasks | |
Published | 2016-09-12 |
URL | https://arxiv.org/abs/1609.03261v3 |
https://arxiv.org/pdf/1609.03261v3.pdf | |
PWC | https://paperswithcode.com/paper/less-than-a-single-pass-stochastically |
Repo | |
Framework | |
Efficient Optimization for Rank-based Loss Functions
Title | Efficient Optimization for Rank-based Loss Functions |
Authors | Pritish Mohapatra, Michal Rolinek, C. V. Jawahar, Vladimir Kolmogorov, M. Pawan Kumar |
Abstract | The accuracy of information retrieval systems is often measured using complex loss functions such as the average precision (AP) or the normalized discounted cumulative gain (NDCG). Given a set of positive and negative samples, the parameters of a retrieval system can be estimated by minimizing these loss functions. However, the non-differentiability and non-decomposability of these loss functions does not allow for simple gradient based optimization algorithms. This issue is generally circumvented by either optimizing a structured hinge-loss upper bound to the loss function or by using asymptotic methods like the direct-loss minimization framework. Yet, the high computational complexity of loss-augmented inference, which is necessary for both the frameworks, prohibits its use in large training data sets. To alleviate this deficiency, we present a novel quicksort flavored algorithm for a large class of non-decomposable loss functions. We provide a complete characterization of the loss functions that are amenable to our algorithm, and show that it includes both AP and NDCG based loss functions. Furthermore, we prove that no comparison based algorithm can improve upon the computational complexity of our approach asymptotically. We demonstrate the effectiveness of our approach in the context of optimizing the structured hinge loss upper bound of AP and NDCG loss for learning models for a variety of vision tasks. We show that our approach provides significantly better results than simpler decomposable loss functions, while requiring a comparable training time. |
Tasks | Information Retrieval |
Published | 2016-04-27 |
URL | http://arxiv.org/abs/1604.08269v3 |
http://arxiv.org/pdf/1604.08269v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-optimization-for-rank-based-loss |
Repo | |
Framework | |
The Mondrian Kernel
Title | The Mondrian Kernel |
Authors | Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh |
Abstract | We introduce the Mondrian kernel, a fast random feature approximation to the Laplace kernel. It is suitable for both batch and online learning, and admits a fast kernel-width-selection procedure as the random features can be re-used efficiently for all kernel widths. The features are constructed by sampling trees via a Mondrian process [Roy and Teh, 2009], and we highlight the connection to Mondrian forests [Lakshminarayanan et al., 2014], where trees are also sampled via a Mondrian process, but fit independently. This link provides a new insight into the relationship between kernel methods and random forests. |
Tasks | |
Published | 2016-06-16 |
URL | http://arxiv.org/abs/1606.05241v1 |
http://arxiv.org/pdf/1606.05241v1.pdf | |
PWC | https://paperswithcode.com/paper/the-mondrian-kernel |
Repo | |
Framework | |