January 28, 2020

3479 words 17 mins read

Paper Group ANR 941

An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies. Complexity of Linear Regions in Deep Networks. Learning Representations from Persian Handwriting for Offline Signature Verification, a Deep Transfer Learning Approach. Deriving a Quantitative Relationship Between Resolution and Human Classification Error. Py …

An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies


Title	An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies
Authors	Mirco Mutti, Marcello Restelli
Abstract	What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards? Ideally, we would like to get a policy driving towards a uniform state-action visitation (highly exploring) in a minimum number of steps (fast mixing), in order to ease efficient learning of any goal-conditioned policy later on. Unfortunately, it is remarkably arduous to directly learn an optimal policy of this nature. In this paper, we propose a novel surrogate objective for learning highly exploring and fast mixing policies, which focuses on maximizing a lower bound to the entropy of the steady-state distribution induced by the policy. In particular, we introduce three novel lower bounds, that lead to as many optimization problems, that tradeoff the theoretical guarantees with computational complexity. Then, we present a model-based reinforcement learning algorithm, IDE$^{3}$AL, to learn an optimal policy according to the introduced objective. Finally, we provide an empirical evaluation of this algorithm on a set of hard-exploration tasks.
Tasks
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04662v2
PDF	https://arxiv.org/pdf/1907.04662v2.pdf
PWC	https://paperswithcode.com/paper/an-intrinsically-motivated-approach-for
Repo
Framework

Complexity of Linear Regions in Deep Networks


Title	Complexity of Linear Regions in Deep Networks
Authors	Boris Hanin, David Rolnick
Abstract	It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions. In the case of networks that compute piecewise linear functions, such as those with ReLU activation, the number of distinct linear regions is a natural measure of expressivity. It is possible to construct networks with merely a single region, or for which the number of linear regions grows exponentially with depth; it is not clear where within this range most networks fall in practice, either before or after training. In this paper, we provide a mathematical framework to count the number of linear regions of a piecewise linear network and measure the volume of the boundaries between these regions. In particular, we prove that for networks at initialization, the average number of regions along any one-dimensional subspace grows linearly in the total number of neurons, far below the exponential upper bound. We also find that the average distance to the nearest region boundary at initialization scales like the inverse of the number of neurons. Our theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches our empirical observations. We conclude that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and that this gap can be quantified.
Tasks
Published	2019-01-25
URL	https://arxiv.org/abs/1901.09021v2
PDF	https://arxiv.org/pdf/1901.09021v2.pdf
PWC	https://paperswithcode.com/paper/complexity-of-linear-regions-in-deep-networks
Repo
Framework

Learning Representations from Persian Handwriting for Offline Signature Verification, a Deep Transfer Learning Approach


Title	Learning Representations from Persian Handwriting for Offline Signature Verification, a Deep Transfer Learning Approach
Authors	Omid Mersa, Farhood Etaati, Saeed Masoudnia, Babak N. Araabi
Abstract	Offline Signature Verification (OSV) is a challenging pattern recognition task, especially when it is expected to generalize well on the skilled forgeries that are not available during the training. Its challenges also include small training sample and large intra-class variations. Considering the limitations, we suggest a novel transfer learning approach from Persian handwriting domain to multi-language OSV domain. We train two Residual CNNs on the source domain separately based on two different tasks of word classification and writer identification. Since identifying a person signature resembles identifying ones handwriting, it seems perfectly convenient to use handwriting for the feature learning phase. The learned representation on the more varied and plentiful handwriting dataset can compensate for the lack of training data in the original task, i.e. OSV, without sacrificing the generalizability. Our proposed OSV system includes two steps: learning representation and verification of the input signature. For the first step, the signature images are fed into the trained Residual CNNs. The output representations are then used to train SVMs for the verification. We test our OSV system on three different signature datasets, including MCYT (a Spanish signature dataset), UTSig (a Persian one) and GPDS-Synthetic (an artificial dataset). On UT-SIG, we achieved 9.80% Equal Error Rate (EER) which showed substantial improvement over the best EER in the literature, 17.45%. Our proposed method surpassed state-of-the-arts by 6% on GPDS-Synthetic, achieving 6.81%. On MCYT, EER of 3.98% was obtained which is comparable to the best previously reported results.
Tasks	Transfer Learning
Published	2019-02-28
URL	http://arxiv.org/abs/1903.06249v1
PDF	http://arxiv.org/pdf/1903.06249v1.pdf
PWC	https://paperswithcode.com/paper/learning-representations-from-persian
Repo
Framework

Deriving a Quantitative Relationship Between Resolution and Human Classification Error


Title	Deriving a Quantitative Relationship Between Resolution and Human Classification Error
Authors	Josiah I. Clark, Caroline A. Clark
Abstract	For machine learning perception problems, human-level classification performance is used as an estimate of top algorithm performance. Thus, it is important to understand as precisely as possible the factors that impact human-level performance. Knowing this 1) provides a benchmark for model performance, 2) tells a project manager what type of data to obtain for human labelers in order to get accurate labels, and 3) enables ground-truth analysis–largely conducted by humans–to be carried out smoothly. In this empirical study, we explored the relationship between resolution and human classification performance using the MNIST data set down-sampled to various resolutions. The quantitative heuristic we derived could prove useful for predicting machine model performance, predicting data storage requirements, and saving valuable resources in the deployment of machine learning projects. It also has the potential to be used in a wide variety of fields such as remote sensing, medical imaging, scientific imaging, and astronomy.
Tasks
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09183v1
PDF	https://arxiv.org/pdf/1908.09183v1.pdf
PWC	https://paperswithcode.com/paper/deriving-a-quantitative-relationship-between
Repo
Framework

PyDEns: a Python Framework for Solving Differential Equations with Neural Networks


Title	PyDEns: a Python Framework for Solving Differential Equations with Neural Networks
Authors	Alexander Koryagin, Roman Khudorozkov, Sergey Tsimfer
Abstract	Recently, a lot of papers proposed to use neural networks to approximately solve partial differential equations (PDEs). Yet, there has been a lack of flexible framework for convenient experimentation. In an attempt to fill the gap, we introduce a PyDEns-module open-sourced on GitHub. Coupled with capabilities of BatchFlow, open-source framework for convenient and reproducible deep learning, PyDEns-module allows to 1) solve partial differential equations from a large family, including heat equation and wave equation 2) easily search for the best neural-network architecture among the zoo, that includes ResNet and DenseNet 3) fully control the process of model-training by testing different point-sampling schemes. With that in mind, our main contribution goes as follows: implementation of a ready-to-use and open-source numerical solver of PDEs of a novel format, based on neural networks.
Tasks
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11544v1
PDF	https://arxiv.org/pdf/1909.11544v1.pdf
PWC	https://paperswithcode.com/paper/pydens-a-python-framework-for-solving
Repo
Framework

Bridging the Gap between Semantics and Multimedia Processing


Title	Bridging the Gap between Semantics and Multimedia Processing
Authors	Marcio Ferreira Moreno, Guilherme Lima, Rodrigo Costa Mesquita Santos, Roberto Azevedo, Markus Endler
Abstract	In this paper, we give an overview of the semantic gap problem in multimedia and discuss how machine learning and symbolic AI can be combined to narrow this gap. We describe the gap in terms of a classical architecture for multimedia processing and discuss a structured approach to bridge it. This approach combines machine learning (for mapping signals to objects) and symbolic AI (for linking objects to meanings). Our main goal is to raise awareness and discuss the challenges involved in this structured approach to multimedia understanding, especially in the view of the latest developments in machine learning and symbolic AI.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11631v2
PDF	https://arxiv.org/pdf/1911.11631v2.pdf
PWC	https://paperswithcode.com/paper/bridging-the-gap-between-semantics-and
Repo
Framework

MMF: Attribute Interpretable Collaborative Filtering


Title	MMF: Attribute Interpretable Collaborative Filtering
Authors	Yixin Su, Sarah Monazam Erfani, Rui Zhang
Abstract	Collaborative filtering is one of the most popular techniques in designing recommendation systems, and its most representative model, matrix factorization, has been wildly used by researchers and the industry. However, this model suffers from the lack of interpretability and the item cold-start problem, which limit its reliability and practicability. In this paper, we propose an interpretable recommendation model called Multi-Matrix Factorization (MMF), which addresses these two limitations and achieves the state-of-the-art prediction accuracy by exploiting common attributes that are present in different items. In the model, predicted item ratings are regarded as weighted aggregations of attribute ratings generated by the inner product of the user latent vectors and the attribute latent vectors. MMF provides more fine grained analyses than matrix factorization in the following ways: attribute ratings with weights allow the understanding of how much each attribute contributes to the recommendation and hence provide interpretability; the common attributes can act as a link between existing and new items, which solves the item cold-start problem when no rating exists on an item. We evaluate the interpretability of MMF comprehensively, and conduct extensive experiments on real datasets to show that MMF outperforms state-of-the-art baselines in terms of accuracy.
Tasks	Recommendation Systems
Published	2019-08-03
URL	https://arxiv.org/abs/1908.01099v1
PDF	https://arxiv.org/pdf/1908.01099v1.pdf
PWC	https://paperswithcode.com/paper/mmf-attribute-interpretable-collaborative
Repo
Framework

$\mathtt{MedGraph:}$ Structural and Temporal Representation Learning of Electronic Medical Records


Title	$\mathtt{MedGraph:}$ Structural and Temporal Representation Learning of Electronic Medical Records
Authors	Bhagya Hettige, Yuan-Fang Li, Weiqing Wang, Suong Le, Wray Buntine
Abstract	Electronic medical record (EMR) data contains historical sequences of visits of patients, and each visit contains rich information, such as patient demographics, hospital utilisation and medical codes, including diagnosis, procedure and medication codes. Most existing EMR embedding methods capture visit-code associations by constructing input visit representations as binary vectors with a static vocabulary of medical codes. With this limited representation, they fail in encapsulating rich attribute information of visits (demographics and utilisation information) and/or codes (e.g., medical code descriptions). Furthermore, current work considers visits of the same patient as discrete-time events and ignores time gaps between them. However, the time gaps between visits depict dynamics of the patient’s medical history inducing varying influences on future visits. To address these limitations, we present $\mathtt{MedGraph}$, a supervised EMR embedding method that captures two types of information: (1) the visit-code associations in an attributed bipartite graph, and (2) the temporal sequencing of visits through a point process. $\mathtt{MedGraph}$ produces Gaussian embeddings for visits and codes to model the uncertainty. We evaluate the performance of $\mathtt{MedGraph}$ through an extensive experimental study and show that $\mathtt{MedGraph}$ outperforms state-of-the-art EMR embedding methods in several medical risk prediction tasks.
Tasks	Point Processes, Representation Learning
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03703v2
PDF	https://arxiv.org/pdf/1912.03703v2.pdf
PWC	https://paperswithcode.com/paper/mathttmedgraph-structural-and-temporal
Repo
Framework

Graph Neural Networks for Decentralized Multi-Robot Path Planning


Title	Graph Neural Networks for Decentralized Multi-Robot Path Planning
Authors	Qingbiao Li, Fernando Gama, Alejandro Ribeiro, Amanda Prorok
Abstract	Efficient and collision-free navigation in multi-robot systems is fundamental to advancing mobility. Scenarios where the robots are restricted in observation and communication range call for decentralized solutions, whereby robots execute localized planning policies. From the point of view of an individual robot, however, its local decision-making system is incomplete, since other agents’ unobservable states affect future values. The manner in which information is shared is crucial to the system’s performance, yet is not well addressed by current approaches. To address these challenges, we propose a combined architecture, with the goal of learning a decentralized sequential action policy that yields efficient path plans for all robots. Our framework is composed of a convolutional neural network (CNN) that extracts adequate features from local observations, and a graph neural network (GNN) that communicates these features among robots. We train the model to imitate an expert algorithm, and use the resulting model online in decentralized planning involving only local communication. We evaluate our method in simulations involving teams of robots in cluttered workspaces. We measure the success rates and sum of costs over the planned paths. The results show a performance close to that of our expert algorithm, demonstrating the validity of our approach. In particular, we show our model’s capability to generalize to previously unseen cases (involving larger environments and larger robot teams).
Tasks	Decision Making
Published	2019-12-12
URL	https://arxiv.org/abs/1912.06095v1
PDF	https://arxiv.org/pdf/1912.06095v1.pdf
PWC	https://paperswithcode.com/paper/graph-neural-networks-for-decentralized-multi
Repo
Framework

Towards Regulated Deep Learning


Title	Towards Regulated Deep Learning
Authors	Andrés García-Camino
Abstract	Regulation of Multi-Agent Systems (MAS) was a research topic of the past decade and one of these proposals was Electronic Institutions. However, with the recent reformulation of Artificial Neural Networks (ANN) as Deep Learning (DL), Security, Privacy, Ethical and Legal issues regarding the use of DL has raised concerns in the Artificial Intelligence (AI) Community. Now that the Regulation of MAS is almost correctly addressed, we propose the Regulation of ANN as Agent-based Training of a special type of regulated ANN that we call Institutional Neural Network. This paper introduces the former concept and provides $\mathcal{I}$, a language previously used to model and extend Electronic Institutions, as a means to implement and regulate DL.
Tasks
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13122v1
PDF	https://arxiv.org/pdf/1912.13122v1.pdf
PWC	https://paperswithcode.com/paper/towards-regulated-deep-learning
Repo
Framework

To What Extent Does Downsampling, Compression, and Data Scarcity Impact Renal Image Analysis?


Title	To What Extent Does Downsampling, Compression, and Data Scarcity Impact Renal Image Analysis?
Authors	Can Peng, Kun Zhao, Arnold Wiliem, Teng Zhang, Peter Hobson, Anthony Jennings, Brian C. Lovell
Abstract	The condition of the Glomeruli, or filter sacks, in renal Direct Immunofluorescence (DIF) specimens is a critical indicator for diagnosing kidney diseases. A digital pathology system which digitizes a glass histology slide into a Whole Slide Image (WSI) and then automatically detects and zooms in on the glomeruli with a higher magnification objective will be extremely helpful for pathologists. In this paper, using glomerulus detection as the study case, we provide analysis and observations on several important issues to help with the development of Computer Aided Diagnostic (CAD) systems to process WSIs. Large image resolution, large file size, and data scarcity are always challenging to deal with. To this end, we first examine image downsampling rates in terms of their effect on detection accuracy. Second, we examine the impact of image compression. Third, we examine the relationship between the size of the training set and detection accuracy. To understand the above issues, experiments are performed on the state-of-the-art detectors: Faster R-CNN, R-FCN, Mask R-CNN and SSD. Critical findings are observed: (1) The best balance between detection accuracy, detection speed and file size is achieved at 8 times downsampling captured with a $40\times$ objective; (2) compression which reduces the file size dramatically, does not necessarily have an adverse effect on overall accuracy; (3) reducing the amount of training data to some extents causes a drop in precision but has a negligible impact on the recall; (4) in most cases, Faster R-CNN achieves the best accuracy in the glomerulus detection task. We show that the image file size of $40\times$ WSI images can be reduced by a factor of over 6000 with negligible loss of glomerulus detection accuracy.
Tasks	Image Compression
Published	2019-09-22
URL	https://arxiv.org/abs/1909.09945v1
PDF	https://arxiv.org/pdf/1909.09945v1.pdf
PWC	https://paperswithcode.com/paper/190909945
Repo
Framework

Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping


Title	Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping
Authors	Uttaran Bhattacharya, Christian Roncal, Trisha Mittal, Rohan Chandra, Aniket Bera, Dinesh Manocha
Abstract	We present an autoencoder-based semi-supervised approach to classify perceived human emotions from walking styles obtained from videos or from motion-captured data and represented as sequences of 3D poses. Given the motion on each joint in the pose at each time step extracted from 3D pose sequences, we hierarchically pool these joint motions in a bottom-up manner in the encoder, following the kinematic chains in the human body. We also constrain the latent embeddings of the encoder to contain the space of psychologically-motivated affective features underlying the gaits. We train the decoder to reconstruct the motions per joint per time step in a top-down manner from the latent embeddings. For the annotated data, we also train a classifier to map the latent embeddings to emotion labels. Our semi-supervised approach achieves a mean average precision of 0.84 on the Emotion-Gait benchmark dataset, which contains gaits collected from multiple sources. We outperform current state-of-art algorithms for both emotion recognition and action recognition from 3D gaits by 7% – 23% on the absolute.
Tasks	Emotion Recognition
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08708v1
PDF	https://arxiv.org/pdf/1911.08708v1.pdf
PWC	https://paperswithcode.com/paper/take-an-emotion-walk-perceiving-emotions-from
Repo
Framework

Roweis Discriminant Analysis: A Generalized Subspace Learning Method


Title	Roweis Discriminant Analysis: A Generalized Subspace Learning Method
Authors	Benyamin Ghojogh, Fakhri Karray, Mark Crowley
Abstract	We present a new method which generalizes subspace learning based on eigenvalue and generalized eigenvalue problems. This method, Roweis Discriminant Analysis (RDA), is named after Sam Roweis to whom the field of subspace learning owes significantly. RDA is a family of infinite number of algorithms where Principal Component Analysis (PCA), Supervised PCA (SPCA), and Fisher Discriminant Analysis (FDA) are special cases. One of the extreme special cases, which we name Double Supervised Discriminant Analysis (DSDA), uses the labels twice; it is novel and has not appeared elsewhere. We propose a dual for RDA for some special cases. We also propose kernel RDA, generalizing kernel PCA, kernel SPCA, and kernel FDA, using both dual RDA and representation theory. Our theoretical analysis explains previously known facts such as why SPCA can use regression but FDA cannot, why PCA and SPCA have duals but FDA does not, why kernel PCA and kernel SPCA use kernel trick but kernel FDA does not, and why PCA is the best linear method for reconstruction. Roweisfaces and kernel Roweisfaces are also proposed generalizing eigenfaces, Fisherfaces, supervised eigenfaces, and their kernel variants. We also report experiments showing the effectiveness of RDA and kernel RDA on some benchmark datasets.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05437v1
PDF	https://arxiv.org/pdf/1910.05437v1.pdf
PWC	https://paperswithcode.com/paper/roweis-discriminant-analysis-a-generalized
Repo
Framework

Large Scale Structure of Neural Network Loss Landscapes


Title	Large Scale Structure of Neural Network Loss Landscapes
Authors	Stanislav Fort, Stanislaw Jastrzebski
Abstract	There are many surprising and perhaps counter-intuitive properties of optimization of deep neural networks. We propose and experimentally verify a unified phenomenological model of the loss landscape that incorporates many of them. High dimensionality plays a key role in our model. Our core idea is to model the loss landscape as a set of high dimensional \emph{wedges} that together form a large-scale, inter-connected structure and towards which optimization is drawn. We first show that hyperparameter choices such as learning rate, network width and $L_2$ regularization, affect the path optimizer takes through the landscape in a similar ways, influencing the large scale curvature of the regions the optimizer explores. Finally, we predict and demonstrate new counter-intuitive properties of the loss-landscape. We show an existence of low loss subspaces connecting a set (not only a pair) of solutions, and verify it experimentally. Finally, we analyze recently popular ensembling techniques for deep networks in the light of our model.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04724v1
PDF	https://arxiv.org/pdf/1906.04724v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-structure-of-neural-network-loss
Repo
Framework

Basis Pursuit and Orthogonal Matching Pursuit for Subspace-preserving Recovery: Theoretical Analysis


Title	Basis Pursuit and Orthogonal Matching Pursuit for Subspace-preserving Recovery: Theoretical Analysis
Authors	Daniel P. Robinson, Rene Vidal, Chong You
Abstract	Given an overcomplete dictionary $A$ and a signal $b = Ac^$ for some sparse vector $c^$ whose nonzero entries correspond to linearly independent columns of $A$, classical sparse signal recovery theory considers the problem of whether $c^*$ can be recovered as the unique sparsest solution to $b = A c$. It is now well-understood that such recovery is possible by practical algorithms when the dictionary $A$ is incoherent or restricted isometric. In this paper, we consider the more general case where $b$ lies in a subspace $\mathcal{S}_0$ spanned by a subset of linearly dependent columns of $A$, and the remaining columns are outside of the subspace. In this case, the sparsest representation may not be unique, and the dictionary may not be incoherent or restricted isometric. The goal is to have the representation $c$ correctly identify the subspace, i.e. the nonzero entries of $c$ should correspond to columns of $A$ that are in the subspace $\mathcal{S}_0$. Such a representation $c$ is called subspace-preserving, a key concept that has found important applications for learning low-dimensional structures in high-dimensional data. We present various geometric conditions that guarantee subspace-preserving recovery. Among them, the major results are characterized by the covering radius and the angular distance, which capture the distribution of points in the subspace and the similarity between points in the subspace and points outside the subspace, respectively. Importantly, these conditions do not require the dictionary to be incoherent or restricted isometric. By establishing that the subspace-preserving recovery problem and the classical sparse signal recovery problem are equivalent under common assumptions on the latter, we show that several of our proposed conditions are generalizations of some well-known conditions in the sparse signal recovery literature.
Tasks
Published	2019-12-30
URL	https://arxiv.org/abs/1912.13091v1
PDF	https://arxiv.org/pdf/1912.13091v1.pdf
PWC	https://paperswithcode.com/paper/basis-pursuit-and-orthogonal-matching-pursuit
Repo
Framework