# Paper Group AWR 442

Explainability Techniques for Graph Convolutional Networks. Learning from the Past: Continual Meta-Learning via Bayesian Graph Modeling. LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference. Fast Compressive Sensing Recovery Using Generative Models with Structured Latent Variables. Semi-Supervised Monocular Depth Estim …

#### Explainability Techniques for Graph Convolutional Networks

Title | Explainability Techniques for Graph Convolutional Networks |

Authors | Federico Baldassarre, Hossein Azizpour |

Abstract | Graph Networks are used to make decisions in potentially complex scenarios but it is usually not obvious how or why they made them. In this work, we study the explainability of Graph Network decisions using two main classes of techniques, gradient-based and decomposition-based, on a toy dataset and a chemistry task. Our study sets the ground for future development as well as application to real-world problems. |

Tasks | |

Published | 2019-05-31 |

URL | https://arxiv.org/abs/1905.13686v1 |

https://arxiv.org/pdf/1905.13686v1.pdf | |

PWC | https://paperswithcode.com/paper/explainability-techniques-for-graph |

Repo | https://github.com/baldassarreFe/graph-network-explainability |

Framework | pytorch |

#### Learning from the Past: Continual Meta-Learning via Bayesian Graph Modeling

Title | Learning from the Past: Continual Meta-Learning via Bayesian Graph Modeling |

Authors | Yadan Luo, Zi Huang, Zheng Zhang, Ziwei Wang, Mahsa Baktashmotlagh, Yang Yang |

Abstract | Meta-learning for few-shot learning allows a machine to leverage previously acquired knowledge as a prior, thus improving the performance on novel tasks with only small amounts of data. However, most mainstream models suffer from catastrophic forgetting and insufficient robustness issues, thereby failing to fully retain or exploit long-term knowledge while being prone to cause severe error accumulation. In this paper, we propose a novel Continual Meta-Learning approach with Bayesian Graph Neural Networks (CML-BGNN) that mathematically formulates meta-learning as continual learning of a sequence of tasks. With each task forming as a graph, the intra- and inter-task correlations can be well preserved via message-passing and history transition. To remedy topological uncertainty from graph initialization, we utilize Bayes by Backprop strategy that approximates the posterior distribution of task-specific parameters with amortized inference networks, which are seamlessly integrated into the end-to-end edge learning. Extensive experiments conducted on the miniImageNet and tieredImageNet datasets demonstrate the effectiveness and efficiency of the proposed method, improving the performance by 42.8% compared with state-of-the-art on the miniImageNet 5-way 1-shot classification task. |

Tasks | Continual Learning, Few-Shot Learning, Meta-Learning |

Published | 2019-11-12 |

URL | https://arxiv.org/abs/1911.04695v1 |

https://arxiv.org/pdf/1911.04695v1.pdf | |

PWC | https://paperswithcode.com/paper/learning-from-the-past-continual-meta |

Repo | https://github.com/Luoyadan/BGNN-AAAI |

Framework | pytorch |

#### LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

Title | LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference |

Authors | Erwei Wang, James J. Davis, Peter Y. K. Cheung, George A. Constantinides |

Abstract | Research has shown that deep neural networks contain significant redundancy, and thus that high classification accuracy can be achieved even when weights and activations are quantized down to binary values. Network binarization on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA’s fundamental building block, the K-LUT, is capable of implementing far more than an XNOR: it can perform any K-input Boolean operation. Inspired by this observation, we propose LUTNet, an end-to-end hardware-software framework for the construction of area-efficient FPGA-based neural network accelerators using the native LUTs as inference operators. We describe the realization of both unrolled and tiled LUTNet architectures, with the latter facilitating smaller, less power-hungry deployment over the former while sacrificing area and energy efficiency along with throughput. For both varieties, we demonstrate that the exploitation of LUT flexibility allows for far heavier pruning than possible in prior works, resulting in significant area savings while achieving comparable accuracy. Against the state-of-the-art binarized neural network implementation, we achieve up to twice the area efficiency for several standard network models when inferencing popular datasets. We also demonstrate that even greater energy efficiency improvements are obtainable. |

Tasks | |

Published | 2019-10-24 |

URL | https://arxiv.org/abs/1910.12625v2 |

https://arxiv.org/pdf/1910.12625v2.pdf | |

PWC | https://paperswithcode.com/paper/lutnet-learning-fpga-configurations-for |

Repo | https://github.com/awai54st/LUTNet |

Framework | tf |

#### Fast Compressive Sensing Recovery Using Generative Models with Structured Latent Variables

Title | Fast Compressive Sensing Recovery Using Generative Models with Structured Latent Variables |

Authors | Shaojie Xu, Sihan Zeng, Justin Romberg |

Abstract | Deep learning models have significantly improved the visual quality and accuracy on compressive sensing recovery. In this paper, we propose an algorithm for signal reconstruction from compressed measurements with image priors captured by a generative model. We search and constrain on latent variable space to make the method stable when the number of compressed measurements is extremely limited. We show that, by exploiting certain structures of the latent variables, the proposed method produces improved reconstruction accuracy and preserves realistic and non-smooth features in the image. Our algorithm achieves high computation speed by projecting between the original signal space and the latent variable space in an alternating fashion. |

Tasks | Compressive Sensing |

Published | 2019-02-19 |

URL | https://arxiv.org/abs/1902.06913v4 |

https://arxiv.org/pdf/1902.06913v4.pdf | |

PWC | https://paperswithcode.com/paper/fast-compressive-sensing-recovery-using |

Repo | https://github.com/sihan-zeng/f-csrg |

Framework | tf |

#### Semi-Supervised Monocular Depth Estimation with Left-Right Consistency Using Deep Neural Network

Title | Semi-Supervised Monocular Depth Estimation with Left-Right Consistency Using Deep Neural Network |

Authors | Ali Jahani Amiri, Shing Yan Loo, Hong Zhang |

Abstract | There has been tremendous research progress in estimating the depth of a scene from a monocular camera image. Existing methods for single-image depth prediction are exclusively based on deep neural networks, and their training can be unsupervised using stereo image pairs, supervised using LiDAR point clouds, or semi-supervised using both stereo and LiDAR. In general, semi-supervised training is preferred as it does not suffer from the weaknesses of either supervised training, resulting from the difference in the cameras and the LiDARs field of view, or unsupervised training, resulting from the poor depth accuracy that can be recovered from a stereo pair. In this paper, we present our research in single image depth prediction using semi-supervised training that outperforms the state-of-the-art. We achieve this through a loss function that explicitly exploits left-right consistency in a stereo reconstruction, which has not been adopted in previous semi-supervised training. In addition, we describe the correct use of ground truth depth derived from LiDAR that can significantly reduce prediction error. The performance of our depth prediction model is evaluated on popular datasets, and the importance of each aspect of our semi-supervised training approach is demonstrated through experimental results. Our deep neural network model has been made publicly available. |

Tasks | Depth Estimation, Monocular Depth Estimation |

Published | 2019-05-18 |

URL | https://arxiv.org/abs/1905.07542v1 |

https://arxiv.org/pdf/1905.07542v1.pdf | |

PWC | https://paperswithcode.com/paper/semi-supervised-monocular-depth-estimation |

Repo | https://github.com/a-jahani/semiDepth |

Framework | tf |

#### DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

Title | DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators |

Authors | Lu Lu, Pengzhan Jin, George Em Karniadakis |

Abstract | While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator \cite{chen1995universal}. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data. However, the theorem guarantees only a small approximation error for a sufficient large network, and does not consider the important optimization and generalization errors. To realize this theorem in practice, we propose deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset. A DeepONet consists of two sub-networks, one for encoding the input function at a fixed number of sensors $x_i, i=1,\dots,m$ (branch net), and another for encoding the locations for the output functions (trunk net). We perform systematic simulations for identifying two types of operators, i.e., dynamic systems and partial differential equations, and demonstrate that DeepONet significantly reduces the generalization error compared to the fully-connected networks. We also derive theoretically the dependence of the approximation error in terms of the number of sensors (where the input function is defined) as well as the input function type, and we verify the theorem with computational results. More importantly, we observe high-order error convergence in our computational tests, namely polynomial rates (from half order to fourth order) and even exponential convergence with respect to the training dataset size. |

Tasks | |

Published | 2019-10-08 |

URL | https://arxiv.org/abs/1910.03193v1 |

https://arxiv.org/pdf/1910.03193v1.pdf | |

PWC | https://paperswithcode.com/paper/deeponet-learning-nonlinear-operators-for |

Repo | https://github.com/lululxvi/deepxde |

Framework | tf |

#### Tracing Forum Posts to MOOC Content using Topic Analysis

Title | Tracing Forum Posts to MOOC Content using Topic Analysis |

Authors | Alexander William Wong, Ken Wong, Abram Hindle |

Abstract | Massive Open Online Courses are educational programs that are open and accessible to a large number of people through the internet. To facilitate learning, MOOC discussion forums exist where students and instructors communicate questions, answers, and thoughts related to the course. The primary objective of this paper is to investigate tracing discussion forum posts back to course lecture videos and readings using topic analysis. We utilize both unsupervised and supervised variants of Latent Dirichlet Allocation (LDA) to extract topics from course material and classify forum posts. We validate our approach on posts bootstrapped from five Coursera courses and determine that topic models can be used to map student discussion posts back to the underlying course lecture or reading. Labeled LDA outperforms unsupervised Hierarchical Dirichlet Process LDA and base LDA for our traceability task. This research is useful as it provides an automated approach for clustering student discussions by course material, enabling instructors to quickly evaluate student misunderstanding of content and clarify materials accordingly. |

Tasks | Topic Models |

Published | 2019-04-15 |

URL | http://arxiv.org/abs/1904.07307v1 |

http://arxiv.org/pdf/1904.07307v1.pdf | |

PWC | https://paperswithcode.com/paper/tracing-forum-posts-to-mooc-content-using |

Repo | https://github.com/awwong1/topic-traceability |

Framework | none |

#### Bayesian Allocation Model: Inference by Sequential Monte Carlo for Nonnegative Tensor Factorizations and Topic Models using Polya Urns

Title | Bayesian Allocation Model: Inference by Sequential Monte Carlo for Nonnegative Tensor Factorizations and Topic Models using Polya Urns |

Authors | Ali Taylan Cemgil, Mehmet Burak Kurutmaz, Sinan Yildirim, Melih Barsbey, Umut Simsekli |

Abstract | We introduce a dynamic generative model, Bayesian allocation model (BAM), which establishes explicit connections between nonnegative tensor factorization (NTF), graphical models of discrete probability distributions and their Bayesian extensions, and the topic models such as the latent Dirichlet allocation. BAM is based on a Poisson process, whose events are marked by using a Bayesian network, where the conditional probability tables of this network are then integrated out analytically. We show that the resulting marginal process turns out to be a Polya urn, an integer valued self-reinforcing process. This urn processes, which we name a Polya-Bayes process, obey certain conditional independence properties that provide further insight about the nature of NTF. These insights also let us develop space efficient simulation algorithms that respect the potential sparsity of data: we propose a class of sequential importance sampling algorithms for computing NTF and approximating their marginal likelihood, which would be useful for model selection. The resulting methods can also be viewed as a model scoring method for topic models and discrete Bayesian networks with hidden variables. The new algorithms have favourable properties in the sparse data regime when contrasted with variational algorithms that become more accurate when the total sum of the elements of the observed tensor goes to infinity. We illustrate the performance on several examples and numerically study the behaviour of the algorithms for various data regimes. |

Tasks | Model Selection, Topic Models |

Published | 2019-03-11 |

URL | http://arxiv.org/abs/1903.04478v1 |

http://arxiv.org/pdf/1903.04478v1.pdf | |

PWC | https://paperswithcode.com/paper/bayesian-allocation-model-inference-by |

Repo | https://github.com/atcemgil/bam |

Framework | none |

#### A Unifying Bayesian View of Continual Learning

Title | A Unifying Bayesian View of Continual Learning |

Authors | Sebastian Farquhar, Yarin Gal |

Abstract | Some machine learning applications require continual learning - where data comes in a sequence of datasets, each is used for training and then permanently discarded. From a Bayesian perspective, continual learning seems straightforward: Given the model posterior one would simply use this as the prior for the next task. However, exact posterior evaluation is intractable with many models, especially with Bayesian neural networks (BNNs). Instead, posterior approximations are often sought. Unfortunately, when posterior approximations are used, prior-focused approaches do not succeed in evaluations designed to capture properties of realistic continual learning use cases. As an alternative to prior-focused methods, we introduce a new approximate Bayesian derivation of the continual learning loss. Our loss does not rely on the posterior from earlier tasks, and instead adapts the model itself by changing the likelihood term. We call these approaches likelihood-focused. We then combine prior- and likelihood-focused methods into one objective, tying the two views together under a single unifying framework of approximate Bayesian continual learning. |

Tasks | Continual Learning |

Published | 2019-02-18 |

URL | http://arxiv.org/abs/1902.06494v1 |

http://arxiv.org/pdf/1902.06494v1.pdf | |

PWC | https://paperswithcode.com/paper/a-unifying-bayesian-view-of-continual |

Repo | https://github.com/Saraharas/A-Unifying-Bayesian-View-of-Continual-Learning |

Framework | pytorch |

#### Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport

Title | Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport |

Authors | Tengfei Ma, Jie Chen |

Abstract | Hierarchical abstractions are a methodology for solving large-scale graph problems in various disciplines. Coarsening is one such approach: it generates a pyramid of graphs whereby the one in the next level is a structural summary of the prior one. With a long history in scientific computing, many coarsening strategies were developed based on mathematically driven heuristics. Recently, resurgent interests exist in deep learning to design hierarchical methods learnable through differentiable parameterization. These approaches are paired with downstream tasks for supervised learning. In practice, however, supervised signals (e.g., labels) are scarce and are often laborious to obtain. In this work, we propose an unsupervised approach, coined OTCoarsening, with the use of optimal transport. Both the coarsening matrix and the transport cost matrix are parameterized, so that an optimal coarsening strategy can be learned and tailored for a given set of graphs. We demonstrate that the proposed approach produces meaningful coarse graphs and yields competitive performance compared with supervised methods for graph classification and regression. |

Tasks | Graph Classification |

Published | 2019-12-24 |

URL | https://arxiv.org/abs/1912.11176v1 |

https://arxiv.org/pdf/1912.11176v1.pdf | |

PWC | https://paperswithcode.com/paper/unsupervised-learning-of-graph-hierarchical-1 |

Repo | https://github.com/matenure/OTCoarsening |

Framework | pytorch |

#### Relational Graph Learning for Crowd Navigation

Title | Relational Graph Learning for Crowd Navigation |

Authors | Changan Chen, Sha Hu, Payam Nikdel, Greg Mori, Manolis Savva |

Abstract | We present a relational graph learning approach for robotic crowd navigation using model-based deep reinforcement learning that plans actions by looking into the future. Our approach reasons about the relations between all agents based on their latent features and uses a Graph Convolutional Network to encode higher-order interactions in each agent’s state representation, which is subsequently leveraged for state prediction and value estimation. The ability to predict human motion allows us to perform multi-step lookahead planning, taking into account the temporal evolution of human crowds. We evaluate our approach against a state-of-the-art baseline for crowd navigation and ablations of our model to demonstrate that navigation with our approach is more efficient, results in fewer collisions, and avoids failure cases involving oscillatory and freezing behaviors. |

Tasks | |

Published | 2019-09-28 |

URL | https://arxiv.org/abs/1909.13165v2 |

https://arxiv.org/pdf/1909.13165v2.pdf | |

PWC | https://paperswithcode.com/paper/relational-graph-learning-for-crowd |

Repo | https://github.com/vita-epfl/CrowdNav |

Framework | none |

#### Limitations of Lazy Training of Two-layers Neural Networks

Title | Limitations of Lazy Training of Two-layers Neural Networks |

Authors | Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari |

Abstract | We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}i$ are $d$-dimensional Gaussians and responses are $y_i = f*({\boldsymbol x}i)$ for $f*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels. We use two-layers neural networks with quadratic activations, and compare three different learning regimes: the random features (RF) regime in which we only train the second-layer weights; the neural tangent (NT) regime in which we train a linearization of the neural network around its initialization; the fully trained neural network (NN) regime in which we train all the weights in the network. We prove that, even for the simple quadratic model of point (1), there is a potentially unbounded gap between the prediction risk achieved in these three training regimes, when the number of neurons is smaller than the ambient dimension. When the number of neurons is larger than the number of dimensions, the problem is significantly easier and both NT and NN learning achieve zero risk. |

Tasks | |

Published | 2019-06-21 |

URL | https://arxiv.org/abs/1906.08899v1 |

https://arxiv.org/pdf/1906.08899v1.pdf | |

PWC | https://paperswithcode.com/paper/limitations-of-lazy-training-of-two-layers |

Repo | https://github.com/bGhorbani/Lazy-Training-Neural-Nets |

Framework | tf |

#### Declarative Question Answering over Knowledge Bases containing Natural Language Text with Answer Set Programming

Title | Declarative Question Answering over Knowledge Bases containing Natural Language Text with Answer Set Programming |

Authors | Arindam Mitra, Peter Clark, Oyvind Tafjord, Chitta Baral |

Abstract | While in recent years machine learning (ML) based approaches have been the popular approach in developing end-to-end question answering systems, such systems often struggle when additional knowledge is needed to correctly answer the questions. Proposed alternatives involve translating the question and the natural language text to a logical representation and then use logical reasoning. However, this alternative falters when the size of the text gets bigger. To address this we propose an approach that does logical reasoning over premises written in natural language text. The proposed method uses recent features of Answer Set Programming (ASP) to call external NLP modules (which may be based on ML) which perform simple textual entailment. To test our approach we develop a corpus based on the life cycle questions and showed that Our system achieves up to $18%$ performance gain when compared to standard MCQ solvers. |

Tasks | Natural Language Inference, Question Answering |

Published | 2019-05-01 |

URL | http://arxiv.org/abs/1905.00198v1 |

http://arxiv.org/pdf/1905.00198v1.pdf | |

PWC | https://paperswithcode.com/paper/declarative-question-answering-over-knowledge |

Repo | https://github.com/OpenSourceAI/sota_server |

Framework | none |

#### Self-Supervised Monocular Depth Hints

Title | Self-Supervised Monocular Depth Hints |

Authors | Jamie Watson, Michael Firman, Gabriel J. Brostow, Daniyar Turmukhambetov |

Abstract | Monocular depth estimators can be trained with various forms of self-supervision from binocular-stereo data to circumvent the need for high-quality laser scans or other ground-truth data. The disadvantage, however, is that the photometric reprojection losses used with self-supervised learning typically have multiple local minima. These plausible-looking alternatives to ground truth can restrict what a regression network learns, causing it to predict depth maps of limited quality. As one prominent example, depth discontinuities around thin structures are often incorrectly estimated by current state-of-the-art methods. Here, we study the problem of ambiguous reprojections in depth prediction from stereo-based self-supervision, and introduce Depth Hints to alleviate their effects. Depth Hints are complementary depth suggestions obtained from simple off-the-shelf stereo algorithms. These hints enhance an existing photometric loss function, and are used to guide a network to learn better weights. They require no additional data, and are assumed to be right only sometimes. We show that using our Depth Hints gives a substantial boost when training several leading self-supervised-from-stereo models, not just our own. Further, combined with other good practices, we produce state-of-the-art depth predictions on the KITTI benchmark. |

Tasks | Depth Estimation, Monocular Depth Estimation |

Published | 2019-09-19 |

URL | https://arxiv.org/abs/1909.09051v1 |

https://arxiv.org/pdf/1909.09051v1.pdf | |

PWC | https://paperswithcode.com/paper/self-supervised-monocular-depth-hints |

Repo | https://github.com/nianticlabs/depth-hints |

Framework | none |

#### Learn Stereo, Infer Mono: Siamese Networks for Self-Supervised, Monocular, Depth Estimation

Title | Learn Stereo, Infer Mono: Siamese Networks for Self-Supervised, Monocular, Depth Estimation |

Authors | Matan Goldman, Tal Hassner, Shai Avidan |

Abstract | The field of self-supervised monocular depth estimation has seen huge advancements in recent years. Most methods assume stereo data is available during training but usually under-utilize it and only treat it as a reference signal. We propose a novel self-supervised approach which uses both left and right images equally during training, but can still be used with a single input image at test time, for monocular depth estimation. Our Siamese network architecture consists of two, twin networks, each learns to predict a disparity map from a single image. At test time, however, only one of these networks is used in order to infer depth. We show state-of-the-art results on the standard KITTI Eigen split benchmark as well as being the highest scoring self-supervised method on the new KITTI single view benchmark. To demonstrate the ability of our method to generalize to new data sets, we further provide results on the Make3D benchmark, which was not used during training. |

Tasks | Depth Estimation, Monocular Depth Estimation |

Published | 2019-05-01 |

URL | http://arxiv.org/abs/1905.00401v1 |

http://arxiv.org/pdf/1905.00401v1.pdf | |

PWC | https://paperswithcode.com/paper/learn-stereo-infer-mono-siamese-networks-for |

Repo | https://github.com/mtngld/lsim |

Framework | tf |