April 2, 2020

3181 words 15 mins read

Paper Group ANR 88

Paper Group ANR 88

Physics-Guided Machine Learning for Scientific Discovery: An Application in Simulating Lake Temperature Profiles. Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization. Learning to Multi-Task Learn for Better Neural Machine Translation. Gated Graph Recurrent Neural Networks. Interactive Robot Trainin …

Physics-Guided Machine Learning for Scientific Discovery: An Application in Simulating Lake Temperature Profiles

Title Physics-Guided Machine Learning for Scientific Discovery: An Application in Simulating Lake Temperature Profiles
Authors Xiaowei Jia, Jared Willard, Anuj Karpatne, Jordan S Read, Jacob A Zwart, Michael Steinbach, Vipin Kumar
Abstract Physics-based models of dynamical systems are often used to study engineering and environmental systems. Despite their extensive use, these models have several well-known limitations due to simplified representations of the physical processes being modeled or challenges in selecting appropriate parameters. While-state-of-the-art machine learning models can sometimes outperform physics-based models given ample amount of training data, they can produce results that are physically inconsistent. This paper proposes a physics-guided recurrent neural network model (PGRNN) that combines RNNs and physics-based models to leverage their complementary strengths and improves the modeling of physical processes. Specifically, we show that a PGRNN can improve prediction accuracy over that of physics-based models, while generating outputs consistent with physical laws. An important aspect of our PGRNN approach lies in its ability to incorporate the knowledge encoded in physics-based models. This allows training the PGRNN model using very few true observed data while also ensuring high prediction accuracy. Although we present and evaluate this methodology in the context of modeling the dynamics of temperature in lakes, it is applicable more widely to a range of scientific and engineering disciplines where physics-based (also known as mechanistic) models are used, e.g., climate science, materials science, computational chemistry, and biomedicine.
Published 2020-01-28
URL https://arxiv.org/abs/2001.11086v1
PDF https://arxiv.org/pdf/2001.11086v1.pdf
PWC https://paperswithcode.com/paper/physics-guided-machine-learning-for

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

Title Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization
Authors Satrajit Chatterjee
Abstract An open question in the Deep Learning community is why neural networks trained with Gradient Descent generalize well on real datasets even though they are capable of fitting random data. We propose an approach to answering this question based on a hypothesis about the dynamics of gradient descent that we call Coherent Gradients: Gradients from similar examples are similar and so the overall gradient is stronger in certain directions where these reinforce each other. Thus changes to the network parameters during training are biased towards those that (locally) simultaneously benefit many examples when such similarity exists. We support this hypothesis with heuristic arguments and perturbative experiments and outline how this can explain several common empirical observations about Deep Learning. Furthermore, our analysis is not just descriptive, but prescriptive. It suggests a natural modification to gradient descent that can greatly reduce overfitting.
Published 2020-02-25
URL https://arxiv.org/abs/2002.10657v1
PDF https://arxiv.org/pdf/2002.10657v1.pdf
PWC https://paperswithcode.com/paper/coherent-gradients-an-approach-to-1

Learning to Multi-Task Learn for Better Neural Machine Translation

Title Learning to Multi-Task Learn for Better Neural Machine Translation
Authors Poorya Zaremoodi, Gholamreza Haffari
Abstract Scarcity of parallel sentence pairs is a major challenge for training high quality neural machine translation (NMT) models in bilingually low-resource scenarios, as NMT is data-hungry. Multi-task learning is an elegant approach to inject linguistic-related inductive biases into NMT, using auxiliary syntactic and semantic tasks, to improve generalisation. The challenge, however, is to devise effective training schedules, prescribing when to make use of the auxiliary tasks during the training process to fill the knowledge gaps of the main translation task, a setting referred to as biased-MTL. Current approaches for the training schedule are based on hand-engineering heuristics, whose effectiveness vary in different MTL settings. We propose a novel framework for learning the training schedule, ie learning to multi-task learn, for the MTL setting of interest. We formulate the training schedule as a Markov decision process which paves the way to employ policy learning methods to learn the scheduling policy. We effectively and efficiently learn the training schedule policy within the imitation learning framework using an oracle policy algorithm that dynamically sets the importance weights of auxiliary tasks based on their contributions to the generalisability of the main NMT task. Experiments on low-resource NMT settings show the resulting automatically learned training schedulers are competitive with the best heuristics, and lead to up to +1.1 BLEU score improvements.
Tasks Imitation Learning, Machine Translation, Multi-Task Learning
Published 2020-01-10
URL https://arxiv.org/abs/2001.03294v1
PDF https://arxiv.org/pdf/2001.03294v1.pdf
PWC https://paperswithcode.com/paper/learning-to-multi-task-learn-for-better

Gated Graph Recurrent Neural Networks

Title Gated Graph Recurrent Neural Networks
Authors Luana Ruiz, Fernando Gama, Alejandro Ribeiro
Abstract Graph processes exhibit a temporal structure determined by the sequence index and and a spatial structure determined by the graph support. To learn from graph processes, an information processing architecture must then be able to exploit both underlying structures. We introduce Graph Recurrent Neural Networks (GRNNs), which achieve this goal by leveraging the hidden Markov model (HMM) together with graph signal processing (GSP). In the GRNN, the number of learnable parameters is independent of the length of the sequence and of the size of the graph, guaranteeing scalability. We also prove that GRNNs are permutation equivariant and that they are stable to perturbations of the underlying graph support. Following the observation that stability decreases with longer sequences, we propose a time-gated extension of GRNNs. We also put forward node- and edge-gated variants of the GRNN to address the problem of vanishing gradients arising from long range graph dependencies. The advantages of GRNNs over GNNs and RNNs are demonstrated in a synthetic regression experiment and in a classification problem where seismic wave readings from a network of seismographs are used to predict the region of an earthquake. Finally, the benefits of time, node and edge gating are experimentally validated in multiple time and spatial correlation scenarios.
Published 2020-02-03
URL https://arxiv.org/abs/2002.01038v1
PDF https://arxiv.org/pdf/2002.01038v1.pdf
PWC https://paperswithcode.com/paper/gated-graph-recurrent-neural-networks

Interactive Robot Training for Non-Markov Tasks

Title Interactive Robot Training for Non-Markov Tasks
Authors Ankit Shah, Julie Shah
Abstract Defining sound and complete specifications for robots using formal languages is challenging, while learning formal specifications directly from demonstrations can lead to over-constrained task policies. In this paper, we propose a Bayesian interactive robot training framework that allows the robot to learn from both demonstrations provided by a teacher, and that teacher’s assessments of the robot’s task executions. We also present an active learning approach – inspired by uncertainty sampling – to identify the task execution with the most uncertain degree of acceptability. We demonstrate that active learning within our framework identifies a teacher’s intended task specification to a greater degree of similarity when compared with an approach that learns purely from demonstrations. Finally, we also conduct a user-study that demonstrates the efficacy of our active learning framework in learning a table-setting task from a human teacher.
Tasks Active Learning
Published 2020-03-04
URL https://arxiv.org/abs/2003.02232v1
PDF https://arxiv.org/pdf/2003.02232v1.pdf
PWC https://paperswithcode.com/paper/interactive-robot-training-for-non-markov

FASTER: Fast and Safe Trajectory Planner for Flights in Unknown Environments

Title FASTER: Fast and Safe Trajectory Planner for Flights in Unknown Environments
Authors Jesus Tordesillas, Brett T. Lopez, Michael Everett, Jonathan P. How
Abstract Planning high-speed trajectories for UAVs in unknown environments requires algorithmic techniques that enable fast reaction times to guarantee safety as more information about the environment becomes available. The standard approach to ensure safety is to enforce a “stop” condition in the free-known space. However, this can severely limit the speed of the vehicle, especially in situations where much of the world is unknown. Moreover, the ad-hoc time and interval allocation scheme usually imposed on the trajectory also leads to conservative and slower trajectories. This work proposes FASTER (Fast and Safe Trajectory Planner) to ensure safety without sacrificing speed. FASTER obtains high-speed trajectories by enabling the local planner to optimize in both the free-known and unknown spaces. Safety guarantees are ensured by always having a feasible, safe back-up trajectory in the free-known space at the start of each replanning step. The Mixed Integer Quadratic Program formulation proposed allows the solver to choose the trajectory interval allocation, and the time allocation is found by a line search algorithm initialized with a heuristic computed from the previous replanning iteration. This proposed algorithm is tested extensively both in simulation and in real hardware, showing agile flights in unknown cluttered environments with velocities up to 7.8 m/s. To demonstrate the generality of the proposed framework, FASTER is also applied to a skid-steer robot, and the maximum speed specified for the robot (2 m/s) is achieved in real hardware experiments.
Published 2020-01-09
URL https://arxiv.org/abs/2001.04420v1
PDF https://arxiv.org/pdf/2001.04420v1.pdf
PWC https://paperswithcode.com/paper/faster-fast-and-safe-trajectory-planner-for

iCap: Interactive Image Captioning with Predictive Text

Title iCap: Interactive Image Captioning with Predictive Text
Authors Zhengxiong Jia, Xirong Li
Abstract In this paper we study a brand new topic of interactive image captioning with human in the loop. Different from automated image captioning where a given test image is the sole input in the inference stage, we have access to both the test image and a sequence of (incomplete) user-input sentences in the interactive scenario. We formulate the problem as Visually Conditioned Sentence Completion (VCSC). For VCSC, we propose asynchronous bidirectional decoding for image caption completion (ABD-Cap). With ABD-Cap as the core module, we build iCap, a web-based interactive image captioning system capable of predicting new text with respect to live input from a user. A number of experiments covering both automated evaluations and real user studies show the viability of our proposals.
Tasks Image Captioning
Published 2020-01-31
URL https://arxiv.org/abs/2001.11782v3
PDF https://arxiv.org/pdf/2001.11782v3.pdf
PWC https://paperswithcode.com/paper/icap-interative-image-captioning-with

Particle-Gibbs Sampling For Bayesian Feature Allocation Models

Title Particle-Gibbs Sampling For Bayesian Feature Allocation Models
Authors Alexandre Bouchard-Côté, Andrew Roth
Abstract Bayesian feature allocation models are a popular tool for modelling data with a combinatorial latent structure. Exact inference in these models is generally intractable and so practitioners typically apply Markov Chain Monte Carlo (MCMC) methods for posterior inference. The most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix. These element wise updates can be inefficient as features are typically strongly correlated. To overcome this problem we have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move. However, this sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features. We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features. We compare the performance of our proposed methods to the standard Gibbs sampler using synthetic data from a range of feature allocation models. Our results suggest that row wise updates using the PG methodology can significantly improve the performance of samplers for feature allocation models.
Published 2020-01-25
URL https://arxiv.org/abs/2001.09367v1
PDF https://arxiv.org/pdf/2001.09367v1.pdf
PWC https://paperswithcode.com/paper/particle-gibbs-sampling-for-bayesian-feature

Preference Modeling with Context-Dependent Salient Features

Title Preference Modeling with Context-Dependent Salient Features
Authors Amanda Bower, Laura Balzano
Abstract We consider the problem of estimating a ranking on a set of items from noisy pairwise comparisons given item features. We address the fact that pairwise comparison data often reflects irrational choice, e.g. intransitivity. Our key observation is that two items compared in isolation from other items may be compared based on only a salient subset of features. Formalizing this framework, we propose the “salient feature preference model” and prove a sample complexity result for learning the parameters of our model and the underlying ranking with maximum likelihood estimation. We also provide empirical results that support our theoretical bounds and illustrate how our model explains systematic intransitivity. Finally we demonstrate strong performance of maximum likelihood estimation of our model on both synthetic data and two real data sets: the UT Zappos50K data set and comparison data about the compactness of legislative districts in the US.
Published 2020-02-22
URL https://arxiv.org/abs/2002.09615v1
PDF https://arxiv.org/pdf/2002.09615v1.pdf
PWC https://paperswithcode.com/paper/preference-modeling-with-context-dependent

Sparsity-promoting algorithms for the discovery of informative Koopman invariant subspaces

Title Sparsity-promoting algorithms for the discovery of informative Koopman invariant subspaces
Authors Shaowu Pan, Nicholas Arnold-Medabalimi, Karthik Duraisamy
Abstract Koopman decomposition is a non-linear generalization of eigen decomposition, and is being increasingly utilized in the analysis of spatio-temporal dynamics. Well-known techniques such as the dynamic mode decomposition (DMD) and its variants provide approximations to the Koopman operator, and have been applied extensively in many fluid dynamic problems. Despite being endowed with a richer dictionary of nonlinear observables, nonlinear variants of the DMD, such as extended/kernel dynamic mode decomposition (EDMD/KDMD) are seldom applied to large-scale problems primarily due to the difficulty of discerning the Koopman invariant subspace from thousands of resulting Koopman triplets: eigenvalues, eigenvectors, and modes. To address this issue, we revisit the formulation of EDMD and KDMD, and propose an algorithm based on multi-task feature learning to extract the most informative Koopman invariant subspace by removing redundant and spurious Koopman triplets. These algorithms can be viewed as sparsity promoting extensions of EDMD/KDMD and are presented in an open-source package. Further, we extend KDMD to a continuous-time setting and show a relationship between the present algorithm, sparsity-promoting DMD and an empirical criterion from the viewpoint of non-convex optimization. The effectiveness of our algorithm is demonstrated on examples ranging from simple dynamical systems to two-dimensional cylinder wake flows at different Reynolds numbers and a three-dimensional turbulent ship air-wake flow. The latter two problems are designed such that very strong transients are present in the flow evolution, thus requiring accurate representation of decaying modes.
Published 2020-02-25
URL https://arxiv.org/abs/2002.10637v1
PDF https://arxiv.org/pdf/2002.10637v1.pdf
PWC https://paperswithcode.com/paper/sparsity-promoting-algorithms-for-the

Generating Natural Adversarial Hyperspectral examples with a modified Wasserstein GAN

Title Generating Natural Adversarial Hyperspectral examples with a modified Wasserstein GAN
Authors Jean-Christophe Burnel, Kilian Fatras, Nicolas Courty
Abstract Adversarial examples are a hot topic due to their abilities to fool a classifier’s prediction. There are two strategies to create such examples, one uses the attacked classifier’s gradients, while the other only requires access to the clas-sifier’s prediction. This is particularly appealing when the classifier is not full known (black box model). In this paper, we present a new method which is able to generate natural adversarial examples from the true data following the second paradigm. Based on Generative Adversarial Networks (GANs) [5], it reweights the true data empirical distribution to encourage the classifier to generate ad-versarial examples. We provide a proof of concept of our method by generating adversarial hyperspectral signatures on a remote sensing dataset.
Published 2020-01-27
URL https://arxiv.org/abs/2001.09993v1
PDF https://arxiv.org/pdf/2001.09993v1.pdf
PWC https://paperswithcode.com/paper/generating-natural-adversarial-hyperspectral

Subspace Fitting Meets Regression: The Effects of Supervision and Orthonormality Constraints on Double Descent of Generalization Errors

Title Subspace Fitting Meets Regression: The Effects of Supervision and Orthonormality Constraints on Double Descent of Generalization Errors
Authors Yehuda Dar, Paul Mayer, Lorenzo Luzi, Richard G. Baraniuk
Abstract We study the linear subspace fitting problem in the overparameterized setting, where the estimated subspace can perfectly interpolate the training examples. Our scope includes the least-squares solutions to subspace fitting tasks with varying levels of supervision in the training data (i.e., the proportion of input-output examples of the desired low-dimensional mapping) and orthonormality of the vectors defining the learned operator. This flexible family of problems connects standard, unsupervised subspace fitting that enforces strict orthonormality with a corresponding regression task that is fully supervised and does not constrain the linear operator structure. This class of problems is defined over a supervision-orthonormality plane, where each coordinate induces a problem instance with a unique pair of supervision level and softness of orthonormality constraints. We explore this plane and show that the generalization errors of the corresponding subspace fitting problems follow double descent trends as the settings become more supervised and less orthonormally constrained.
Published 2020-02-25
URL https://arxiv.org/abs/2002.10614v1
PDF https://arxiv.org/pdf/2002.10614v1.pdf
PWC https://paperswithcode.com/paper/subspace-fitting-meets-regression-the-effects

Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods

Title Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods
Authors Anirudh Itagi, Ritam Sil, Saurav Mohapatra, Subham Rout, Bharath K P, Karthik R, Rajesh Kumar Muthu
Abstract Misclassification of medicine is perilous to the health of a patient, more so if the said patient is visually impaired or simply did not recognize the color, shape or type of medicine strip. This paper proposes a method for identification of medicine strips by 2-D cepstral analysis of their images followed by performing classification that has been done using the K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Logistic Regression (LR) Classifiers. The 2-D cepstral features extracted are extremely distinct to a medicine strip and consequently make identifying them exceptionally accurate. This paper also proposes the Color Gradient and Pill shape Feature (CGPF) extraction procedure and discusses the Binary Robust Invariant Scalable Keypoints (BRISK) algorithm as well. The mentioned algorithms were implemented and their identification results have been compared.
Published 2020-02-03
URL https://arxiv.org/abs/2003.00810v1
PDF https://arxiv.org/pdf/2003.00810v1.pdf
PWC https://paperswithcode.com/paper/medicine-strip-identification-using-2-d

Triangle-Net: Towards Robustness in Point Cloud Classification

Title Triangle-Net: Towards Robustness in Point Cloud Classification
Authors Chenxi Xiao, Juan Wachs
Abstract 3D object recognition is becoming a key desired capability for many computer vision systems such as autonomous vehicles, service robots and surveillance drones to operate more effectively in unstructured environments. These real-time systems require effective classification methods that are robust to sampling resolution, measurement noise, and pose configuration of the objects. Previous research has shown that sparsity, rotation and positional variance of points can lead to a significant drop in the performance of point cloud based classification techniques. In this regard, we propose a novel approach for 3D classification that takes sparse point clouds as input and learns a model that is robust to rotational and positional variance as well as point sparsity. To this end, we introduce new feature descriptors which are fed as an input to our proposed neural network in order to learn a robust latent representation of the 3D object. We show that such latent representations can significantly improve the performance of object classification and retrieval. Further, we show that our approach outperforms PointNet and 3DmFV by 34.4% and 27.4% respectively in classification tasks using sparse point clouds of only 16 points under arbitrary SO(3) rotation.
Tasks 3D Object Recognition, Autonomous Vehicles, Object Classification, Object Recognition
Published 2020-02-27
URL https://arxiv.org/abs/2003.00856v1
PDF https://arxiv.org/pdf/2003.00856v1.pdf
PWC https://paperswithcode.com/paper/triangle-net-towards-robustness-in-point

Retrain or not retrain? – efficient pruning methods of deep CNN networks

Title Retrain or not retrain? – efficient pruning methods of deep CNN networks
Authors Marcin Pietron, Maciej Wielgosz
Abstract Convolutional neural networks (CNN) play a major role in image processing tasks like image classification, object detection, semantic segmentation. Very often CNN networks have from several to hundred stacked layers with several megabytes of weights. One of the possible methods to reduce complexity and memory footprint is pruning. Pruning is a process of removing weights which connect neurons from two adjacent layers in the network. The process of finding near optimal solution with specified drop in accuracy can be more sophisticated when DL model has higher number of convolutional layers. In the paper few approaches based on retraining and no retraining are described and compared together.
Tasks Image Classification, Object Detection, Semantic Segmentation
Published 2020-02-12
URL https://arxiv.org/abs/2002.07051v1
PDF https://arxiv.org/pdf/2002.07051v1.pdf
PWC https://paperswithcode.com/paper/retrain-or-not-retrain-efficient-pruning
comments powered by Disqus