Paper Group ANR 88
Physics-Guided Machine Learning for Scientific Discovery: An Application in Simulating Lake Temperature Profiles. Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization. Learning to Multi-Task Learn for Better Neural Machine Translation. Gated Graph Recurrent Neural Networks. Interactive Robot Trainin …
Physics-Guided Machine Learning for Scientific Discovery: An Application in Simulating Lake Temperature Profiles
Title | Physics-Guided Machine Learning for Scientific Discovery: An Application in Simulating Lake Temperature Profiles |
Authors | Xiaowei Jia, Jared Willard, Anuj Karpatne, Jordan S Read, Jacob A Zwart, Michael Steinbach, Vipin Kumar |
Abstract | Physics-based models of dynamical systems are often used to study engineering and environmental systems. Despite their extensive use, these models have several well-known limitations due to simplified representations of the physical processes being modeled or challenges in selecting appropriate parameters. While-state-of-the-art machine learning models can sometimes outperform physics-based models given ample amount of training data, they can produce results that are physically inconsistent. This paper proposes a physics-guided recurrent neural network model (PGRNN) that combines RNNs and physics-based models to leverage their complementary strengths and improves the modeling of physical processes. Specifically, we show that a PGRNN can improve prediction accuracy over that of physics-based models, while generating outputs consistent with physical laws. An important aspect of our PGRNN approach lies in its ability to incorporate the knowledge encoded in physics-based models. This allows training the PGRNN model using very few true observed data while also ensuring high prediction accuracy. Although we present and evaluate this methodology in the context of modeling the dynamics of temperature in lakes, it is applicable more widely to a range of scientific and engineering disciplines where physics-based (also known as mechanistic) models are used, e.g., climate science, materials science, computational chemistry, and biomedicine. |
Tasks | |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.11086v1 |
https://arxiv.org/pdf/2001.11086v1.pdf | |
PWC | https://paperswithcode.com/paper/physics-guided-machine-learning-for |
Repo | |
Framework | |
Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization
Title | Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization |
Authors | Satrajit Chatterjee |
Abstract | An open question in the Deep Learning community is why neural networks trained with Gradient Descent generalize well on real datasets even though they are capable of fitting random data. We propose an approach to answering this question based on a hypothesis about the dynamics of gradient descent that we call Coherent Gradients: Gradients from similar examples are similar and so the overall gradient is stronger in certain directions where these reinforce each other. Thus changes to the network parameters during training are biased towards those that (locally) simultaneously benefit many examples when such similarity exists. We support this hypothesis with heuristic arguments and perturbative experiments and outline how this can explain several common empirical observations about Deep Learning. Furthermore, our analysis is not just descriptive, but prescriptive. It suggests a natural modification to gradient descent that can greatly reduce overfitting. |
Tasks | |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10657v1 |
https://arxiv.org/pdf/2002.10657v1.pdf | |
PWC | https://paperswithcode.com/paper/coherent-gradients-an-approach-to-1 |
Repo | |
Framework | |
Learning to Multi-Task Learn for Better Neural Machine Translation
Title | Learning to Multi-Task Learn for Better Neural Machine Translation |
Authors | Poorya Zaremoodi, Gholamreza Haffari |
Abstract | Scarcity of parallel sentence pairs is a major challenge for training high quality neural machine translation (NMT) models in bilingually low-resource scenarios, as NMT is data-hungry. Multi-task learning is an elegant approach to inject linguistic-related inductive biases into NMT, using auxiliary syntactic and semantic tasks, to improve generalisation. The challenge, however, is to devise effective training schedules, prescribing when to make use of the auxiliary tasks during the training process to fill the knowledge gaps of the main translation task, a setting referred to as biased-MTL. Current approaches for the training schedule are based on hand-engineering heuristics, whose effectiveness vary in different MTL settings. We propose a novel framework for learning the training schedule, ie learning to multi-task learn, for the MTL setting of interest. We formulate the training schedule as a Markov decision process which paves the way to employ policy learning methods to learn the scheduling policy. We effectively and efficiently learn the training schedule policy within the imitation learning framework using an oracle policy algorithm that dynamically sets the importance weights of auxiliary tasks based on their contributions to the generalisability of the main NMT task. Experiments on low-resource NMT settings show the resulting automatically learned training schedulers are competitive with the best heuristics, and lead to up to +1.1 BLEU score improvements. |
Tasks | Imitation Learning, Machine Translation, Multi-Task Learning |
Published | 2020-01-10 |
URL | https://arxiv.org/abs/2001.03294v1 |
https://arxiv.org/pdf/2001.03294v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-multi-task-learn-for-better |
Repo | |
Framework | |
Gated Graph Recurrent Neural Networks
Title | Gated Graph Recurrent Neural Networks |
Authors | Luana Ruiz, Fernando Gama, Alejandro Ribeiro |
Abstract | Graph processes exhibit a temporal structure determined by the sequence index and and a spatial structure determined by the graph support. To learn from graph processes, an information processing architecture must then be able to exploit both underlying structures. We introduce Graph Recurrent Neural Networks (GRNNs), which achieve this goal by leveraging the hidden Markov model (HMM) together with graph signal processing (GSP). In the GRNN, the number of learnable parameters is independent of the length of the sequence and of the size of the graph, guaranteeing scalability. We also prove that GRNNs are permutation equivariant and that they are stable to perturbations of the underlying graph support. Following the observation that stability decreases with longer sequences, we propose a time-gated extension of GRNNs. We also put forward node- and edge-gated variants of the GRNN to address the problem of vanishing gradients arising from long range graph dependencies. The advantages of GRNNs over GNNs and RNNs are demonstrated in a synthetic regression experiment and in a classification problem where seismic wave readings from a network of seismographs are used to predict the region of an earthquake. Finally, the benefits of time, node and edge gating are experimentally validated in multiple time and spatial correlation scenarios. |
Tasks | |
Published | 2020-02-03 |
URL | https://arxiv.org/abs/2002.01038v1 |
https://arxiv.org/pdf/2002.01038v1.pdf | |
PWC | https://paperswithcode.com/paper/gated-graph-recurrent-neural-networks |
Repo | |
Framework | |
Interactive Robot Training for Non-Markov Tasks
Title | Interactive Robot Training for Non-Markov Tasks |
Authors | Ankit Shah, Julie Shah |
Abstract | Defining sound and complete specifications for robots using formal languages is challenging, while learning formal specifications directly from demonstrations can lead to over-constrained task policies. In this paper, we propose a Bayesian interactive robot training framework that allows the robot to learn from both demonstrations provided by a teacher, and that teacher’s assessments of the robot’s task executions. We also present an active learning approach – inspired by uncertainty sampling – to identify the task execution with the most uncertain degree of acceptability. We demonstrate that active learning within our framework identifies a teacher’s intended task specification to a greater degree of similarity when compared with an approach that learns purely from demonstrations. Finally, we also conduct a user-study that demonstrates the efficacy of our active learning framework in learning a table-setting task from a human teacher. |
Tasks | Active Learning |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.02232v1 |
https://arxiv.org/pdf/2003.02232v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-robot-training-for-non-markov |
Repo | |
Framework | |
FASTER: Fast and Safe Trajectory Planner for Flights in Unknown Environments
Title | FASTER: Fast and Safe Trajectory Planner for Flights in Unknown Environments |
Authors | Jesus Tordesillas, Brett T. Lopez, Michael Everett, Jonathan P. How |
Abstract | Planning high-speed trajectories for UAVs in unknown environments requires algorithmic techniques that enable fast reaction times to guarantee safety as more information about the environment becomes available. The standard approach to ensure safety is to enforce a “stop” condition in the free-known space. However, this can severely limit the speed of the vehicle, especially in situations where much of the world is unknown. Moreover, the ad-hoc time and interval allocation scheme usually imposed on the trajectory also leads to conservative and slower trajectories. This work proposes FASTER (Fast and Safe Trajectory Planner) to ensure safety without sacrificing speed. FASTER obtains high-speed trajectories by enabling the local planner to optimize in both the free-known and unknown spaces. Safety guarantees are ensured by always having a feasible, safe back-up trajectory in the free-known space at the start of each replanning step. The Mixed Integer Quadratic Program formulation proposed allows the solver to choose the trajectory interval allocation, and the time allocation is found by a line search algorithm initialized with a heuristic computed from the previous replanning iteration. This proposed algorithm is tested extensively both in simulation and in real hardware, showing agile flights in unknown cluttered environments with velocities up to 7.8 m/s. To demonstrate the generality of the proposed framework, FASTER is also applied to a skid-steer robot, and the maximum speed specified for the robot (2 m/s) is achieved in real hardware experiments. |
Tasks | |
Published | 2020-01-09 |
URL | https://arxiv.org/abs/2001.04420v1 |
https://arxiv.org/pdf/2001.04420v1.pdf | |
PWC | https://paperswithcode.com/paper/faster-fast-and-safe-trajectory-planner-for |
Repo | |
Framework | |
iCap: Interactive Image Captioning with Predictive Text
Title | iCap: Interactive Image Captioning with Predictive Text |
Authors | Zhengxiong Jia, Xirong Li |
Abstract | In this paper we study a brand new topic of interactive image captioning with human in the loop. Different from automated image captioning where a given test image is the sole input in the inference stage, we have access to both the test image and a sequence of (incomplete) user-input sentences in the interactive scenario. We formulate the problem as Visually Conditioned Sentence Completion (VCSC). For VCSC, we propose asynchronous bidirectional decoding for image caption completion (ABD-Cap). With ABD-Cap as the core module, we build iCap, a web-based interactive image captioning system capable of predicting new text with respect to live input from a user. A number of experiments covering both automated evaluations and real user studies show the viability of our proposals. |
Tasks | Image Captioning |
Published | 2020-01-31 |
URL | https://arxiv.org/abs/2001.11782v3 |
https://arxiv.org/pdf/2001.11782v3.pdf | |
PWC | https://paperswithcode.com/paper/icap-interative-image-captioning-with |
Repo | |
Framework | |
Particle-Gibbs Sampling For Bayesian Feature Allocation Models
Title | Particle-Gibbs Sampling For Bayesian Feature Allocation Models |
Authors | Alexandre Bouchard-Côté, Andrew Roth |
Abstract | Bayesian feature allocation models are a popular tool for modelling data with a combinatorial latent structure. Exact inference in these models is generally intractable and so practitioners typically apply Markov Chain Monte Carlo (MCMC) methods for posterior inference. The most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix. These element wise updates can be inefficient as features are typically strongly correlated. To overcome this problem we have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move. However, this sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features. We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features. We compare the performance of our proposed methods to the standard Gibbs sampler using synthetic data from a range of feature allocation models. Our results suggest that row wise updates using the PG methodology can significantly improve the performance of samplers for feature allocation models. |
Tasks | |
Published | 2020-01-25 |
URL | https://arxiv.org/abs/2001.09367v1 |
https://arxiv.org/pdf/2001.09367v1.pdf | |
PWC | https://paperswithcode.com/paper/particle-gibbs-sampling-for-bayesian-feature |
Repo | |
Framework | |
Preference Modeling with Context-Dependent Salient Features
Title | Preference Modeling with Context-Dependent Salient Features |
Authors | Amanda Bower, Laura Balzano |
Abstract | We consider the problem of estimating a ranking on a set of items from noisy pairwise comparisons given item features. We address the fact that pairwise comparison data often reflects irrational choice, e.g. intransitivity. Our key observation is that two items compared in isolation from other items may be compared based on only a salient subset of features. Formalizing this framework, we propose the “salient feature preference model” and prove a sample complexity result for learning the parameters of our model and the underlying ranking with maximum likelihood estimation. We also provide empirical results that support our theoretical bounds and illustrate how our model explains systematic intransitivity. Finally we demonstrate strong performance of maximum likelihood estimation of our model on both synthetic data and two real data sets: the UT Zappos50K data set and comparison data about the compactness of legislative districts in the US. |
Tasks | |
Published | 2020-02-22 |
URL | https://arxiv.org/abs/2002.09615v1 |
https://arxiv.org/pdf/2002.09615v1.pdf | |
PWC | https://paperswithcode.com/paper/preference-modeling-with-context-dependent |
Repo | |
Framework | |
Sparsity-promoting algorithms for the discovery of informative Koopman invariant subspaces
Title | Sparsity-promoting algorithms for the discovery of informative Koopman invariant subspaces |
Authors | Shaowu Pan, Nicholas Arnold-Medabalimi, Karthik Duraisamy |
Abstract | Koopman decomposition is a non-linear generalization of eigen decomposition, and is being increasingly utilized in the analysis of spatio-temporal dynamics. Well-known techniques such as the dynamic mode decomposition (DMD) and its variants provide approximations to the Koopman operator, and have been applied extensively in many fluid dynamic problems. Despite being endowed with a richer dictionary of nonlinear observables, nonlinear variants of the DMD, such as extended/kernel dynamic mode decomposition (EDMD/KDMD) are seldom applied to large-scale problems primarily due to the difficulty of discerning the Koopman invariant subspace from thousands of resulting Koopman triplets: eigenvalues, eigenvectors, and modes. To address this issue, we revisit the formulation of EDMD and KDMD, and propose an algorithm based on multi-task feature learning to extract the most informative Koopman invariant subspace by removing redundant and spurious Koopman triplets. These algorithms can be viewed as sparsity promoting extensions of EDMD/KDMD and are presented in an open-source package. Further, we extend KDMD to a continuous-time setting and show a relationship between the present algorithm, sparsity-promoting DMD and an empirical criterion from the viewpoint of non-convex optimization. The effectiveness of our algorithm is demonstrated on examples ranging from simple dynamical systems to two-dimensional cylinder wake flows at different Reynolds numbers and a three-dimensional turbulent ship air-wake flow. The latter two problems are designed such that very strong transients are present in the flow evolution, thus requiring accurate representation of decaying modes. |
Tasks | |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10637v1 |
https://arxiv.org/pdf/2002.10637v1.pdf | |
PWC | https://paperswithcode.com/paper/sparsity-promoting-algorithms-for-the |
Repo | |
Framework | |
Generating Natural Adversarial Hyperspectral examples with a modified Wasserstein GAN
Title | Generating Natural Adversarial Hyperspectral examples with a modified Wasserstein GAN |
Authors | Jean-Christophe Burnel, Kilian Fatras, Nicolas Courty |
Abstract | Adversarial examples are a hot topic due to their abilities to fool a classifier’s prediction. There are two strategies to create such examples, one uses the attacked classifier’s gradients, while the other only requires access to the clas-sifier’s prediction. This is particularly appealing when the classifier is not full known (black box model). In this paper, we present a new method which is able to generate natural adversarial examples from the true data following the second paradigm. Based on Generative Adversarial Networks (GANs) [5], it reweights the true data empirical distribution to encourage the classifier to generate ad-versarial examples. We provide a proof of concept of our method by generating adversarial hyperspectral signatures on a remote sensing dataset. |
Tasks | |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.09993v1 |
https://arxiv.org/pdf/2001.09993v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-natural-adversarial-hyperspectral |
Repo | |
Framework | |
Subspace Fitting Meets Regression: The Effects of Supervision and Orthonormality Constraints on Double Descent of Generalization Errors
Title | Subspace Fitting Meets Regression: The Effects of Supervision and Orthonormality Constraints on Double Descent of Generalization Errors |
Authors | Yehuda Dar, Paul Mayer, Lorenzo Luzi, Richard G. Baraniuk |
Abstract | We study the linear subspace fitting problem in the overparameterized setting, where the estimated subspace can perfectly interpolate the training examples. Our scope includes the least-squares solutions to subspace fitting tasks with varying levels of supervision in the training data (i.e., the proportion of input-output examples of the desired low-dimensional mapping) and orthonormality of the vectors defining the learned operator. This flexible family of problems connects standard, unsupervised subspace fitting that enforces strict orthonormality with a corresponding regression task that is fully supervised and does not constrain the linear operator structure. This class of problems is defined over a supervision-orthonormality plane, where each coordinate induces a problem instance with a unique pair of supervision level and softness of orthonormality constraints. We explore this plane and show that the generalization errors of the corresponding subspace fitting problems follow double descent trends as the settings become more supervised and less orthonormally constrained. |
Tasks | |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10614v1 |
https://arxiv.org/pdf/2002.10614v1.pdf | |
PWC | https://paperswithcode.com/paper/subspace-fitting-meets-regression-the-effects |
Repo | |
Framework | |
Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods
Title | Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods |
Authors | Anirudh Itagi, Ritam Sil, Saurav Mohapatra, Subham Rout, Bharath K P, Karthik R, Rajesh Kumar Muthu |
Abstract | Misclassification of medicine is perilous to the health of a patient, more so if the said patient is visually impaired or simply did not recognize the color, shape or type of medicine strip. This paper proposes a method for identification of medicine strips by 2-D cepstral analysis of their images followed by performing classification that has been done using the K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Logistic Regression (LR) Classifiers. The 2-D cepstral features extracted are extremely distinct to a medicine strip and consequently make identifying them exceptionally accurate. This paper also proposes the Color Gradient and Pill shape Feature (CGPF) extraction procedure and discusses the Binary Robust Invariant Scalable Keypoints (BRISK) algorithm as well. The mentioned algorithms were implemented and their identification results have been compared. |
Tasks | |
Published | 2020-02-03 |
URL | https://arxiv.org/abs/2003.00810v1 |
https://arxiv.org/pdf/2003.00810v1.pdf | |
PWC | https://paperswithcode.com/paper/medicine-strip-identification-using-2-d |
Repo | |
Framework | |
Triangle-Net: Towards Robustness in Point Cloud Classification
Title | Triangle-Net: Towards Robustness in Point Cloud Classification |
Authors | Chenxi Xiao, Juan Wachs |
Abstract | 3D object recognition is becoming a key desired capability for many computer vision systems such as autonomous vehicles, service robots and surveillance drones to operate more effectively in unstructured environments. These real-time systems require effective classification methods that are robust to sampling resolution, measurement noise, and pose configuration of the objects. Previous research has shown that sparsity, rotation and positional variance of points can lead to a significant drop in the performance of point cloud based classification techniques. In this regard, we propose a novel approach for 3D classification that takes sparse point clouds as input and learns a model that is robust to rotational and positional variance as well as point sparsity. To this end, we introduce new feature descriptors which are fed as an input to our proposed neural network in order to learn a robust latent representation of the 3D object. We show that such latent representations can significantly improve the performance of object classification and retrieval. Further, we show that our approach outperforms PointNet and 3DmFV by 34.4% and 27.4% respectively in classification tasks using sparse point clouds of only 16 points under arbitrary SO(3) rotation. |
Tasks | 3D Object Recognition, Autonomous Vehicles, Object Classification, Object Recognition |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2003.00856v1 |
https://arxiv.org/pdf/2003.00856v1.pdf | |
PWC | https://paperswithcode.com/paper/triangle-net-towards-robustness-in-point |
Repo | |
Framework | |
Retrain or not retrain? – efficient pruning methods of deep CNN networks
Title | Retrain or not retrain? – efficient pruning methods of deep CNN networks |
Authors | Marcin Pietron, Maciej Wielgosz |
Abstract | Convolutional neural networks (CNN) play a major role in image processing tasks like image classification, object detection, semantic segmentation. Very often CNN networks have from several to hundred stacked layers with several megabytes of weights. One of the possible methods to reduce complexity and memory footprint is pruning. Pruning is a process of removing weights which connect neurons from two adjacent layers in the network. The process of finding near optimal solution with specified drop in accuracy can be more sophisticated when DL model has higher number of convolutional layers. In the paper few approaches based on retraining and no retraining are described and compared together. |
Tasks | Image Classification, Object Detection, Semantic Segmentation |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.07051v1 |
https://arxiv.org/pdf/2002.07051v1.pdf | |
PWC | https://paperswithcode.com/paper/retrain-or-not-retrain-efficient-pruning |
Repo | |
Framework | |