April 2, 2020

3073 words 15 mins read

Paper Group ANR 101

Paper Group ANR 101

Up to two billion times acceleration of scientific simulations with deep neural architecture search. Towards Practical Lottery Ticket Hypothesis for Adversarial Training. Deeper Insights into Weight Sharing in Neural Architecture Search. Baryons from Mesons: A Machine Learning Perspective. Wavelet-based Temporal Forecasting Models of Human Activiti …

Title Up to two billion times acceleration of scientific simulations with deep neural architecture search
Authors M. F. Kasim, D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, M. Jarvis, S. Khatiwala, J. Korenaga, J. Topp-Mugglestone, E. Viezzer, S. M. Vinko
Abstract Computer simulations are invaluable tools for scientific discovery. However, accurate simulations are often slow to execute, which limits their applicability to extensive parameter exploration, large-scale data analysis, and uncertainty quantification. A promising route to accelerate simulations by building fast emulators with machine learning requires large training datasets, which can be prohibitively expensive to obtain with slow simulations. Here we present a method based on neural architecture search to build accurate emulators even with a limited number of training data. The method successfully accelerates simulations by up to 2 billion times in 10 scientific cases including astrophysics, climate science, biogeochemistry, high energy density physics, fusion energy, and seismology, using the same super-architecture, algorithm, and hyperparameters. Our approach also inherently provides emulator uncertainty estimation, adding further confidence in their use. We anticipate this work will accelerate research involving expensive simulations, allow more extensive parameters exploration, and enable new, previously unfeasible computational discovery.
Tasks Neural Architecture Search
Published 2020-01-17
URL https://arxiv.org/abs/2001.08055v1
PDF https://arxiv.org/pdf/2001.08055v1.pdf
PWC https://paperswithcode.com/paper/up-to-two-billion-times-acceleration-of

Towards Practical Lottery Ticket Hypothesis for Adversarial Training

Title Towards Practical Lottery Ticket Hypothesis for Adversarial Training
Authors Bai Li, Shiqi Wang, Yunhan Jia, Yantao Lu, Zhenyu Zhong, Lawrence Carin, Suman Jana
Abstract Recent research has proposed the lottery ticket hypothesis, suggesting that for a deep neural network, there exist trainable sub-networks performing equally or better than the original model with commensurate training steps. While this discovery is insightful, finding proper sub-networks requires iterative training and pruning. The high cost incurred limits the applications of the lottery ticket hypothesis. We show there exists a subset of the aforementioned sub-networks that converge significantly faster during the training process and thus can mitigate the cost issue. We conduct extensive experiments to show such sub-networks consistently exist across various model structures for a restrictive setting of hyperparameters ($e.g.$, carefully selected learning rate, pruning ratio, and model capacity). As a practical application of our findings, we demonstrate that such sub-networks can help in cutting down the total time of adversarial training, a standard approach to improve robustness, by up to 49% on CIFAR-10 to achieve the state-of-the-art robustness.
Published 2020-03-06
URL https://arxiv.org/abs/2003.05733v1
PDF https://arxiv.org/pdf/2003.05733v1.pdf
PWC https://paperswithcode.com/paper/towards-practical-lottery-ticket-hypothesis
Title Deeper Insights into Weight Sharing in Neural Architecture Search
Authors Yuge Zhang, Zejun Lin, Junyang Jiang, Quanlu Zhang, Yujing Wang, Hui Xue, Chen Zhang, Yaming Yang
Abstract With the success of deep neural networks, Neural Architecture Search (NAS) as a way of automatic model design has attracted wide attention. As training every child model from scratch is very time-consuming, recent works leverage weight-sharing to speed up the model evaluation procedure. These approaches greatly reduce computation by maintaining a single copy of weights on the super-net and share the weights among every child model. However, weight-sharing has no theoretical guarantee and its impact has not been well studied before. In this paper, we conduct comprehensive experiments to reveal the impact of weight-sharing: (1) The best-performing models from different runs or even from consecutive epochs within the same run have significant variance; (2) Even with high variance, we can extract valuable information from training the super-net with shared weights; (3) The interference between child models is a main factor that induces high variance; (4) Properly reducing the degree of weight sharing could effectively reduce variance and improve performance.
Tasks Neural Architecture Search
Published 2020-01-06
URL https://arxiv.org/abs/2001.01431v1
PDF https://arxiv.org/pdf/2001.01431v1.pdf
PWC https://paperswithcode.com/paper/deeper-insights-into-weight-sharing-in-neural-1

Baryons from Mesons: A Machine Learning Perspective

Title Baryons from Mesons: A Machine Learning Perspective
Authors Yarin Gal, Vishnu Jejjala, Damian Kaloni Mayorga Pena, Challenger Mishra
Abstract Quantum chromodynamics (QCD) is the theory of the strong interaction. The fundamental particles of QCD, quarks and gluons, carry colour charge and form colourless bound states at low energies. The hadronic bound states of primary interest to us are the mesons and the baryons. From knowledge of the meson spectrum, we use neural networks and Gaussian processes to predict the masses of baryons with 90.3% and 96.6% accuracy, respectively. These results compare favourably to the constituent quark model. We as well predict the masses of pentaquarks and other exotic hadrons.
Tasks Gaussian Processes
Published 2020-03-23
URL https://arxiv.org/abs/2003.10445v1
PDF https://arxiv.org/pdf/2003.10445v1.pdf
PWC https://paperswithcode.com/paper/baryons-from-mesons-a-machine-learning

Wavelet-based Temporal Forecasting Models of Human Activities for Anomaly Detection

Title Wavelet-based Temporal Forecasting Models of Human Activities for Anomaly Detection
Authors Manuel Fernandez-Carmona, Nicola Bellotto
Abstract This paper presents a novel approach for temporal modelling of long-term human activities based on wavelet transforms. The model is applied to binary smart-home sensors to forecast their signals, which are used then as temporal priors to infer anomalies in office and Active & Assisted Living (AAL) scenarios. Such inference is performed by a new extension of Hybrid Markov Logic Networks (HMLNs) that merges different anomaly indicators, including activity levels detected by sensors, expert rules and the new temporal models. The latter in particular allow the inference system to discover deviations from long-term activity patterns, which cannot by detected by simpler frequency-based models. Two new publicly available datasets were collected using several smart-sensors to evaluate the wavelet-based temporal models and their application to signal forecasting and anomaly detection. The experimental results show the effectiveness of the proposed techniques and their successful application to detect unexpected activities in office and AAL settings.
Tasks Anomaly Detection
Published 2020-02-26
URL https://arxiv.org/abs/2002.11503v1
PDF https://arxiv.org/pdf/2002.11503v1.pdf
PWC https://paperswithcode.com/paper/wavelet-based-temporal-forecasting-models-of

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings

Title ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings
Authors Jiahui Huang, Sheng Yang, Tai-Jiang Mu, Shi-Min Hu
Abstract We present ClusterVO, a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects. Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving. At the core of our system lies a multi-level probabilistic association mechanism and a heterogeneous Conditional Random Field (CRF) clustering approach combining semantic, spatial and motion information to jointly infer cluster segmentations online for every frame. The poses of camera and dynamic objects are instantly solved through a sliding-window optimization. Our system is evaluated on Oxford Multimotion and KITTI dataset both quantitatively and qualitatively, reaching comparable results to state-of-the-art solutions on both odometry and dynamic trajectory recovery.
Tasks Autonomous Driving, Scene Understanding, Visual Odometry
Published 2020-03-29
URL https://arxiv.org/abs/2003.12980v1
PDF https://arxiv.org/pdf/2003.12980v1.pdf
PWC https://paperswithcode.com/paper/clustervo-clustering-moving-instances-and

An Artificial Intelligence-Based System to Assess Nutrient Intake for Hospitalised Patients

Title An Artificial Intelligence-Based System to Assess Nutrient Intake for Hospitalised Patients
Authors Ya Lu, Thomai Stathopoulou, Maria F. Vasiloglou, Stergios Christodoulidis, Zeno Stanga, Stavroula Mougiakakou
Abstract Regular monitoring of nutrient intake in hospitalised patients plays a critical role in reducing the risk of disease-related malnutrition. Although several methods to estimate nutrient intake have been developed, there is still a clear demand for a more reliable and fully automated technique, as this could improve data accuracy and reduce both the burden on participants and health costs. In this paper, we propose a novel system based on artificial intelligence (AI) to accurately estimate nutrient intake, by simply processing RGB Depth (RGB-D) image pairs captured before and after meal consumption. The system includes a novel multi-task contextual network for food segmentation, a few-shot learning-based classifier built by limited training samples for food recognition, and an algorithm for 3D surface construction. This allows sequential food segmentation, recognition, and estimation of the consumed food volume, permitting fully automatic estimation of the nutrient intake for each meal. For the development and evaluation of the system, a dedicated new database containing images and nutrient recipes of 322 meals is assembled, coupled to data annotation using innovative strategies. Experimental results demonstrate that the estimated nutrient intake is highly correlated (> 0.91) to the ground truth and shows very small mean relative errors (< 20%), outperforming existing techniques proposed for nutrient intake assessment.
Tasks Few-Shot Learning, Food Recognition
Published 2020-03-18
URL https://arxiv.org/abs/2003.08273v1
PDF https://arxiv.org/pdf/2003.08273v1.pdf
PWC https://paperswithcode.com/paper/an-artificial-intelligence-based-system-to

Spectral Graph Attention Network

Title Spectral Graph Attention Network
Authors Heng Chang, Yu Rong, Tingyang Xu, Wenbing Huang, Somayeh Sojoudi, Junzhou Huang, Wenwu Zhu
Abstract Variants of Graph Neural Networks (GNNs) for representation learning have been proposed recently and achieved fruitful results in various fields. Among them, graph attention networks (GATs) first employ a self-attention strategy to learn attention weights for each edge in the spatial domain. However, learning the attentions over edges only pays attention to the local information of graphs and greatly increases the number of parameters. In this paper, we first introduce attentions in the spectral domain of graphs. Accordingly, we present Spectral Graph Attention Network (SpGAT) that learn representations for different frequency components regarding weighted filters and graph wavelets bases. In this way, SpGAT can better capture global patterns of graphs in an efficient manner with much fewer learned parameters than that of GAT. We thoroughly evaluate the performance of SpGAT in the semi-supervised node classification task and verified the effectiveness of the learned attentions in the spectral domain.
Tasks Node Classification, Representation Learning
Published 2020-03-16
URL https://arxiv.org/abs/2003.07450v1
PDF https://arxiv.org/pdf/2003.07450v1.pdf
PWC https://paperswithcode.com/paper/spectral-graph-attention-network

Learning Fine Grained Place Embeddings with Spatial Hierarchy from Human Mobility Trajectories

Title Learning Fine Grained Place Embeddings with Spatial Hierarchy from Human Mobility Trajectories
Authors Toru Shimizu, Takahiro Yabe, Kota Tsubouchi
Abstract Place embeddings generated from human mobility trajectories have become a popular method to understand the functionality of places. Place embeddings with high spatial resolution are desirable for many applications, however, downscaling the spatial resolution deteriorates the quality of embeddings due to data sparsity, especially in less populated areas. We address this issue by proposing a method that generates fine grained place embeddings, which leverages spatial hierarchical information according to the local density of observed data points. The effectiveness of our fine grained place embeddings are compared to baseline methods via next place prediction tasks using real world trajectory data from 3 cities in Japan. In addition, we demonstrate the value of our fine grained place embeddings for land use classification applications. We believe that our technique of incorporating spatial hierarchical information can complement and reinforce various place embedding generating methods.
Published 2020-02-06
URL https://arxiv.org/abs/2002.02058v1
PDF https://arxiv.org/pdf/2002.02058v1.pdf
PWC https://paperswithcode.com/paper/learning-fine-grained-place-embeddings-with

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

Title A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
Authors Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora
Abstract One popular trend in meta-learning is to learn from many training tasks a common initialization for a gradient-based method that can be used to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for convex models. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is $\Omega(d)$, where $d$ is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $\mathcal{O}(1)$, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.
Tasks Meta-Learning, Representation Learning
Published 2020-02-25
URL https://arxiv.org/abs/2002.11172v1
PDF https://arxiv.org/pdf/2002.11172v1.pdf
PWC https://paperswithcode.com/paper/a-sample-complexity-separation-between-non

Predicting Multidimensional Data via Tensor Learning

Title Predicting Multidimensional Data via Tensor Learning
Authors Giuseppe Brandi, T. Di Matteo
Abstract The analysis of multidimensional data is becoming a more and more relevant topic in statistical and machine learning research. Given their complexity, such data objects are usually reshaped into matrices or vectors and then analysed. However, this methodology presents several drawbacks. First of all, it destroys the intrinsic interconnections among datapoints in the multidimensional space and, secondly, the number of parameters to be estimated in a model increases exponentially. We develop a model that overcomes such drawbacks. In particular, we proposed a parsimonious tensor regression based model that retains the intrinsic multidimensional structure of the dataset. Tucker structure is employed to achieve parsimony and a shrinkage penalization is introduced to deal with over-fitting and collinearity. An Alternating Least Squares (ALS) algorithm is developed to estimate the model parameters. A simulation exercise is produced to validate the model and its robustness. Finally, an empirical application to Foursquares spatio-temporal dataset and macroeconomic time series is also performed. Overall, the proposed model is able to outperform existing models present in forecasting literature.
Tasks Time Series
Published 2020-02-11
URL https://arxiv.org/abs/2002.04328v1
PDF https://arxiv.org/pdf/2002.04328v1.pdf
PWC https://paperswithcode.com/paper/predicting-multidimensional-data-via-tensor

An Effective Automatic Image Annotation Model Via Attention Model and Data Equilibrium

Title An Effective Automatic Image Annotation Model Via Attention Model and Data Equilibrium
Authors Amir Vatani, Milad Taleby Ahvanooey, Mostafa Rahimi
Abstract Nowadays, a huge number of images are available. However, retrieving a required image for an ordinary user is a challenging task in computer vision systems. During the past two decades, many types of research have been introduced to improve the performance of the automatic annotation of images, which are traditionally focused on content-based image retrieval. Although, recent research demonstrates that there is a semantic gap between content-based image retrieval and image semantics understandable by humans. As a result, existing research in this area has caused to bridge the semantic gap between low-level image features and high-level semantics. The conventional method of bridging the semantic gap is through the automatic image annotation (AIA) that extracts semantic features using machine learning techniques. In this paper, we propose a novel AIA model based on the deep learning feature extraction method. The proposed model has three phases, including a feature extractor, a tag generator, and an image annotator. First, the proposed model extracts automatically the high and low-level features based on dual-tree continues wavelet transform (DT-CWT), singular value decomposition, distribution of color ton, and the deep neural network. Moreover, the tag generator balances the dictionary of the annotated keywords by a new log-entropy auto-encoder (LEAE) and then describes these keywords by word embedding. Finally, the annotator works based on the long-short-term memory (LSTM) network in order to obtain the importance degree of specific features of the image. The experiments conducted on two benchmark datasets confirm that the superiority of the proposed model compared to the previous models in terms of performance criteria.
Tasks Content-Based Image Retrieval, Image Retrieval
Published 2020-01-26
URL https://arxiv.org/abs/2001.10590v1
PDF https://arxiv.org/pdf/2001.10590v1.pdf
PWC https://paperswithcode.com/paper/an-effective-automatic-image-annotation-model

Can We Find Near-Approximately-Stationary Points of Nonsmooth Nonconvex Functions?

Title Can We Find Near-Approximately-Stationary Points of Nonsmooth Nonconvex Functions?
Authors Ohad Shamir
Abstract It is well-known that given a bounded, smooth nonconvex function, standard gradient-based methods can find $\epsilon$-stationary points (where the gradient norm is less than $\epsilon$) in $\mathcal{O}(1/\epsilon^2)$ iterations. However, many important nonconvex optimization problems, such as those associated with training modern neural networks, are inherently not smooth, making these results inapplicable. Moreover, as recently pointed out in Zhang et al. [2020], it is generally impossible to provide finite-time guarantees for finding an $\epsilon$-stationary point of nonsmooth functions. Perhaps the most natural relaxation of this is to find points which are near such $\epsilon$-stationary points. In this paper, we show that even this relaxed goal is hard to obtain in general, given only black-box access to the function values and gradients. We also discuss the pros and cons of alternative approaches.
Published 2020-02-27
URL https://arxiv.org/abs/2002.11962v2
PDF https://arxiv.org/pdf/2002.11962v2.pdf
PWC https://paperswithcode.com/paper/can-we-find-near-approximately-stationary

A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Title A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback
Authors Shota Yasui, Gota Morishita, Komei Fujita, Masashi Shibata
Abstract In display advertising, predicting the conversion rate, that is, the probability that a user takes a predefined action on an advertiser’s website, such as purchasing goods is fundamental in estimating the value of displaying the advertisement. However, there is a relatively long time delay between a click and its resultant conversion. Because of the delayed feedback, some positive instances at the training period are labeled as negative because some conversions have not yet occurred when training data are gathered. As a result, the conditional label distributions differ between the training data and the production environment. This situation is referred to as a feedback shift. We address this problem by using an importance weight approach typically used for covariate shift correction. We prove its consistency for the feedback shift. Results in both offline and online experiments show that our proposed method outperforms the existing method.
Published 2020-02-06
URL https://arxiv.org/abs/2002.02068v1
PDF https://arxiv.org/pdf/2002.02068v1.pdf
PWC https://paperswithcode.com/paper/a-feedback-shift-correction-in-predicting

A Machine Learning Framework for Data Ingestion in Document Images

Title A Machine Learning Framework for Data Ingestion in Document Images
Authors Han Fu, Yunyu Bai, Zhuo Li, Jun Shen, Jianling Sun
Abstract Paper documents are widely used as an irreplaceable channel of information in many fields, especially in financial industry, fostering a great amount of demand for systems which can convert document images into structured data representations. In this paper, we present a machine learning framework for data ingestion in document images, which processes the images uploaded by users and return fine-grained data in JSON format. Details of model architectures, design strategies, distinctions with existing solutions and lessons learned during development are elaborated. We conduct abundant experiments on both synthetic and real-world data in State Street. The experimental results indicate the effectiveness and efficiency of our methods.
Published 2020-02-11
URL https://arxiv.org/abs/2003.00838v1
PDF https://arxiv.org/pdf/2003.00838v1.pdf
PWC https://paperswithcode.com/paper/a-machine-learning-framework-for-data
comments powered by Disqus