Paper Group ANR 154
Traffic Data Imputation using Deep Convolutional Neural Networks. Explainable $k$-Means and $k$-Medians Clustering. Multi-objective Neural Architecture Search via Non-stationary Policy Gradient. Efficient Neural Architecture Search: A Broad Version. Upper Confidence Primal-Dual Optimization: Stochastically Constrained Markov Decision Processes with …
Traffic Data Imputation using Deep Convolutional Neural Networks
Title | Traffic Data Imputation using Deep Convolutional Neural Networks |
Authors | Ouafa Benkraouda, Bilal Thonnam Thodi, Hwasoo Yeo, Monica Menendez, Saif Eddin Jabari |
Abstract | We propose a statistical learning-based traffic speed estimation method that uses sparse vehicle trajectory information. Using a convolutional encoder-decoder based architecture, we show that a well trained neural network can learn spatio-temporal traffic speed dynamics from time-space diagrams. We demonstrate this for a homogeneous road section using simulated vehicle trajectories and then validate it using real-world data from NGSIM. Our results show that with probe vehicle penetration levels as low as 5%, the proposed estimation method can provide a sound reconstruction of macroscopic traffic speeds and reproduce realistic shockwave patterns, implying applicability in a variety of traffic conditions. We further discuss the model’s reconstruction mechanisms and confirm its ability to differentiate various traffic behaviors such as congested and free-flow traffic states, transition dynamics, and shockwave propagation. |
Tasks | Imputation, Traffic Data Imputation |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2002.04406v1 |
https://arxiv.org/pdf/2002.04406v1.pdf | |
PWC | https://paperswithcode.com/paper/traffic-data-imputation-using-deep |
Repo | |
Framework | |
Explainable $k$-Means and $k$-Medians Clustering
Title | Explainable $k$-Means and $k$-Medians Clustering |
Authors | Sanjoy Dasgupta, Nave Frost, Michal Moshkovitz, Cyrus Rashtchian |
Abstract | Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a complicated way. To improve interpretability, we consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner. We study this problem from a theoretical viewpoint, measuring cluster quality by the $k$-means and $k$-medians objectives: Must there exist a tree-induced clustering whose cost is comparable to that of the best unconstrained clustering, and if so, how can it be found? In terms of negative results, we show, first, that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and second, that any tree-induced clustering must in general incur an $\Omega(\log k)$ approximation factor compared to the optimal clustering. On the positive side, we design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves. For two means/medians, we show that a single threshold cut suffices to achieve a constant factor approximation, and we give nearly-matching lower bounds. For general $k \geq 2$, our algorithm is an $O(k)$ approximation to the optimal $k$-medians and an $O(k^2)$ approximation to the optimal $k$-means. Prior to our work, no algorithms were known with provable guarantees independent of dimension and input size. |
Tasks | |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12538v1 |
https://arxiv.org/pdf/2002.12538v1.pdf | |
PWC | https://paperswithcode.com/paper/explainable-k-means-and-k-medians-clustering |
Repo | |
Framework | |
Multi-objective Neural Architecture Search via Non-stationary Policy Gradient
Title | Multi-objective Neural Architecture Search via Non-stationary Policy Gradient |
Authors | Zewei Chen, Fengwei Zhou, George Trimponias, Zhenguo Li |
Abstract | Multi-objective Neural Architecture Search (NAS) aims to discover novel architectures in the presence of multiple conflicting objectives. Despite recent progress, the problem of approximating the full Pareto front accurately and efficiently remains challenging. In this work, we explore the novel reinforcement learning (RL) based paradigm of non-stationary policy gradient (NPG). NPG utilizes a non-stationary reward function, and encourages a continuous adaptation of the policy to capture the entire Pareto front efficiently. We introduce two novel reward functions with elements from the dominant paradigms of scalarization and evolution. To handle non-stationarity, we propose a new exploration scheme using cosine temperature decay with warm restarts. For fast and accurate architecture evaluation, we introduce a novel pre-trained shared model that we continuously fine-tune throughout training. Our extensive experimental study with various datasets shows that our framework can approximate the full Pareto front well at fast speeds. Moreover, our discovered cells can achieve supreme predictive performance compared to other multi-objective NAS methods, and other single-objective NAS methods at similar network sizes. Our work demonstrates the potential of NPG as a simple, efficient, and effective paradigm for multi-objective NAS. |
Tasks | Neural Architecture Search |
Published | 2020-01-23 |
URL | https://arxiv.org/abs/2001.08437v2 |
https://arxiv.org/pdf/2001.08437v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-objective-neural-architecture-search-2 |
Repo | |
Framework | |
Efficient Neural Architecture Search: A Broad Version
Title | Efficient Neural Architecture Search: A Broad Version |
Authors | Zixiang Ding, Yaran Chen, Nannan Li, Dongbin Zhao, C. L. Philip Chen |
Abstract | Efficient Neural Architecture Search (ENAS) achieves novel efficiency for learning architecture with high-performance via parameter sharing, but suffers from an issue of slow propagation speed of search model with deep topology. In this paper, we propose a Broad version for ENAS (BENAS) to solve the above issue, by learning broad architecture whose propagation speed is fast with reinforcement learning and parameter sharing used in ENAS, thereby achieving a higher search efficiency. In particular, we elaborately design Broad Convolutional Neural Network (BCNN), the search paradigm of BENAS with fast propagation speed, which can obtain a satisfactory performance with broad topology, i.e. fast forward and backward propagation speed. The proposed BCNN extracts multi-scale features and enhancement representations, and feeds them into global average pooling layer to yield more reasonable and comprehensive representations so that the achieved performance of BCNN with shallow topology can be promised. In order to verify the effectiveness of BENAS, several experiments are performed and experimental results show that 1) BENAS delivers 0.23 day which is 2x less expensive than ENAS, 2) the architecture learned by BENAS based small-size BCNNs with 0.5 and 1.1 millions parameters obtain state-of-the-art performance, 3.63% and 3.40% test error on CIFAR-10, 3) the learned architecture based BCNN achieves 25.3% top-1 error on ImageNet just using 3.9 millions parameters. |
Tasks | Neural Architecture Search |
Published | 2020-01-18 |
URL | https://arxiv.org/abs/2001.06679v1 |
https://arxiv.org/pdf/2001.06679v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-neural-architecture-search-a-broad |
Repo | |
Framework | |
Upper Confidence Primal-Dual Optimization: Stochastically Constrained Markov Decision Processes with Adversarial Losses and Unknown Transitions
Title | Upper Confidence Primal-Dual Optimization: Stochastically Constrained Markov Decision Processes with Adversarial Losses and Unknown Transitions |
Authors | Shuang Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang |
Abstract | We consider online learning for episodic Markov decision processes (MDPs) with stochastic long-term budget constraints, which plays a central role in ensuring the safety of reinforcement learning. Here the loss function can vary arbitrarily across the episodes, whereas both the loss received and the budget consumption are revealed at the end of each episode. Previous works solve this problem under the restrictive assumption that the transition model of the MDP is known a priori and establish regret bounds that depend polynomially on the cardinalities of the state space $\mathcal{S}$ and the action space $\mathcal{A}$. In this work, we propose a new \emph{upper confidence primal-dual} algorithm, which only requires the trajectories sampled from the transition model. In particular, we prove that the proposed algorithm achieves $\tilde{\mathcal{O}}(L\mathcal{S}\sqrt{\mathcal{A}T})$ upper bounds of both the regret and the constraint violation, where $L$ is the length of each episode. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning, which demonstrates the power of “optimism in the face of uncertainty” in constrained online learning. |
Tasks | |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.00660v1 |
https://arxiv.org/pdf/2003.00660v1.pdf | |
PWC | https://paperswithcode.com/paper/upper-confidence-primal-dual-optimization |
Repo | |
Framework | |
Software System for Road Condition Forecast Correction
Title | Software System for Road Condition Forecast Correction |
Authors | Dmitrii Smolyakov, Evgeny Burnaev |
Abstract | In this paper, we present a monitoring system that allows increasing road safety by predicting ice formation. The system consists of a network of road weather stations and intelligence data processing program module. The results were achieved by combining physical models for forecasting road conditions based on measurements from stations and machine learning models for detecting incorrect data and forecast correction. |
Tasks | |
Published | 2020-03-22 |
URL | https://arxiv.org/abs/2003.09957v1 |
https://arxiv.org/pdf/2003.09957v1.pdf | |
PWC | https://paperswithcode.com/paper/software-system-for-road-condition-forecast |
Repo | |
Framework | |
Heterogeneous Relational Reasoning in Knowledge Graphs with Reinforcement Learning
Title | Heterogeneous Relational Reasoning in Knowledge Graphs with Reinforcement Learning |
Authors | Mandana Saebi, Steven Krieg, Chuxu Zhang, Meng Jiang, Nitesh Chawla |
Abstract | Path-based relational reasoning over knowledge graphs has become increasingly popular due to a variety of downstream applications such as question answering in dialogue systems, fact prediction, and recommender systems. In recent years, reinforcement learning (RL) has provided solutions that are more interpretable and explainable than other deep learning models. However, these solutions still face several challenges, including large action space for the RL agent and accurate representation of entity neighborhood structure. We address these problems by introducing a type-enhanced RL agent that uses the local neighborhood information for efficient path-based reasoning over knowledge graphs. Our solution uses graph neural network (GNN) for encoding the neighborhood information and utilizes entity types to prune the action space. Experiments on real-world dataset show that our method outperforms state-of-the-art RL methods and discovers more novel paths during the training procedure. |
Tasks | Knowledge Graphs, Question Answering, Recommendation Systems, Relational Reasoning |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.06050v1 |
https://arxiv.org/pdf/2003.06050v1.pdf | |
PWC | https://paperswithcode.com/paper/heterogeneous-relational-reasoning-in |
Repo | |
Framework | |
FTT-NAS: Discovering Fault-Tolerant Neural Architecture
Title | FTT-NAS: Discovering Fault-Tolerant Neural Architecture |
Authors | Xuefei Ning, Guangjun Ge, Wenshuo Li, Zhenhua Zhu, Yin Zheng, Xiaoming Chen, Zhen Gao, Yu Wang, Huazhong Yang |
Abstract | With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, and malicious attackers. Thus the safety risk of deploying NNs is now drawing much attention. In this paper, after the analysis of the possible faults in various types of NN accelerators, we formalize and implement various fault models from the algorithmic perspective. We propose Fault-Tolerant Neural Architecture Search (FT-NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable to various faults in nowadays devices. Then we incorporate fault-tolerant training (FTT) in the search process to achieve better results, which is referred to as FTT-NAS. Experiments on CIFAR-10 show that the discovered architectures outperform other manually designed baseline architectures significantly, with comparable or fewer floating-point operations (FLOPs) and parameters. Specifically, with the same fault settings, F-FTT-Net discovered under the feature fault model achieves an accuracy of 86.2% (VS. 68.1% achieved by MobileNet-V2), and W-FTT-Net discovered under the weight fault model achieves an accuracy of 69.6% (VS. 60.8% achieved by ResNet-20). By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models. |
Tasks | Neural Architecture Search, Quantization |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.10375v1 |
https://arxiv.org/pdf/2003.10375v1.pdf | |
PWC | https://paperswithcode.com/paper/ftt-nas-discovering-fault-tolerant-neural |
Repo | |
Framework | |
Customized Video QoE Estimation with Algorithm-Agnostic Transfer Learning
Title | Customized Video QoE Estimation with Algorithm-Agnostic Transfer Learning |
Authors | Selim Ickin, Markus Fiedler, Konstantinos Vandikas |
Abstract | The development of QoE models by means of Machine Learning (ML) is challenging, amongst others due to small-size datasets, lack of diversity in user profiles in the source domain, and too much diversity in the target domains of QoE models. Furthermore, datasets can be hard to share between research entities, as the machine learning models and the collected user data from the user studies may be IPR- or GDPR-sensitive. This makes a decentralized learning-based framework appealing for sharing and aggregating learned knowledge in-between the local models that map the obtained metrics to the user QoE, such as Mean Opinion Scores (MOS). In this paper, we present a transfer learning-based ML model training approach, which allows decentralized local models to share generic indicators on MOS to learn a generic base model, and then customize the generic base model further using additional features that are unique to those specific localized (and potentially sensitive) QoE nodes. We show that the proposed approach is agnostic to specific ML algorithms, stacked upon each other, as it does not necessitate the collaborating localized nodes to run the same ML algorithm. Our reproducible results reveal the advantages of stacking various generic and specific models with corresponding weight factors. Moreover, we identify the optimal combination of algorithms and weight factors for the corresponding localized QoE nodes. |
Tasks | Transfer Learning |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.08730v1 |
https://arxiv.org/pdf/2003.08730v1.pdf | |
PWC | https://paperswithcode.com/paper/customized-video-qoe-estimation-with |
Repo | |
Framework | |
Learning the Hypotheses Space from data Part I: Learning Space and U-curve Property
Title | Learning the Hypotheses Space from data Part I: Learning Space and U-curve Property |
Authors | Diego Marcondes, Adilson Simonis, Junior Barrera |
Abstract | The agnostic PAC learning model consists of: a Hypothesis Space $\mathcal{H}$, a probability distribution $P$, a sample complexity function $m_{\mathcal{H}}(\epsilon,\delta): [0,1]^{2} \mapsto \mathbb{Z}_{+}$ of precision $\epsilon$ and confidence $1 - \delta$, a finite i.i.d. sample $\mathcal{D}_{N}$, a cost function $\ell$ and a learning algorithm $\mathbb{A}(\mathcal{H},\mathcal{D}_{N})$, which estimates $\hat{h} \in \mathcal{H}$ that approximates a target function $h^{\star} \in \mathcal{H}$ seeking to minimize out-of-sample error. In this model, prior information is represented by $\mathcal{H}$ and $\ell$, while problem solution is performed through their instantiation in several applied learning models, with specific algebraic structures for $\mathcal{H}$ and corresponding learning algorithms. However, these applied models use additional important concepts not covered by the classic PAC learning theory: model selection and regularization. This paper presents an extension of this model which covers these concepts. The main principle added is the selection, based solely on data, of a subspace of $\mathcal{H}$ with a VC-dimension compatible with the available sample. In order to formalize this principle, the concept of Learning Space $\mathbb{L}(\mathcal{H})$, which is a poset of subsets of $\mathcal{H}$ that covers $\mathcal{H}$ and satisfies a property regarding the VC dimension of related subspaces, is presented as the natural search space for model selection algorithms. A remarkable result obtained on this new framework are conditions on $\mathbb{L}(\mathcal{H})$ and $\ell$ that lead to estimated out-of-sample error surfaces, which are true U-curves on $\mathbb{L}(\mathcal{H})$ chains, enabling a more efficient search on $\mathbb{L}(\mathcal{H})$. Hence, in this new framework, the U-curve optimization problem becomes a natural component of model selection algorithms. |
Tasks | Model Selection |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09532v1 |
https://arxiv.org/pdf/2001.09532v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-hypotheses-space-from-data-part |
Repo | |
Framework | |
Towards better understanding of meta-features contributions
Title | Towards better understanding of meta-features contributions |
Authors | Katarzyna Woźnica, Przemysław Biecek |
Abstract | Meta learning is a difficult problem as the expected performance of a model is affected by various aspects such as selected hyperparameters, dataset properties and landmarkers. Existing approaches are focused on searching for the best model but do not explain how these different aspects contribute to the performance of a model. To build a new generation of meta-models we need a deeper understanding of relative importance of meta-features and construction of better meta-features. In this paper we (1) introduce a method that can be used to understand how meta-features influence a model performance, (2) discuss the relative importance of different groups of meta-features, (3) analyse in detail the most informative hyperparameters that may result in insights for selection of empirical priors of engineering for hyperparameters. To our knowledge this is the first paper that uses techniques developed for eXplainable Artificial Intelligence (XAI) to examine the behaviour of a meta model. |
Tasks | Meta-Learning |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04276v1 |
https://arxiv.org/pdf/2002.04276v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-better-understanding-of-meta-features |
Repo | |
Framework | |
Distribution-Agnostic Model-Agnostic Meta-Learning
Title | Distribution-Agnostic Model-Agnostic Meta-Learning |
Authors | Liam Collins, Aryan Mokhtari, Sanjay Shakkottai |
Abstract | The Model-Agnostic Meta-Learning (MAML) algorithm \citep{finn2017model} has been celebrated for its efficiency and generality, as it has demonstrated success in quickly learning the parameters of an arbitrary learning model. However, MAML implicitly assumes that the tasks come from a particular distribution, and optimizes the expected (or sample average) loss over tasks drawn from this distribution. Here, we amend this limitation of MAML by reformulating the objective function as a min-max problem, where the maximization is over the set of possible distributions over tasks. Our proposed algorithm is the first distribution-agnostic and model-agnostic meta-learning method, and we show that it converges to an $\epsilon$-accurate point at the rate of $\mathcal{O}(1/\epsilon^2)$ in the convex setting and to an $(\epsilon, \delta)$-stationary point at the rate of $\mathcal{O}(\max{1/\epsilon^5, 1/\delta^5})$ in nonconvex settings. We also provide numerical experiments that demonstrate the worst-case superiority of our algorithm in comparison to MAML. |
Tasks | Meta-Learning |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.04766v1 |
https://arxiv.org/pdf/2002.04766v1.pdf | |
PWC | https://paperswithcode.com/paper/distribution-agnostic-model-agnostic-meta |
Repo | |
Framework | |
A Multi-scale CNN-CRF Framework for Environmental Microorganism Image Segmentation
Title | A Multi-scale CNN-CRF Framework for Environmental Microorganism Image Segmentation |
Authors | Jinghua Zhang, Chen Li, Frank Kulwa, Xin Zhao, Changhao Sun, Zihan Li, Tao Jiang, Hong Li |
Abstract | In order to assist researchers to identify Environmental Microorganisms (EMs) effectively, a Multi-scale CNN-CRF (MSCC) framework for the EM image segmentation is proposed in this paper. There are two parts in this framework: The first is a novel pixel-level segmentation approach, using a newly introduced Convolutional Neural Network (CNN), namely “mU-Net-B3”, with a dense Conditional Random Field (CRF) post-processing. The second is a VGG-16 based patch-level segmentation method with a novel “buffer” strategy, which further improves the segmentation quality of the details of the EMs. In the experiment, compared with the state-of-the-art methods on 420 EM images, the proposed MSCC method reduces the memory requirement from 355 MB to 103 MB, improves the overall evaluation indexes (Dice, Jaccard, Recall, Accuracy) from 85.24%, 77.42%, 82.27% and 96.76% to 87.13%, 79.74%, 87.12% and 96.91% respectively, and reduces the volume overlap error from 22.58% to 20.26%. Therefore, the MSCC method shows a big potential in the EM segmentation field. |
Tasks | Semantic Segmentation |
Published | 2020-03-08 |
URL | https://arxiv.org/abs/2003.03744v1 |
https://arxiv.org/pdf/2003.03744v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-scale-cnn-crf-framework-for |
Repo | |
Framework | |
When Wireless Security Meets Machine Learning: Motivation, Challenges, and Research Directions
Title | When Wireless Security Meets Machine Learning: Motivation, Challenges, and Research Directions |
Authors | Yalin E. Sagduyu, Yi Shi, Tugba Erpek, William Headley, Bryse Flowers, George Stantchev, Zhuo Lu |
Abstract | Wireless systems are vulnerable to various attacks such as jamming and eavesdropping due to the shared and broadcast nature of wireless medium. To support both attack and defense strategies, machine learning (ML) provides automated means to learn from and adapt to wireless communication characteristics that are hard to capture by hand-crafted features and models. This article discusses motivation, background, and scope of research efforts that bridge ML and wireless security. Motivated by research directions surveyed in the context of ML for wireless security, ML-based attack and defense solutions and emerging adversarial ML techniques in the wireless domain are identified along with a roadmap to foster research efforts in bridging ML and wireless security. |
Tasks | |
Published | 2020-01-24 |
URL | https://arxiv.org/abs/2001.08883v1 |
https://arxiv.org/pdf/2001.08883v1.pdf | |
PWC | https://paperswithcode.com/paper/when-wireless-security-meets-machine-learning |
Repo | |
Framework | |
On the stability of projection-based model order reduction for convection-dominated laminar and turbulent flows
Title | On the stability of projection-based model order reduction for convection-dominated laminar and turbulent flows |
Authors | Sebastian Grimberg, Charbel Farhat, Noah Youkilis |
Abstract | In the literature on projection-based nonlinear model order reduction for fluid dynamics problems, it is often claimed that due to modal truncation, a projection-based reduced-order model (PROM) does not resolve the dissipative regime of the turbulent energy cascade and therefore is numerically unstable. Efforts at addressing this claim have ranged from attempting to model the effects of the truncated modes to enriching the classical subspace of approximation in order to account for the truncated phenomena. This paper challenges this claim. Exploring the relationship between projection-based model order reduction and semi-discretization and using numerical evidence from three relevant flow problems, it argues in an orderly manner that the real culprit behind most if not all reported numerical instabilities of PROMs for turbulence and convection-dominated turbulent flow problems is the Galerkin framework that has been used for constructing the PROMs. The paper also shows that alternatively, a Petrov-Galerkin framework can be used to construct numerically stable PROMs for convection-dominated laminar as well as turbulent flow problems that are numerically stable and accurate, without resorting to additional closure models or tailoring of the subspace of approximation. It also shows that such alternative PROMs deliver significant speedup factors. |
Tasks | |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.10110v1 |
https://arxiv.org/pdf/2001.10110v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-stability-of-projection-based-model |
Repo | |
Framework | |