Paper Group ANR 95
Explaining Deep Neural Networks Using Spectrum-Based Fault Localization. Learning to Code: Coded Caching via Deep Reinforcement Learning. DAR-Net: Dynamic Aggregation Network for Semantic Scene Segmentation. Automatic Mouse Embryo Brain Ventricle & Body Segmentation and Mutant Classification From Ultrasound Data Using Deep Learning. A Preliminary S …
Explaining Deep Neural Networks Using Spectrum-Based Fault Localization
Title | Explaining Deep Neural Networks Using Spectrum-Based Fault Localization |
Authors | Youcheng Sun, Hana Chockler, Xiaowei Huang, Daniel Kroening |
Abstract | Deep neural networks (DNNs) increasingly replace traditionally developed software in a broad range of applications. However, in stark contrast to traditional software, the black-box nature of DNNs makes it impossible to understand their outputs, creating demand for “Explainable AI”. Explanations of the outputs of the DNN are essential for the training process and are supporting evidence of the adequacy of the DNN. In this paper, we show that spectrum-based fault localization delivers good explanations of the outputs of DNNs. We present an algorithm and a tool PROTOZOA, which synthesizes a ranking of the parts of the inputs using several spectrum-based fault localization measures. We show that the highest-ranked parts provide explanations that are consistent with the standard definitions of explanations in the literature. Our experimental results on ImageNet show that the explanations we generate are useful visual indicators for the progress of the training of the DNN. We compare the results of PROTOZOA with SHAP and show that the explanations generated by PROTOZOA are on par or superior. We also generate adversarial examples using our explanations; the efficiency of this process can serve as a proxy metric for the quality of the explanations. Our measurements show that PROTOZOA’s explanations yield a higher number of adversarial examples than those produced by SHAP. |
Tasks | |
Published | 2019-08-06 |
URL | https://arxiv.org/abs/1908.02374v1 |
https://arxiv.org/pdf/1908.02374v1.pdf | |
PWC | https://paperswithcode.com/paper/explaining-deep-neural-networks-using |
Repo | |
Framework | |
Learning to Code: Coded Caching via Deep Reinforcement Learning
Title | Learning to Code: Coded Caching via Deep Reinforcement Learning |
Authors | Navid Naderializadeh, Seyed Mohammad Asghari |
Abstract | We consider a system comprising a file library and a network with a server and multiple users equipped with cache memories. The system operates in two phases: a prefetching phase, where users load their caches with parts of contents from the library, and a delivery phase, where users request files from the library and the server needs to send the uncached parts of the requested files to the users. For the case where the users’ caches are arbitrarily loaded, we propose an algorithm based on deep reinforcement learning to minimize the delay of delivering requested contents to the users in the delivery phase. Simulation results demonstrate that our proposed deep reinforcement learning agent learns a coded delivery strategy for sending the requests to the users, which slightly outperforms the state-of-the-art performance in terms of delivery delay, while drastically reducing the computational complexity. |
Tasks | |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.04321v1 |
https://arxiv.org/pdf/1912.04321v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-code-coded-caching-via-deep |
Repo | |
Framework | |
DAR-Net: Dynamic Aggregation Network for Semantic Scene Segmentation
Title | DAR-Net: Dynamic Aggregation Network for Semantic Scene Segmentation |
Authors | Zongyue Zhao, Min Liu, Karthik Ramani |
Abstract | Traditional grid/neighbor-based static pooling has become a constraint for point cloud geometry analysis. In this paper, we propose DAR-Net, a novel network architecture that focuses on dynamic feature aggregation. The central idea of DAR-Net is generating a self-adaptive pooling skeleton that considers both scene complexity and local geometry features. Providing variable semi-local receptive fields and weights, the skeleton serves as a bridge that connect local convolutional feature extractors and a global recurrent feature integrator. Experimental results on indoor scene datasets show advantages of the proposed approach compared to state-of-the-art architectures that adopt static pooling methods. |
Tasks | Scene Segmentation |
Published | 2019-07-28 |
URL | https://arxiv.org/abs/1907.12022v2 |
https://arxiv.org/pdf/1907.12022v2.pdf | |
PWC | https://paperswithcode.com/paper/dar-net-dynamic-aggregation-network-for |
Repo | |
Framework | |
Automatic Mouse Embryo Brain Ventricle & Body Segmentation and Mutant Classification From Ultrasound Data Using Deep Learning
Title | Automatic Mouse Embryo Brain Ventricle & Body Segmentation and Mutant Classification From Ultrasound Data Using Deep Learning |
Authors | Ziming Qiu, Nitin Nair, Jack Langerman, Orlando Aristizabal, Jonathan Mamou, Daniel H. Turnbull, Jeffrey A. Ketterling, Yao Wang |
Abstract | High-frequency ultrasound (HFU) is well suited for imaging embryonic mice in vivo because it is non-invasive and real-time. Manual segmentation of the brain ventricles (BVs) and whole body from 3D HFU images is time-consuming and requires specialized training. This paper presents a deep-learning-based segmentation pipeline which automates several time-consuming, repetitive tasks currently performed to study genetic mutations in developing mouse embryos. Namely, the pipeline accurately segments the BV and body regions in 3D HFU images of mouse embryos, despite significant challenges due to position and shape variation of the embryos, as well as imaging artifacts. Based on the BV segmentation, a 3D convolutional neural network (CNN) is further trained to detect embryos with the Engrailed-1 (En1) mutation. The algorithms achieve 0.896 and 0.925 Dice Similarity Coefficient (DSC) for BV and body segmentation, respectively, and 95.8% accuracy on mutant classification. Through gradient based interrogation and visualization of the trained classifier, it is demonstrated that the model focuses on the morphological structures known to be affected by the En1 mutation. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10555v1 |
https://arxiv.org/pdf/1909.10555v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-mouse-embryo-brain-ventricle-body |
Repo | |
Framework | |
A Preliminary Study of Disentanglement With Insights on the Inadequacy of Metrics
Title | A Preliminary Study of Disentanglement With Insights on the Inadequacy of Metrics |
Authors | Amir H. Abdi, Purang Abolmaesumi, Sidney Fels |
Abstract | Disentangled encoding is an important step towards a better representation learning. However, despite the numerous efforts, there still is no clear winner that captures the independent features of the data in an unsupervised fashion. In this work we empirically evaluate the performance of six unsupervised disentanglement approaches on the mpi3d toy dataset curated and released for the NeurIPS 2019 Disentanglement Challenge. The methods investigated in this work are Beta-VAE, Factor-VAE, DIP-I-VAE, DIP-II-VAE, Info-VAE, and Beta-TCVAE. The capacities of all models were progressively increased throughout the training and the hyper-parameters were kept intact across experiments. The methods were evaluated based on five disentanglement metrics, namely, DCI, Factor-VAE, IRS, MIG, and SAP-Score. Within the limitations of this study, the Beta-TCVAE approach was found to outperform its alternatives with respect to the normalized sum of metrics. However, a qualitative study of the encoded latents reveal that there is not a consistent correlation between the reported metrics and the disentanglement potential of the model. |
Tasks | Representation Learning |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11791v1 |
https://arxiv.org/pdf/1911.11791v1.pdf | |
PWC | https://paperswithcode.com/paper/a-preliminary-study-of-disentanglement-with |
Repo | |
Framework | |
Interoperability and machine-to-machine translation model with mappings to machine learning tasks
Title | Interoperability and machine-to-machine translation model with mappings to machine learning tasks |
Authors | Jacob Nilsson, Fredrik Sandin, Jerker Delsing |
Abstract | Modern large-scale automation systems integrate thousands to hundreds of thousands of physical sensors and actuators. Demands for more flexible reconfiguration of production systems and optimization across different information models, standards and legacy systems challenge current system interoperability concepts. Automatic semantic translation across information models and standards is an increasingly important problem that needs to be addressed to fulfill these demands in a cost-efficient manner under constraints of human capacity and resources in relation to timing requirements and system complexity. Here we define a translator-based operational interoperability model for interacting cyber-physical systems in mathematical terms, which includes system identification and ontology-based translation as special cases. We present alternative mathematical definitions of the translator learning task and mappings to similar machine learning tasks and solutions based on recent developments in machine learning. Possibilities to learn translators between artefacts without a common physical context, for example in simulations of digital twins and across layers of the automation pyramid are briefly discussed. |
Tasks | Machine Translation |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.10735v1 |
http://arxiv.org/pdf/1903.10735v1.pdf | |
PWC | https://paperswithcode.com/paper/interoperability-and-machine-to-machine |
Repo | |
Framework | |
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS
Title | Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS |
Authors | Haohan Guo, Frank K. Soong, Lei He, Lei Xie |
Abstract | The end-to-end TTS, which can predict speech directly from a given sequence of graphemes or phonemes, has shown improved performance over the conventional TTS. However, its predicting capability is still limited by the acoustic/phonetic coverage of the training data, usually constrained by the training set size. To further improve the TTS quality in pronunciation, prosody and perceived naturalness, we propose to exploit the information embedded in a syntactically parsed tree where the inter-phrase/word information of a sentence is organized in a multilevel tree structure. Specifically, two key features: phrase structure and relations between adjacent words are investigated. Experimental results in subjective listening, measured on three test sets, show that the proposed approach is effective to improve the pronunciation clarity, prosody and naturalness of the synthesized speech of the baseline system. |
Tasks | |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04764v1 |
http://arxiv.org/pdf/1904.04764v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-syntactic-features-in-a-parsed |
Repo | |
Framework | |
An Overview of Data-Importance Aware Radio Resource Management for Edge Machine Learning
Title | An Overview of Data-Importance Aware Radio Resource Management for Edge Machine Learning |
Authors | Dingzhu Wen, Xiaoyang Li, Qunsong Zeng, Jinke Ren, Kaibin Huang |
Abstract | The 5G network connecting billions of Internet-of-Things (IoT) devices will make it possible to harvest an enormous amount of real-time mobile data. Furthermore, the 5G virtualization architecture will enable cloud computing at the (network) edge. The availability of both rich data and computation power at the edge has motivated Internet companies to deploy artificial intelligence (AI) there, creating the hot area of edge-AI. Edge learning, the theme of this project, concerns training edge-AI models, which endow on IoT devices intelligence for responding to real-time events. However, the transmission of high-dimensional data from many edge devices to servers can result in excessive communication latency, creating a bottleneck for edge learning. Traditional wireless techniques deigned for only radio access are ineffective in tackling the challenge. Attempts to overcome the communication bottleneck has led to the development of a new class of techniques for intelligent radio resource management (RRM), called data-importance aware RRM. Their designs feature the interplay of active machine learning and wireless communication. Specifically, the metrics that measure data importance in active learning (e.g., classification uncertainty and data diversity) are applied to RRM for efficient acquisition of distributed data in wireless networks to train AI models at servers. This article aims at providing an introduction to the emerging area of importance-aware RRM. To this end, we will introduce the design principles, survey recent advancements in the area, discuss some design examples, and suggest some promising research opportunities. |
Tasks | Active Learning |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03878v2 |
https://arxiv.org/pdf/1911.03878v2.pdf | |
PWC | https://paperswithcode.com/paper/an-overview-of-data-importance-aware-radio |
Repo | |
Framework | |
Swap Dynamics in Single-Peaked House Markets
Title | Swap Dynamics in Single-Peaked House Markets |
Authors | Aurélie Beynier, Nicolas Maudet, Simon Rey, Parham Shams |
Abstract | This paper focuses on the problem of fairly and efficiently allocating resources to agents. We consider a restricted framework in which all the resources are initially owned by the agents, with exactly one resource per agent (house market). In this framework, and with strict preferences, the Top Trading Cycle (TTC) algorithm is the only procedure satisfying Pareto-optimality, individual rationality and strategy-proofness. When preferences are single-peaked, the Crawler enjoys the same properties. These two centralized procedures might involve long trading cycles. In this paper we focus instead on a procedure involving the shortest cycles: bilateral swap deals. In such a swap dynamics, the agents perform pairwise mutually improving deals until reaching a swap-stable allocation (no improving swap-deal is possible). We prove that on the single-peaked domain every swap-stable allocation is Pareto-optimal, showing the efficiency of the swap dynamics. Besides, both the outcome of TTC and the Crawler can always be reached by sequences of swaps. However, some Pareto-optimal allocations are not reachable through improving swap-deals. We further analyze the swap-deal procedure through the study of the average or minimum rank of the resources obtained by agents in the final allocation. We start by providing the price of anarchy of these procedures. Finally, we present an extensive experimental study in which different versions of swap dynamics as well as other existing allocation procedures are compared. We show that swap-deal procedures exhibit good results on average in this domain, under different cultures for generating synthetic data. |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10250v2 |
https://arxiv.org/pdf/1906.10250v2.pdf | |
PWC | https://paperswithcode.com/paper/house-markets-and-single-peaked-preferences |
Repo | |
Framework | |
Adaptivity in Adaptive Submodularity
Title | Adaptivity in Adaptive Submodularity |
Authors | Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni |
Abstract | Adaptive sequential decision making is one of the central challenges in machine learning and artificial intelligence. In such problems, the goal is to design an interactive policy that plans for an action to take, from a finite set of $n$ actions, given some partial observations. It has been shown that in many applications such as active learning, robotics, sequential experimental design, and active detection, the utility function satisfies adaptive submodularity, a notion that generalizes the notion of diminishing returns to policies. In this paper, we revisit the power of adaptivity in maximizing an adaptive monotone submodular function. We propose an efficient batch policy that with $O(\log n \times\log k)$ adaptive rounds of observations can achieve an almost tight $(1-1/e-\epsilon)$ approximation guarantee with respect to an optimal policy that carries out $k$ actions in a fully sequential setting. To complement our results, we also show that it is impossible to achieve a constant factor approximation with $o(\log n)$ adaptive rounds. We also extend our result to the case of adaptive stochastic minimum cost coverage where the goal is to reach a desired utility $Q$ with the cheapest policy. We first prove the conjecture by Golovin and Krause that the greedy policy achieves the asymptotically tight logarithmic approximation guarantee without resorting to stronger notions of adaptivity. We then propose a batch policy that provides the same guarantee in polylogarithmic adaptive rounds through a similar information-parallelism scheme. Our results shrink the adaptivity gap in adaptive submodular maximization by an exponential factor. |
Tasks | Active Learning, Decision Making |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.03620v1 |
https://arxiv.org/pdf/1911.03620v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptivity-in-adaptive-submodularity |
Repo | |
Framework | |
Subspace Clustering with Active Learning
Title | Subspace Clustering with Active Learning |
Authors | Hankui Peng, Nicos G. Pavlidis |
Abstract | Subspace clustering is a growing field of unsupervised learning that has gained much popularity in the computer vision community. Applications can be found in areas such as motion segmentation and face clustering. It assumes that data originate from a union of subspaces, and clusters the data depending on the corresponding subspace. In practice, it is reasonable to assume that a limited amount of labels can be obtained, potentially at a cost. Therefore, algorithms that can effectively and efficiently incorporate this information to improve the clustering model are desirable. In this paper, we propose an active learning framework for subspace clustering that sequentially queries informative points and updates the subspace model. The query stage of the proposed framework relies on results from the perturbation theory of principal component analysis, to identify influential and potentially misclassified points. A constrained subspace clustering algorithm is proposed that monotonically decreases the objective function subject to the constraints imposed by the labelled data. We show that our proposed framework is suitable for subspace clustering algorithms including iterative methods and spectral methods. Experiments on synthetic data sets, motion segmentation data sets, and Yale Faces data sets demonstrate the advantage of our proposed active strategy over state-of-the-art. |
Tasks | Active Learning, Motion Segmentation |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03299v2 |
https://arxiv.org/pdf/1911.03299v2.pdf | |
PWC | https://paperswithcode.com/paper/subspace-clustering-with-active-learning |
Repo | |
Framework | |
Modeling continuous-time stochastic processes using $\mathcal{N}$-Curve mixtures
Title | Modeling continuous-time stochastic processes using $\mathcal{N}$-Curve mixtures |
Authors | Ronny Hug, Wolfgang Hübner, Michael Arens |
Abstract | Representations of sequential data are commonly based on the assumption that observed sequences are realizations of an unknown underlying stochastic process, where the learning problem includes determination of the model parameters. In this context the model must be able to capture the multi-modal nature of the data, without blurring between modes. This property is essential for applications like trajectory prediction or human motion modeling. Towards this end, a neural network model for continuous-time stochastic processes usable for sequence prediction is proposed. The model is based on Mixture Density Networks using B'ezier curves with Gaussian random variables as control points (abbrev.: $\mathcal{N}$-Curves). Key advantages of the model include the ability of generating smooth multi-mode predictions in a single inference step which reduces the need for Monte Carlo simulation, as required in many multi-step prediction models, based on state-of-the-art neural networks. Essential properties of the proposed approach are illustrated by several toy examples and the task of multi-step sequence prediction. Further, the model performance is evaluated on two real world use-cases, i.e. human trajectory prediction and human motion modeling, outperforming different state-of-the-art models. |
Tasks | Trajectory Prediction |
Published | 2019-08-12 |
URL | https://arxiv.org/abs/1908.04030v4 |
https://arxiv.org/pdf/1908.04030v4.pdf | |
PWC | https://paperswithcode.com/paper/modeling-continuous-time-stochastic-processes |
Repo | |
Framework | |
Selective sampling for accelerating training of deep neural networks
Title | Selective sampling for accelerating training of deep neural networks |
Authors | Berry Weinstein, Shai Fine, Yacov Hel-Or |
Abstract | We present a selective sampling method designed to accelerate the training of deep neural networks. To this end, we introduce a novel measurement, the minimal margin score (MMS), which measures the minimal amount of displacement an input should take until its predicted classification is switched. For multi-class linear classification, the MMS measure is a natural generalization of the margin-based selection criterion, which was thoroughly studied in the binary classification setting. In addition, the MMS measure provides an interesting insight into the progress of the training process and can be useful for designing and monitoring new training regimes. Empirically we demonstrate a substantial acceleration when training commonly used deep neural network architectures for popular image classification tasks. The efficiency of our method is compared against the standard training procedures, and against commonly used selective sampling alternatives: Hard negative mining selection, and Entropy-based selection. Finally, we demonstrate an additional speedup when we adopt a more aggressive learning drop regime while using the MMS selective sampling method. |
Tasks | Image Classification |
Published | 2019-11-16 |
URL | https://arxiv.org/abs/1911.06996v1 |
https://arxiv.org/pdf/1911.06996v1.pdf | |
PWC | https://paperswithcode.com/paper/selective-sampling-for-accelerating-training-1 |
Repo | |
Framework | |
Allen’s Interval Algebra Makes the Difference
Title | Allen’s Interval Algebra Makes the Difference |
Authors | Tomi Janhunen, Michael Sioutis |
Abstract | Allen’s Interval Algebra constitutes a framework for reasoning about temporal information in a qualitative manner. In particular, it uses intervals, i.e., pairs of endpoints, on the timeline to represent entities corresponding to actions, events, or tasks, and binary relations such as precedes and overlaps to encode the possible configurations between those entities. Allen’s calculus has found its way in many academic and industrial applications that involve, most commonly, planning and scheduling, temporal databases, and healthcare. In this paper, we present a novel encoding of Interval Algebra using answer-set programming (ASP) extended by difference constraints, i.e., the fragment abbreviated as ASP(DL), and demonstrate its performance via a preliminary experimental evaluation. Although our ASP encoding is presented in the case of Allen’s calculus for the sake of clarity, we suggest that analogous encodings can be devised for other point-based calculi, too. |
Tasks | |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01128v1 |
https://arxiv.org/pdf/1909.01128v1.pdf | |
PWC | https://paperswithcode.com/paper/allens-interval-algebra-makes-the-difference |
Repo | |
Framework | |
Performance of Three Slim Variants of The Long Short-Term Memory (LSTM) Layer
Title | Performance of Three Slim Variants of The Long Short-Term Memory (LSTM) Layer |
Authors | Daniel Kent, Fathi M. Salem |
Abstract | The Long Short-Term Memory (LSTM) layer is an important advancement in the field of neural networks and machine learning, allowing for effective training and impressive inference performance. LSTM-based neural networks have been successfully employed in various applications such as speech processing and language translation. The LSTM layer can be simplified by removing certain components, potentially speeding up training and runtime with limited change in performance. In particular, the recently introduced variants, called SLIM LSTMs, have shown success in initial experiments to support this view. Here, we perform computational analysis of the validation accuracy of a convolutional plus recurrent neural network architecture using comparatively the standard LSTM and three SLIM LSTM layers. We have found that some realizations of the SLIM LSTM layers can potentially perform as well as the standard LSTM layer for our considered architecture. |
Tasks | |
Published | 2019-01-02 |
URL | http://arxiv.org/abs/1901.00525v1 |
http://arxiv.org/pdf/1901.00525v1.pdf | |
PWC | https://paperswithcode.com/paper/performance-of-three-slim-variants-of-the |
Repo | |
Framework | |