Paper Group ANR 266
Scheduling in Cellular Federated Edge Learning with Importance and Channel Awareness. A Corrective View of Neural Networks: Representation, Memorization and Learning. Entity Extraction from Wikipedia List Pages. Review and Prospect: Deep Learning in Nuclear Magnetic Resonance Spectroscopy. A survey on Semi-, Self- and Unsupervised Techniques in Ima …
Scheduling in Cellular Federated Edge Learning with Importance and Channel Awareness
Title | Scheduling in Cellular Federated Edge Learning with Importance and Channel Awareness |
Authors | Jinke Ren, Yinghui He, Dingzhu Wen, Guanding Yu, Kaibin Huang, Dongning Guo |
Abstract | In cellular federated edge learning (FEEL), multiple edge devices holding local data jointly train a learning algorithm by communicating learning updates with an access point without exchanging their data samples. With limited communication resources, it is beneficial to schedule the most informative local learning update. In this paper, a novel scheduling policy is proposed to exploit both diversity in multiuser channels and diversity in the importance of the edge devices’ learning updates. First, a new probabilistic scheduling framework is developed to yield unbiased update aggregation in FEEL. The importance of a local learning update is measured by gradient divergence. If one edge device is scheduled in each communication round, the scheduling policy is derived in closed form to achieve the optimal trade-off between channel quality and update importance. The probabilistic scheduling framework is then extended to allow scheduling multiple edge devices in each communication round. Numerical results obtained using popular models and learning datasets demonstrate that the proposed scheduling policy can achieve faster model convergence and higher learning accuracy than conventional scheduling policies that only exploit a single type of diversity. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00490v1 |
https://arxiv.org/pdf/2004.00490v1.pdf | |
PWC | https://paperswithcode.com/paper/scheduling-in-cellular-federated-edge |
Repo | |
Framework | |
A Corrective View of Neural Networks: Representation, Memorization and Learning
Title | A Corrective View of Neural Networks: Representation, Memorization and Learning |
Authors | Guy Bresler, Dheeraj Nagaraj |
Abstract | We develop a corrective mechanism for neural network approximation: the total available non-linear units are divided into multiple groups and the first group approximates the function under consideration, the second group approximates the error in approximation produced by the first group and corrects it, the third group approximates the error produced by the first and second groups together and so on. This technique yields several new representation and learning results for neural networks. First, we show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels for arbitrary points under under Euclidean distance separation condition using $\tilde{O}(n)$ ReLU or Step activation functions which is optimal in $n$ up to logarithmic factors. Next, we give a powerful representation result for two-layer neural networks with ReLU and smoothed ReLU units which can achieve a squared error of at most $\epsilon$ with $O(C(a,d)\epsilon^{-1/(a+1)})$ for $a \in \mathbb{N}\cup{0}$ when the function is smooth enough (roughly when it has $\Theta(ad)$ bounded derivatives). In certain cases $d$ can be replaced with effective dimension $q \ll d$. Previous results of this type implement Taylor series approximation using deep architectures. We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions. Lastly, we obtain the first $O(\mathrm{subpoly}(1/\epsilon))$ upper bound on the number of neurons required for a two layer network to learn low degree polynomials up to squared error $\epsilon$ via gradient descent. Even though deep networks can express these polynomials with $O(\mathrm{polylog}(1/\epsilon))$ neurons, the best learning bounds on this problem require $\mathrm{poly}(1/\epsilon)$ neurons. |
Tasks | |
Published | 2020-02-01 |
URL | https://arxiv.org/abs/2002.00274v1 |
https://arxiv.org/pdf/2002.00274v1.pdf | |
PWC | https://paperswithcode.com/paper/a-corrective-view-of-neural-networks |
Repo | |
Framework | |
Entity Extraction from Wikipedia List Pages
Title | Entity Extraction from Wikipedia List Pages |
Authors | Nicolas Heist, Heiko Paulheim |
Abstract | When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia’s policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia’s list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision. |
Tasks | Entity Extraction, Knowledge Graphs |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05146v1 |
https://arxiv.org/pdf/2003.05146v1.pdf | |
PWC | https://paperswithcode.com/paper/entity-extraction-from-wikipedia-list-pages |
Repo | |
Framework | |
Review and Prospect: Deep Learning in Nuclear Magnetic Resonance Spectroscopy
Title | Review and Prospect: Deep Learning in Nuclear Magnetic Resonance Spectroscopy |
Authors | Dicheng Chen, Zi Wang, Di Guo, Vladislav Orekhov, Xiaobo Qu |
Abstract | Since the concept of deep learning (DL) was formally proposed in 2006, it had a major impact on academic research and industry. Nowadays, DL provides an unprecedented way to analyze and process data with demonstrated great results in computer vision, medical imaging, natural language processing, etc. In this Minireview, we summarize applications of DL in nuclear magnetic resonance (NMR) spectroscopy and outline a perspective for DL as entirely new approaches that is likely to transform NMR spectroscopy into a much more efficient and powerful technique in chemistry and life science. |
Tasks | |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04813v1 |
https://arxiv.org/pdf/2001.04813v1.pdf | |
PWC | https://paperswithcode.com/paper/review-and-prospect-deep-learning-in-nuclear |
Repo | |
Framework | |
A survey on Semi-, Self- and Unsupervised Techniques in Image Classification
Title | A survey on Semi-, Self- and Unsupervised Techniques in Image Classification |
Authors | Lars Schmarje, Monty Santarossa, Simon-Martin Schröder, Reinhard Koch |
Abstract | While deep learning strategies achieve outstanding results in computer vision tasks, one issue remains. The current strategies rely heavily on a huge amount of labeled data. In many real-world problems it is not feasible to create such an amount of labeled training data. Therefore, researchers try to incorporate unlabeled data into the training process to reach equal results with fewer labels. Due to a lot of concurrent research, it is difficult to keep track of recent developments. In this survey we provide an overview of often used techniques and methods in image classification with fewer labels. We compare 21 methods. In our analysis we identify three major trends. 1. State-of-the-art methods are scaleable to real world applications based on their accuracy. 2. The degree of supervision which is needed to achieve comparable results to the usage of all labels is decreasing. 3. All methods share common techniques while only few methods combine these techniques to achieve better performance. Based on all of these three trends we discover future research opportunities. |
Tasks | Image Classification |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08721v1 |
https://arxiv.org/pdf/2002.08721v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-semi-self-and-unsupervised |
Repo | |
Framework | |
Medical-based Deep Curriculum Learning for Improved Fracture Classification
Title | Medical-based Deep Curriculum Learning for Improved Fracture Classification |
Authors | Amelia Jiménez-Sánchez, Diana Mateus, Sonja Kirchhoff, Chlodwig Kirchhoff, Peter Biberthaler, Nassir Navab, Miguel A. González Ballester, Gemma Piella |
Abstract | Current deep-learning based methods do not easily integrate to clinical protocols, neither take full advantage of medical knowledge. In this work, we propose and compare several strategies relying on curriculum learning, to support the classification of proximal femur fracture from X-ray images, a challenging problem as reflected by existing intra- and inter-expert disagreement. Our strategies are derived from knowledge such as medical decision trees and inconsistencies in the annotations of multiple experts, which allows us to assign a degree of difficulty to each training sample. We demonstrate that if we start learning “easy” examples and move towards “hard”, the model can reach a better performance, even with fewer data. The evaluation is performed on the classification of a clinical dataset of about 1000 X-ray images. Our results show that, compared to class-uniform and random strategies, the proposed medical knowledge-based curriculum, performs up to 15% better in terms of accuracy, achieving the performance of experienced trauma surgeons. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00482v1 |
https://arxiv.org/pdf/2004.00482v1.pdf | |
PWC | https://paperswithcode.com/paper/medical-based-deep-curriculum-learning-for |
Repo | |
Framework | |
Distance and Equivalence between Finite State Machines and Recurrent Neural Networks: Computational results
Title | Distance and Equivalence between Finite State Machines and Recurrent Neural Networks: Computational results |
Authors | Reda Marzouk, Colin de la Higuera |
Abstract | The need of interpreting Deep Learning (DL) models has led, during the past years, to a proliferation of works concerned by this issue. Among strategies which aim at shedding some light on how information is represented internally in DL models, one consists in extracting symbolic rule-based machines from connectionist models that are supposed to approximate well their behaviour. In order to better understand how reasonable these approximation strategies are, we need to know the computational complexity of measuring the quality of approximation. In this article, we will prove some computational results related to the problem of extracting Finite State Machine (FSM) based models from trained RNN Language models. More precisely, we’ll show the following: (a) For general weighted RNN-LMs with a single hidden layer and a ReLu activation: - The equivalence problem of a PDFA/PFA/WFA and a weighted first-order RNN-LM is undecidable; - As a corollary, the distance problem between languages generated by PDFA/PFA/WFA and that of a weighted RNN-LM is not recursive; -The intersection between a DFA and the cut language of a weighted RNN-LM is undecidable; - The equivalence of a PDFA/PFA/WFA and weighted RNN-LM in a finite support is EXP-Hard; (b) For consistent weight RNN-LMs with any computable activation function: - The Tcheybechev distance approximation is decidable; - The Tcheybechev distance approximation in a finite support is NP-Hard. Moreover, our reduction technique from 3-SAT makes this latter fact easily generalizable to other RNN architectures (e.g. LSTMs/RNNs), and RNNs with finite precision. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00478v1 |
https://arxiv.org/pdf/2004.00478v1.pdf | |
PWC | https://paperswithcode.com/paper/distance-and-equivalence-between-finite-state |
Repo | |
Framework | |
Particle Swarm Optimization: Stability Analysis using N-Informers under Arbitrary Coefficient Distributions
Title | Particle Swarm Optimization: Stability Analysis using N-Informers under Arbitrary Coefficient Distributions |
Authors | Christopher W Cleghorn, Belinda Stapelberg |
Abstract | This paper derives, under minimal modelling assumptions, a simple to use theorem for obtaining both order-$1$ and order-$2$ stability criteria for a common class of particle swarm optimization (PSO) variants. Specifically, PSO variants that can be rewritten as a finite sum of stochastically weighted difference vectors between a particle’s position and swarm informers are covered by the theorem. Additionally, the use of the derived theorem allows a PSO practitioner to obtain stability criteria that contains no artificial restriction on the relationship between control coefficients. Almost all previous PSO stability results have provided stability criteria under the restriction that the social and cognitive control coefficients are equal; such restrictions are not present when using the derived theorem. Using the derived theorem, as demonstration of its ease of use, stability criteria are derived without the imposed restriction on the relation between the control coefficients for three popular PSO variants. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00476v1 |
https://arxiv.org/pdf/2004.00476v1.pdf | |
PWC | https://paperswithcode.com/paper/particle-swarm-optimization-stability |
Repo | |
Framework | |
Stopping Criteria for, and Strong Convergence of, Stochastic Gradient Descent on Bottou-Curtis-Nocedal Functions
Title | Stopping Criteria for, and Strong Convergence of, Stochastic Gradient Descent on Bottou-Curtis-Nocedal Functions |
Authors | Vivak Patel |
Abstract | While Stochastic Gradient Descent (SGD) is a rather efficient algorithm for data-driven problems, it is an incomplete optimization algorithm as it lacks stopping criteria, which has limited its adoption in situations where such criteria are necessary. Unlike stopping criteria for deterministic methods, stopping criteria for SGD require a detailed understanding of (A) strong convergence, (B) whether the criteria will be triggered, (C) how false negatives are controlled, and (D) how false positives are controlled. In order to address these issues, we first prove strong global convergence (i.e., convergence with probability one) of SGD on a popular and general class of convex and nonconvex functions that are specified by, what we call, the Bottou-Curtis-Nocedal structure. Our proof of strong global convergence refines many techniques currently in the literature and employs new ones that are of independent interest. With strong convergence established, we then present several stopping criteria and rigorously explore whether they will be triggered in finite time and supply bounds on false negative probabilities. Ultimately, we lay a foundation for rigorously developing stopping criteria for SGD methods for a broad class of functions, in hopes of making SGD a more complete optimization algorithm with greater adoption for data-driven problems. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00475v1 |
https://arxiv.org/pdf/2004.00475v1.pdf | |
PWC | https://paperswithcode.com/paper/stopping-criteria-for-and-strong-convergence |
Repo | |
Framework | |
Robust Classification of High-Dimensional Spectroscopy Data Using Deep Learning and Data Synthesis
Title | Robust Classification of High-Dimensional Spectroscopy Data Using Deep Learning and Data Synthesis |
Authors | James Houston, Frank G. Glavin, Michael G. Madden |
Abstract | This paper presents a new approach to classification of high dimensional spectroscopy data and demonstrates that it outperforms other current state-of-the art approaches. The specific task we consider is identifying whether samples contain chlorinated solvents or not, based on their Raman spectra. We also examine robustness to classification of outlier samples that are not represented in the training set (negative outliers). A novel application of a locally-connected neural network (NN) for the binary classification of spectroscopy data is proposed and demonstrated to yield improved accuracy over traditionally popular algorithms. Additionally, we present the ability to further increase the accuracy of the locally-connected NN algorithm through the use of synthetic training spectra and we investigate the use of autoencoder based one-class classifiers and outlier detectors. Finally, a two-step classification process is presented as an alternative to the binary and one-class classification paradigms. This process combines the locally-connected NN classifier, the use of synthetic training data, and an autoencoder based outlier detector to produce a model which is shown to both produce high classification accuracy, and be robust to the presence of negative outliers. |
Tasks | |
Published | 2020-03-26 |
URL | https://arxiv.org/abs/2003.11842v1 |
https://arxiv.org/pdf/2003.11842v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-classification-of-high-dimensional |
Repo | |
Framework | |
Diversity-Based Generalization for Neural Unsupervised Text Classification under Domain Shift
Title | Diversity-Based Generalization for Neural Unsupervised Text Classification under Domain Shift |
Authors | Jitin Krishnan, Hemant Purohit, Huzefa Rangwala |
Abstract | Domain adaptation approaches seek to learn from a source domain and generalize it to an unseen target domain. At present, the state-of-the-art domain adaptation approaches for subjective text classification problems are semi-supervised; and use unlabeled target data along with labeled source data. In this paper, we propose a novel method for domain adaptation of single-task text classification problems based on a simple but effective idea of diversity-based generalization that does not require unlabeled target data. Diversity plays the role of promoting the model to better generalize and be indiscriminate towards domain shift by forcing the model not to rely on same features for prediction. We apply this concept on the most explainable component of neural networks, the attention layer. To generate sufficient diversity, we create a multi-head attention model and infuse a diversity constraint between the attention heads such that each head will learn differently. We further expand upon our model by tri-training and designing a procedure with an additional diversity constraint between the attention heads of the tri-trained classifiers. Extensive evaluation using the standard benchmark dataset of Amazon reviews and a newly constructed dataset of Crisis events shows that our fully unsupervised method matches with the competing semi-supervised baselines. Our results demonstrate that machine learning architectures that ensure sufficient diversity can generalize better; encouraging future research to design ubiquitously usable learning models without using unlabeled target data. |
Tasks | Domain Adaptation, Text Classification |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10937v1 |
https://arxiv.org/pdf/2002.10937v1.pdf | |
PWC | https://paperswithcode.com/paper/diversity-based-generalization-for-neural |
Repo | |
Framework | |
Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication
Title | Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication |
Authors | Jianyu Su, Stephen Adams, Peter A. Beling |
Abstract | We consider a fully cooperative multi-agent system where agents cooperate to maximize a system’s utility in a partial-observable environment. We propose that multi-agent systems must have the ability to (1) communicate and understand the inter-plays between agents and (2) correctly distribute rewards based on an individual agent’s contribution. In contrast, most work in this setting considers only one of the above abilities. In this study, we develop an architecture that allows for communication among agents and tailors the system’s reward for each individual agent. Our architecture represents agent communication through graph convolution and applies an existing credit assignment structure, counterfactual multi-agent policy gradient (COMA), to assist agents to learn communication by back-propagation. The flexibility of the graph structure enables our method to be applicable to a variety of multi-agent systems, e.g. dynamic systems that consist of varying numbers of agents and static systems with a fixed number of agents. We evaluate our method on a range of tasks, demonstrating the advantage of marrying communication with credit assignment. In the experiments, our proposed method yields better performance than the state-of-art methods, including COMA. Moreover, we show that the communication strategies offers us insights and interpretability of the system’s cooperative policies. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00470v1 |
https://arxiv.org/pdf/2004.00470v1.pdf | |
PWC | https://paperswithcode.com/paper/counterfactual-multi-agent-reinforcement |
Repo | |
Framework | |
A scale-dependent notion of effective dimension
Title | A scale-dependent notion of effective dimension |
Authors | Oksana Berezniuk, Alessio Figalli, Raffaele Ghigliazza, Kharen Musaelian |
Abstract | We introduce a notion of “effective dimension” of a statistical model based on the number of cubes of size $1/\sqrt{n}$ needed to cover the model space when endowed with the Fisher Information Matrix as metric, $n$ being the number of observations. The number of observations fixes a natural scale or resolution. The effective dimension is then measured via the spectrum of the Fisher Information Matrix regularized using this natural scale. |
Tasks | |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2001.10872v1 |
https://arxiv.org/pdf/2001.10872v1.pdf | |
PWC | https://paperswithcode.com/paper/a-scale-dependent-notion-of-effective |
Repo | |
Framework | |
Deep transformation models: Tackling complex regression problems with neural network based transformation models
Title | Deep transformation models: Tackling complex regression problems with neural network based transformation models |
Authors | Beate Sick, Torsten Hothorn, Oliver Dürr |
Abstract | We present a deep transformation model for probabilistic regression. Deep learning is known for outstandingly accurate predictions on complex data but in regression tasks, it is predominantly used to just predict a single number. This ignores the non-deterministic character of most tasks. Especially if crucial decisions are based on the predictions, like in medical applications, it is essential to quantify the prediction uncertainty. The presented deep learning transformation model estimates the whole conditional probability distribution, which is the most thorough way to capture uncertainty about the outcome. We combine ideas from a statistical transformation model (most likely transformation) with recent transformation models from deep learning (normalizing flows) to predict complex outcome distributions. The core of the method is a parameterized transformation function which can be trained with the usual maximum likelihood framework using gradient descent. The method can be combined with existing deep learning architectures. For small machine learning benchmark datasets, we report state of the art performance for most dataset and partly even outperform it. Our method works for complex input data, which we demonstrate by employing a CNN architecture on image data. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00464v1 |
https://arxiv.org/pdf/2004.00464v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-transformation-models-tackling-complex |
Repo | |
Framework | |
PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
Title | PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization |
Authors | Shunsuke Saito, Tomas Simon, Jason Saragih, Hanbyul Joo |
Abstract | Recent advances in image-based 3D human shape estimation have been driven by the significant improvement in representation power afforded by deep neural networks. Although current approaches have demonstrated the potential in real world settings, they still fail to produce reconstructions with the level of detail often present in the input images. We argue that this limitation stems primarily form two conflicting requirements; accurate predictions require large context, but precise predictions require high resolution. Due to memory limitations in current hardware, previous approaches tend to take low resolution images as input to cover large spatial context, and produce less precise (or low resolution) 3D estimates as a result. We address this limitation by formulating a multi-level architecture that is end-to-end trainable. A coarse level observes the whole image at lower resolution and focuses on holistic reasoning. This provides context to an fine level which estimates highly detailed geometry by observing higher-resolution images. We demonstrate that our approach significantly outperforms existing state-of-the-art techniques on single image human shape reconstruction by fully leveraging 1k-resolution input images. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00452v1 |
https://arxiv.org/pdf/2004.00452v1.pdf | |
PWC | https://paperswithcode.com/paper/pifuhd-multi-level-pixel-aligned-implicit |
Repo | |
Framework | |