Paper Group ANR 395
On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime. Deep Molecular Programming: A Natural Implementation of Binary-Weight ReLU Neural Networks. The Problem with Metrics is a Fundamental Problem for AI. Detection of FLOSS version release events from Stack Overflow message data. Low Latency ASR for Simultaneous …
On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime
Title | On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime |
Authors | Arman Rahbar, Ashkan Panahi, Chiranjib Bhattacharyya, Devdatt Dubhashi, Morteza Haghir Chehreghani |
Abstract | Knowledge distillation (KD), i.e. one classifier being trained on the outputs of another classifier, is an empirically very successful technique for knowledge transfer between classifiers. It has even been observed that classifiers learn much faster and more reliably if trained with the outputs of another classifier as soft labels, instead of from ground truth data. However, there has been little or no theoretical analysis of this phenomenon. We provide the first theoretical analysis of KD in the setting of extremely wide two layer non-linear networks in model and regime in (Arora et al., 2019; Du & Hu, 2019; Cao & Gu, 2019). We prove results on what the student network learns and on the rate of convergence for the student network. Intriguingly, we also confirm the lottery ticket hypothesis (Frankle & Carbin, 2019) in this model. To prove our results, we extend the repertoire of techniques from linear systems dynamics. We give corresponding experimental analysis that validates the theoretical results and yields additional insights. |
Tasks | Transfer Learning |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13438v1 |
https://arxiv.org/pdf/2003.13438v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-unreasonable-effectiveness-of |
Repo | |
Framework | |
Deep Molecular Programming: A Natural Implementation of Binary-Weight ReLU Neural Networks
Title | Deep Molecular Programming: A Natural Implementation of Binary-Weight ReLU Neural Networks |
Authors | Marko Vasic, Cameron Chalk, Sarfraz Khurshid, David Soloveichik |
Abstract | Embedding computation in molecular contexts incompatible with traditional electronics is expected to have wide ranging impact in synthetic biology, medicine, nanofabrication and other fields. A key remaining challenge lies in developing programming paradigms for molecular computation that are well-aligned with the underlying chemical hardware and do not attempt to shoehorn ill-fitting electronics paradigms. We discover a surprisingly tight connection between a popular class of neural networks (Binary-weight ReLU aka BinaryConnect) and a class of coupled chemical reactions that are absolutely robust to reaction rates. The robustness of rate-independent chemical computation makes it a promising target for bioengineering implementation. We show how a BinaryConnect neural network trained in silico using well-founded deep learning optimization techniques, can be compiled to an equivalent chemical reaction network, providing a novel molecular programming paradigm. We illustrate such translation on the paradigmatic IRIS and MNIST datasets. Toward intended applications of chemical computation, we further use our method to generate a CRN that can discriminate between different virus types based on gene expression levels. Our work sets the stage for rich knowledge transfer between neural network and molecular programming communities. |
Tasks | Transfer Learning |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13720v1 |
https://arxiv.org/pdf/2003.13720v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-molecular-programming-a-natural |
Repo | |
Framework | |
The Problem with Metrics is a Fundamental Problem for AI
Title | The Problem with Metrics is a Fundamental Problem for AI |
Authors | Rachel Thomas, David Uminsky |
Abstract | Optimizing a given metric is a central aspect of most current AI approaches, yet overemphasizing metrics leads to manipulation, gaming, a myopic focus on short-term goals, and other unexpected negative consequences. This poses a fundamental contradiction for AI development. Through a series of real-world case studies, we look at various aspects of where metrics go wrong in practice and aspects of how our online environment and current business practices are exacerbating these failures. Finally, we propose a framework towards mitigating the harms caused by overemphasis of metrics within AI by: (1) using a slate of metrics to get a fuller and more nuanced picture, (2) combining metrics with qualitative accounts, and (3) involving a range of stakeholders, including those who will be most impacted. |
Tasks | |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08512v1 |
https://arxiv.org/pdf/2002.08512v1.pdf | |
PWC | https://paperswithcode.com/paper/the-problem-with-metrics-is-a-fundamental |
Repo | |
Framework | |
Detection of FLOSS version release events from Stack Overflow message data
Title | Detection of FLOSS version release events from Stack Overflow message data |
Authors | A. Sokolovsky, T. Gross, J. Bacardit |
Abstract | Topic Detection and Tracking (TDT) is a very active research question within the area of text mining, generally applied to news feeds and Twitter datasets, where topics and events are detected. The notion of “event” is broad, but typically it applies to occurrences that can be detected from a single post or a message. Little attention has been drawn to what we call “micro-events”, which, due to their nature, cannot be detected from a single piece of textual information. The study investigates micro-event detection on textual data using a sample of messages from the Stack Overflow Q&A platform in order to detect Free/Libre Open Source Software (FLOSS) version releases. Micro-events are detected using logistic regression models with step-wise forward regression feature selection from a set of LDA topics and sentiment analysis features. We perform a detailed statistical analysis of the models, including influential cases, variance inflation factors, validation of the linearity assumption, pseudo R squared measures and no-information rate. Finally, in order to understand the detection limits and improve the performance of the estimators, we suggest a method for generating micro-event synthetic datasets and use them identify the micro-event detectability thresholds. |
Tasks | Feature Selection, Sentiment Analysis |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.14257v1 |
https://arxiv.org/pdf/2003.14257v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-floss-version-release-events |
Repo | |
Framework | |
Low Latency ASR for Simultaneous Speech Translation
Title | Low Latency ASR for Simultaneous Speech Translation |
Authors | Thai Son Nguyen, Jan Niehues, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Muller, Matthias Sperber, Sebastian Stueker, Alex Waibel |
Abstract | User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we focused on word latency. We used it to analyze the performance of our current system and to identify opportunities for improvements. In order to minimize the latency we combined run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription. This combination reduces the latency at word level, where the words are final and will never be updated again in the future, from 18.1s to 1.1s without sacrificing performance in terms of word error rate. |
Tasks | Speech Recognition |
Published | 2020-03-22 |
URL | https://arxiv.org/abs/2003.09891v1 |
https://arxiv.org/pdf/2003.09891v1.pdf | |
PWC | https://paperswithcode.com/paper/low-latency-asr-for-simultaneous-speech |
Repo | |
Framework | |
Experience Selection Using Dynamics Similarity for Efficient Multi-Source Transfer Learning Between Robots
Title | Experience Selection Using Dynamics Similarity for Efficient Multi-Source Transfer Learning Between Robots |
Authors | Michael J. Sorocky, Siqi Zhou, Angela P. Schoellig |
Abstract | In the robotics literature, different knowledge transfer approaches have been proposed to leverage the experience from a source task or robot – real or virtual – to accelerate the learning process on a new task or robot. A commonly made but infrequently examined assumption is that incorporating experience from a source task or robot will be beneficial. In practice, inappropriate knowledge transfer can result in negative transfer or unsafe behaviour. In this work, inspired by a system gap metric from robust control theory, the $\nu$-gap, we present a data-efficient algorithm for estimating the similarity between pairs of robot systems. In a multi-source inter-robot transfer learning setup, we show that this similarity metric allows us to predict relative transfer performance and thus informatively select experiences from a source robot before knowledge transfer. We demonstrate our approach with quadrotor experiments, where we transfer an inverse dynamics model from a real or virtual source quadrotor to enhance the tracking performance of a target quadrotor on arbitrary hand-drawn trajectories. We show that selecting experiences based on the proposed similarity metric effectively facilitates the learning of the target quadrotor, improving performance by 62% compared to a poorly selected experience. |
Tasks | Transfer Learning |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.13150v1 |
https://arxiv.org/pdf/2003.13150v1.pdf | |
PWC | https://paperswithcode.com/paper/experience-selection-using-dynamics |
Repo | |
Framework | |
A Real-Time Deep Network for Crowd Counting
Title | A Real-Time Deep Network for Crowd Counting |
Authors | Xiaowen Shi, Xin Li, Caili Wu, Shuchen Kong, Jing Yang, Liang He |
Abstract | Automatic analysis of highly crowded people has attracted extensive attention from computer vision research. Previous approaches for crowd counting have already achieved promising performance across various benchmarks. However, to deal with the real situation, we hope the model run as fast as possible while keeping accuracy. In this paper, we propose a compact convolutional neural network for crowd counting which learns a more efficient model with a small number of parameters. With three parallel filters executing the convolutional operation on the input image simultaneously at the front of the network, our model could achieve nearly real-time speed and save more computing resources. Experiments on two benchmarks show that our proposed method not only takes a balance between performance and efficiency which is more suitable for actual scenes but also is superior to existing light-weight models in speed. |
Tasks | Crowd Counting |
Published | 2020-02-16 |
URL | https://arxiv.org/abs/2002.06515v1 |
https://arxiv.org/pdf/2002.06515v1.pdf | |
PWC | https://paperswithcode.com/paper/a-real-time-deep-network-for-crowd-counting |
Repo | |
Framework | |
Robustness Analysis of the Data-Selective Volterra NLMS Algorithm
Title | Robustness Analysis of the Data-Selective Volterra NLMS Algorithm |
Authors | Javad Sharafi, Abbas Maarefparvar |
Abstract | Recently, the data-selective adaptive Volterra filters have been proposed; however, up to now, there are not any theoretical analyses on its behavior rather than numerical simulations. Therefore, in this paper, we analyze the robustness (in the sense of l2-stability) of the data-selective Volterra normalized least-mean-square (DS-VNLMS) algorithm. First, we study the local robustness of this algorithm at any iteration, then we propose a global bound for the error/discrepancy in the coefficient vector. Also, we demonstrate that the DS-VNLMS algorithm improves the parameter estimation for the majority of the iterations that an update is implemented. Moreover, we prove that if the noise bound is known, we can set the DS-VNLMS so that it never degrades the estimate. The simulation results corroborate the validity of the executed analysis and demonstrate that the DS-VNLMS algorithm is robust against noise, no matter how its parameters are adopted. |
Tasks | |
Published | 2020-03-25 |
URL | https://arxiv.org/abs/2003.11514v1 |
https://arxiv.org/pdf/2003.11514v1.pdf | |
PWC | https://paperswithcode.com/paper/robustness-analysis-of-the-data-selective |
Repo | |
Framework | |
Surrogate-free machine learning-based organ dose reconstruction for pediatric abdominal radiotherapy
Title | Surrogate-free machine learning-based organ dose reconstruction for pediatric abdominal radiotherapy |
Authors | M. Virgolin, Z. Wang, B. V. Balgobind, I. W. E. M. van Dijk, J. Wiersma, P. S. Kroon, G. O. Janssens, M. van Herk, D. C. Hodgson, L. Zadravec Zaletel, C. R. N. Rasch, A. Bel, P. A. N. Bosman, T. Alderliesten |
Abstract | To study radiotherapy-related adverse effects, detailed dose information (3D distribution) is needed for accurate dose-effect modeling. For childhood cancer survivors who underwent radiotherapy in the pre-CT era, only 2D radiographs were acquired, thus 3D dose distributions must be reconstructed. State-of-the-art methods achieve this by using 3D surrogate anatomies. These can however lack personalization and lead to coarse reconstructions. We present and validate a surrogate-free dose reconstruction method based on Machine Learning (ML). Abdominal planning CTs (n=142) of recently-treated childhood cancer patients were gathered, their organs at risk were segmented, and 300 artificial Wilms’ tumor plans were sampled automatically. Each artificial plan was automatically emulated on the 142 CTs, resulting in 42,600 3D dose distributions from which dose-volume metrics were derived. Anatomical features were extracted from digitally reconstructed radiographs simulated from the CTs to resemble historical radiographs. Further, patient and radiotherapy plan features typically available from historical treatment records were collected. An evolutionary ML algorithm was then used to link features to dose-volume metrics. Besides 5-fold cross validation, a further evaluation was done on an independent dataset of five CTs each associated with two clinical plans. Cross-validation resulted in mean absolute errors $\leq$0.6 Gy for organs completely inside or outside the field. For organs positioned at the edge of the field, mean absolute errors $\leq$1.7 Gy for $D_{mean}$, $\leq$2.9 Gy for $D_{2cc}$, and $\leq$13% for $V_{5Gy}$ and $V_{10Gy}$, were obtained, without systematic bias. Similar results were found for the independent dataset. To conclude, our novel organ dose reconstruction method is not only accurate, but also efficient, as the setup of a surrogate is no longer needed. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.07161v1 |
https://arxiv.org/pdf/2002.07161v1.pdf | |
PWC | https://paperswithcode.com/paper/surrogate-free-machine-learning-based-organ |
Repo | |
Framework | |
180-degree Outpainting from a Single Image
Title | 180-degree Outpainting from a Single Image |
Authors | Zhenqiang Ying, Alan Bovik |
Abstract | Presenting context images to a viewer’s peripheral vision is one of the most effective techniques to enhance immersive visual experiences. However, most images only present a narrow view, since the field-of-view (FoV) of standard cameras is small. To overcome this limitation, we propose a deep learning approach that learns to predict a 180{\deg} panoramic image from a narrow-view image. Specifically, we design a foveated framework that applies different strategies on near-periphery and mid-periphery regions. Two networks are trained separately, and then are employed jointly to sequentially perform narrow-to-90{\deg} generation and 90{\deg}-to-180{\deg} generation. The generated outputs are then fused with their aligned inputs to produce expanded equirectangular images for viewing. Our experimental results show that single-view-to-panoramic image generation using deep learning is both feasible and promising. |
Tasks | Image Generation |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04568v1 |
https://arxiv.org/pdf/2001.04568v1.pdf | |
PWC | https://paperswithcode.com/paper/180-degree-outpainting-from-a-single-image |
Repo | |
Framework | |
A Neural Approach to Ordinal Regression for the Preventive Assessment of Developmental Dyslexia
Title | A Neural Approach to Ordinal Regression for the Preventive Assessment of Developmental Dyslexia |
Authors | F. J. Martinez-Murcia, A. Ortiz, M. Lopez-Zamora, J. L. Luque, A. Giménez |
Abstract | Developmental Dyslexia (DD) is a learning disability related to the acquisition of reading skills that affects about 5% of the population. DD can have an enormous impact on the intellectual and personal development of affected children, so early detection is key to implementing preventive strategies for teaching language. Research has shown that there may be biological underpinnings to DD that affect phoneme processing, and hence these symptoms may be identifiable before reading ability is acquired, allowing for early intervention. In this paper we propose a new methodology to assess the risk of DD before students learn to read. For this purpose, we propose a mixed neural model that calculates risk levels of dyslexia from tests that can be completed at the age of 5 years. Our method first trains an auto-encoder, and then combines the trained encoder with an optimized ordinal regression neural network devised to ensure consistency of predictions. Our experiments show that the system is able to detect unaffected subjects two years before it can assess the risk of DD based mainly on phonological processing, giving a specificity of 0.969 and a correct rate of more than 0.92. In addition, the trained encoder can be used to transform test results into an interpretable subject spatial distribution that facilitates risk assessment and validates methodology. |
Tasks | |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02184v1 |
https://arxiv.org/pdf/2002.02184v1.pdf | |
PWC | https://paperswithcode.com/paper/a-neural-approach-to-ordinal-regression-for |
Repo | |
Framework | |
ShapeVis: High-dimensional Data Visualization at Scale
Title | ShapeVis: High-dimensional Data Visualization at Scale |
Authors | Nupur Kumari, Siddarth R., Akash Rupela, Piyush Gupta, Balaji Krishnamurthy |
Abstract | We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation. Much success has been reported by the data visualization technique Mapper, that discreetly approximates the Reeb graph of a filter function on the data. However, when using standard dimensionality reduction algorithms as the filter function, Mapper suffers from considerable computational cost. This makes it difficult to scale to high-dimensional data. Our proposed technique relies on finding a subset of points called landmarks along the data manifold to construct a weighted witness-graph over it. This graph captures the structural characteristics of the point cloud, and its weights are determined using a Finite Markov Chain. We further compress this graph by applying induced maps from standard community detection algorithms. Using techniques borrowed from manifold tearing, we prune and reinstate edges in the induced graph based on their modularity to summarize the shape of data. We empirically demonstrate how our technique captures the structural characteristics of real and synthetic data sets. Further, we compare our approach with Mapper using various filter functions like t-SNE, UMAP, LargeVis and show that our algorithm scales to millions of data points while preserving the quality of data visualization. |
Tasks | Community Detection, Dimensionality Reduction, Topological Data Analysis |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05166v2 |
https://arxiv.org/pdf/2001.05166v2.pdf | |
PWC | https://paperswithcode.com/paper/shapevis-high-dimensional-data-visualization |
Repo | |
Framework | |
Controllable Person Image Synthesis with Attribute-Decomposed GAN
Title | Controllable Person Image Synthesis with Attribute-Decomposed GAN |
Authors | Yifang Men, Yiming Mao, Yuning Jiang, Wei-Ying Ma, Zhouhui Lian |
Abstract | This paper introduces the Attribute-Decomposed GAN, a novel generative model for controllable person image synthesis, which can produce realistic person images with desired human attributes (e.g., pose, head, upper clothes and pants) provided in various source inputs. The core idea of the proposed model is to embed human attributes into the latent space as independent codes and thus achieve flexible and continuous control of attributes via mixing and interpolation operations in explicit style representations. Specifically, a new architecture consisting of two encoding pathways with style block connections is proposed to decompose the original hard mapping into multiple more accessible subtasks. In source pathway, we further extract component layouts with an off-the-shelf human parser and feed them into a shared global texture encoder for decomposed latent codes. This strategy allows for the synthesis of more realistic output images and automatic separation of un-annotated attributes. Experimental results demonstrate the proposed method’s superiority over the state of the art in pose transfer and its effectiveness in the brand-new task of component attribute transfer. |
Tasks | Continuous Control, Image Generation, Pose Transfer |
Published | 2020-03-27 |
URL | https://arxiv.org/abs/2003.12267v1 |
https://arxiv.org/pdf/2003.12267v1.pdf | |
PWC | https://paperswithcode.com/paper/controllable-person-image-synthesis-with |
Repo | |
Framework | |
Online Hierarchical Forecasting for Power Consumption Data
Title | Online Hierarchical Forecasting for Power Consumption Data |
Authors | Margaux Brégère, Malo Huard |
Abstract | We study the forecasting of the power consumptions of a population of households and of subpopulations thereof. These subpopulations are built according to location, to exogenous information and/or to profiles we determined from historical households consumption time series. Thus, we aim to forecast the electricity consumption time series at several levels of households aggregation. These time series are linked through some summation constraints which induce a hierarchy. Our approach consists in three steps: feature generation, aggregation and projection. Firstly (feature generation step), we build, for each considering group for households, a benchmark forecast (called features), using random forests or generalized additive models. Secondly (aggregation step), aggregation algorithms, run in parallel, aggregate these forecasts and provide new predictions. Finally (projection step), we use the summation constraints induced by the time series underlying hierarchy to re-conciliate the forecasts by projecting them in a well-chosen linear subspace. We provide some theoretical guaranties on the average prediction error of this methodology, through the minimization of a quantity called regret. We also test our approach on households power consumption data collected in Great Britain by multiple energy providers in the Energy Demand Research Project context. We build and compare various population segmentations for the evaluation of our approach performance. |
Tasks | Time Series |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00585v1 |
https://arxiv.org/pdf/2003.00585v1.pdf | |
PWC | https://paperswithcode.com/paper/online-hierarchical-forecasting-for-power |
Repo | |
Framework | |
Multiview Chirality
Title | Multiview Chirality |
Authors | Sameer Agarwal, Andrew Pryhuber, Rainer Sinn, Rekha R. Thomas |
Abstract | Given an arrangement of cameras $\mathcal{A} = {A_1,\dots, A_m}$, the chiral domain of $\mathcal{A}$ is the subset of $\mathbb{P}^3$ that lies in front it. It is a generalization of the classical definition of chirality. We give an algebraic description of this set and use it to generalize Hartley’s theory of chiral reconstruction to $m \ge 2$ views and derive a chiral version of Triggs’ Joint Image. |
Tasks | |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.09265v1 |
https://arxiv.org/pdf/2003.09265v1.pdf | |
PWC | https://paperswithcode.com/paper/multiview-chirality |
Repo | |
Framework | |