Paper Group ANR 10
Point-of-Care Diabetic Retinopathy Diagnosis: A Standalone Mobile Application Approach. Ballooning Multi-Armed Bandits. Workshop Report: Detection and Classification in Marine Bioacoustics with Deep Learning. Addressing the Memory Bottleneck in AI Model Training. Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition. F …
Point-of-Care Diabetic Retinopathy Diagnosis: A Standalone Mobile Application Approach
Title | Point-of-Care Diabetic Retinopathy Diagnosis: A Standalone Mobile Application Approach |
Authors | Misgina Tsighe Hagos |
Abstract | Although deep learning research and applications have grown rapidly over the past decade, it has shown limitation in healthcare applications and its reachability to people in remote areas. One of the challenges of incorporating deep learning in medical data classification or prediction is the shortage of annotated training data in the healthcare industry. Medical data sharing privacy issues and limited patient population size can be stated as some of the reasons for training data insufficiency in healthcare. Methods to exploit deep learning applications in healthcare have been proposed and implemented in this dissertation. Traditional diagnosis of diabetic retinopathy requires trained ophthalmologists and expensive imaging equipment to reach healthcare centres in order to provide facilities for treatment of preventable blindness. Diabetic people residing in remote areas with shortage of healthcare services and ophthalmologists usually fail to get periodical diagnosis of diabetic retinopathy thereby facing the probability of vision loss or impairment. Deep learning and mobile application development have been integrated in this dissertation to provide an easy to use point-of-care smartphone based diagnosis of diabetic retinopathy. In order to solve the challenge of shortage of healthcare centres and trained ophthalmologists, the standalone diagnostic service was built so as to be operated by a non-expert without an internet connection. This approach could be transferred to other areas of medical image classification. |
Tasks | Image Classification |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2002.04066v1 |
https://arxiv.org/pdf/2002.04066v1.pdf | |
PWC | https://paperswithcode.com/paper/point-of-care-diabetic-retinopathy-diagnosis |
Repo | |
Framework | |
Ballooning Multi-Armed Bandits
Title | Ballooning Multi-Armed Bandits |
Authors | Ganesh Ghalme, Swapnil Dhamal, Shweta Jain, Sujit Gujar, Y. Narahari |
Abstract | In this paper, we introduce Ballooning Multi-Armed Bandits (BL-MAB), a novel extension to the classical stochastic MAB model. In BL-MAB model, the set of available arms grows (or balloons) over time. In contrast to the classical MAB setting where the regret is computed with respect to the best arm overall, the regret in a BL-MAB setting is computed with respect to the best available arm at each time. We first observe that the existing MAB algorithms are not regret-optimal for the BL-MAB model. We show that if the best arm is equally likely to arrive at any time, a sub-linear regret cannot be achieved, irrespective of the arrival of other arms. We further show that if the best arm is more likely to arrive in the early rounds, one can achieve sub-linear regret. Our proposed algorithm determines (1) the fraction of the time horizon for which the newly arriving arms should be explored and (2) the sequence of arm pulls in the exploitation phase from among the explored arms. Making reasonable assumptions on the arrival distribution of the best arm in terms of the thinness of the distribution’s tail, we prove that the proposed algorithm achieves sub-linear instance-independent regret. We further quantify the explicit dependence of regret on the arrival distribution parameters. We reinforce our theoretical findings with extensive simulation results. |
Tasks | Multi-Armed Bandits |
Published | 2020-01-24 |
URL | https://arxiv.org/abs/2001.10055v1 |
https://arxiv.org/pdf/2001.10055v1.pdf | |
PWC | https://paperswithcode.com/paper/ballooning-multi-armed-bandits |
Repo | |
Framework | |
Workshop Report: Detection and Classification in Marine Bioacoustics with Deep Learning
Title | Workshop Report: Detection and Classification in Marine Bioacoustics with Deep Learning |
Authors | Fabio Frazao, Bruno Padovese, Oliver S. Kirsebom |
Abstract | On 21-22 November 2019, about 30 researchers gathered in Victoria, BC, Canada, for the workshop “Detection and Classification in Marine Bioacoustics with Deep Learning” organized by MERIDIAN and hosted by Ocean Networks Canada. The workshop was attended by marine biologists, data scientists, and computer scientists coming from both Canadian coasts and the US and representing a wide spectrum of research organizations including universities, government (Fisheries and Oceans Canada, National Oceanic and Atmospheric Administration), industry (JASCO Applied Sciences, Google, Axiom Data Science), and non-for-profits (Orcasound, OrcaLab). Consisting of a mix of oral presentations, open discussion sessions, and hands-on tutorials, the workshop program offered a rare opportunity for specialists from distinctly different domains to engage in conversation about deep learning and its promising potential for the development of detection and classification algorithms in underwater acoustics. In this workshop report, we summarize key points from the presentations and discussion sessions. |
Tasks | |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.08249v1 |
https://arxiv.org/pdf/2002.08249v1.pdf | |
PWC | https://paperswithcode.com/paper/workshop-report-detection-and-classification |
Repo | |
Framework | |
Addressing the Memory Bottleneck in AI Model Training
Title | Addressing the Memory Bottleneck in AI Model Training |
Authors | David Ojika, Bhavesh Patel, G. Anthony Reina, Trent Boyer, Chad Martin, Prashant Shah |
Abstract | Using medical imaging as case-study, we demonstrate how Intel-optimized TensorFlow on an x86-based server equipped with 2nd Generation Intel Xeon Scalable Processors with large system memory allows for the training of memory-intensive AI/deep-learning models in a scale-up server configuration. We believe our work represents the first training of a deep neural network having large memory footprint (~ 1 TB) on a single-node server. We recommend this configuration to scientists and researchers who wish to develop large, state-of-the-art AI models but are currently limited by memory. |
Tasks | |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.08732v1 |
https://arxiv.org/pdf/2003.08732v1.pdf | |
PWC | https://paperswithcode.com/paper/addressing-the-memory-bottleneck-in-ai-model |
Repo | |
Framework | |
Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition
Title | Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition |
Authors | Yunhao Ge, Jiaping Zhao, Laurent Itti |
Abstract | Object pose increases interclass object variance which makes object recognition from 2D images harder. To render a classifier robust to pose variations, most deep neural networks try to eliminate the influence of pose by using large datasets with many poses for each class. Here, we propose a different approach: a class-agnostic object pose transformation network (OPT-Net) can transform an image along 3D yaw and pitch axes to synthesize additional poses continuously. Synthesized images lead to better training of an object classifier. We design a novel eliminate-add structure to explicitly disentangle pose from object identity: first eliminate pose information of the input image and then add target pose information (regularized as continuous variables) to synthesize any target pose. We trained OPT-Net on images of toy vehicles shot on a turntable from the iLab-20M dataset. After training on unbalanced discrete poses (5 classes with 6 poses per object instance, plus 5 classes with only 2 poses), we show that OPT-Net can synthesize balanced continuous new poses along yaw and pitch axes with high quality. Training a ResNet-18 classifier with original plus synthesized poses improves mAP accuracy by 9% overtraining on original poses only. Further, the pre-trained OPT-Net can generalize to new object classes, which we demonstrate on both iLab-20M and RGB-D. We also show that the learned features can generalize to ImageNet. |
Tasks | Object Recognition |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08526v1 |
https://arxiv.org/pdf/2003.08526v1.pdf | |
PWC | https://paperswithcode.com/paper/pose-augmentation-class-agnostic-object-pose |
Repo | |
Framework | |
Fabric Surface Characterization: Assessment of Deep Learning-based Texture Representations Using a Challenging Dataset
Title | Fabric Surface Characterization: Assessment of Deep Learning-based Texture Representations Using a Challenging Dataset |
Authors | Yuting Hu, Zhiling Long, Anirudha Sundaresan, Motaz Alfarraj, Ghassan AlRegib, Sungmee Park, Sundaresan Jayaraman |
Abstract | Tactile sensing or fabric hand plays a critical role in an individual’s decision to buy a certain fabric from the range of available fabrics for a desired application. Therefore, textile and clothing manufacturers have long been in search of an objective method for assessing fabric hand, which can then be used to engineer fabrics with a desired hand. Recognizing textures and materials in real-world images has played an important role in object recognition and scene understanding. In this paper, we explore how to computationally characterize apparent or latent properties (e.g., surface smoothness) of materials, i.e., computational material surface characterization, which moves a step further beyond material recognition. We formulate the problem as a very fine-grained texture classification problem, and study how deep learning-based texture representation techniques can help tackle the task. We introduce a new, large-scale challenging microscopic material surface dataset (CoMMonS), geared towards an automated fabric quality assessment mechanism in an intelligent manufacturing system. We then conduct a comprehensive evaluation of state-of-the-art deep learning-based methods for texture classification using CoMMonS. Additionally, we propose a multi-level texture encoding and representation network (MuLTER), which simultaneously leverages low- and high-level features to maintain both texture details and spatial information in the texture representation. Our results show that, in comparison with the state-of-the-art deep texture descriptors, MuLTER yields higher accuracy not only on our CoMMonS dataset for material characterization, but also on established datasets such as MINC-2500 and GTOS-mobile for material recognition. |
Tasks | Material Recognition, Object Recognition, Scene Understanding, Texture Classification |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07725v1 |
https://arxiv.org/pdf/2003.07725v1.pdf | |
PWC | https://paperswithcode.com/paper/fabric-surface-characterization-assessment-of |
Repo | |
Framework | |
Towards a Framework for Visual Intelligence in Service Robotics: Epistemic Requirements and Gap Analysis
Title | Towards a Framework for Visual Intelligence in Service Robotics: Epistemic Requirements and Gap Analysis |
Authors | Agnese Chiatti, Enrico Motta, Enrico Daga |
Abstract | A key capability required by service robots operating in real-world, dynamic environments is that of Visual Intelligence, i.e., the ability to use their vision system, reasoning components and background knowledge to make sense of their environment. In this paper, we analyze the epistemic requirements for Visual Intelligence, both in a top-down fashion, using existing frameworks for human-like Visual Intelligence in the literature, and from the bottom up, based on the errors emerging from object recognition trials in a real-world robotic scenario. Finally, we use these requirements to evaluate current knowledge bases for Service Robotics and to identify gaps in the support they provide for Visual Intelligence. These gaps provide the basis of a research agenda for developing more effective knowledge representations for Visual Intelligence. |
Tasks | Object Recognition |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06171v1 |
https://arxiv.org/pdf/2003.06171v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-a-framework-for-visual-intelligence |
Repo | |
Framework | |
Automatic marker-free registration of tree point-cloud data based on rotating projection
Title | Automatic marker-free registration of tree point-cloud data based on rotating projection |
Authors | Xiuxian Xu, Pei Wang, Xiaozheng Gan, Yaxin Li, Li Zhang, Qing Zhang, Mei Zhou, Yinghui Zhao, Xinwei Li |
Abstract | Point-cloud data acquired using a terrestrial laser scanner (TLS) play an important role in digital forestry research. Multiple scans are generally used to overcome occlusion effects and obtain complete tree structural information. However, it is time-consuming and difficult to place artificial reflectors in a forest with complex terrain for marker-based registration, a process that reduces registration automation and efficiency. In this study, we propose an automatic coarse-to-fine method for the registration of point-cloud data from multiple scans of a single tree. In coarse registration, point clouds produced by each scan are projected onto a spherical surface to generate a series of two-dimensional (2D) images, which are used to estimate the initial positions of multiple scans. Corresponding feature-point pairs are then extracted from these series of 2D images. In fine registration, point-cloud data slicing and fitting methods are used to extract corresponding central stem and branch centers for use as tie points to calculate fine transformation parameters. To evaluate the accuracy of registration results, we propose a model of error evaluation via calculating the distances between center points from corresponding branches in adjacent scans. For accurate evaluation, we conducted experiments on two simulated trees and a real-world tree. Average registration errors of the proposed method were 0.26m around on simulated tree point clouds, and 0.05m around on real-world tree point cloud. |
Tasks | |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11192v1 |
https://arxiv.org/pdf/2001.11192v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-marker-free-registration-of-tree |
Repo | |
Framework | |
RCNet: Incorporating Structural Information into Deep RNN for MIMO-OFDM Symbol Detection with Limited Training
Title | RCNet: Incorporating Structural Information into Deep RNN for MIMO-OFDM Symbol Detection with Limited Training |
Authors | Zhou Zhou, Lingjia Liu, Shashank Jere, Jianzhong, Zhang, Yang Yi |
Abstract | In this paper, we investigate learning-based MIMO-OFDM symbol detection strategies focusing on a special recurrent neural network (RNN) – reservoir computing (RC). We first introduce the Time-Frequency RC to take advantage of the structural information inherent in OFDM signals. Using the time domain RC and the time-frequency RC as the building blocks, we provide two extensions of the shallow RC to RCNet: 1) Stacking multiple time domain RCs; 2) Stacking multiple time-frequency RCs into a deep structure. The combination of RNN dynamics, the time-frequency structure of MIMO-OFDM signals, and the deep network enables RCNet to handle the interference and nonlinear distortion of MIMO-OFDM signals to outperform existing methods. Unlike most existing NN-based detection strategies, RCNet is also shown to provide a good generalization performance even with a limited training set (i.e, similar amount of reference signals/training as standard model-based approaches). Numerical experiments demonstrate that the introduced RCNet can offer a faster learning convergence and as much as 20% gain in bit error rate over a shallow RC structure by compensating for the nonlinear distortion of the MIMO-OFDM signal, such as due to power amplifier compression in the transmitter or due to finite quantization resolution in the receiver. |
Tasks | Quantization |
Published | 2020-03-15 |
URL | https://arxiv.org/abs/2003.06923v1 |
https://arxiv.org/pdf/2003.06923v1.pdf | |
PWC | https://paperswithcode.com/paper/rcnet-incorporating-structural-information |
Repo | |
Framework | |
Generative Low-bitwidth Data Free Quantization
Title | Generative Low-bitwidth Data Free Quantization |
Authors | Shoukai Xu, Haokun Li, Bohan Zhuang, Jing Liu, Jiezhang Cao, Chuangrun Liang, Mingkui Tan |
Abstract | Neural network quantization is an effective way to compress deep models and improve the execution latency and energy efficiency, so that they can be deployed on mobile or embedded devices. Existing quantization methods require original data for calibration or fine-tuning to get better performance. However, in many real-world scenarios, the data may not be available due to confidential or private issues, making existing quantization methods not applicable. Moreover, due to the absence of original data, the recently developed generative adversarial networks (GANs) can not be applied to generate data. Although the full precision model may contain the entire data information, such information alone is hard to exploit for recovering the original data or generating new meaningful data. In this paper, we investigate a simple-yet-effective method called Generative Low-bitwidth Data Free Quantization to remove the data dependence burden. Specifically, we propose a Knowledge Matching Generator to produce meaningful fake data by exploiting classification boundary knowledge and distribution information in the pre-trained model. With the help of generated data, we are able to quantize a model by learning knowledge from the pre-trained model. Extensive experiments on three data sets demonstrate the effectiveness of our method. More critically, our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method. |
Tasks | Calibration, Quantization |
Published | 2020-03-07 |
URL | https://arxiv.org/abs/2003.03603v1 |
https://arxiv.org/pdf/2003.03603v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-low-bitwidth-data-free |
Repo | |
Framework | |
Reward-Free Exploration for Reinforcement Learning
Title | Reward-Free Exploration for Reinforcement Learning |
Authors | Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu |
Abstract | Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new “reward-free RL” framework. In the exploration phase, the agent first collects trajectories from an MDP $\mathcal{M}$ without a pre-specified reward function. After exploration, it is tasked with computing near-optimal policies under for $\mathcal{M}$ for a collection of given reward functions. This framework is particularly suitable when there are many reward functions of interest, or when the reward function is shaped by an external agent to elicit desired behavior. We give an efficient algorithm that conducts $\tilde{\mathcal{O}}(S^2A\mathrm{poly}(H)/\epsilon^2)$ episodes of exploration and returns $\epsilon$-suboptimal policies for an arbitrary number of reward functions. We achieve this by finding exploratory policies that visit each “significant” state with probability proportional to its maximum visitation probability under any possible policy. Moreover, our planning procedure can be instantiated by any black-box approximate planner, such as value iteration or natural policy gradient. We also give a nearly-matching $\Omega(S^2AH^2/\epsilon^2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting. |
Tasks | |
Published | 2020-02-07 |
URL | https://arxiv.org/abs/2002.02794v1 |
https://arxiv.org/pdf/2002.02794v1.pdf | |
PWC | https://paperswithcode.com/paper/reward-free-exploration-for-reinforcement |
Repo | |
Framework | |
Is Local SGD Better than Minibatch SGD?
Title | Is Local SGD Better than Minibatch SGD? |
Authors | Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro |
Abstract | We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee. |
Tasks | Distributed Optimization |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07839v1 |
https://arxiv.org/pdf/2002.07839v1.pdf | |
PWC | https://paperswithcode.com/paper/is-local-sgd-better-than-minibatch-sgd |
Repo | |
Framework | |
Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications
Title | Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications |
Authors | Chinthaka Gamanayake, Lahiru Jayasinghe, Benny Ng, Chau Yuen |
Abstract | Even though the Convolutional Neural Networks (CNN) has shown superior results in the field of computer vision, it is still a challenging task to implement computer vision algorithms in real-time at the edge, especially using a low-cost IoT device due to high memory consumption and computation complexities in a CNN. Network compression methodologies such as weight pruning, filter pruning, and quantization are used to overcome the above mentioned problem. Even though filter pruning methodology has shown better performances compared to other techniques, irregularity of the number of filters pruned across different layers of a CNN might not comply with majority of the neural computing hardware architectures. In this paper, a novel greedy approach called cluster pruning has been proposed, which provides a structured way of removing filters in a CNN by considering the importance of filters and the underlying hardware architecture. The proposed methodology is compared with the conventional filter pruning algorithm on Pascal-VOC open dataset, and Head-Counting dataset, which is our own dataset developed to detect and count people entering a room. We benchmark our proposed method on three hardware architectures, namely CPU, GPU, and Intel Movidius Neural Computer Stick (NCS) using the popular SSD-MobileNet and SSD-SqueezeNet neural network architectures used for edge-AI vision applications. Results demonstrate that our method outperforms the conventional filter pruning methodology, using both datasets on above mentioned hardware architectures. Furthermore, a low cost IoT hardware setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI application using our proposed pruning methodology. |
Tasks | Quantization |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02449v1 |
https://arxiv.org/pdf/2003.02449v1.pdf | |
PWC | https://paperswithcode.com/paper/cluster-pruning-an-efficient-filter-pruning |
Repo | |
Framework | |
A Survey on Deep Hashing Methods
Title | A Survey on Deep Hashing Methods |
Authors | Xiao Luo, Chong Chen, Huasong Zhong, Hao Zhang, Minghua Deng, Jianqiang Huang, Xiansheng Hua |
Abstract | Nearest neighbor search is to find the data points in the database such that the distances from them to the query are the smallest, which is a fundamental problem in various domains, such as computer vision, recommendation systems and machine learning. Hashing is one of the most widely used method for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional methods. In this paper, we present a comprehensive survey of the deep hashing algorithms. Based on the loss function, we categorize deep supervised hashing methods according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization. In addition, we also introduce some other topics such as deep unsupervised hashing and multi-modal deep hashing methods. Meanwhile, we also present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally, we discussed some potential research directions in the conclusion. |
Tasks | Quantization, Recommendation Systems |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.03369v1 |
https://arxiv.org/pdf/2003.03369v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-deep-hashing-methods |
Repo | |
Framework | |
A Common Operating Picture Framework Leveraging Data Fusion and Deep Learning
Title | A Common Operating Picture Framework Leveraging Data Fusion and Deep Learning |
Authors | Benjamin Ortiz, David Lindenbaum, Joseph Nassar, Brendan Lammers, John Wahl, Robert Mangum, Margaret Smith, Marc Bosch |
Abstract | Organizations are starting to realize of the combined power of data and data-driven algorithmic models to gain insights, situational awareness, and advance their mission. A common challenge to gaining insights is connecting inherently different datasets. These datasets (e.g. geocoded features, video streams, raw text, social network data, etc.) per separate they provide very narrow answers; however collectively they can provide new capabilities. In this work, we present a data fusion framework for accelerating solutions for Processing, Exploitation, and Dissemination (PED). Our platform is a collection of services that extract information from several data sources (per separate) by leveraging deep learning and other means of processing. This information is fused by a set of analytical engines that perform data correlations, searches, and other modeling operations to combine information from the disparate data sources. As a result, events of interest are detected, geolocated, logged, and presented into a common operating picture. This common operating picture allows the user to visualize in real time all the data sources, per separate and their collective cooperation. In addition, forensic activities have been implemented and made available through the framework. Users can review archived results and compare them to the most recent snapshot of the operational environment. In our first iteration we have focused on visual data (FMV, WAMI, CCTV/PTZ-Cameras, open source video, etc.) and AIS data streams (satellite and terrestrial sources). As a proof-of-concept, in our experiments we show how FMV detections can be combined with vessel tracking signals from AIS sources to confirm identity, tip-and-cue aerial reconnaissance, and monitor vessel activity in an area. |
Tasks | |
Published | 2020-01-16 |
URL | https://arxiv.org/abs/2001.05982v1 |
https://arxiv.org/pdf/2001.05982v1.pdf | |
PWC | https://paperswithcode.com/paper/a-common-operating-picture-framework |
Repo | |
Framework | |