January 25, 2020

3165 words 15 mins read

Paper Group ANR 1693

Paper Group ANR 1693

Interpretable Multiple-Kernel Prototype Learning for Discriminative Representation and Feature Selection. Ensemble Learning based Convexification of Power Flow with Application in OPF. MetaFusion: Controlled False-Negative Reduction of Minority Classes in Semantic Segmentation. Attention-Based Face AntiSpoofing of RGB Images, using a Minimal End-2- …

Interpretable Multiple-Kernel Prototype Learning for Discriminative Representation and Feature Selection

Title Interpretable Multiple-Kernel Prototype Learning for Discriminative Representation and Feature Selection
Authors Babak Hosseini, Barbara Hammer
Abstract Prototype-based methods are of the particular interest for domain specialists and practitioners as they summarize a dataset by a small set of representatives. Therefore, in a classification setting, interpretability of the prototypes is as significant as the prediction accuracy of the algorithm. Nevertheless, the state-of-the-art methods make inefficient trade-offs between these concerns by sacrificing one in favor of the other, especially if the given data has a kernel-based representation. In this paper, we propose a novel interpretable multiple-kernel prototype learning (IMKPL) to construct highly interpretable prototypes in the feature space, which are also efficient for the discriminative representation of the data. Our method focuses on the local discrimination of the classes in the feature space and shaping the prototypes based on condensed class-homogeneous neighborhoods of data. Besides, IMKPL learns a combined embedding in the feature space in which the above objectives are better fulfilled. When the base kernels coincide with the data dimensions, this embedding results in a discriminative features selection. We evaluate IMKPL on several benchmarks from different domains which demonstrate its superiority to the related state-of-the-art methods regarding both interpretability and discriminative representation.
Tasks Feature Selection
Published 2019-11-10
URL https://arxiv.org/abs/1911.03949v1
PDF https://arxiv.org/pdf/1911.03949v1.pdf
PWC https://paperswithcode.com/paper/interpretable-multiple-kernel-prototype
Repo
Framework

Ensemble Learning based Convexification of Power Flow with Application in OPF

Title Ensemble Learning based Convexification of Power Flow with Application in OPF
Authors Ren Hu, Qifeng Li
Abstract This paper proposes an ensemble learning based approach for convexifying AC power flow equations, which differs from the existing relaxation-based convexification techniques. The proposed approach is based on the quadratic power flow equations in rectangular coordinates. To develop this data-driven convex model of power flow, the polynomial regression (PR) is first deployed as a basic learner to fit convex relationships between the independent and dependent variables. Then, ensemble learning algorithms, i.e. gradient boosting (GB) and bagging, are introduced to combine learners to boost model performance. Based on the learned convex models of power flow, optimal power flow (OPF) is formulated as a convex quadratic programming problem. The simulation results on IEEE standard cases illustrate that, 1) GB outperforms PR and bagging on the prediction accuracy, 2) in context of solving OPF, the proposed data-driven convex model outperforms the conventional SDP relaxation in both accuracy and computational efficiency.
Tasks
Published 2019-09-12
URL https://arxiv.org/abs/1909.05748v1
PDF https://arxiv.org/pdf/1909.05748v1.pdf
PWC https://paperswithcode.com/paper/ensemble-learning-based-convexification-of
Repo
Framework

MetaFusion: Controlled False-Negative Reduction of Minority Classes in Semantic Segmentation

Title MetaFusion: Controlled False-Negative Reduction of Minority Classes in Semantic Segmentation
Authors Robin Chan, Matthias Rottmann, Fabian Hüger, Peter Schlicht, Hanno Gottschalk
Abstract In semantic segmentation datasets, classes of high importance are oftentimes underrepresented, e.g., humans in street scenes. Neural networks are usually trained to reduce the overall number of errors, attaching identical loss to errors of all kinds. However, this is not necessarily aligned with human intuition. For instance, an overlooked pedestrian seems more severe than an incorrectly detected one. One possible remedy is to deploy different decision rules by introducing class priors which assigns larger weight to underrepresented classes. While reducing the false-negatives of the underrepresented class, at the same time this leads to a considerable increase of false-positive indications. In this work, we combine decision rules with methods for false-positive detection. We therefore fuse false-negative detection with uncertainty based false-positive meta classification. We present proof-of-concept results for CIFAR-10, and prove the efficiency of our method for the semantic segmentation of street scenes on the Cityscapes dataset based on predicted instances of the ‘human’ class. In the latter we employ an advanced false-positive detection method using uncertainty measures aggregated over instances. We thereby achieve improved trade-offs between false-negative and false-positive samples of the underrepresented classes.
Tasks Semantic Segmentation
Published 2019-12-16
URL https://arxiv.org/abs/1912.07420v1
PDF https://arxiv.org/pdf/1912.07420v1.pdf
PWC https://paperswithcode.com/paper/metafusion-controlled-false-negative
Repo
Framework

Attention-Based Face AntiSpoofing of RGB Images, using a Minimal End-2-End Neural Network

Title Attention-Based Face AntiSpoofing of RGB Images, using a Minimal End-2-End Neural Network
Authors Ali Ghofrani, Rahil Mahdian Toroghi, Seyed Mojtaba Tabatabaie
Abstract Face anti-spoofing aims at identifying the real face, as well as the fake one, and gains a high attention in security-sensitive applications, liveness detection, fingerprinting, and so on. In this paper, we address the anti-spoofing problem by proposing two end-to-end systems of convolutional neural networks. One model is developed based on the EfficientNet B0 network which has been modified in the final dense layers. The second one, is a very light model of the MobileNet V2, which has been contracted, modified and retrained efficiently on the data being created based on the Rose-Youtu dataset, for this purpose. The experiments show that, both of the proposed architectures achieve remarkable results on detecting the real and fake images of the face input data. The experiments clearly show that the heavy-weight model could be efficiently employed in server-side implementations, whereas the low-weight model could be easily implemented on the hand-held devices and both perform perfectly well using merely RGB input images.
Tasks Face Anti-Spoofing
Published 2019-12-18
URL https://arxiv.org/abs/1912.08870v1
PDF https://arxiv.org/pdf/1912.08870v1.pdf
PWC https://paperswithcode.com/paper/attention-based-face-antispoofing-of-rgb
Repo
Framework

Automated Architecture Design for Deep Neural Networks

Title Automated Architecture Design for Deep Neural Networks
Authors Steven Abreu
Abstract Machine learning has made tremendous progress in recent years and received large amounts of public attention. Though we are still far from designing a full artificially intelligent agent, machine learning has brought us many applications in which computers solve human learning tasks remarkably well. Much of this progress comes from a recent trend within machine learning, called deep learning. Deep learning models are responsible for many state-of-the-art applications of machine learning. Despite their success, deep learning models are hard to train, very difficult to understand, and often times so complex that training is only possible on very large GPU clusters. Lots of work has been done on enabling neural networks to learn efficiently. However, the design and architecture of such neural networks is often done manually through trial and error and expert knowledge. This thesis inspects different approaches, existing and novel, to automate the design of deep feedforward neural networks in an attempt to create less complex models with good performance that take away the burden of deciding on an architecture and make it more efficient to design and train such deep networks.
Tasks
Published 2019-08-22
URL https://arxiv.org/abs/1908.10714v1
PDF https://arxiv.org/pdf/1908.10714v1.pdf
PWC https://paperswithcode.com/paper/automated-architecture-design-for-deep-neural
Repo
Framework

Static and Dynamic Fusion for Multi-modal Cross-ethnicity Face Anti-spoofing

Title Static and Dynamic Fusion for Multi-modal Cross-ethnicity Face Anti-spoofing
Authors Ajian Liu, Zichang Tan, Xuan Li, Jun Wan, Sergio Escalera, Guodong Guo, Stan Z. Li
Abstract Regardless of the usage of deep learning and handcrafted methods, the dynamic information from videos and the effect of cross-ethnicity are rarely considered in face anti-spoofing. In this work, we propose a static-dynamic fusion mechanism for multi-modal face anti-spoofing. Inspired by motion divergences between real and fake faces, we incorporate the dynamic image calculated by rank pooling with static information into a conventional neural network (CNN) for each modality (i.e., RGB, Depth and infrared (IR)). Then, we develop a partially shared fusion method to learn complementary information from multiple modalities. Furthermore, in order to study the generalization capability of the proposal in terms of cross-ethnicity attacks and unknown spoofs, we introduce the largest public cross-ethnicity Face Anti-spoofing (CASIA-CeFA) dataset, covering 3 ethnicities, 3 modalities, 1607 subjects, and 2D plus 3D attack types. Experiments demonstrate that the proposed method achieves state-of-the-art results on CASIA-CeFA, CASIA-SURF, OULU-NPU and SiW.
Tasks Face Anti-Spoofing
Published 2019-12-05
URL https://arxiv.org/abs/1912.02340v2
PDF https://arxiv.org/pdf/1912.02340v2.pdf
PWC https://paperswithcode.com/paper/static-and-dynamic-fusion-for-multi-modal
Repo
Framework

Real-Time Semantic Stereo Matching

Title Real-Time Semantic Stereo Matching
Authors Pier Luigi Dovesi, Matteo Poggi, Lorenzo Andraghetti, Miquel Martí, Hedvig Kjellström, Alessandro Pieropan, Stefano Mattoccia
Abstract Scene understanding is paramount in robotics, self-navigation, augmented reality, and many other fields. To fully accomplish this task, an autonomous agent has to infer the 3D structure of the sensed scene (to know where it looks at) and its content (to know what it sees). To tackle the two tasks, deep neural networks trained to infer semantic segmentation and depth from stereo images are often the preferred choices. Specifically, Semantic Stereo Matching can be tackled by either standalone models trained for the two tasks independently or joint end-to-end architectures. Nonetheless, as proposed so far, both solutions are inefficient because requiring two forward passes in the former case or due to the complexity of a single network in the latter, although jointly tackling both tasks is usually beneficial in terms of accuracy. In this paper, we propose a single compact and lightweight architecture for real-time semantic stereo matching. Our framework relies on coarse-to-fine estimations in a multi-stage fashion, allowing: i) very fast inference even on embedded devices, with marginal drops in accuracy, compared to state-of-the-art networks, ii) trade accuracy for speed, according to the specific application requirements. Experimental results on high-end GPUs as well as on an embedded Jetson TX2 confirm the superiority of semantic stereo matching compared to standalone tasks and highlight the versatility of our framework on any hardware and for any application.
Tasks Scene Understanding, Semantic Segmentation, Stereo Matching
Published 2019-10-01
URL https://arxiv.org/abs/1910.00541v2
PDF https://arxiv.org/pdf/1910.00541v2.pdf
PWC https://paperswithcode.com/paper/real-time-semantic-stereo-matching
Repo
Framework

Learning Multi-dimensional Indexes

Title Learning Multi-dimensional Indexes
Authors Vikram Nathan, Jialin Ding, Mohammad Alizadeh, Tim Kraska
Abstract Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines. To optimize the performance of these operations, databases often create clustered indexes over a single dimension or multi-dimensional indexes such as R-trees, or use complex sort orders (e.g., Z-ordering). However, these schemes are often hard to tune and their performance is inconsistent across different datasets and queries. In this paper, we introduce Flood, a multi-dimensional in-memory index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage. Flood achieves up to three orders of magnitude faster performance for range scans with predicates than state-of-the-art multi-dimensional indexes or sort orders on real-world datasets and workloads. Our work serves as a building block towards an end-to-end learned database system.
Tasks
Published 2019-12-03
URL https://arxiv.org/abs/1912.01668v1
PDF https://arxiv.org/pdf/1912.01668v1.pdf
PWC https://paperswithcode.com/paper/learning-multi-dimensional-indexes
Repo
Framework

Food Recommendation: Framework, Existing Solutions and Challenges

Title Food Recommendation: Framework, Existing Solutions and Challenges
Authors Weiqing Min, Shuqiang Jiang, Ramesh Jain
Abstract A growing proportion of the global population is becoming overweight or obese, leading to various diseases (e.g., diabetes, ischemic heart disease and even cancer) due to unhealthy eating patterns, such as increased intake of food with high energy and high fat. Food recommendation is of paramount importance to alleviate this problem. Unfortunately, modern multimedia research has enhanced the performance and experience of multimedia recommendation in many fields such as movies and POI, yet largely lags in the food domain. This article proposes a unified framework for food recommendation, and identifies main issues affecting food recommendation including building the personal model, analyzing unique food characteristics, incorporating various context and domain knowledge. We then review existing solutions for these issues, and finally elaborate research challenges and future directions in this field. To our knowledge, this is the first survey that targets the study of food recommendation in the multimedia field and offers a collection of research studies and technologies to benefit researchers in this field.
Tasks
Published 2019-05-15
URL https://arxiv.org/abs/1905.06269v2
PDF https://arxiv.org/pdf/1905.06269v2.pdf
PWC https://paperswithcode.com/paper/food-recommendation-framework-existing
Repo
Framework

A Perspective on Objects and Systematic Generalization in Model-Based RL

Title A Perspective on Objects and Systematic Generalization in Model-Based RL
Authors Sjoerd van Steenkiste, Klaus Greff, Jürgen Schmidhuber
Abstract In order to meet the diverse challenges in solving many real-world problems, an intelligent agent has to be able to dynamically construct a model of its environment. Objects facilitate the modular reuse of prior knowledge and the combinatorial construction of such models. In this work, we argue that dynamically bound features (objects) do not simply emerge in connectionist models of the world. We identify several requirements that need to be fulfilled in overcoming this limitation and highlight corresponding inductive biases.
Tasks
Published 2019-06-03
URL https://arxiv.org/abs/1906.01035v1
PDF https://arxiv.org/pdf/1906.01035v1.pdf
PWC https://paperswithcode.com/paper/a-perspective-on-objects-and-systematic
Repo
Framework

Novel evaluation of surgical activity recognition models using task-based efficiency metrics

Title Novel evaluation of surgical activity recognition models using task-based efficiency metrics
Authors Aneeq Zia, Liheng Guo, Linlin Zhou, Irfan Essa, Anthony Jarc
Abstract Purpose: Surgical task-based metrics (rather than entire procedure metrics) can be used to improve surgeon training and, ultimately, patient care through focused training interventions. Machine learning models to automatically recognize individual tasks or activities are needed to overcome the otherwise manual effort of video review. Traditionally, these models have been evaluated using frame-level accuracy. Here, we propose evaluating surgical activity recognition models by their effect on task-based efficiency metrics. In this way, we can determine when models have achieved adequate performance for providing surgeon feedback via metrics from individual tasks. Methods: We propose a new CNN-LSTM model, RP-Net-V2, to recognize the 12 steps of robotic-assisted radical prostatectomies (RARP). We evaluated our model both in terms of conventional methods (e.g. Jaccard Index, task boundary accuracy) as well as novel ways, such as the accuracy of efficiency metrics computed from instrument movements and system events. Results: Our proposed model achieves a Jaccard Index of 0.85 thereby outperforming previous models on robotic-assisted radical prostatectomies. Additionally, we show that metrics computed from tasks automatically identified using RP-Net-V2 correlate well with metrics from tasks labeled by clinical experts. Conclusions: We demonstrate that metrics-based evaluation of surgical activity recognition models is a viable approach to determine when models can be used to quantify surgical efficiencies. We believe this approach and our results illustrate the potential for fully automated, post-operative efficiency reports.
Tasks Activity Recognition
Published 2019-07-03
URL https://arxiv.org/abs/1907.02060v1
PDF https://arxiv.org/pdf/1907.02060v1.pdf
PWC https://paperswithcode.com/paper/novel-evaluation-of-surgical-activity
Repo
Framework

CloudSegNet: A Deep Network for Nychthemeron Cloud Image Segmentation

Title CloudSegNet: A Deep Network for Nychthemeron Cloud Image Segmentation
Authors Soumyabrata Dev, Atul Nautiyal, Yee Hui Lee, Stefan Winkler
Abstract We analyze clouds in the earth’s atmosphere using ground-based sky cameras. An accurate segmentation of clouds in the captured sky/cloud image is difficult, owing to the fuzzy boundaries of clouds. Several techniques have been proposed that use color as the discriminatory feature for cloud detection. In the existing literature, however, analysis of daytime and nighttime images is considered separately, mainly because of differences in image characteristics and applications. In this paper, we propose a light-weight deep-learning architecture called CloudSegNet. It is the first that integrates daytime and nighttime (also known as nychthemeron) image segmentation in a single framework, and achieves state-of-the-art results on public databases.
Tasks Cloud Detection, Semantic Segmentation
Published 2019-04-16
URL http://arxiv.org/abs/1904.07979v1
PDF http://arxiv.org/pdf/1904.07979v1.pdf
PWC https://paperswithcode.com/paper/cloudsegnet-a-deep-network-for-nychthemeron
Repo
Framework

Knowledge Distillation for Incremental Learning in Semantic Segmentation

Title Knowledge Distillation for Incremental Learning in Semantic Segmentation
Authors Umberto Michieli, Pietro Zanuttigh
Abstract Deep learning architectures have shown remarkable results in scene understanding problems, however they exhibit a critical drop of performances when they are required to learn incrementally new tasks without forgetting old ones. This catastrophic forgetting phenomenon impacts on the deployment of artificial intelligence in real world scenarios where systems need to learn new and different representations over time. Current approaches for incremental learning deal only with image classification and object detection tasks, while in this work we formally introduce incremental learning for semantic segmentation. We tackle the problem applying various knowledge distillation techniques on the previous model. In this way, we retain the information about learned classes, whilst updating the current model to learn the new ones. We developed four main methodologies of knowledge distillation working on both output layers and internal feature representations. We do not store any image belonging to previous training stages and only the last model is used to preserve high accuracy on previously learned classes. Extensive experimental results on the Pascal VOC2012 and MSRC-v2 datasets show the effectiveness of the proposed approaches in several incremental learning scenarios.
Tasks Image Classification, Object Detection, Scene Understanding, Semantic Segmentation
Published 2019-11-08
URL https://arxiv.org/abs/1911.03462v3
PDF https://arxiv.org/pdf/1911.03462v3.pdf
PWC https://paperswithcode.com/paper/knowledge-distillation-for-incremental
Repo
Framework

Relevance Vector Machines for harmonization of MRI brain volumes using image descriptors

Title Relevance Vector Machines for harmonization of MRI brain volumes using image descriptors
Authors Maria Ines Meyer, Ezequiel de la Rosa, Koen Van Leemput, Diana M. Sima
Abstract With the increased need for multi-center magnetic resonance imaging studies, problems arise related to differences in hardware and software between centers. Namely, current algorithms for brain volume quantification are unreliable for the longitudinal assessment of volume changes in this type of setting. Currently most methods attempt to decrease this issue by regressing the scanner- and/or center-effects from the original data. In this work, we explore a novel approach to harmonize brain volume measurements by using only image descriptors. First, we explore the relationships between volumes and image descriptors. Then, we train a Relevance Vector Machine (RVM) model over a large multi-site dataset of healthy subjects to perform volume harmonization. Finally, we validate the method over two different datasets: i) a subset of unseen healthy controls; and ii) a test-retest dataset of multiple sclerosis (MS) patients. The method decreases scanner and center variability while preserving measurements that did not require correction in MS patient data. We show that image descriptors can be used as input to a machine learning algorithm to improve the reliability of longitudinal volumetric studies.
Tasks
Published 2019-11-08
URL https://arxiv.org/abs/1911.04289v1
PDF https://arxiv.org/pdf/1911.04289v1.pdf
PWC https://paperswithcode.com/paper/relevance-vector-machines-for-harmonization
Repo
Framework

AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks

Title AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Authors Jinrong Guo, Wantao Liu, Wang Wang, Qu Lu, Songlin Hu, Jizhong Han, Ruixuan Li
Abstract Typically, Ultra-deep neural network(UDNN) tends to yield high-quality model, but its training process is usually resource intensive and time-consuming. Modern GPU’s scarce DRAM capacity is the primary bottleneck that hinders the trainability and the training efficiency of UDNN. In this paper, we present “AccUDNN”, an accelerator that aims to make the utmost use of finite GPU memory resources to speed up the training process of UDNN. AccUDNN mainly includes two modules: memory optimizer and hyperparameter tuner. Memory optimizer develops a performance-model guided dynamic swap out/in strategy, by offloading appropriate data to host memory, GPU memory footprint can be significantly slashed to overcome the restriction of trainability of UDNN. After applying the memory optimization strategy, hyperparameter tuner is designed to explore the efficiency-optimal minibatch size and the matched learning rate. Evaluations demonstrate that AccUDNN cuts down the GPU memory requirement of ResNet-152 from more than 24GB to 8GB. In turn, given 12GB GPU memory budget, the efficiency-optimal minibatch size can reach 4.2x larger than original Caffe. Benefiting from better utilization of single GPU’s computing resources and fewer parameter synchronization of large minibatch size, 7.7x speed-up is achieved by 8 GPUs’ cluster without any communication optimization and no accuracy losses.
Tasks
Published 2019-01-21
URL https://arxiv.org/abs/1901.06773v2
PDF https://arxiv.org/pdf/1901.06773v2.pdf
PWC https://paperswithcode.com/paper/accudnn-a-gpu-memory-efficient-accelerator
Repo
Framework
comments powered by Disqus