Paper Group ANR 1029
The Visual Centrifuge: Model-Free Layered Video Representations. Transfer Learning of Artist Group Factors to Musical Genre Classification. Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks. GANs for Medical Image Analysis. Tensor Matched Kronecker-Structured Subspace Detection for Mis …
The Visual Centrifuge: Model-Free Layered Video Representations
Title | The Visual Centrifuge: Model-Free Layered Video Representations |
Authors | Jean-Baptiste Alayrac, João Carreira, Andrew Zisserman |
Abstract | True video understanding requires making sense of non-lambertian scenes where the color of light arriving at the camera sensor encodes information about not just the last object it collided with, but about multiple mediums – colored windows, dirty mirrors, smoke or rain. Layered video representations have the potential of accurately modelling realistic scenes but have so far required stringent assumptions on motion, lighting and shape. Here we propose a learning-based approach for multi-layered video representation: we introduce novel uncertainty-capturing 3D convolutional architectures and train them to separate blended videos. We show that these models then generalize to single videos, where they exhibit interesting abilities: color constancy, factoring out shadows and separating reflections. We present quantitative and qualitative results on real world videos. |
Tasks | Color Constancy, Video Understanding |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01461v2 |
http://arxiv.org/pdf/1812.01461v2.pdf | |
PWC | https://paperswithcode.com/paper/the-visual-centrifuge-model-free-layered |
Repo | |
Framework | |
Transfer Learning of Artist Group Factors to Musical Genre Classification
Title | Transfer Learning of Artist Group Factors to Musical Genre Classification |
Authors | Jaehun Kim, Minz Won, Xavier Serra, Cynthia C. S. Liem |
Abstract | The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy. Artist labels are less subjective and less noisy, while certain artists may relate more strongly to certain genres. At the same time, at prediction time, it is not guaranteed that artist labels are available for a given audio segment. Therefore, in this work, we propose to apply the transfer learning framework, learning artist-related information which will be used at inference time for genre classification. We consider different types of artist-related information, expressed through artist group factors, which will allow for more efficient learning and stronger robustness to potential label noise. Furthermore, we investigate how to achieve the highest validation accuracy on the given FMA dataset, by experimenting with various kinds of transfer methods, including single-task transfer, multi-task transfer and finally multi-task learning. |
Tasks | Multi-Task Learning, Transfer Learning |
Published | 2018-05-05 |
URL | http://arxiv.org/abs/1805.02043v2 |
http://arxiv.org/pdf/1805.02043v2.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-of-artist-group-factors-to |
Repo | |
Framework | |
Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks
Title | Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks |
Authors | Amy Jin, Serena Yeung, Jeffrey Jopling, Jonathan Krause, Dan Azagury, Arnold Milstein, Li Fei-Fei |
Abstract | Five billion people in the world lack access to quality surgical care. Surgeon skill varies dramatically, and many surgical patients suffer complications and avoidable harm. Improving surgical training and feedback would help to reduce the rate of complications, half of which have been shown to be preventable. To do this, it is essential to assess operative skill, a process that currently requires experts and is manual, time consuming, and subjective. In this work, we introduce an approach to automatically assess surgeon performance by tracking and analyzing tool movements in surgical videos, leveraging region-based convolutional neural networks. In order to study this problem, we also introduce a new dataset, m2cai16-tool-locations, which extends the m2cai16-tool dataset with spatial bounds of tools. While previous methods have addressed tool presence detection, ours is the first to not only detect presence but also spatially localize surgical tools in real-world laparoscopic surgical videos. We show that our method both effectively detects the spatial bounds of tools as well as significantly outperforms existing methods on tool presence detection. We further demonstrate the ability of our method to assess surgical quality through analysis of tool usage patterns, movement range, and economy of motion. |
Tasks | |
Published | 2018-02-24 |
URL | http://arxiv.org/abs/1802.08774v2 |
http://arxiv.org/pdf/1802.08774v2.pdf | |
PWC | https://paperswithcode.com/paper/tool-detection-and-operative-skill-assessment |
Repo | |
Framework | |
GANs for Medical Image Analysis
Title | GANs for Medical Image Analysis |
Authors | Salome Kazeminia, Christoph Baur, Arjan Kuijper, Bram van Ginneken, Nassir Navab, Shadi Albarqouni, Anirban Mukhopadhyay |
Abstract | Generative Adversarial Networks (GANs) and their extensions have carved open many exciting ways to tackle well known and challenging medical image analysis problems such as medical image de-noising, reconstruction, segmentation, data simulation, detection or classification. Furthermore, their ability to synthesize images at unprecedented levels of realism also gives hope that the chronic scarcity of labeled data in the medical field can be resolved with the help of these generative models. In this review paper, a broad overview of recent literature on GANs for medical applications is given, the shortcomings and opportunities of the proposed methods are thoroughly discussed and potential future work is elaborated. We review the most relevant papers published until the submission date. For quick access, important details such as the underlying method, datasets and performance are tabulated. An interactive visualization which categorizes all papers to keep the review alive, is available at http://livingreview.in.tum.de/GANs_for_Medical_Applications. |
Tasks | |
Published | 2018-09-13 |
URL | https://arxiv.org/abs/1809.06222v3 |
https://arxiv.org/pdf/1809.06222v3.pdf | |
PWC | https://paperswithcode.com/paper/gans-for-medical-image-analysis |
Repo | |
Framework | |
Tensor Matched Kronecker-Structured Subspace Detection for Missing Information
Title | Tensor Matched Kronecker-Structured Subspace Detection for Missing Information |
Authors | Ishan Jindal, Matthew Nokleby |
Abstract | We consider the problem of detecting whether a tensor signal having many missing entities lies within a given low dimensional Kronecker-Structured (KS) subspace. This is a matched subspace detection problem. Tensor matched subspace detection problem is more challenging because of the intertwined signal dimensions. We solve this problem by projecting the signal onto the Kronecker structured subspace, which is a Kronecker product of different subspaces corresponding to each signal dimension. Under this framework, we define the KS subspaces and the orthogonal projection of the signal onto the KS subspace. We prove that reliable detection is possible as long as the cardinality of the missing signal is greater than the dimensions of the KS subspace by bounding the residual energy of the sampling signal with high probability. |
Tasks | |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.10957v1 |
http://arxiv.org/pdf/1810.10957v1.pdf | |
PWC | https://paperswithcode.com/paper/tensor-matched-kronecker-structured-subspace |
Repo | |
Framework | |
Model-Based Learning of Turbulent Flows using a Mobile Robot
Title | Model-Based Learning of Turbulent Flows using a Mobile Robot |
Authors | Reza Khodayi-mehr, Michael M. Zavlanos |
Abstract | We consider the problem of model-based learning of turbulent flows using mobile robots. Specifically, we use empirical data to improve on numerical solutions obtained from Reynolds-Averaged Navier Stokes (RANS) models. RANS models are computationally efficient but rely on assumptions that require experimental validation. Here we construct statistical models of the flow properties using Gaussian processes (GPs) and rely on the numerical solutions to inform their mean. Utilizing Bayesian inference, we incorporate measurements of the time-averaged velocity and turbulent intensity into these GPs. We account for model ambiguity and parameter uncertainty, via Bayesian model selection, and for measurement noise by systematically incorporating it in the GPs. To collect the measurements, we control a custom-built mobile robot through a sequence of waypoints that maximize the information content of the measurements. The end result is a posterior distribution of the flow field that better approximates the real flow and quantifies the uncertainty in its properties. We experimentally demonstrate a considerable improvement in the prediction of these properties compared to numerical solutions. |
Tasks | Bayesian Inference, Gaussian Processes, Model Selection |
Published | 2018-12-10 |
URL | https://arxiv.org/abs/1812.03894v3 |
https://arxiv.org/pdf/1812.03894v3.pdf | |
PWC | https://paperswithcode.com/paper/model-based-learning-of-turbulent-flows-using |
Repo | |
Framework | |
Recursive Optimization of Convex Risk Measures: Mean-Semideviation Models
Title | Recursive Optimization of Convex Risk Measures: Mean-Semideviation Models |
Authors | Dionysios S. Kalogerias, Warren B. Powell |
Abstract | We develop recursive, data-driven, stochastic subgradient methods for optimizing a new, versatile, and application-driven class of convex risk measures, termed here as mean-semideviations, strictly generalizing the well-known and popular mean-upper-semideviation. We introduce the MESSAGEp algorithm, which is an efficient compositional subgradient procedure for iteratively solving convex mean-semideviation risk-averse problems to optimality. We analyze the asymptotic behavior of the MESSAGEp algorithm under a flexible and structure-exploiting set of problem assumptions. In particular: 1) Under appropriate stepsize rules, we establish pathwise convergence of the MESSAGEp algorithm in a strong technical sense, confirming its asymptotic consistency. 2) Assuming a strongly convex cost, we show that, for fixed semideviation order $p>1$ and for $\epsilon\in\left[0,1\right)$, the MESSAGEp algorithm achieves a squared-${\cal L}{2}$ solution suboptimality rate of the order of ${\cal O}(n^{-\left(1-\epsilon\right)/2})$ iterations, where, for $\epsilon>0$, pathwise convergence is simultaneously guaranteed. This result establishes a rate of order arbitrarily close to ${\cal O}(n^{-1/2})$, while ensuring strongly stable pathwise operation. For $p\equiv1$, the rate order improves to ${\cal O}(n^{-2/3})$, which also suffices for pathwise convergence, and matches previous results. 3) Likewise, in the general case of a convex cost, we show that, for any $\epsilon\in\left[0,1\right)$, the MESSAGEp algorithm with iterate smoothing achieves an ${\cal L}{1}$ objective suboptimality rate of the order of ${\cal O}(n^{-\left(1-\epsilon\right)/\left(4\bf{1}_{\left{ p>1\right} }+4\right)})$ iterations. This result provides maximal rates of ${\cal O}(n^{-1/4})$, if $p\equiv1$, and ${\cal O}(n^{-1/8})$, if $p>1$, matching the state of the art, as well. |
Tasks | |
Published | 2018-04-02 |
URL | http://arxiv.org/abs/1804.00636v5 |
http://arxiv.org/pdf/1804.00636v5.pdf | |
PWC | https://paperswithcode.com/paper/recursive-optimization-of-convex-risk |
Repo | |
Framework | |
RealPoint3D: Point Cloud Generation from a Single Image with Complex Background
Title | RealPoint3D: Point Cloud Generation from a Single Image with Complex Background |
Authors | Yan Xia, Yang Zhang, Dingfu Zhou, Xinyu Huang, Cheng Wang, Ruigang Yang |
Abstract | 3D point cloud generation by the deep neural network from a single image has been attracting more and more researchers’ attention. However, recently-proposed methods require the objects be captured with relatively clean backgrounds, fixed viewpoint, while this highly limits its application in the real environment. To overcome these drawbacks, we proposed to integrate the prior 3D shape knowledge into the network to guide the 3D generation. By taking additional 3D information, the proposed network can handle the 3D object generation from a single real image captured from any viewpoint and complex background. Specifically, giving a query image, we retrieve the nearest shape model from a pre-prepared 3D model database. Then, the image together with the retrieved shape model is fed into the proposed network to generate the fine-grained 3D point cloud. The effectiveness of our proposed framework has been verified on different kinds of datasets. Experimental results show that the proposed framework achieves state-of-the-art accuracy compared to other volumetric-based and point set generation methods. Furthermore, the proposed framework works well for real images in complex backgrounds with various view angles. |
Tasks | Point Cloud Generation |
Published | 2018-09-08 |
URL | http://arxiv.org/abs/1809.02743v1 |
http://arxiv.org/pdf/1809.02743v1.pdf | |
PWC | https://paperswithcode.com/paper/realpoint3d-point-cloud-generation-from-a |
Repo | |
Framework | |
Gradient Boosting With Piece-Wise Linear Regression Trees
Title | Gradient Boosting With Piece-Wise Linear Regression Trees |
Authors | Yu Shi, Jian Li, Zhize Li |
Abstract | Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across a variety of applications. Recently, several variants of GBDT training algorithms and implementations have been designed and heavily optimized in some very popular open sourced toolkits including XGBoost, LightGBM and CatBoost. In this paper, we show that both the accuracy and efficiency of GBDT can be further enhanced by using more complex base learners. Specifically, we extend gradient boosting to use piecewise linear regression trees (PL Trees), instead of piecewise constant regression trees, as base learners. We show that PL Trees can accelerate convergence of GBDT and improve the accuracy. We also propose some optimization tricks to substantially reduce the training time of PL Trees, with little sacrifice of accuracy. Moreover, we propose several implementation techniques to speedup our algorithm on modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism. The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time. |
Tasks | |
Published | 2018-02-15 |
URL | https://arxiv.org/abs/1802.05640v3 |
https://arxiv.org/pdf/1802.05640v3.pdf | |
PWC | https://paperswithcode.com/paper/gradient-boosting-with-piece-wise-linear |
Repo | |
Framework | |
S4ND: Single-Shot Single-Scale Lung Nodule Detection
Title | S4ND: Single-Shot Single-Scale Lung Nodule Detection |
Authors | Naji Khosravan, Ulas Bagci |
Abstract | The state of the art lung nodule detection studies rely on computationally expensive multi-stage frameworks to detect nodules from CT scans. To address this computational challenge and provide better performance, in this paper we propose S4ND, a new deep learning based method for lung nodule detection. Our approach uses a single feed forward pass of a single network for detection and provides better performance when compared to the current literature. The whole detection pipeline is designed as a single $3D$ Convolutional Neural Network (CNN) with dense connections, trained in an end-to-end manner. S4ND does not require any further post-processing or user guidance to refine detection results. Experimentally, we compared our network with the current state-of-the-art object detection network (SSD) in computer vision as well as the state-of-the-art published method for lung nodule detection (3D DCNN). We used publically available $888$ CT scans from LUNA challenge dataset and showed that the proposed method outperforms the current literature both in terms of efficiency and accuracy by achieving an average FROC-score of $0.897$. We also provide an in-depth analysis of our proposed network to shed light on the unclear paradigms of tiny object detection. |
Tasks | Lung Nodule Detection, Object Detection |
Published | 2018-05-06 |
URL | http://arxiv.org/abs/1805.02279v2 |
http://arxiv.org/pdf/1805.02279v2.pdf | |
PWC | https://paperswithcode.com/paper/s4nd-single-shot-single-scale-lung-nodule |
Repo | |
Framework | |
A family of OWA operators based on Faulhaber’s formulas
Title | A family of OWA operators based on Faulhaber’s formulas |
Authors | Oscar Duarte, Sandra Téllez |
Abstract | In this paper we develop a new family of Ordered Weighted Averaging (OWA) operators. Weight vector is obtained from a desired orness of the operator. Using Faulhaber’s formulas we obtain direct and simple expressions for the weight vector without any iteration loop. With the exception of one weight, the remaining follow a straight line relation. As a result, a fast and robust algorithm is developed. The resulting weight vector is suboptimal according with the Maximum Entropy criterion, but it is very close to the optimal. Comparisons are done with other procedures. |
Tasks | |
Published | 2018-01-31 |
URL | http://arxiv.org/abs/1801.10545v1 |
http://arxiv.org/pdf/1801.10545v1.pdf | |
PWC | https://paperswithcode.com/paper/a-family-of-owa-operators-based-on-faulhabers |
Repo | |
Framework | |
Towards A Unified Analysis of Random Fourier Features
Title | Towards A Unified Analysis of Random Fourier Features |
Authors | Zhu Li, Jean-Francois Ton, Dino Oglic, Dino Sejdinovic |
Abstract | Random Fourier features is a widely used, simple, and effective technique for scaling up kernel methods. The existing theoretical analysis of the approach, however, remains focused on specific learning tasks and typically gives pessimistic bounds which are at odds with the empirical results. We tackle these problems and provide the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions. In our bounds, the trade-off between the computational cost and the expected risk convergence rate is problem specific and expressed in terms of the regularization parameter and the \emph{number of effective degrees of freedom}. We study both the standard random Fourier features method for which we improve the existing bounds on the number of features required to guarantee the corresponding minimax risk convergence rate of kernel ridge regression, as well as a data-dependent modification which samples features proportional to \emph{ridge leverage scores} and further reduces the required number of features. As ridge leverage scores are expensive to compute, we devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency. |
Tasks | |
Published | 2018-06-24 |
URL | https://arxiv.org/abs/1806.09178v4 |
https://arxiv.org/pdf/1806.09178v4.pdf | |
PWC | https://paperswithcode.com/paper/towards-a-unified-analysis-of-random-fourier |
Repo | |
Framework | |
To understand deep learning we need to understand kernel learning
Title | To understand deep learning we need to understand kernel learning |
Authors | Mikhail Belkin, Siyuan Ma, Soumik Mandal |
Abstract | Generalization performance of classifiers in deep learning has recently become a subject of intense study. Deep models, typically over-parametrized, tend to fit the training data exactly. Despite this “overfitting”, they perform well on test data, a phenomenon not yet fully understood. The first point of our paper is that strong performance of overfitted classifiers is not a unique feature of deep learning. Using six real-world and two synthetic datasets, we establish experimentally that kernel machines trained to have zero classification or near zero regression error perform very well on test data, even when the labels are corrupted with a high level of noise. We proceed to give a lower bound on the norm of zero loss solutions for smooth kernels, showing that they increase nearly exponentially with data size. We point out that this is difficult to reconcile with the existing generalization bounds. Moreover, none of the bounds produce non-trivial results for interpolating solutions. Second, we show experimentally that (non-smooth) Laplacian kernels easily fit random labels, a finding that parallels results for ReLU neural networks. In contrast, fitting noisy data requires many more epochs for smooth Gaussian kernels. Similar performance of overfitted Laplacian and Gaussian classifiers on test, suggests that generalization is tied to the properties of the kernel function rather than the optimization process. Certain key phenomena of deep learning are manifested similarly in kernel methods in the modern “overfitted” regime. The combination of the experimental and theoretical results presented in this paper indicates a need for new theoretical ideas for understanding properties of classical kernel methods. We argue that progress on understanding deep learning will be difficult until more tractable “shallow” kernel methods are better understood. |
Tasks | |
Published | 2018-02-05 |
URL | http://arxiv.org/abs/1802.01396v3 |
http://arxiv.org/pdf/1802.01396v3.pdf | |
PWC | https://paperswithcode.com/paper/to-understand-deep-learning-we-need-to |
Repo | |
Framework | |
DenseRAN for Offline Handwritten Chinese Character Recognition
Title | DenseRAN for Offline Handwritten Chinese Character Recognition |
Authors | Wenchao Wang, Jianshu Zhang, Jun Du, Zi-Rui Wang, Yixing Zhu |
Abstract | Recently, great success has been achieved in offline handwritten Chinese character recognition by using deep learning methods. Chinese characters are mainly logographic and consist of basic radicals, however, previous research mostly treated each Chinese character as a whole without explicitly considering its internal two-dimensional structure and radicals. In this study, we propose a novel radical analysis network with densely connected architecture (DenseRAN) to analyze Chinese character radicals and its two-dimensional structures simultaneously. DenseRAN first encodes input image to high-level visual features by employing DenseNet as an encoder. Then a decoder based on recurrent neural networks is employed, aiming at generating captions of Chinese characters by detecting radicals and two-dimensional structures through attention mechanism. The manner of treating a Chinese character as a composition of two-dimensional structures and radicals can reduce the size of vocabulary and enable DenseRAN to possess the capability of recognizing unseen Chinese character classes, only if the corresponding radicals have been seen in training set. Evaluated on ICDAR-2013 competition database, the proposed approach significantly outperforms whole-character modeling approach with a relative character error rate (CER) reduction of 18.54%. Meanwhile, for the case of recognizing 3277 unseen Chinese characters in CASIA-HWDB1.2 database, DenseRAN can achieve a character accuracy of about 41% while the traditional whole-character method has no capability to handle them. |
Tasks | Offline Handwritten Chinese Character Recognition |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04134v1 |
http://arxiv.org/pdf/1808.04134v1.pdf | |
PWC | https://paperswithcode.com/paper/denseran-for-offline-handwritten-chinese |
Repo | |
Framework | |
The EcoLexicon English Corpus as an open corpus in Sketch Engine
Title | The EcoLexicon English Corpus as an open corpus in Sketch Engine |
Authors | Pilar Leon-Arauz, Antonio San Martin, Arianne Reimerink |
Abstract | The EcoLexicon English Corpus (EEC) is a 23.1-million-word corpus of contemporary environmental texts. It was compiled by the LexiCon research group for the development of EcoLexicon (Faber, Leon-Arauz & Reimerink 2016; San Martin et al. 2017), a terminological knowledge base on the environment. It is available as an open corpus in the well-known corpus query system Sketch Engine (Kilgarriff et al. 2014), which means that any user, even without a subscription, can freely access and query the corpus. In this paper, the EEC is introduced by de- scribing how it was built and compiled and how it can be queried and exploited, based both on the functionalities provided by Sketch Engine and on the parameters in which the texts in the EEC are classified. |
Tasks | |
Published | 2018-07-16 |
URL | http://arxiv.org/abs/1807.05797v1 |
http://arxiv.org/pdf/1807.05797v1.pdf | |
PWC | https://paperswithcode.com/paper/the-ecolexicon-english-corpus-as-an-open |
Repo | |
Framework | |