Paper Group ANR 673
Detection of Moving Object in Dynamic Background Using Gaussian Max-Pooling and Segmentation Constrained RPCA. K-Means Clustering using Tabu Search with Quantized Means. Learning to select computations. RADNET: Radiologist Level Accuracy using Deep Learning for HEMORRHAGE detection in CT Scans. Video Object Detection with an Aligned Spatial-Tempora …
Detection of Moving Object in Dynamic Background Using Gaussian Max-Pooling and Segmentation Constrained RPCA
Title | Detection of Moving Object in Dynamic Background Using Gaussian Max-Pooling and Segmentation Constrained RPCA |
Authors | Yang Li, Guangcan Liu, Shengyong Chen |
Abstract | Due to its efficiency and stability, Robust Principal Component Analysis (RPCA) has been emerging as a promising tool for moving object detection. Unfortunately, existing RPCA based methods assume static or quasi-static background, and thereby they may have trouble in coping with the background scenes that exhibit a persistent dynamic behavior. In this work, we shall introduce two techniques to fill in the gap. First, instead of using the raw pixel-value as features that are brittle in the presence of dynamic background, we devise a so-called Gaussian max-pooling operator to estimate a “stable-value” for each pixel. Those stable-values are robust to various background changes and can therefore distinguish effectively the foreground objects from the background. Then, to obtain more accurate results, we further propose a Segmentation Constrained RPCA (SC-RPCA) model, which incorporates the temporal and spatial continuity in images into RPCA. The inference process of SC-RPCA is a group sparsity constrained nuclear norm minimization problem, which is convex and easy to solve. Experimental results on seven videos from the CDCNET 2014 database show the superior performance of the proposed method. |
Tasks | Object Detection |
Published | 2017-09-03 |
URL | http://arxiv.org/abs/1709.00657v1 |
http://arxiv.org/pdf/1709.00657v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-moving-object-in-dynamic |
Repo | |
Framework | |
K-Means Clustering using Tabu Search with Quantized Means
Title | K-Means Clustering using Tabu Search with Quantized Means |
Authors | Kojo Sarfo Gyamfi, James Brusey, Andrew Hunt |
Abstract | The Tabu Search (TS) metaheuristic has been proposed for K-Means clustering as an alternative to Lloyd’s algorithm, which for all its ease of implementation and fast runtime, has the major drawback of being trapped at local optima. While the TS approach can yield superior performance, it involves a high computational complexity. Moreover, the difficulty in parameter selection in the existing TS approach does not make it any more attractive. This paper presents an alternative, low-complexity formulation of the TS optimization procedure for K-Means clustering. This approach does not require many parameter settings. We initially constrain the centers to points in the dataset. We then aim at evolving these centers using a unique neighborhood structure that makes use of gradient information of the objective function. This results in an efficient exploration of the search space, after which the means are refined. The proposed scheme is implemented in MATLAB and tested on four real-world datasets, and it achieves a significant improvement over the existing TS approach in terms of the intra cluster sum of squares and computational time. |
Tasks | Efficient Exploration |
Published | 2017-03-24 |
URL | http://arxiv.org/abs/1703.08440v1 |
http://arxiv.org/pdf/1703.08440v1.pdf | |
PWC | https://paperswithcode.com/paper/k-means-clustering-using-tabu-search-with |
Repo | |
Framework | |
Learning to select computations
Title | Learning to select computations |
Authors | Frederick Callaway, Sayan Gul, Paul M. Krueger, Thomas L. Griffiths, Falk Lieder |
Abstract | The efficient use of limited computational resources is an essential ingredient of intelligence. Selecting computations optimally according to rational metareasoning would achieve this, but this is computationally intractable. Inspired by psychology and neuroscience, we propose the first concrete and domain-general learning algorithm for approximating the optimal selection of computations: Bayesian metalevel policy search (BMPS). We derive this general, sample-efficient search algorithm for a computation-selecting metalevel policy based on the insight that the value of information lies between the myopic value of information and the value of perfect information. We evaluate BMPS on three increasingly difficult metareasoning problems: when to terminate computation, how to allocate computation between competing options, and planning. Across all three domains, BMPS achieved near-optimal performance and compared favorably to previously proposed metareasoning heuristics. Finally, we demonstrate the practical utility of BMPS in an emergency management scenario, even accounting for the overhead of metareasoning. |
Tasks | |
Published | 2017-11-18 |
URL | http://arxiv.org/abs/1711.06892v3 |
http://arxiv.org/pdf/1711.06892v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-select-computations |
Repo | |
Framework | |
RADNET: Radiologist Level Accuracy using Deep Learning for HEMORRHAGE detection in CT Scans
Title | RADNET: Radiologist Level Accuracy using Deep Learning for HEMORRHAGE detection in CT Scans |
Authors | Monika Grewal, Muktabh Mayank Srivastava, Pulkit Kumar, Srikrishna Varadarajan |
Abstract | We describe a deep learning approach for automated brain hemorrhage detection from computed tomography (CT) scans. Our model emulates the procedure followed by radiologists to analyse a 3D CT scan in real-world. Similar to radiologists, the model sifts through 2D cross-sectional slices while paying close attention to potential hemorrhagic regions. Further, the model utilizes 3D context from neighboring slices to improve predictions at each slice and subsequently, aggregates the slice-level predictions to provide diagnosis at CT level. We refer to our proposed approach as Recurrent Attention DenseNet (RADnet) as it employs original DenseNet architecture along with adding the components of attention for slice level predictions and recurrent neural network layer for incorporating 3D context. The real-world performance of RADnet has been benchmarked against independent analysis performed by three senior radiologists for 77 brain CTs. RADnet demonstrates 81.82% hemorrhage prediction accuracy at CT level that is comparable to radiologists. Further, RADnet achieves higher recall than two of the three radiologists, which is remarkable. |
Tasks | Computed Tomography (CT) |
Published | 2017-10-13 |
URL | http://arxiv.org/abs/1710.04934v2 |
http://arxiv.org/pdf/1710.04934v2.pdf | |
PWC | https://paperswithcode.com/paper/radnet-radiologist-level-accuracy-using-deep |
Repo | |
Framework | |
Video Object Detection with an Aligned Spatial-Temporal Memory
Title | Video Object Detection with an Aligned Spatial-Temporal Memory |
Authors | Fanyi Xiao, Yong Jae Lee |
Abstract | We introduce Spatial-Temporal Memory Networks for video object detection. At its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent computation unit to model long-term temporal appearance and motion dynamics. The STMM’s design enables full integration of pretrained backbone CNN weights, which we find to be critical for accurate detection. Furthermore, in order to tackle object motion in videos, we propose a novel MatchTrans module to align the spatial-temporal memory from frame to frame. Our method produces state-of-the-art results on the benchmark ImageNet VID dataset, and our ablative studies clearly demonstrate the contribution of our different design choices. We release our code and models at http://fanyix.cs.ucdavis.edu/project/stmn/project.html. |
Tasks | Object Detection, Video Object Detection |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06317v3 |
http://arxiv.org/pdf/1712.06317v3.pdf | |
PWC | https://paperswithcode.com/paper/video-object-detection-with-an-aligned |
Repo | |
Framework | |
Opportunistic Self Organizing Migrating Algorithm for Real-Time Dynamic Traveling Salesman Problem
Title | Opportunistic Self Organizing Migrating Algorithm for Real-Time Dynamic Traveling Salesman Problem |
Authors | Shubham Dokania, Sunyam Bagga, Rohit Sharma |
Abstract | Self Organizing Migrating Algorithm (SOMA) is a meta-heuristic algorithm based on the self-organizing behavior of individuals in a simulated social environment. SOMA performs iterative computations on a population of potential solutions in the given search space to obtain an optimal solution. In this paper, an Opportunistic Self Organizing Migrating Algorithm (OSOMA) has been proposed that introduces a novel strategy to generate perturbations effectively. This strategy allows the individual to span across more possible solutions and thus, is able to produce better solutions. A comprehensive analysis of OSOMA on multi-dimensional unconstrained benchmark test functions is performed. OSOMA is then applied to solve real-time Dynamic Traveling Salesman Problem (DTSP). The problem of real-time DTSP has been stipulated and simulated using real-time data from Google Maps with a varying cost-metric between any two cities. Although DTSP is a very common and intuitive model in the real world, its presence in literature is still very limited. OSOMA performs exceptionally well on the problems mentioned above. To substantiate this claim, the performance of OSOMA is compared with SOMA, Differential Evolution and Particle Swarm Optimization. |
Tasks | |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.03793v1 |
http://arxiv.org/pdf/1709.03793v1.pdf | |
PWC | https://paperswithcode.com/paper/opportunistic-self-organizing-migrating |
Repo | |
Framework | |
Investigating how well contextual features are captured by bi-directional recurrent neural network models
Title | Investigating how well contextual features are captured by bi-directional recurrent neural network models |
Authors | Kushal Chawla, Sunil Kumar Sahu, Ashish Anand |
Abstract | Learning algorithms for natural language processing (NLP) tasks traditionally rely on manually defined relevant contextual features. On the other hand, neural network models using an only distributional representation of words have been successfully applied for several NLP tasks. Such models learn features automatically and avoid explicit feature engineering. Across several domains, neural models become a natural choice specifically when limited characteristics of data are known. However, this flexibility comes at the cost of interpretability. In this paper, we define three different methods to investigate ability of bi-directional recurrent neural networks (RNNs) in capturing contextual features. In particular, we analyze RNNs for sequence tagging tasks. We perform a comprehensive analysis on general as well as biomedical domain datasets. Our experiments focus on important contextual words as features, which can easily be extended to analyze various other feature types. We also investigate positional effects of context words and show how the developed methods can be used for error analysis. |
Tasks | Feature Engineering |
Published | 2017-09-03 |
URL | http://arxiv.org/abs/1709.00659v2 |
http://arxiv.org/pdf/1709.00659v2.pdf | |
PWC | https://paperswithcode.com/paper/investigating-how-well-contextual-features |
Repo | |
Framework | |
Pix2face: Direct 3D Face Model Estimation
Title | Pix2face: Direct 3D Face Model Estimation |
Authors | Daniel Crispell, Maxim Bazik |
Abstract | An efficient, fully automatic method for 3D face shape and pose estimation in unconstrained 2D imagery is presented. The proposed method jointly estimates a dense set of 3D landmarks and facial geometry using a single pass of a modified version of the popular “U-Net” neural network architecture. Additionally, we propose a method for directly estimating a set of 3D Morphable Model (3DMM) parameters, using the estimated 3D landmarks and geometry as constraints in a simple linear system. Qualitative modeling results are presented, as well as quantitative evaluation of predicted 3D face landmarks in unconstrained video sequences. |
Tasks | Pose Estimation |
Published | 2017-08-29 |
URL | http://arxiv.org/abs/1708.09006v1 |
http://arxiv.org/pdf/1708.09006v1.pdf | |
PWC | https://paperswithcode.com/paper/pix2face-direct-3d-face-model-estimation |
Repo | |
Framework | |
DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
Title | DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion |
Authors | Zhishuai Zhang, Cihang Xie, Jianyu Wang, Lingxi Xie, Alan L. Yuille |
Abstract | In this paper, we study the task of detecting semantic parts of an object, e.g., a wheel of a car, under partial occlusion. We propose that all models should be trained without seeing occlusions while being able to transfer the learned knowledge to deal with occlusions. This setting alleviates the difficulty in collecting an exponentially large dataset to cover occlusion patterns and is more essential. In this scenario, the proposal-based deep networks, like RCNN-series, often produce unsatisfactory results, because both the proposal extraction and classification stages may be confused by the irrelevant occluders. To address this, [25] proposed a voting mechanism that combines multiple local visual cues to detect semantic parts. The semantic parts can still be detected even though some visual cues are missing due to occlusions. However, this method is manually-designed, thus is hard to be optimized in an end-to-end manner. In this paper, we present DeepVoting, which incorporates the robustness shown by [25] into a deep network, so that the whole pipeline can be jointly optimized. Specifically, it adds two layers after the intermediate features of a deep network, e.g., the pool-4 layer of VGGNet. The first layer extracts the evidence of local visual cues, and the second layer performs a voting mechanism by utilizing the spatial relationship between visual cues and semantic parts. We also propose an improved version DeepVoting+ by learning visual cues from context outside objects. In experiments, DeepVoting achieves significantly better performance than several baseline methods, including Faster-RCNN, for semantic part detection under occlusion. In addition, DeepVoting enjoys explainability as the detection results can be diagnosed via looking up the voting cues. |
Tasks | |
Published | 2017-09-14 |
URL | http://arxiv.org/abs/1709.04577v2 |
http://arxiv.org/pdf/1709.04577v2.pdf | |
PWC | https://paperswithcode.com/paper/deepvoting-a-robust-and-explainable-deep |
Repo | |
Framework | |
Parseval Networks: Improving Robustness to Adversarial Examples
Title | Parseval Networks: Improving Robustness to Adversarial Examples |
Authors | Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier |
Abstract | We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Parseval networks are empirically and theoretically motivated by an analysis of the robustness of the predictions made by deep neural networks when their input is subject to an adversarial perturbation. The most important feature of Parseval networks is to maintain weight matrices of linear and convolutional layers to be (approximately) Parseval tight frames, which are extensions of orthogonal matrices to non-square matrices. We describe how these constraints can be maintained efficiently during SGD. We show that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers (SVHN) while being more robust than their vanilla counterpart against adversarial examples. Incidentally, Parseval networks also tend to train faster and make a better usage of the full capacity of the networks. |
Tasks | |
Published | 2017-04-28 |
URL | http://arxiv.org/abs/1704.08847v2 |
http://arxiv.org/pdf/1704.08847v2.pdf | |
PWC | https://paperswithcode.com/paper/parseval-networks-improving-robustness-to |
Repo | |
Framework | |
Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy
Title | Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy |
Authors | Jan Kremer, Kristoffer Stensbo-Smidt, Fabian Gieseke, Kim Steenstrup Pedersen, Christian Igel |
Abstract | Astrophysics and cosmology are rich with data. The advent of wide-area digital cameras on large aperture telescopes has led to ever more ambitious surveys of the sky. Data volumes of entire surveys a decade ago can now be acquired in a single night and real-time analysis is often desired. Thus, modern astronomy requires big data know-how, in particular it demands highly efficient machine learning and image analysis algorithms. But scalability is not the only challenge: Astronomy applications touch several current machine learning research questions, such as learning from biased data and dealing with label and measurement noise. We argue that this makes astronomy a great domain for computer science research, as it pushes the boundaries of data analysis. In the following, we will present this exciting application area for data scientists. We will focus on exemplary results, discuss main challenges, and highlight some recent methodological advancements in machine learning and image analysis triggered by astronomical applications. |
Tasks | |
Published | 2017-04-15 |
URL | http://arxiv.org/abs/1704.04650v1 |
http://arxiv.org/pdf/1704.04650v1.pdf | |
PWC | https://paperswithcode.com/paper/big-universe-big-data-machine-learning-and |
Repo | |
Framework | |
Stable Distribution Alignment Using the Dual of the Adversarial Distance
Title | Stable Distribution Alignment Using the Dual of the Adversarial Distance |
Authors | Ben Usman, Kate Saenko, Brian Kulis |
Abstract | Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time. |
Tasks | Domain Adaptation |
Published | 2017-07-13 |
URL | http://arxiv.org/abs/1707.04046v4 |
http://arxiv.org/pdf/1707.04046v4.pdf | |
PWC | https://paperswithcode.com/paper/stable-distribution-alignment-using-the-dual |
Repo | |
Framework | |
Small Boxes Big Data: A Deep Learning Approach to Optimize Variable Sized Bin Packing
Title | Small Boxes Big Data: A Deep Learning Approach to Optimize Variable Sized Bin Packing |
Authors | Feng Mao, Edgar Blanco, Mingang Fu, Rohit Jain, Anurag Gupta, Sebastien Mancel, Rong Yuan, Stephen Guo, Sai Kumar, Yayang Tian |
Abstract | Bin Packing problems have been widely studied because of their broad applications in different domains. Known as a set of NP-hard problems, they have different vari- ations and many heuristics have been proposed for obtaining approximate solutions. Specifically, for the 1D variable sized bin packing problem, the two key sets of optimization heuristics are the bin assignment and the bin allocation. Usually the performance of a single static optimization heuristic can not beat that of a dynamic one which is tailored for each bin packing instance. Building such an adaptive system requires modeling the relationship between bin features and packing perform profiles. The primary drawbacks of traditional AI machine learnings for this task are the natural limitations of feature engineering, such as the curse of dimensionality and feature selection quality. We introduce a deep learning approach to overcome the drawbacks by applying a large training data set, auto feature selection and fast, accurate labeling. We show in this paper how to build such a system by both theoretical formulation and engineering practices. Our prediction system achieves up to 89% training accuracy and 72% validation accuracy to select the best heuristic that can generate a better quality bin packing solution. |
Tasks | Feature Engineering, Feature Selection |
Published | 2017-02-14 |
URL | http://arxiv.org/abs/1702.04415v1 |
http://arxiv.org/pdf/1702.04415v1.pdf | |
PWC | https://paperswithcode.com/paper/small-boxes-big-data-a-deep-learning-approach |
Repo | |
Framework | |
Fast learning rate of deep learning via a kernel perspective
Title | Fast learning rate of deep learning via a kernel perspective |
Authors | Taiji Suzuki |
Abstract | We develop a new theoretical framework to analyze the generalization error of deep learning, and derive a new fast learning rate for two representative algorithms: empirical risk minimization and Bayesian deep learning. The series of theoretical analyses of deep learning has revealed its high expressive power and universal approximation capability. Although these analyses are highly nonparametric, existing generalization error analyses have been developed mainly in a fixed dimensional parametric model. To compensate this gap, we develop an infinite dimensional model that is based on an integral form as performed in the analysis of the universal approximation capability. This allows us to define a reproducing kernel Hilbert space corresponding to each layer. Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one. The approximation error is evaluated by the degree of freedom of the reproducing kernel Hilbert space in each layer. To estimate a good finite dimensional model, we consider both of empirical risk minimization and Bayesian deep learning. We derive its generalization error bound and it is shown that there appears bias-variance trade-off in terms of the number of parameters of the finite dimensional approximation. We show that the optimal width of the internal layers can be determined through the degree of freedom and the convergence rate can be faster than $O(1/\sqrt{n})$ rate which has been shown in the existing studies. |
Tasks | |
Published | 2017-05-29 |
URL | http://arxiv.org/abs/1705.10182v1 |
http://arxiv.org/pdf/1705.10182v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-learning-rate-of-deep-learning-via-a |
Repo | |
Framework | |
Object Detection by Spatio-Temporal Analysis and Tracking of the Detected Objects in a Video with Variable Background
Title | Object Detection by Spatio-Temporal Analysis and Tracking of the Detected Objects in a Video with Variable Background |
Authors | Kumar S. Ray, Vijayan K. Asari, Soma Chakraborty |
Abstract | In this paper we propose a novel approach for detecting and tracking objects in videos with variable background i.e. videos captured by moving cameras without any additional sensor. In a video captured by a moving camera, both the background and foreground are changing in each frame of the image sequence. So for these videos, modeling a single background with traditional background modeling methods is infeasible and thus the detection of actual moving object in a variable background is a challenging task. To detect actual moving object in this work, spatio-temporal blobs have been generated in each frame by spatio-temporal analysis of the image sequence using a three-dimensional Gabor filter. Then individual blobs, which are parts of one object are merged using Minimum Spanning Tree to form the moving object in the variable background. The height, width and four-bin gray-value histogram of the object are calculated as its features and an object is tracked in each frame using these features to generate the trajectories of the object through the video sequence. In this work, problem of data association during tracking is solved by Linear Assignment Problem and occlusion is handled by the application of kalman filter. The major advantage of our method over most of the existing tracking algorithms is that, the proposed method does not require initialization in the first frame or training on sample data to perform. Performance of the algorithm has been tested on benchmark videos and very satisfactory result has been achieved. The performance of the algorithm is also comparable and superior with respect to some benchmark algorithms. |
Tasks | Object Detection |
Published | 2017-04-28 |
URL | http://arxiv.org/abs/1705.02949v1 |
http://arxiv.org/pdf/1705.02949v1.pdf | |
PWC | https://paperswithcode.com/paper/object-detection-by-spatio-temporal-analysis |
Repo | |
Framework | |