Paper Group ANR 307
Unsupervised Incremental Learning of Deep Descriptors From Video Streams. Frank-Wolfe Optimization for Symmetric-NMF under Simplicial Constraint. Cascaded Region-based Densely Connected Network for Event Detection: A Seismic Application. Hardware-Driven Nonlinear Activation for Stochastic Computing Based Deep Convolutional Neural Networks. UC Merce …
Unsupervised Incremental Learning of Deep Descriptors From Video Streams
Title | Unsupervised Incremental Learning of Deep Descriptors From Video Streams |
Authors | Federico Pernici, Alberto Del Bimbo |
Abstract | We present a novel unsupervised method for face identity learning from video sequences. The method exploits the ResNet deep network for face detection and VGGface fc7 face descriptors together with a smart learning mechanism that exploits the temporal coherence of visual data in video streams. We present a novel feature matching solution based on Reverse Nearest Neighbour and a feature forgetting strategy that supports incremental learning with memory size control, while time progresses. It is shown that the proposed learning procedure is asymptotically stable and can be effectively applied to relevant applications like multiple face tracking. |
Tasks | Face Detection |
Published | 2017-08-11 |
URL | http://arxiv.org/abs/1708.03615v1 |
http://arxiv.org/pdf/1708.03615v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-incremental-learning-of-deep |
Repo | |
Framework | |
Frank-Wolfe Optimization for Symmetric-NMF under Simplicial Constraint
Title | Frank-Wolfe Optimization for Symmetric-NMF under Simplicial Constraint |
Authors | Han Zhao, Geoff Gordon |
Abstract | Symmetric nonnegative matrix factorization has found abundant applications in various domains by providing a symmetric low-rank decomposition of nonnegative matrices. In this paper we propose a Frank-Wolfe (FW) solver to optimize the symmetric nonnegative matrix factorization problem under a simplicial constraint, which has recently been proposed for probabilistic clustering. Compared with existing solutions, this algorithm is simple to implement, and has no hyperparameters to be tuned. Building on the recent advances of FW algorithms in nonconvex optimization, we prove an $O(1/\varepsilon^2)$ convergence rate to $\varepsilon$-approximate KKT points, via a tight bound $\Theta(n^2)$ on the curvature constant, which matches the best known result in unconstrained nonconvex setting using gradient methods. Numerical results demonstrate the effectiveness of our algorithm. As a side contribution, we construct a simple nonsmooth convex problem where the FW algorithm fails to converge to the optimum. This result raises an interesting question about necessary conditions of the success of the FW algorithm on convex problems. |
Tasks | |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06348v3 |
http://arxiv.org/pdf/1706.06348v3.pdf | |
PWC | https://paperswithcode.com/paper/frank-wolfe-optimization-for-symmetric-nmf |
Repo | |
Framework | |
Cascaded Region-based Densely Connected Network for Event Detection: A Seismic Application
Title | Cascaded Region-based Densely Connected Network for Event Detection: A Seismic Application |
Authors | Yue Wu, Youzuo Lin, Zheng Zhou, David Chas Bolton, Ji Liu, Paul Johnson |
Abstract | Automatic event detection from time series signals has wide applications, such as abnormal event detection in video surveillance and event detection in geophysical data. Traditional detection methods detect events primarily by the use of similarity and correlation in data. Those methods can be inefficient and yield low accuracy. In recent years, because of the significantly increased computational power, machine learning techniques have revolutionized many science and engineering domains. In this study, we apply a deep-learning-based method to the detection of events from time series seismic signals. However, a direct adaptation of the similar ideas from 2D object detection to our problem faces two challenges. The first challenge is that the duration of earthquake event varies significantly; The other is that the proposals generated are temporally correlated. To address these challenges, we propose a novel cascaded region-based convolutional neural network to capture earthquake events in different sizes, while incorporating contextual information to enrich features for each individual proposal. To achieve a better generalization performance, we use densely connected blocks as the backbone of our network. Because of the fact that some positive events are not correctly annotated, we further formulate the detection problem as a learning-from-noise problem. To verify the performance of our detection methods, we employ our methods to seismic data generated from a bi-axial “earthquake machine” located at Rock Mechanics Laboratory, and we acquire labels with the help of experts. Through our numerical tests, we show that our novel detection techniques yield high accuracy. Therefore, our novel deep-learning-based detection methods can potentially be powerful tools for locating events from time series data in various applications. |
Tasks | Abnormal Event Detection In Video, Object Detection, Time Series |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.07943v2 |
http://arxiv.org/pdf/1709.07943v2.pdf | |
PWC | https://paperswithcode.com/paper/cascaded-region-based-densely-connected |
Repo | |
Framework | |
Hardware-Driven Nonlinear Activation for Stochastic Computing Based Deep Convolutional Neural Networks
Title | Hardware-Driven Nonlinear Activation for Stochastic Computing Based Deep Convolutional Neural Networks |
Authors | Ji Li, Zihao Yuan, Zhe Li, Caiwen Ding, Ao Ren, Qinru Qiu, Jeffrey Draper, Yanzhi Wang |
Abstract | Recently, Deep Convolutional Neural Networks (DCNNs) have made unprecedented progress, achieving the accuracy close to, or even better than human-level perception in various tasks. There is a timely need to map the latest software DCNNs to application-specific hardware, in order to achieve orders of magnitude improvement in performance, energy efficiency and compactness. Stochastic Computing (SC), as a low-cost alternative to the conventional binary computing paradigm, has the potential to enable massively parallel and highly scalable hardware implementation of DCNNs. One major challenge in SC based DCNNs is designing accurate nonlinear activation functions, which have a significant impact on the network-level accuracy but cannot be implemented accurately by existing SC computing blocks. In this paper, we design and optimize SC based neurons, and we propose highly accurate activation designs for the three most frequently used activation functions in software DCNNs, i.e, hyperbolic tangent, logistic, and rectified linear units. Experimental results on LeNet-5 using MNIST dataset demonstrate that compared with a binary ASIC hardware DCNN, the DCNN with the proposed SC neurons can achieve up to 61X, 151X, and 2X improvement in terms of area, power, and energy, respectively, at the cost of small precision degradation.In addition, the SC approach achieves up to 21X and 41X of the area, 41X and 72X of the power, and 198200X and 96443X of the energy, compared with CPU and GPU approaches, respectively, while the error is increased by less than 3.07%. ReLU activation is suggested for future SC based DCNNs considering its superior performance under a small bit stream length. |
Tasks | |
Published | 2017-03-12 |
URL | http://arxiv.org/abs/1703.04135v1 |
http://arxiv.org/pdf/1703.04135v1.pdf | |
PWC | https://paperswithcode.com/paper/hardware-driven-nonlinear-activation-for |
Repo | |
Framework | |
UC Merced Submission to the ActivityNet Challenge 2016
Title | UC Merced Submission to the ActivityNet Challenge 2016 |
Authors | Yi Zhu, Shawn Newsam, Zaikun Xu |
Abstract | This notebook paper describes our system for the untrimmed classification task in the ActivityNet challenge 2016. We investigate multiple state-of-the-art approaches for action recognition in long, untrimmed videos. We exploit hand-crafted motion boundary histogram features as well feature activations from deep networks such as VGG16, GoogLeNet, and C3D. These features are separately fed to linear, one-versus-rest support vector machine classifiers to produce confidence scores for each action class. These predictions are then fused along with the softmax scores of the recent ultra-deep ResNet-101 using weighted averaging. |
Tasks | Temporal Action Localization |
Published | 2017-04-11 |
URL | http://arxiv.org/abs/1704.03503v1 |
http://arxiv.org/pdf/1704.03503v1.pdf | |
PWC | https://paperswithcode.com/paper/uc-merced-submission-to-the-activitynet |
Repo | |
Framework | |
Optimal Approximation with Sparsely Connected Deep Neural Networks
Title | Optimal Approximation with Sparsely Connected Deep Neural Networks |
Authors | Helmut Bölcskei, Philipp Grohs, Gitta Kutyniok, Philipp Petersen |
Abstract | We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accuracy. Additionally, we prove that our lower bounds are achievable for a broad family of function classes. Specifically, all function classes that are optimally approximated by a general class of representation systems—so-called \emph{affine systems}—can be approximated by deep neural networks with minimal connectivity and memory requirements. Affine systems encompass a wealth of representation systems from applied harmonic analysis such as wavelets, ridgelets, curvelets, shearlets, $\alpha$-shearlets, and more generally $\alpha$-molecules. Our central result elucidates a remarkable universality property of neural networks and shows that they achieve the optimum approximation properties of all affine systems combined. As a specific example, we consider the class of $\alpha^{-1}$-cartoon-like functions, which is approximated optimally by $\alpha$-shearlets. We also explain how our results can be extended to the case of functions on low-dimensional immersed manifolds. Finally, we present numerical experiments demonstrating that the standard stochastic gradient descent algorithm generates deep neural networks providing close-to-optimal approximation rates. Moreover, these results indicate that stochastic gradient descent can actually learn approximations that are sparse in the representation systems optimally sparsifying the function class the network is trained on. |
Tasks | |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.01714v4 |
http://arxiv.org/pdf/1705.01714v4.pdf | |
PWC | https://paperswithcode.com/paper/optimal-approximation-with-sparsely-connected |
Repo | |
Framework | |
How we can control the crack to propagate along the specified path feasibly?
Title | How we can control the crack to propagate along the specified path feasibly? |
Authors | Zhenxing Cheng, Hu Wang |
Abstract | A controllable crack propagation (CCP) strategy is suggested. It is well known that crack always leads the failure by crossing the critical domain in engineering structure. Therefore, the CCP method is proposed to control the crack to propagate along the specified path, which is away from the critical domain. To complete this strategy, two optimization methods are engaged. Firstly, a back propagation neural network (BPNN) assisted particle swarm optimization (PSO) is suggested. In this method, to improve the efficiency of CCP, the BPNN is used to build the metamodel instead of the forward evaluation. Secondly, the popular PSO is used. Considering the optimization iteration is a time consuming process, an efficient reanalysis based extended finite element methods (X-FEM) is used to substitute the complete X-FEM solver to calculate the crack propagation path. Moreover, an adaptive subdomain partition strategy is suggested to improve the fitting accuracy between real crack and specified paths. Several typical numerical examples demonstrate that both optimization methods can carry out the CCP. The selection of them should be determined by the tradeoff between efficiency and accuracy. |
Tasks | |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.10748v2 |
http://arxiv.org/pdf/1710.10748v2.pdf | |
PWC | https://paperswithcode.com/paper/how-we-can-control-the-crack-to-propagate |
Repo | |
Framework | |
Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks
Title | Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks |
Authors | Yingfei Wang, Chu Wang, Warren Powell |
Abstract | We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success in either offline (training) or online (testing) phases. Our problem is motivated by real-world applications where observations are time-consuming and/or expensive. We develop a knowledge gradient policy using an online Bayesian linear classifier to guide the experiment by maximizing the expected value of information of labeling each alternative. We provide a finite-time analysis of the estimated error and show that the maximum likelihood estimator based produced by the KG policy is consistent and asymptotically normal. We also show that the knowledge gradient policy is asymptotically optimal in an offline setting. This work further extends the knowledge gradient to the setting of contextual bandits. We report the results of a series of experiments that demonstrate its efficiency. |
Tasks | Decision Making, Multi-Armed Bandits |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.05216v1 |
http://arxiv.org/pdf/1709.05216v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-learning-for-sequential-decision |
Repo | |
Framework | |
A Study on Topological Descriptors for the Analysis of 3D Surface Texture
Title | A Study on Topological Descriptors for the Analysis of 3D Surface Texture |
Authors | Matthias Zeppelzauer, Bartosz Zielinski, Mateusz Juda, Markus Seidl |
Abstract | Methods from computational topology are becoming more and more popular in computer vision and have shown to improve the state-of-the-art in several tasks. In this paper, we investigate the applicability of topological descriptors in the context of 3D surface analysis for the classification of different surface textures. We present a comprehensive study on topological descriptors, investigate their robustness and expressiveness and compare them with state-of-the-art methods including Convolutional Neural Networks (CNNs). Results show that class-specific information is reflected well in topological descriptors. The investigated descriptors can directly compete with non-topological descriptors and capture complementary information. As a consequence they improve the state-of-the-art when combined with non-topological descriptors. |
Tasks | |
Published | 2017-10-29 |
URL | http://arxiv.org/abs/1710.10662v1 |
http://arxiv.org/pdf/1710.10662v1.pdf | |
PWC | https://paperswithcode.com/paper/a-study-on-topological-descriptors-for-the |
Repo | |
Framework | |
PixelBNN: Augmenting the PixelCNN with batch normalization and the presentation of a fast architecture for retinal vessel segmentation
Title | PixelBNN: Augmenting the PixelCNN with batch normalization and the presentation of a fast architecture for retinal vessel segmentation |
Authors | Henry A Leopold, Jeff Orchard, John S Zelek, Vasudevan Lakshminarayanan |
Abstract | Analysis of retinal fundus images is essential for eye-care physicians in the diagnosis, care and treatment of patients. Accurate fundus and/or retinal vessel maps give rise to longitudinal studies able to utilize multimedia image registration and disease/condition status measurements, as well as applications in surgery preparation and biometrics. The segmentation of retinal morphology has numerous applications in assessing ophthalmologic and cardiovascular disease pathologies. The early detection of many such conditions is often the most effective method for reducing patient risk. Computer aided segmentation of the vasculature has proven to be a challenge, mainly due to inconsistencies such as noise and variations in hue and brightness that can greatly reduce the quality of fundus images. This paper presents PixelBNN, a highly efficient deep method for automating the segmentation of fundus morphologies. The model was trained, tested and cross tested on the DRIVE, STARE and CHASE_DB1 retinal vessel segmentation datasets. Performance was evaluated using G-mean, Mathews Correlation Coefficient and F1-score. The network was 8.5 times faster than the current state-of-the-art at test time and performed comparatively well, considering a 5 to 19 times reduction in information from resizing images during preprocessing. |
Tasks | Image Registration, Retinal Vessel Segmentation |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.06742v1 |
http://arxiv.org/pdf/1712.06742v1.pdf | |
PWC | https://paperswithcode.com/paper/pixelbnn-augmenting-the-pixelcnn-with-batch |
Repo | |
Framework | |
Large-Scale 3D Scene Classification With Multi-View Volumetric CNN
Title | Large-Scale 3D Scene Classification With Multi-View Volumetric CNN |
Authors | Dror Aiger, Brett Allen, Aleksey Golovinskiy |
Abstract | We introduce a method to classify imagery using a convo- lutional neural network (CNN) on multi-view image pro- jections. The power of our method comes from using pro- jections of multiple images at multiple depth planes near the reconstructed surface. This enables classification of categories whose salient aspect is appearance change un- der different viewpoints, such as water, trees, and other materials with complex reflection/light response proper- ties. Our method does not require boundary labelling in images and works on pixel-level classification with a small (few pixels) context, which simplifies the cre- ation of a training set. We demonstrate this application on large-scale aerial imagery collections, and extend the per-pixel classification to robustly create a consistent 2D classification which can be used to fill the gaps in non- reconstructible water regions. We also apply our method to classify tree regions. In both cases, the training data can quickly be generated using a small number of manually- created polygons on a map. We show that even with a very simple and standard network our CNN outperforms the state-of-the-art image classification, the Inception-V3 model retrained from a large collection of aerial images. |
Tasks | Image Classification, Scene Classification |
Published | 2017-12-26 |
URL | http://arxiv.org/abs/1712.09216v1 |
http://arxiv.org/pdf/1712.09216v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-3d-scene-classification-with |
Repo | |
Framework | |
Indefinite Kernel Logistic Regression
Title | Indefinite Kernel Logistic Regression |
Authors | Fanghui Liu, Xiaolin Huang, Jie Yang |
Abstract | Traditionally, kernel learning methods requires positive definitiveness on the kernel, which is too strict and excludes many sophisticated similarities, that are indefinite, in multimedia area. To utilize those indefinite kernels, indefinite learning methods are of great interests. This paper aims at the extension of the logistic regression from positive semi-definite kernels to indefinite kernels. The model, called indefinite kernel logistic regression (IKLR), keeps consistency to the regular KLR in formulation but it essentially becomes non-convex. Thanks to the positive decomposition of an indefinite matrix, IKLR can be transformed into a difference of two convex models, which follows the use of concave-convex procedure. Moreover, we employ an inexact solving scheme to speed up the sub-problem and develop a concave-inexact-convex procedure (CCICP) algorithm with theoretical convergence analysis. Systematical experiments on multi-modal datasets demonstrate the superiority of the proposed IKLR method over kernel logistic regression with positive definite kernels and other state-of-the-art indefinite learning based algorithms. |
Tasks | |
Published | 2017-07-06 |
URL | http://arxiv.org/abs/1707.01826v1 |
http://arxiv.org/pdf/1707.01826v1.pdf | |
PWC | https://paperswithcode.com/paper/indefinite-kernel-logistic-regression |
Repo | |
Framework | |
Hierarchical Surrogate Modeling for Illumination Algorithms
Title | Hierarchical Surrogate Modeling for Illumination Algorithms |
Authors | Alexander Hagg |
Abstract | Evolutionary illumination is a recent technique that allows producing many diverse, optimal solutions in a map of manually defined features. To support the large amount of objective function evaluations, surrogate model assistance was recently introduced. Illumination models need to represent many more, diverse optimal regions than classical surrogate models. In this PhD thesis, we propose to decompose the sample set, decreasing model complexity, by hierarchically segmenting the training set according to their coordinates in feature space. An ensemble of diverse models can then be trained to serve as a surrogate to illumination. |
Tasks | |
Published | 2017-03-29 |
URL | http://arxiv.org/abs/1703.09926v1 |
http://arxiv.org/pdf/1703.09926v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-surrogate-modeling-for |
Repo | |
Framework | |
YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
Title | YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video |
Authors | Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan, Vincent Vanhoucke |
Abstract | We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. The objects represent a subset of the MS COCO label set. All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second. The use of a cascade of increasingly precise human annotations ensures a label accuracy above 95% for every class and tight bounding boxes. Finally, we train and evaluate well-known deep network architectures and report baseline figures for per-frame classification and localization to provide a point of comparison for future work. We also demonstrate how the temporal contiguity of video can potentially be used to improve such inferences. Please see the PDF file to find the URL to download the data. We hope the availability of such large curated corpus will spur new advances in video object detection and tracking. |
Tasks | Object Detection, Video Object Detection |
Published | 2017-02-02 |
URL | http://arxiv.org/abs/1702.00824v5 |
http://arxiv.org/pdf/1702.00824v5.pdf | |
PWC | https://paperswithcode.com/paper/youtube-boundingboxes-a-large-high-precision |
Repo | |
Framework | |
Predicting Human Interaction via Relative Attention Model
Title | Predicting Human Interaction via Relative Attention Model |
Authors | Yichao Yan, Bingbing Ni, Xiaokang Yang |
Abstract | Predicting human interaction is challenging as the on-going activity has to be inferred based on a partially observed video. Essentially, a good algorithm should effectively model the mutual influence between the two interacting subjects. Also, only a small region in the scene is discriminative for identifying the on-going interaction. In this work, we propose a relative attention model to explicitly address these difficulties. Built on a tri-coupled deep recurrent structure representing both interacting subjects and global interaction status, the proposed network collects spatio-temporal information from each subject, rectified with global interaction information, yielding effective interaction representation. Moreover, the proposed network also unifies an attention module to assign higher importance to the regions which are relevant to the on-going action. Extensive experiments have been conducted on two public datasets, and the results demonstrate that the proposed relative attention network successfully predicts informative regions between interacting subjects, which in turn yields superior human interaction prediction accuracy. |
Tasks | |
Published | 2017-05-26 |
URL | http://arxiv.org/abs/1705.09467v1 |
http://arxiv.org/pdf/1705.09467v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-human-interaction-via-relative |
Repo | |
Framework | |