May 7, 2019

2674 words 13 mins read

Paper Group ANR 7

Paper Group ANR 7

Order-aware Convolutional Pooling for Video Based Action Recognition. Deep Learning Ensembles for Melanoma Recognition in Dermoscopy Images. Katyusha: The First Direct Acceleration of Stochastic Gradient Methods. A Generic Framework for Assessing the Performance Bounds of Image Feature Detectors. Deep Reinforcement Learning Discovers Internal Model …

Order-aware Convolutional Pooling for Video Based Action Recognition

Title Order-aware Convolutional Pooling for Video Based Action Recognition
Authors Peng Wang, Lingqiao Liu, Chunhua Shen, Heng Tao Shen
Abstract Most video based action recognition approaches create the video-level representation by temporally pooling the features extracted at each frame. The pooling methods that they adopt, however, usually completely or partially neglect the dynamic information contained in the temporal domain, which may undermine the discriminative power of the resulting video representation since the video sequence order could unveil the evolution of a specific event or action. To overcome this drawback and explore the importance of incorporating the temporal order information, in this paper we propose a novel temporal pooling approach to aggregate the frame-level features. Inspired by the capacity of Convolutional Neural Networks (CNN) in making use of the internal structure of images for information abstraction, we propose to apply the temporal convolution operation to the frame-level representations to extract the dynamic information. However, directly implementing this idea on the original high-dimensional feature would inevitably result in parameter explosion. To tackle this problem, we view the temporal evolution of the feature value at each feature dimension as a 1D signal and learn a unique convolutional filter bank for each of these 1D signals. We conduct experiments on two challenging video-based action recognition datasets, HMDB51 and UCF101; and demonstrate that the proposed method is superior to the conventional pooling methods.
Tasks Temporal Action Localization
Published 2016-01-31
URL http://arxiv.org/abs/1602.00224v1
PDF http://arxiv.org/pdf/1602.00224v1.pdf
PWC https://paperswithcode.com/paper/order-aware-convolutional-pooling-for-video
Repo
Framework

Deep Learning Ensembles for Melanoma Recognition in Dermoscopy Images

Title Deep Learning Ensembles for Melanoma Recognition in Dermoscopy Images
Authors Noel Codella, Quoc-Bao Nguyen, Sharath Pankanti, David Gutman, Brian Helba, Allan Halpern, John R. Smith
Abstract Melanoma is the deadliest form of skin cancer. While curable with early detection, only highly trained specialists are capable of accurately recognizing the disease. As expertise is in limited supply, automated systems capable of identifying disease could save lives, reduce unnecessary biopsies, and reduce costs. Toward this goal, we propose a system that combines recent developments in deep learning with established machine learning approaches, creating ensembles of methods that are capable of segmenting skin lesions, as well as analyzing the detected area and surrounding tissue for melanoma detection. The system is evaluated using the largest publicly available benchmark dataset of dermoscopic images, containing 900 training and 379 testing images. New state-of-the-art performance levels are demonstrated, leading to an improvement in the area under receiver operating characteristic curve of 7.5% (0.843 vs. 0.783), in average precision of 4% (0.649 vs. 0.624), and in specificity measured at the clinically relevant 95% sensitivity operating point 2.9 times higher than the previous state-of-the-art (36.8% specificity compared to 12.5%). Compared to the average of 8 expert dermatologists on a subset of 100 test images, the proposed system produces a higher accuracy (76% vs. 70.5%), and specificity (62% vs. 59%) evaluated at an equivalent sensitivity (82%).
Tasks
Published 2016-10-14
URL http://arxiv.org/abs/1610.04662v2
PDF http://arxiv.org/pdf/1610.04662v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-ensembles-for-melanoma
Repo
Framework

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

Title Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
Authors Zeyuan Allen-Zhu
Abstract Nesterov’s momentum trick is famously known for accelerating gradient descent, and has been proven useful in building fast iterative algorithms. However, in the stochastic setting, counterexamples exist and prevent Nesterov’s momentum from providing similar acceleration, even if the underlying problem is convex and finite-sum. We introduce $\mathtt{Katyusha}$, a direct, primal-only stochastic gradient method to fix this issue. In convex finite-sum stochastic optimization, $\mathtt{Katyusha}$ has an optimal accelerated convergence rate, and enjoys an optimal parallel linear speedup in the mini-batch setting. The main ingredient is $\textit{Katyusha momentum}$, a novel “negative momentum” on top of Nesterov’s momentum. It can be incorporated into a variance-reduction based algorithm and speed it up, both in terms of $\textit{sequential and parallel}$ performance. Since variance reduction has been successfully applied to a growing list of practical problems, our paper suggests that in each of such cases, one could potentially try to give Katyusha a hug.
Tasks Stochastic Optimization
Published 2016-03-18
URL http://arxiv.org/abs/1603.05953v6
PDF http://arxiv.org/pdf/1603.05953v6.pdf
PWC https://paperswithcode.com/paper/katyusha-the-first-direct-acceleration-of
Repo
Framework

A Generic Framework for Assessing the Performance Bounds of Image Feature Detectors

Title A Generic Framework for Assessing the Performance Bounds of Image Feature Detectors
Authors Shoaib Ehsan, Adrian F. Clark, Ales Leonardis, Naveed ur Rehman, Klaus D. McDonald-Maier
Abstract Since local feature detection has been one of the most active research areas in computer vision during the last decade, a large number of detectors have been proposed. The interest in feature-based applications continues to grow and has thus rendered the task of characterizing the performance of various feature detection methods an important issue in vision research. Inspired by the good practices of electronic system design, a generic framework based on the repeatability measure is presented in this paper that allows assessment of the upper and lower bounds of detector performance and finds statistically significant performance differences between detectors as a function of image transformation amount by introducing a new variant of McNemars test in an effort to design more reliable and effective vision systems. The proposed framework is then employed to establish operating and guarantee regions for several state-of-the-art detectors and to identify their statistical performance differences for three specific image transformations: JPEG compression, uniform light changes and blurring. The results are obtained using a newly acquired, large image database (20482) images with 539 different scenes. These results provide new insights into the behaviour of detectors and are also useful from the vision systems design perspective.
Tasks
Published 2016-05-19
URL http://arxiv.org/abs/1605.05791v1
PDF http://arxiv.org/pdf/1605.05791v1.pdf
PWC https://paperswithcode.com/paper/a-generic-framework-for-assessing-the
Repo
Framework

Deep Reinforcement Learning Discovers Internal Models

Title Deep Reinforcement Learning Discovers Internal Models
Authors Nir Baram, Tom Zahavy, Shie Mannor
Abstract Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots. While DRL agents perform well in practice we are still lacking the tools to analayze their performance. In this work we present the Semi-Aggregated MDP (SAMDP) model. A model best suited to describe policies exhibiting both spatial and temporal hierarchies. We describe its advantages for analyzing trained policies over other modeling approaches, and show that under the right state representation, like that of DQN agents, SAMDP can help to identify skills. We detail the automatic process of creating it from recorded trajectories, up to presenting it on t-SNE maps. We explain how to evaluate its fitness and show surprising results indicating high compatibility with the policy at hand. We conclude by showing how using the SAMDP model, an extra performance gain can be squeezed from the agent.
Tasks
Published 2016-06-16
URL http://arxiv.org/abs/1606.05174v1
PDF http://arxiv.org/pdf/1606.05174v1.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-discovers
Repo
Framework

Automatic 3D Reconstruction for Symmetric Shapes

Title Automatic 3D Reconstruction for Symmetric Shapes
Authors Atishay Jain
Abstract Generic 3D reconstruction from a single image is a difficult problem. A lot of data loss occurs in the projection. A domain based approach to reconstruction where we solve a smaller set of problems for a particular use case lead to greater returns. The project provides a way to automatically generate full 3-D renditions of actual symmetric images that have some prior information provided in the pipeline by a recognition algorithm. We provide a critical analysis on how this can be enhanced and improved to provide a general reconstruction framework for automatic reconstruction for any symmetric shape.
Tasks 3D Reconstruction
Published 2016-06-18
URL http://arxiv.org/abs/1606.05785v1
PDF http://arxiv.org/pdf/1606.05785v1.pdf
PWC https://paperswithcode.com/paper/automatic-3d-reconstruction-for-symmetric
Repo
Framework

Classifier comparison using precision

Title Classifier comparison using precision
Authors Lovedeep Gondara
Abstract New proposed models are often compared to state-of-the-art using statistical significance testing. Literature is scarce for classifier comparison using metrics other than accuracy. We present a survey of statistical methods that can be used for classifier comparison using precision, accounting for inter-precision correlation arising from use of same dataset. Comparisons are made using per-class precision and methods presented to test global null hypothesis of an overall model comparison. Comparisons are extended to multiple multi-class classifiers and to models using cross validation or its variants. Partial Bayesian update to precision is introduced when population prevalence of a class is known. Applications to compare deep architectures are studied.
Tasks
Published 2016-09-29
URL http://arxiv.org/abs/1609.09471v2
PDF http://arxiv.org/pdf/1609.09471v2.pdf
PWC https://paperswithcode.com/paper/classifier-comparison-using-precision
Repo
Framework

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

Title phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
Authors Ying Hua Tan, Chee Seng Chan
Abstract A picture is worth a thousand words. Not until recently, however, we noticed some success stories in understanding of visual scenes: a model that is able to detect/name objects, describe their attributes, and recognize their relationships/interactions. In this paper, we propose a phrase-based hierarchical Long Short-Term Memory (phi-LSTM) model to generate image description. The proposed model encodes sentence as a sequence of combination of phrases and words, instead of a sequence of words alone as in those conventional solutions. The two levels of this model are dedicated to i) learn to generate image relevant noun phrases, and ii) produce appropriate image description from the phrases and other words in the corpus. Adopting a convolutional neural network to learn image features and the LSTM to learn the word sequence in a sentence, the proposed model has shown better or competitive results in comparison to the state-of-the-art models on Flickr8k and Flickr30k datasets.
Tasks Image Captioning
Published 2016-08-20
URL http://arxiv.org/abs/1608.05813v5
PDF http://arxiv.org/pdf/1608.05813v5.pdf
PWC https://paperswithcode.com/paper/phi-lstm-a-phrase-based-hierarchical-lstm
Repo
Framework

On the Accuracy of Point Localisation in a Circular Camera-Array

Title On the Accuracy of Point Localisation in a Circular Camera-Array
Authors Alireza Ghasemi, Adam Scholefield, Martin Vetterli
Abstract Although many advances have been made in light-field and camera-array image processing, there is still a lack of thorough analysis of the localisation accuracy of different multi-camera systems. By considering the problem from a frame-quantisation perspective, we are able to quantify the point localisation error of circular camera configurations. Specifically, we obtain closed form expressions bounding the localisation error in terms of the parameters describing the acquisition setup. These theoretical results are independent of the localisation algorithm and thus provide fundamental limits on performance. Furthermore, the new frame-quantisation perspective is general enough to be extended to more complex camera configurations.
Tasks
Published 2016-02-24
URL http://arxiv.org/abs/1602.07542v1
PDF http://arxiv.org/pdf/1602.07542v1.pdf
PWC https://paperswithcode.com/paper/on-the-accuracy-of-point-localisation-in-a
Repo
Framework

Sieve-based Coreference Resolution in the Biomedical Domain

Title Sieve-based Coreference Resolution in the Biomedical Domain
Authors Dane Bell, Gus Hahn-Powell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu
Abstract We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do not encode domain-specific knowledge such as the number and type of participants required in chemical reactions. Moreover, it is difficult to directly encode this knowledge into most coreference resolution algorithms because they are not rule-based. Our rule-based architecture uses sequentially applied hand-designed “sieves”, with the output of each sieve informing and constraining subsequent sieves. This architecture provides a 3.2% increase in throughput to our Reach event extraction system with precision parallel to that of the stricter system that relies solely on syntactic patterns for extraction.
Tasks Coreference Resolution
Published 2016-03-11
URL http://arxiv.org/abs/1603.03758v2
PDF http://arxiv.org/pdf/1603.03758v2.pdf
PWC https://paperswithcode.com/paper/sieve-based-coreference-resolution-in-the
Repo
Framework

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models

Title Nonlinear Statistical Learning with Truncated Gaussian Graphical Models
Authors Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin
Abstract We introduce the truncated Gaussian graphical model (TGGM) as a novel framework for designing statistical models for nonlinear learning. A TGGM is a Gaussian graphical model (GGM) with a subset of variables truncated to be nonnegative. The truncated variables are assumed latent and integrated out to induce a marginal model. We show that the variables in the marginal model are non-Gaussian distributed and their expected relations are nonlinear. We use expectation-maximization to break the inference of the nonlinear model into a sequence of TGGM inference problems, each of which is efficiently solved by using the properties and numerical methods of multivariate Gaussian distributions. We use the TGGM to design models for nonlinear regression and classification, with the performances of these models demonstrated on extensive benchmark datasets and compared to state-of-the-art competing results.
Tasks
Published 2016-06-02
URL http://arxiv.org/abs/1606.00906v2
PDF http://arxiv.org/pdf/1606.00906v2.pdf
PWC https://paperswithcode.com/paper/nonlinear-statistical-learning-with-truncated
Repo
Framework

A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects

Title A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects
Authors Margret Keuper, Siyu Tang, Yu Zhongjie, Bjoern Andres, Thomas Brox, Bernt Schiele
Abstract Recently, Minimum Cost Multicut Formulations have been proposed and proven to be successful in both motion trajectory segmentation and multi-target tracking scenarios. Both tasks benefit from decomposing a graphical model into an optimal number of connected components based on attractive and repulsive pairwise terms. The two tasks are formulated on different levels of granularity and, accordingly, leverage mostly local information for motion segmentation and mostly high-level information for multi-target tracking. In this paper we argue that point trajectories and their local relationships can contribute to the high-level task of multi-target tracking and also argue that high-level cues from object detection and tracking are helpful to solve motion segmentation. We propose a joint graphical model for point trajectories and object detections whose Multicuts are solutions to motion segmentation {\it and} multi-target tracking problems at once. Results on the FBMS59 motion segmentation benchmark as well as on pedestrian tracking sequences from the 2D MOT 2015 benchmark demonstrate the promise of this joint approach.
Tasks Motion Segmentation, Object Detection
Published 2016-07-21
URL http://arxiv.org/abs/1607.06317v1
PDF http://arxiv.org/pdf/1607.06317v1.pdf
PWC https://paperswithcode.com/paper/a-multi-cut-formulation-for-joint
Repo
Framework

End-to-end optimization of nonlinear transform codes for perceptual quality

Title End-to-end optimization of nonlinear transform codes for perceptual quality
Authors Johannes Ballé, Valero Laparra, Eero P. Simoncelli
Abstract We introduce a general framework for end-to-end optimization of the rate–distortion performance of nonlinear transform codes assuming scalar quantization. The framework can be used to optimize any differentiable pair of analysis and synthesis transforms in combination with any differentiable perceptual metric. As an example, we consider a code built from a linear transform followed by a form of multi-dimensional local gain control. Distortion is measured with a state-of-the-art perceptual metric. When optimized over a large database of images, this representation offers substantial improvements in bitrate and perceptual appearance over fixed (DCT) codes, and over linear transform codes optimized for mean squared error.
Tasks Quantization
Published 2016-07-18
URL http://arxiv.org/abs/1607.05006v2
PDF http://arxiv.org/pdf/1607.05006v2.pdf
PWC https://paperswithcode.com/paper/end-to-end-optimization-of-nonlinear
Repo
Framework

A New Bengali Readability Score

Title A New Bengali Readability Score
Authors Shanta Phani, Shibamouli Lahiri, Arindam Biswas
Abstract In this paper we have proposed methods to analyze the readability of Bengali language texts. We have got some exceptionally good results out of the experiments.
Tasks
Published 2016-07-10
URL http://arxiv.org/abs/1607.05755v4
PDF http://arxiv.org/pdf/1607.05755v4.pdf
PWC https://paperswithcode.com/paper/a-new-bengali-readability-score
Repo
Framework

Differentiable Genetic Programming

Title Differentiable Genetic Programming
Authors Dario Izzo, Francesco Biscani, Alessio Mereta
Abstract We introduce the use of high order automatic differentiation, implemented via the algebra of truncated Taylor polynomials, in genetic programming. Using the Cartesian Genetic Programming encoding we obtain a high-order Taylor representation of the program output that is then used to back-propagate errors during learning. The resulting machine learning framework is called differentiable Cartesian Genetic Programming (dCGP). In the context of symbolic regression, dCGP offers a new approach to the long unsolved problem of constant representation in GP expressions. On several problems of increasing complexity we find that dCGP is able to find the exact form of the symbolic expression as well as the constants values. We also demonstrate the use of dCGP to solve a large class of differential equations and to find prime integrals of dynamical systems, presenting, in both cases, results that confirm the efficacy of our approach.
Tasks
Published 2016-11-15
URL http://arxiv.org/abs/1611.04766v1
PDF http://arxiv.org/pdf/1611.04766v1.pdf
PWC https://paperswithcode.com/paper/differentiable-genetic-programming
Repo
Framework
comments powered by Disqus