July 27, 2019

3292 words 16 mins read

Paper Group ANR 468

Heuristic Search for Structural Constraints in Data Association. Variational Gaussian Approximation for Poisson Data. Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking. Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Evaluating Word Embeddings for S …

Heuristic Search for Structural Constraints in Data Association


Title	Heuristic Search for Structural Constraints in Data Association
Authors	Xiao Zhou, Peilin Jiang, Fei Wang
Abstract	The research on multi-object tracking (MOT) is essentially to solve for the data association assignment, the core of which is to design the association cost as discriminative as possible. Generally speaking, the match ambiguities caused by similar appearances of objects and the moving cameras make the data association perplexing and challenging. In this paper, we propose a new heuristic method to search for structural constraints (HSSC) of multiple targets when solving the problem of online multi-object tracking. We believe that the internal structure among multiple targets in the adjacent frames could remain constant and stable even though the video sequences are captured by a moving camera. As a result, the structural constraints are able to cut down the match ambiguities caused by the moving cameras as well as similar appearances of the tracked objects. The proposed heuristic method aims to obtain a maximum match set under the minimum structural cost for each available match pair, which can be integrated with the raw association costs and make them more elaborate and discriminative compared with other approaches. In addition, this paper presents a new method to recover missing targets by minimizing the cost function generated from both motion and structure cues. Our online multi-object tracking (MOT) algorithm based on HSSC has achieved the multi-object tracking accuracy (MOTA) of 25.0 on the public dataset 2DMOT2015[1].
Tasks	Multi-Object Tracking, Object Tracking, Online Multi-Object Tracking
Published	2017-11-08
URL	http://arxiv.org/abs/1711.02823v1
PDF	http://arxiv.org/pdf/1711.02823v1.pdf
PWC	https://paperswithcode.com/paper/heuristic-search-for-structural-constraints
Repo
Framework

Variational Gaussian Approximation for Poisson Data


Title	Variational Gaussian Approximation for Poisson Data
Authors	Simon Arridge, Kazufumi Ito, Bangti Jin, Chen Zhang
Abstract	The Poisson model is frequently employed to describe count data, but in a Bayesian context it leads to an analytically intractable posterior probability distribution. In this work, we analyze a variational Gaussian approximation to the posterior distribution arising from the Poisson model with a Gaussian prior. This is achieved by seeking an optimal Gaussian distribution minimizing the Kullback-Leibler divergence from the posterior distribution to the approximation, or equivalently maximizing the lower bound for the model evidence. We derive an explicit expression for the lower bound, and show the existence and uniqueness of the optimal Gaussian approximation. The lower bound functional can be viewed as a variant of classical Tikhonov regularization that penalizes also the covariance. Then we develop an efficient alternating direction maximization algorithm for solving the optimization problem, and analyze its convergence. We discuss strategies for reducing the computational complexity via low rank structure of the forward operator and the sparsity of the covariance. Further, as an application of the lower bound, we discuss hierarchical Bayesian modeling for selecting the hyperparameter in the prior distribution, and propose a monotonically convergent algorithm for determining the hyperparameter. We present extensive numerical experiments to illustrate the Gaussian approximation and the algorithms.
Tasks
Published	2017-09-18
URL	http://arxiv.org/abs/1709.05885v1
PDF	http://arxiv.org/pdf/1709.05885v1.pdf
PWC	https://paperswithcode.com/paper/variational-gaussian-approximation-for
Repo
Framework

Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking


Title	Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking
Authors	Aljoša Ošep, Wolfgang Mehner, Paul Voigtlaender, Bastian Leibe
Abstract	The most common paradigm for vision-based multi-object tracking is tracking-by-detection, due to the availability of reliable detectors for several important object categories such as cars and pedestrians. However, future mobile systems will need a capability to cope with rich human-made environments, in which obtaining detectors for every possible object category would be infeasible. In this paper, we propose a model-free multi-object tracking approach that uses a category-agnostic image segmentation method to track objects. We present an efficient segmentation mask-based tracker which associates pixel-precise masks reported by the segmentation. Our approach can utilize semantic information whenever it is available for classifying objects at the track level, while retaining the capability to track generic unknown objects in the absence of such information. We demonstrate experimentally that our approach achieves performance comparable to state-of-the-art tracking-by-detection methods for popular object categories such as cars and pedestrians. Additionally, we show that the proposed method can discover and robustly track a large variety of other objects.
Tasks	Multi-Object Tracking, Object Tracking, Semantic Segmentation
Published	2017-12-21
URL	http://arxiv.org/abs/1712.07920v1
PDF	http://arxiv.org/pdf/1712.07920v1.pdf
PWC	https://paperswithcode.com/paper/track-then-decide-category-agnostic-vision
Repo
Framework

Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy


Title	Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy
Authors	Jonathan Krause, Varun Gulshan, Ehsan Rahimy, Peter Karth, Kasumi Widner, Greg S. Corrado, Lily Peng, Dale R. Webster
Abstract	Diabetic retinopathy (DR) and diabetic macular edema are common complications of diabetes which can lead to vision loss. The grading of DR is a fairly complex process that requires the detection of fine features such as microaneurysms, intraretinal hemorrhages, and intraretinal microvascular abnormalities. Because of this, there can be a fair amount of grader variability. There are different methods of obtaining the reference standard and resolving disagreements between graders, and while it is usually accepted that adjudication until full consensus will yield the best reference standard, the difference between various methods of resolving disagreements has not been examined extensively. In this study, we examine the variability in different methods of grading, definitions of reference standards, and their effects on building deep learning models for the detection of diabetic eye disease. We find that a small set of adjudicated DR grades allows substantial improvements in algorithm performance. The resulting algorithm’s performance was on par with that of individual U.S. board-certified ophthalmologists and retinal specialists.
Tasks
Published	2017-10-04
URL	http://arxiv.org/abs/1710.01711v3
PDF	http://arxiv.org/pdf/1710.01711v3.pdf
PWC	https://paperswithcode.com/paper/grader-variability-and-the-importance-of
Repo
Framework

Evaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts


Title	Evaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts
Authors	Marcos V. Treviso, Christopher D. Shulby, Sandra M. Aluisio
Abstract	This paper is motivated by the automation of neuropsychological tests involving discourse analysis in the retellings of narratives by patients with potential cognitive impairment. In this scenario the task of sentence boundary detection in speech transcripts is important as discourse analysis involves the application of Natural Language Processing tools, such as taggers and parsers, which depend on the sentence as a processing unit. Our aim in this paper is to verify which embedding induction method works best for the sentence boundary detection task, specifically whether it be those which were proposed to capture semantic, syntactic or morphological similarities.
Tasks	Boundary Detection, Word Embeddings
Published	2017-08-15
URL	http://arxiv.org/abs/1708.04704v1
PDF	http://arxiv.org/pdf/1708.04704v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-word-embeddings-for-sentence-1
Repo
Framework

Recurrent Autoregressive Networks for Online Multi-Object Tracking


Title	Recurrent Autoregressive Networks for Online Multi-Object Tracking
Authors	Kuan Fang, Yu Xiang, Xiaocheng Li, Silvio Savarese
Abstract	The main challenge of online multi-object tracking is to reliably associate object trajectories with detections in each video frame based on their tracking history. In this work, we propose the Recurrent Autoregressive Network (RAN), a temporal generative modeling framework to characterize the appearance and motion dynamics of multiple objects over time. The RAN couples an external memory and an internal memory. The external memory explicitly stores previous inputs of each trajectory in a time window, while the internal memory learns to summarize long-term tracking history and associate detections by processing the external memory. We conduct experiments on the MOT 2015 and 2016 datasets to demonstrate the robustness of our tracking method in highly crowded and occluded scenes. Our method achieves top-ranked results on the two benchmarks.
Tasks	Multi-Object Tracking, Object Tracking, Online Multi-Object Tracking
Published	2017-11-07
URL	http://arxiv.org/abs/1711.02741v2
PDF	http://arxiv.org/pdf/1711.02741v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-autoregressive-networks-for-online
Repo
Framework

Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism


Title	Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism
Authors	Qi Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang, Bin Liu, Nenghai Yu
Abstract	In this paper, we propose a CNN-based framework for online MOT. This framework utilizes the merits of single object trackers in adapting appearance models and searching for target in the next frame. Simply applying single object tracker for MOT will encounter the problem in computational efficiency and drifted results caused by occlusion. Our framework achieves computational efficiency by sharing features and using ROI-Pooling to obtain individual features for each target. Some online learned target-specific CNN layers are used for adapting the appearance model for each target. In the framework, we introduce spatial-temporal attention mechanism (STAM) to handle the drift caused by occlusion and interaction among targets. The visibility map of the target is learned and used for inferring the spatial attention map. The spatial attention map is then applied to weight the features. Besides, the occlusion status can be estimated from the visibility map, which controls the online updating process via weighted loss on training samples with different occlusion statuses in different frames. It can be considered as temporal attention mechanism. The proposed algorithm achieves 34.3% and 46.0% in MOTA on challenging MOT15 and MOT16 benchmark dataset respectively.
Tasks	Multi-Object Tracking, Object Tracking, Online Multi-Object Tracking
Published	2017-08-09
URL	http://arxiv.org/abs/1708.02843v2
PDF	http://arxiv.org/pdf/1708.02843v2.pdf
PWC	https://paperswithcode.com/paper/online-multi-object-tracking-using-cnn-based
Repo
Framework

Deep Learning for Multi-Task Medical Image Segmentation in Multiple Modalities


Title	Deep Learning for Multi-Task Medical Image Segmentation in Multiple Modalities
Authors	Pim Moeskops, Jelmer M. Wolterink, Bas H. M. van der Velden, Kenneth G. A. Gilhuijs, Tim Leiner, Max A. Viergever, Ivana Išgum
Abstract	Automatic segmentation of medical images is an important task for many clinical applications. In practice, a wide range of anatomical structures are visualised using different imaging modalities. In this paper, we investigate whether a single convolutional neural network (CNN) can be trained to perform different segmentation tasks. A single CNN is trained to segment six tissues in MR brain images, the pectoral muscle in MR breast images, and the coronary arteries in cardiac CTA. The CNN therefore learns to identify the imaging modality, the visualised anatomical structures, and the tissue classes. For each of the three tasks (brain MRI, breast MRI and cardiac CTA), this combined training procedure resulted in a segmentation performance equivalent to that of a CNN trained specifically for that task, demonstrating the high capacity of CNN architectures. Hence, a single system could be used in clinical practice to automatically perform diverse segmentation tasks without task-specific training.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03379v1
PDF	http://arxiv.org/pdf/1704.03379v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-multi-task-medical-image
Repo
Framework

Hierarchical Convolutional-Deconvolutional Neural Networks for Automatic Liver and Tumor Segmentation


Title	Hierarchical Convolutional-Deconvolutional Neural Networks for Automatic Liver and Tumor Segmentation
Authors	Yading Yuan
Abstract	Automatic segmentation of liver and its tumors is an essential step for extracting quantitative imaging biomarkers for accurate tumor detection, diagnosis, prognosis and assessment of tumor response to treatment. MICCAI 2017 Liver Tumor Segmentation Challenge (LiTS) provides a common platform for comparing different automatic algorithms on contrast-enhanced abdominal CT images in tasks including 1) liver segmentation, 2) liver tumor segmentation, and 3) tumor burden estimation. We participate this challenge by developing a hierarchical framework based on deep fully convolutional-deconvolutional neural networks (CDNN). A simple CDNN model is firstly trained to provide a quick but coarse segmentation of the liver on the entire CT volume, then another CDNN is applied to the liver region for fine liver segmentation. At last, the segmented liver region, which is enhanced by histogram equalization, is employed as an additional input to the third CDNN for tumor segmentation. Jaccard distance is used as loss function when training CDNN models to eliminate the need of sample re-weighting. Our framework is trained using the 130 challenge training cases provided by LiTS. The evaluation on the 70 challenge testing cases resulted in a mean Dice Similarity Coefficient (DSC) of 0.963 for liver segmentation, a mean DSC of 0.657 for tumor segmentation, and a root mean square error (RMSE) of 0.017 for tumor burden estimation, which ranked our method in the first, fifth and third place, respectively
Tasks	Automatic Liver And Tumor Segmentation, Liver Segmentation
Published	2017-10-12
URL	http://arxiv.org/abs/1710.04540v1
PDF	http://arxiv.org/pdf/1710.04540v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-convolutional-deconvolutional
Repo
Framework

“Maximizing rigidity” revisited: a convex programming approach for generic 3D shape reconstruction from multiple perspective views


Title	“Maximizing rigidity” revisited: a convex programming approach for generic 3D shape reconstruction from multiple perspective views
Authors	Pan Ji, Hongdong Li, Yuchao Dai, Ian Reid
Abstract	Rigid structure-from-motion (RSfM) and non-rigid structure-from-motion (NRSfM) have long been treated in the literature as separate (different) problems. Inspired by a previous work which solved directly for 3D scene structure by factoring the relative camera poses out, we revisit the principle of “maximizing rigidity” in structure-from-motion literature, and develop a unified theory which is applicable to both rigid and non-rigid structure reconstruction in a rigidity-agnostic way. We formulate these problems as a convex semi-definite program, imposing constraints that seek to apply the principle of minimizing non-rigidity. Our results demonstrate the efficacy of the approach, with state-of-the-art accuracy on various 3D reconstruction problems.
Tasks	3D Reconstruction
Published	2017-07-17
URL	http://arxiv.org/abs/1707.05009v1
PDF	http://arxiv.org/pdf/1707.05009v1.pdf
PWC	https://paperswithcode.com/paper/maximizing-rigidity-revisited-a-convex
Repo
Framework

STAR-RT: Visual attention for real-time video game playing


Title	STAR-RT: Visual attention for real-time video game playing
Authors	Iuliia Kotseruba, John K. Tsotsos
Abstract	In this paper we present STAR-RT - the first working prototype of Selective Tuning Attention Reference (STAR) model and Cognitive Programs (CPs). The Selective Tuning (ST) model received substantial support through psychological and neurophysiological experiments. The STAR framework expands ST and applies it to practical visual tasks. In order to do so, similarly to many cognitive architectures, STAR combines the visual hierarchy (based on ST) with the executive controller, working and short-term memory components and fixation controller. CPs in turn enable the communication among all these elements for visual task execution. To test the relevance of the system in a realistic context, we implemented the necessary components of STAR and designed CPs for playing two closed-source video games - Canabaltand Robot Unicorn Attack. Since both games run in a browser window, our algorithm has the same amount of information and the same amount of time to react to the events on the screen as a human player would. STAR-RT plays both games in real time using only visual input and achieves scores comparable to human expert players. It thus provides an existence proof for the utility of the particular CP structure and primitives used and the potential for continued experimentation and verification of their utility in broader scenarios.
Tasks
Published	2017-11-26
URL	http://arxiv.org/abs/1711.09464v1
PDF	http://arxiv.org/pdf/1711.09464v1.pdf
PWC	https://paperswithcode.com/paper/star-rt-visual-attention-for-real-time-video
Repo
Framework

A Partitioning Algorithm for Detecting Eventuality Coincidence in Temporal Double recurrence


Title	A Partitioning Algorithm for Detecting Eventuality Coincidence in Temporal Double recurrence
Authors	B. O. Akinkunmi
Abstract	A logical theory of regular double or multiple recurrence of eventualities, which are regular patterns of occurrences that are repeated, in time, has been developed within the context of temporal reasoning that enabled reasoning about the problem of coincidence. i.e. if two complex eventualities, or eventuality sequences consisting respectively of component eventualities x0, x1,….,xr and y0, y1, ..,ys both recur over an interval k and all eventualities are of fixed durations, is there a subinterval of k over which the occurrence xp and yq for p between 1 and r and q between 1 and s coincide. We present the ideas behind a new algorithm for detecting the coincidence of eventualities xp and yq within a cycle of the double recurrence of x and y. The algorithm is based on the novel concept of gcd partitions that requires the partitioning of each of the incidences of both x and y into eventuality sequences each of which components have a duration that is equal to the greatest common divisor of the durations of x and y. The worst case running time of the partitioning algorithm is linear in the maximum of the duration of x and that of y, while the worst case running time of an algorithm exploring a complete cycle is quadratic in the durations of x and y. Hence the partitioning algorithm works faster than the cyclical exploration in the worst case.
Tasks
Published	2017-04-29
URL	http://arxiv.org/abs/1705.00211v1
PDF	http://arxiv.org/pdf/1705.00211v1.pdf
PWC	https://paperswithcode.com/paper/a-partitioning-algorithm-for-detecting
Repo
Framework

A novel method for automatic localization of joint area on knee plain radiographs


Title	A novel method for automatic localization of joint area on knee plain radiographs
Authors	Aleksei Tiulpin, Jérôme Thevenot, Esa Rahtu, Simo Saarakkala
Abstract	Osteoarthritis (OA) is a common musculoskeletal condition typically diagnosed from radiographic assessment after clinical examination. However, a visual evaluation made by a practitioner suffers from subjectivity and is highly dependent on the experience. Computer-aided diagnostics (CAD) could improve the objectivity of knee radiographic examination. The first essential step of knee OA CAD is to automatically localize the joint area. However, according to the literature this task itself remains challenging. The aim of this study was to develop novel and computationally efficient method to tackle the issue. Here, three different datasets of knee radiographs were used (n = 473/93/77) to validate the overall performance of the method. Our pipeline consists of two parts: anatomically-based joint area proposal and their evaluation using Histogram of Oriented Gradients and the pre-trained Support Vector Machine classifier scores. The obtained results for the used datasets show the mean intersection over the union equal to: 0.84, 0.79 and 0.78. Using a high-end computer, the method allows to automatically annotate conventional knee radiographs within 14-16ms and high resolution ones within 170ms. Our results demonstrate that the developed method is suitable for large-scale analyses.
Tasks
Published	2017-01-31
URL	http://arxiv.org/abs/1701.08991v3
PDF	http://arxiv.org/pdf/1701.08991v3.pdf
PWC	https://paperswithcode.com/paper/a-novel-method-for-automatic-localization-of
Repo
Framework

Learning A Physical Long-term Predictor


Title	Learning A Physical Long-term Predictor
Authors	Sebastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi
Abstract	Evolution has resulted in highly developed abilities in many natural intelligences to quickly and accurately predict mechanical phenomena. Humans have successfully developed laws of physics to abstract and model such mechanical phenomena. In the context of artificial intelligence, a recent line of work has focused on estimating physical parameters based on sensory data and use them in physical simulators to make long-term predictions. In contrast, we investigate the effectiveness of a single neural network for end-to-end long-term prediction of mechanical phenomena. Based on extensive evaluation, we demonstrate that such networks can outperform alternate approaches having even access to ground-truth physical simulators, especially when some physical parameters are unobserved or not known a-priori. Further, our network outputs a distribution of outcomes to capture the inherent uncertainty in the data. Our approach demonstrates for the first time the possibility of making actionable long-term predictions from sensor data without requiring to explicitly model the underlying physical laws.
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00247v1
PDF	http://arxiv.org/pdf/1703.00247v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-physical-long-term-predictor
Repo
Framework

SSGAN: Secure Steganography Based on Generative Adversarial Networks


Title	SSGAN: Secure Steganography Based on Generative Adversarial Networks
Authors	Haichao Shi, Jing Dong, Wei Wang, Yinlong Qian, Xiaoyu Zhang
Abstract	In this paper, a novel strategy of Secure Steganograpy based on Generative Adversarial Networks is proposed to generate suitable and secure covers for steganography. The proposed architecture has one generative network, and two discriminative networks. The generative network mainly evaluates the visual quality of the generated images for steganography, and the discriminative networks are utilized to assess their suitableness for information hiding. Different from the existing work which adopts Deep Convolutional Generative Adversarial Networks, we utilize another form of generative adversarial networks. By using this new form of generative adversarial networks, significant improvements are made on the convergence speed, the training stability and the image quality. Furthermore, a sophisticated steganalysis network is reconstructed for the discriminative network, and the network can better evaluate the performance of the generated images. Numerous experiments are conducted on the publicly available datasets to demonstrate the effectiveness and robustness of the proposed method.
Tasks
Published	2017-07-06
URL	http://arxiv.org/abs/1707.01613v4
PDF	http://arxiv.org/pdf/1707.01613v4.pdf
PWC	https://paperswithcode.com/paper/ssgan-secure-steganography-based-on
Repo
Framework