Paper Group ANR 557
Detect or Track: Towards Cost-Effective Video Object Detection/Tracking. In-depth Assessment of an Interactive Graph-based Approach for the Segmentation for Pancreatic Metastasis in Ultrasound Acquisitions of the Liver with two Specialists in Internal Medicine. Automatic detection of lumen and media in the IVUS images using U-Net with VGG16 Encoder …
Detect or Track: Towards Cost-Effective Video Object Detection/Tracking
Title | Detect or Track: Towards Cost-Effective Video Object Detection/Tracking |
Authors | Hao Luo, Wenxuan Xie, Xinggang Wang, Wenjun Zeng |
Abstract | State-of-the-art object detectors and trackers are developing fast. Trackers are in general more efficient than detectors but bear the risk of drifting. A question is hence raised – how to improve the accuracy of video object detection/tracking by utilizing the existing detectors and trackers within a given time budget? A baseline is frame skipping – detecting every N-th frames and tracking for the frames in between. This baseline, however, is suboptimal since the detection frequency should depend on the tracking quality. To this end, we propose a scheduler network, which determines to detect or track at a certain frame, as a generalization of Siamese trackers. Although being light-weight and simple in structure, the scheduler network is more effective than the frame skipping baselines and flow-based approaches, as validated on ImageNet VID dataset in video object detection/tracking. |
Tasks | Object Detection, Video Object Detection |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05340v1 |
http://arxiv.org/pdf/1811.05340v1.pdf | |
PWC | https://paperswithcode.com/paper/detect-or-track-towards-cost-effective-video |
Repo | |
Framework | |
In-depth Assessment of an Interactive Graph-based Approach for the Segmentation for Pancreatic Metastasis in Ultrasound Acquisitions of the Liver with two Specialists in Internal Medicine
Title | In-depth Assessment of an Interactive Graph-based Approach for the Segmentation for Pancreatic Metastasis in Ultrasound Acquisitions of the Liver with two Specialists in Internal Medicine |
Authors | Jan Egger, Xiaojun Chen, Lucas Bettac, Mark Hänle, Tilmann Gräter, Wolfram Zoller, Dieter Schmalstieg, Alexander Hann |
Abstract | The manual outlining of hepatic metastasis in (US) ultrasound acquisitions from patients suffering from pancreatic cancer is common practice. However, such pure manual measurements are often very time consuming, and the results repeatedly differ between the raters. In this contribution, we study the in-depth assessment of an interactive graph-based approach for the segmentation for pancreatic metastasis in US images of the liver with two specialists in Internal Medicine. Thereby, evaluating the approach with over one hundred different acquisitions of metastases. The two physicians or the algorithm had never assessed the acquisitions before the evaluation. In summary, the physicians first performed a pure manual outlining followed by an algorithmic segmentation over one month later. As a result, the experts satisfied in up to ninety percent of algorithmic segmentation results. Furthermore, the algorithmic segmentation was much faster than manual outlining and achieved a median Dice Similarity Coefficient (DSC) of over eighty percent. Ultimately, the algorithm enables a fast and accurate segmentation of liver metastasis in clinical US images, which can support the manual outlining in daily practice. |
Tasks | |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04279v1 |
http://arxiv.org/pdf/1803.04279v1.pdf | |
PWC | https://paperswithcode.com/paper/in-depth-assessment-of-an-interactive-graph |
Repo | |
Framework | |
Automatic detection of lumen and media in the IVUS images using U-Net with VGG16 Encoder
Title | Automatic detection of lumen and media in the IVUS images using U-Net with VGG16 Encoder |
Authors | Chirag Balakrishna, Sarshar Dadashzadeh, Sara Soltaninejad |
Abstract | Coronary heart disease is one of the top rank leading cause of mortality in the world which can be because of plaque burden inside the arteries. Intravascular Ultrasound (IVUS) has been recognized as power- ful imaging technology which captures the real time and high resolution images of the coronary arteries and can be used for the analysis of these plaques. The IVUS segmentation involves the extraction of two arterial walls components namely, lumen and media. In this paper, we investi- gate the effectiveness of Convolutional Neural Networks including U-Net to segment ultrasound scans of arteries. In particular, the proposed seg- mentation network was built based on the the U-Net with the VGG16 encoder. Experiments were done for evaluating the proposed segmen- tation architecture which show promising quantitative and qualitative results. |
Tasks | |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07554v1 |
http://arxiv.org/pdf/1806.07554v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-detection-of-lumen-and-media-in-the |
Repo | |
Framework | |
Sequence Labeling: A Practical Approach
Title | Sequence Labeling: A Practical Approach |
Authors | Adnan Akhundov, Dietrich Trautmann, Georg Groh |
Abstract | We take a practical approach to solving sequence labeling problem assuming unavailability of domain expertise and scarcity of informational and computational resources. To this end, we utilize a universal end-to-end Bi-LSTM-based neural sequence labeling model applicable to a wide range of NLP tasks and languages. The model combines morphological, semantic, and structural cues extracted from data to arrive at informed predictions. The model’s performance is evaluated on eight benchmark datasets (covering three tasks: POS-tagging, NER, and Chunking, and four languages: English, German, Dutch, and Spanish). We observe state-of-the-art results on four of them: CoNLL-2012 (English NER), CoNLL-2002 (Dutch NER), GermEval 2014 (German NER), Tiger Corpus (German POS-tagging), and competitive performance on the rest. |
Tasks | Chunking |
Published | 2018-08-12 |
URL | http://arxiv.org/abs/1808.03926v1 |
http://arxiv.org/pdf/1808.03926v1.pdf | |
PWC | https://paperswithcode.com/paper/sequence-labeling-a-practical-approach |
Repo | |
Framework | |
Pack and Detect: Fast Object Detection in Videos Using Region-of-Interest Packing
Title | Pack and Detect: Fast Object Detection in Videos Using Region-of-Interest Packing |
Authors | Athindran Ramesh Kumar, Balaraman Ravindran, Anand Raghunathan |
Abstract | Object detection in videos is an important task in computer vision for various applications such as object tracking, video summarization and video search. Although great progress has been made in improving the accuracy of object detection in recent years due to the rise of deep neural networks, the state-of-the-art algorithms are highly computationally intensive. In order to address this challenge, we make two important observations in the context of videos: (i) Objects often occupy only a small fraction of the area in each video frame, and (ii) There is a high likelihood of strong temporal correlation between consecutive frames. Based on these observations, we propose Pack and Detect (PaD), an approach to reduce the computational requirements of object detection in videos. In PaD, only selected video frames called anchor frames are processed at full size. In the frames that lie between anchor frames (inter-anchor frames), regions of interest (ROIs) are identified based on the detections in the previous frame. We propose an algorithm to pack the ROIs of each inter-anchor frame together into a reduced-size frame. The computational requirements of the detector are reduced due to the lower size of the input. In order to maintain the accuracy of object detection, the proposed algorithm expands the ROIs greedily to provide additional background around each object to the detector. PaD can use any underlying neural network architecture to process the full-size and reduced-size frames. Experiments using the ImageNet video object detection dataset indicate that PaD can potentially reduce the number of FLOPS required for a frame by $4\times$. This leads to an overall increase in throughput of $1.25\times$ on a 2.1 GHz Intel Xeon server with a NVIDIA Titan X GPU at the cost of $1.1%$ drop in accuracy. |
Tasks | Object Detection, Object Tracking, Real-Time Object Detection, Video Object Detection, Video Summarization |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01701v3 |
http://arxiv.org/pdf/1809.01701v3.pdf | |
PWC | https://paperswithcode.com/paper/pack-and-detect-fast-object-detection-in |
Repo | |
Framework | |
Inherent Brain Segmentation Quality Control from Fully ConvNet Monte Carlo Sampling
Title | Inherent Brain Segmentation Quality Control from Fully ConvNet Monte Carlo Sampling |
Authors | Abhijit Guha Roy, Sailesh Conjeti, Nassir Navab, Christian Wachinger |
Abstract | We introduce inherent measures for effective quality control of brain segmentation based on a Bayesian fully convolutional neural network, using model uncertainty. Monte Carlo samples from the posterior distribution are efficiently generated using dropout at test time. Based on these samples, we introduce next to a voxel-wise uncertainty map also three metrics for structure-wise uncertainty. We then incorporate these structure-wise uncertainty in group analyses as a measure of confidence in the observation. Our results show that the metrics are highly correlated to segmentation accuracy and therefore present an inherent measure of segmentation quality. Furthermore, group analysis with uncertainty results in effect sizes closer to that of manual annotations. The introduced uncertainty metrics can not only be very useful in translation to clinical practice but also provide automated quality control and group analyses in processing large data repositories. |
Tasks | Brain Segmentation |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.07046v2 |
http://arxiv.org/pdf/1804.07046v2.pdf | |
PWC | https://paperswithcode.com/paper/inherent-brain-segmentation-quality-control |
Repo | |
Framework | |
Catch and Prolong: recurrent neural network for seeking track-candidates
Title | Catch and Prolong: recurrent neural network for seeking track-candidates |
Authors | Dmitriy Baranov, Gennady Ososkov, Pavel Goncharov, Andrei Tsytrinov |
Abstract | One of the most important problems of data processing in high energy and nuclear physics is the event reconstruction. Its main part is the track reconstruction procedure which consists in looking for all tracks that elementary particles leave when they pass through a detector among a huge number of points, so-called hits, produced when flying particles fire detector coordinate planes. Unfortunately, the tracking is seriously impeded by the famous shortcoming of multiwired, strip and GEM detectors due to appearance in them a lot of fake hits caused by extra spurious crossings of fired strips. Since the number of those fakes is several orders of magnitude greater than for true hits, one faces with the quite serious difficulty to unravel possible track-candidates via true hits ignoring fakes. We introduce a renewed method that is a significant improvement of our previous two-stage approach based on hit preprocessing using directed K-d tree search followed a deep neural classifier. We combine these two stages in one by applying recurrent neural network that simultaneously determines whether a set of points belongs to a true track or not and predicts where to look for the next point of track on the next coordinate plane of the detector. We show that proposed deep network is more accurate, faster and does not require any special preprocessing stage. Preliminary results of our approach for simulated events of the BM@N GEM detector are presented. |
Tasks | |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.06002v1 |
http://arxiv.org/pdf/1811.06002v1.pdf | |
PWC | https://paperswithcode.com/paper/catch-and-prolong-recurrent-neural-network |
Repo | |
Framework | |
Beyond Attributes: Adversarial Erasing Embedding Network for Zero-shot Learning
Title | Beyond Attributes: Adversarial Erasing Embedding Network for Zero-shot Learning |
Authors | Xiao-Bo Jin, Kai-Zhu Huang, Jianyu Miao |
Abstract | In this paper, an adversarial erasing embedding network with the guidance of high-order attributes (AEEN-HOA) is proposed for going further to solve the challenging ZSL/GZSL task. AEEN-HOA consists of two branches, i.e., the upper stream is capable of erasing some initially discovered regions, then the high-order attribute supervision is incorporated to characterize the relationship between the class attributes. Meanwhile, the bottom stream is trained by taking the current background regions to train the same attribute. As far as we know, it is the first time of introducing the erasing operations into the ZSL task. In addition, we first propose a class attribute activation map for the visualization of ZSL output, which shows the relationship between class attribute feature and attention map. Experiments on four standard benchmark datasets demonstrate the superiority of AEEN-HOA framework. |
Tasks | Zero-Shot Learning |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07626v2 |
http://arxiv.org/pdf/1811.07626v2.pdf | |
PWC | https://paperswithcode.com/paper/beyond-attributes-adversarial-erasing |
Repo | |
Framework | |
A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling
Title | A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling |
Authors | Zhijian Ou |
Abstract | This document aims to provide a review on learning with deep generative models (DGMs), which is an highly-active area in machine learning and more generally, artificial intelligence. This review is not meant to be a tutorial, but when necessary, we provide self-contained derivations for completeness. This review has two features. First, though there are different perspectives to classify DGMs, we choose to organize this review from the perspective of graphical modeling, because the learning methods for directed DGMs and undirected DGMs are fundamentally different. Second, we differentiate model definitions from model learning algorithms, since different learning algorithms can be applied to solve the learning problem on the same model, and an algorithm can be applied to learn different models. We thus separate model definition and model learning, with more emphasis on reviewing, differentiating and connecting different learning algorithms. We also discuss promising future research directions. |
Tasks | |
Published | 2018-08-05 |
URL | http://arxiv.org/abs/1808.01630v4 |
http://arxiv.org/pdf/1808.01630v4.pdf | |
PWC | https://paperswithcode.com/paper/a-review-of-learning-with-deep-generative |
Repo | |
Framework | |
Automated Evaluation of Semantic Segmentation Robustness for Autonomous Driving
Title | Automated Evaluation of Semantic Segmentation Robustness for Autonomous Driving |
Authors | Wei Zhou, Julie Stephany Berrio, Stewart Worrall, Eduardo Nebot |
Abstract | One of the fundamental challenges in the design of perception systems for autonomous vehicles is validating the performance of each algorithm under a comprehensive variety of operating conditions. In the case of vision-based semantic segmentation, there are known issues when encountering new scenarios that are sufficiently different to the training data. In addition, even small variations in environmental conditions such as illumination and precipitation can affect the classification performance of the segmentation model. Given the reliance on visual information, these effects often translate into poor semantic pixel classification which can potentially lead to catastrophic consequences when driving autonomously. This paper presents a novel method for analysing the robustness of semantic segmentation models and provides a number of metrics to evaluate the classification performance over a variety of environmental conditions. The process incorporates an additional sensor (lidar) to automate the process, eliminating the need for labour-intensive hand labelling of validation data. The system integrity can be monitored as the performance of the vision sensors are validated against a different sensor modality. This is necessary for detecting failures that are inherent to vision technology. Experimental results are presented based on multiple datasets collected at different times of the year with different environmental conditions. These results show that the semantic segmentation performance varies depending on the weather, camera parameters, existence of shadows, etc.. The results also demonstrate how the metrics can be used to compare and validate the performance after making improvements to a model, and compare the performance of different networks. |
Tasks | Autonomous Driving, Autonomous Vehicles, Semantic Segmentation |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10193v1 |
http://arxiv.org/pdf/1810.10193v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-evaluation-of-semantic-segmentation |
Repo | |
Framework | |
Learning to Teach in Cooperative Multiagent Reinforcement Learning
Title | Learning to Teach in Cooperative Multiagent Reinforcement Learning |
Authors | Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, Jonathan P. How |
Abstract | Collective human knowledge has clearly benefited from the fact that innovations by individuals are taught to others through communication. Similar to human social groups, agents in distributed learning systems would likely benefit from communication to share knowledge and teach skills. The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to. This learning to teach problem has inherent complexities related to measuring long-term impacts of teaching that compound the standard multiagent coordination challenges. In contrast to existing works, this paper presents the first general framework and algorithm for intelligent agents to learn to teach in a multiagent environment. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), addresses peer-to-peer teaching in cooperative multiagent reinforcement learning. Each agent in our approach learns both when and what to advise, then uses the received advice to improve local learning. Importantly, these roles are not fixed; these agents learn to assume the role of student and/or teacher at the appropriate moments, requesting and providing advice in order to improve teamwide performance and learning. Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail. |
Tasks | |
Published | 2018-05-20 |
URL | http://arxiv.org/abs/1805.07830v4 |
http://arxiv.org/pdf/1805.07830v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-teach-in-cooperative-multiagent |
Repo | |
Framework | |
Day-ahead electricity price forecasting with high-dimensional structures: Univariate vs. multivariate modeling frameworks
Title | Day-ahead electricity price forecasting with high-dimensional structures: Univariate vs. multivariate modeling frameworks |
Authors | Florian Ziel, Rafal Weron |
Abstract | We conduct an extensive empirical study on short-term electricity price forecasting (EPF) to address the long-standing question if the optimal model structure for EPF is univariate or multivariate. We provide evidence that despite a minor edge in predictive performance overall, the multivariate modeling framework does not uniformly outperform the univariate one across all 12 considered datasets, seasons of the year or hours of the day, and at times is outperformed by the latter. This is an indication that combining advanced structures or the corresponding forecasts from both modeling approaches can bring a further improvement in forecasting accuracy. We show that this indeed can be the case, even for a simple averaging scheme involving only two models. Finally, we also analyze variable selection for the best performing high-dimensional lasso-type models, thus provide guidelines to structuring better performing forecasting model designs. |
Tasks | |
Published | 2018-05-17 |
URL | http://arxiv.org/abs/1805.06649v1 |
http://arxiv.org/pdf/1805.06649v1.pdf | |
PWC | https://paperswithcode.com/paper/day-ahead-electricity-price-forecasting-with |
Repo | |
Framework | |
Revisiting Pre-training: An Efficient Training Method for Image Classification
Title | Revisiting Pre-training: An Efficient Training Method for Image Classification |
Authors | Bowen Cheng, Yunchao Wei, Honghui Shi, Shiyu Chang, Jinjun Xiong, Thomas S. Huang |
Abstract | The training method of repetitively feeding all samples into a pre-defined network for image classification has been widely adopted by current state-of-the-art. In this work, we provide a new method, which can be leveraged to train classification networks in a more efficient way. Starting with a warm-up step, we propose to continually repeat a Drop-and-Pick (DaP) learning strategy. In particular, we drop those easy samples to encourage the network to focus on studying hard ones. Meanwhile, by picking up all samples periodically during training, we aim to recall the memory of the networks to prevent catastrophic forgetting of previously learned knowledge. Our DaP learning method can recover 99.88%, 99.60%, 99.83% top-1 accuracy on ImageNet for ResNet-50, DenseNet-121, and MobileNet-V1 but only requires 75% computation in training compared to those using the classic training schedule. Furthermore, our pre-trained models are equipped with strong knowledge transferability when used for downstream tasks, especially for hard cases. Extensive experiments on object detection, instance segmentation and pose estimation can well demonstrate the effectiveness of our DaP training method. |
Tasks | Image Classification, Instance Segmentation, Object Detection, Pose Estimation, Semantic Segmentation |
Published | 2018-11-23 |
URL | http://arxiv.org/abs/1811.09347v1 |
http://arxiv.org/pdf/1811.09347v1.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-pre-training-an-efficient-training |
Repo | |
Framework | |
Differentially Private Empirical Risk Minimization Revisited: Faster and More General
Title | Differentially Private Empirical Risk Minimization Revisited: Faster and More General |
Authors | Di Wang, Minwei Ye, Jinhui Xu |
Abstract | In this paper we study the differentially private Empirical Risk Minimization (ERM) problem in different settings. For smooth (strongly) convex loss function with or without (non)-smooth regularization, we give algorithms that achieve either optimal or near optimal utility bounds with less gradient complexity compared with previous work. For ERM with smooth convex loss function in high-dimensional ($p\gg n$) setting, we give an algorithm which achieves the upper bound with less gradient complexity than previous ones. At last, we generalize the expected excess empirical risk from convex loss functions to non-convex ones satisfying the Polyak-Lojasiewicz condition and give a tighter upper bound on the utility than the one in \cite{ijcai2017-548}. |
Tasks | |
Published | 2018-02-14 |
URL | http://arxiv.org/abs/1802.05251v1 |
http://arxiv.org/pdf/1802.05251v1.pdf | |
PWC | https://paperswithcode.com/paper/differentially-private-empirical-risk-2 |
Repo | |
Framework | |
MIMO Graph Filters for Convolutional Neural Networks
Title | MIMO Graph Filters for Convolutional Neural Networks |
Authors | Fernando Gama, Antonio G. Marques, Alejandro Ribeiro, Geert Leus |
Abstract | Superior performance and ease of implementation have fostered the adoption of Convolutional Neural Networks (CNNs) for a wide array of inference and reconstruction tasks. CNNs implement three basic blocks: convolution, pooling and pointwise nonlinearity. Since the two first operations are well-defined only on regular-structured data such as audio or images, application of CNNs to contemporary datasets where the information is defined in irregular domains is challenging. This paper investigates CNNs architectures to operate on signals whose support can be modeled using a graph. Architectures that replace the regular convolution with a so-called linear shift-invariant graph filter have been recently proposed. This paper goes one step further and, under the framework of multiple-input multiple-output (MIMO) graph filters, imposes additional structure on the adopted graph filters, to obtain three new (more parsimonious) architectures. The proposed architectures result in a lower number of model parameters, reducing the computational complexity, facilitating the training, and mitigating the risk of overfitting. Simulations show that the proposed simpler architectures achieve similar performance as more complex models. |
Tasks | |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02247v1 |
http://arxiv.org/pdf/1803.02247v1.pdf | |
PWC | https://paperswithcode.com/paper/mimo-graph-filters-for-convolutional-neural |
Repo | |
Framework | |