October 17, 2019

3433 words 17 mins read

Paper Group ANR 853

Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete Annotation and Data Imbalance. Automatic Identification of Arabic expressions related to future events in Lebanon’s economy. Designing the Game to Play: Optimizing Payoff Structure in Security Games. Generation Meets Recommendation: Proposing Novel Items for Groups of U …

Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete Annotation and Data Imbalance


Title	Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete Annotation and Data Imbalance
Authors	Yuan Gao, Xingyuan Bu, Yang Hu, Hui Shen, Ti Bai, Xubin Li, Shilei Wen
Abstract	This report demonstrates our solution for the Open Images 2018 Challenge. Based on our detailed analysis on the Open Images Datasets (OID), it is found that there are four typical features: large-scale, hierarchical tag system, severe annotation incompleteness and data imbalance. Considering these characteristics, an amount of strategies are employed, including SNIPER, soft sampling, class-aware sampling (CAS), hierarchical non-maximum suppression (HNMS) and so on. In virtue of these effective strategies, and further using the powerful SENet154 armed with feature pyramid module and deformable ROIalign as the backbone, our best single model could achieve a mAP of 56.9%. After a further ensemble with 9 models, the final mAP is boosted to 62.2% in the public leaderboard (ranked the 2nd place) and 58.6% in the private leaderboard (ranked the 3rd place, slightly inferior to the 1st place by only 0.04 point).
Tasks	Object Detection
Published	2018-10-15
URL	http://arxiv.org/abs/1810.06208v1
PDF	http://arxiv.org/pdf/1810.06208v1.pdf
PWC	https://paperswithcode.com/paper/solution-for-large-scale-hierarchical-object
Repo
Framework


Title	Automatic Identification of Arabic expressions related to future events in Lebanon’s economy
Authors	Moustafa Al-Hajj, Amani Sabra
Abstract	In this paper, we propose a method to automatically identify future events in Lebanon’s economy from Arabic texts. Challenges are threefold: first, we need to build a corpus of Arabic texts that covers Lebanon’s economy; second, we need to study how future events are expressed linguistically in these texts; and third, we need to automatically identify the relevant textual segments accordingly. We will validate this method on a constructed corpus form the web and show that it has very promising results. To do so, we will be using SLCSAS, a system for semantic analysis, based on the Contextual Explorer method, and “AlKhalil Morpho Sys” system for morpho-syntactic analysis.
Tasks
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11603v1
PDF	http://arxiv.org/pdf/1805.11603v1.pdf
PWC	https://paperswithcode.com/paper/automatic-identification-of-arabic
Repo
Framework

Designing the Game to Play: Optimizing Payoff Structure in Security Games


Title	Designing the Game to Play: Optimizing Payoff Structure in Security Games
Authors	Zheyuan Ryan Shi, Ziye Tang, Long Tran-Thanh, Rohit Singh, Fei Fang
Abstract	Effective game-theoretic modeling of defender-attacker behavior is becoming increasingly important. In many domains, the defender functions not only as a player but also the designer of the game’s payoff structure. We study Stackelberg Security Games where the defender, in addition to allocating defensive resources to protect targets from the attacker, can strategically manipulate the attacker’s payoff under budget constraints in weighted L^p-norm form regarding the amount of change. Focusing on problems with weighted L^1-norm form constraint, we present (i) a mixed integer linear program-based algorithm with approximation guarantee; (ii) a branch-and-bound based algorithm with improved efficiency achieved by effective pruning; (iii) a polynomial time approximation scheme for a special but practical class of problems. In addition, we show that problems under budget constraints in L^0-norm form and weighted L^\infty-norm form can be solved in polynomial time. We provide an extensive experimental evaluation of our proposed algorithms.
Tasks
Published	2018-05-05
URL	http://arxiv.org/abs/1805.01987v2
PDF	http://arxiv.org/pdf/1805.01987v2.pdf
PWC	https://paperswithcode.com/paper/designing-the-game-to-play-optimizing-payoff
Repo
Framework

Generation Meets Recommendation: Proposing Novel Items for Groups of Users


Title	Generation Meets Recommendation: Proposing Novel Items for Groups of Users
Authors	Vinh Vo Thanh, Harold Soh
Abstract	Consider a movie studio aiming to produce a set of new movies for summer release: What types of movies it should produce? Who would the movies appeal to? How many movies should it make? Similar issues are encountered by a variety of organizations, e.g., mobile-phone manufacturers and online magazines, who have to create new (non-existent) items to satisfy groups of users with different preferences. In this paper, we present a joint problem formalization of these interrelated issues, and propose generative methods that address these questions simultaneously. Specifically, we leverage the latent space obtained by training a deep generative model—the Variational Autoencoder (VAE)—via a loss function that incorporates both rating performance and item reconstruction terms. We then apply a greedy search algorithm that utilizes this learned latent space to jointly obtain K plausible new items, and user groups that would find the items appealing. An evaluation of our methods on a synthetic dataset indicates that our approach is able to generate novel items similar to highly-desirable unobserved items. As case studies on real-world data, we applied our method on the MART abstract art and Movielens Tag Genome dataset, which resulted in promising results: small and diverse sets of novel items.
Tasks
Published	2018-08-02
URL	http://arxiv.org/abs/1808.01199v1
PDF	http://arxiv.org/pdf/1808.01199v1.pdf
PWC	https://paperswithcode.com/paper/generation-meets-recommendation-proposing
Repo
Framework

Material Based Object Tracking in Hyperspectral Videos: Benchmark and Algorithms


Title	Material Based Object Tracking in Hyperspectral Videos: Benchmark and Algorithms
Authors	Fengchao Xiong, Jun Zhou, Yuntao Qian
Abstract	Traditional color images only depict color intensities in red, green and blue channels, often making object trackers fail in challenging scenarios, e.g., background clutter and rapid changes of target appearance. Alternatively, material information of targets contained in a large amount of bands of hyperspectral images (HSI) is more robust to these difficult conditions. In this paper, we conduct a comprehensive study on how material information can be utilized to boost object tracking from three aspects: benchmark dataset, material feature representation and material based tracking. In terms of benchmark, we construct a dataset of fully-annotated videos, which contain both hyperspectral and color sequences of the same scene. Material information is represented by spectral-spatial histogram of multidimensional gradient, which describes the 3D local spectral-spatial structure in an HSI, and fractional abundances of constituted material components which encode the underlying material distribution. These two types of features are embedded into correlation filters, yielding material based tracking. Experimental results on the collected benchmark dataset show the potentials and advantages of material based object tracking.
Tasks	Object Tracking
Published	2018-12-11
URL	https://arxiv.org/abs/1812.04179v5
PDF	https://arxiv.org/pdf/1812.04179v5.pdf
PWC	https://paperswithcode.com/paper/spectral-spatial-features-for-material-based
Repo
Framework

Performing Co-Membership Attacks Against Deep Generative Models


Title	Performing Co-Membership Attacks Against Deep Generative Models
Authors	Kin Sum Liu, Chaowei Xiao, Bo Li, Jie Gao
Abstract	In this paper we propose a new membership attack method called co-membership attacks against deep generative models including Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). Specifically, membership attack aims to check whether a given instance x was used in the training data or not. A co-membership attack checks whether the given bundle of n instances were in the training, with the prior knowledge that the bundle was either entirely used in the training or none at all. Successful membership attacks can compromise the privacy of training data when the generative model is published. Our main idea is to cast membership inference of target data x as the optimization of another neural network (called the attacker network) to search for the latent encoding to reproduce x. The final reconstruction error is used directly to conclude whether x was in the training data or not. We conduct extensive experiments on a variety of datasets and generative models showing that: our attacker network outperforms prior membership attacks; co-membership attacks can be substantially more powerful than single attacks; and VAEs are more susceptible to membership attacks compared to GANs.
Tasks
Published	2018-05-24
URL	https://arxiv.org/abs/1805.09898v3
PDF	https://arxiv.org/pdf/1805.09898v3.pdf
PWC	https://paperswithcode.com/paper/performing-co-membership-attacks-against-deep
Repo
Framework

A new stereo formulation not using pixel and disparity models


Title	A new stereo formulation not using pixel and disparity models
Authors	Kiyoshi Oguri, Yuichiro Shibata
Abstract	We introduce a new stereo formulation which does not use pixel and disparity models. Many problems in vision are treated as assigning each pixel a label. Disparities are labels for stereo. Such pixel-labeling problems are naturally represented in terms of energy minimization, where the energy function has two terms: one term penalizes solutions that inconsistent with the observed data, the other term enforces spatial smoothness. Graph cuts are one of the effi- cient methods for solving energy minimization. However, exact minimization of multi labeling problems can be performed by graph cuts only for the case with convex smoothness terms. In pixel-disparity formulation, convex smoothness terms do not generate well reconstructed 3D results. Thus, truncated linear or quadratic smoothness terms, etc. are used, where approximate energy minimization is necessary. In this paper, we introduce a new site-labeling formulation, where the sites are not pixels but lines in 3D space, labels are not disparities but depth numbers. For this formulation, visibility reasoning is naturally included in the energy function. In addition, this formulation allows us to use a small smoothness term, which does not affect the 3D results much. This makes the optimization step very simple, so we could develop an approximation method for graph cut itself (not for energy minimization) and a high performance GPU graph cut program. For Tsukuba stereo pair in Middlebury data set, we got the result in 5ms using GTX1080GPU, 19ms using GTX660GPU.
Tasks
Published	2018-03-05
URL	http://arxiv.org/abs/1803.01516v1
PDF	http://arxiv.org/pdf/1803.01516v1.pdf
PWC	https://paperswithcode.com/paper/a-new-stereo-formulation-not-using-pixel-and
Repo
Framework

Multiview Based 3D Scene Understanding On Partial Point Sets


Title	Multiview Based 3D Scene Understanding On Partial Point Sets
Authors	Ye Zhu, Sven Ewan Shepstone, Pablo Martínez-Nuevo, Miklas Strøm Kristoffersen, Fabien Moutarde, Zhuang Fu
Abstract	Deep learning within the context of point clouds has gained much research interest in recent years mostly due to the promising results that have been achieved on a number of challenging benchmarks, such as 3D shape recognition and scene semantic segmentation. In many realistic settings however, snapshots of the environment are often taken from a single view, which only contains a partial set of the scene due to the field of view restriction of commodity cameras. 3D scene semantic understanding on partial point clouds is considered as a challenging task. In this work, we propose a processing approach for 3D point cloud data based on a multiview representation of the existing 360{\deg} point clouds. By fusing the original 360{\deg} point clouds and their corresponding 3D multiview representations as input data, a neural network is able to recognize partial point sets while improving the general performance on complete point sets, resulting in an overall increase of 31.9% and 4.3% in segmentation accuracy for partial and complete scene semantic understanding, respectively. This method can also be applied in a wider 3D recognition context such as 3D part segmentation.
Tasks	3D Part Segmentation, 3D Shape Recognition, Scene Understanding, Semantic Segmentation
Published	2018-11-30
URL	http://arxiv.org/abs/1812.01712v1
PDF	http://arxiv.org/pdf/1812.01712v1.pdf
PWC	https://paperswithcode.com/paper/multiview-based-3d-scene-understanding-on
Repo
Framework

Interpretable Spiculation Quantification for Lung Cancer Screening


Title	Interpretable Spiculation Quantification for Lung Cancer Screening
Authors	Wookjin Choi, Saad Nadeem, Sadegh Riyahi, Joseph O. Deasy, Allen Tannenbaum, Wei Lu
Abstract	Spiculations are spikes on the surface of pulmonary nodule and are important predictors of malignancy in lung cancer. In this work, we introduced an interpretable, parameter-free technique for quantifying this critical feature using the area distortion metric from the spherical conformal (angle-preserving) parameterization. The conformal factor in the spherical mapping formulation provides a direct measure of spiculation which can be used to detect spikes and compute spike heights for geometrically-complex spiculations. The use of the area distortion metric from conformal mapping has never been exploited before in this context. Based on the area distortion metric and the spiculation height, we introduced a novel spiculation score. A combination of our spiculation measures was found to be highly correlated (Spearman’s rank correlation coefficient $\rho = 0.48$) with the radiologist’s spiculation score. These measures were also used in the radiomics framework to achieve state-of-the-art malignancy prediction accuracy of 88.9% on a publicly available dataset.
Tasks
Published	2018-08-24
URL	http://arxiv.org/abs/1808.08307v2
PDF	http://arxiv.org/pdf/1808.08307v2.pdf
PWC	https://paperswithcode.com/paper/interpretable-spiculation-quantification-for
Repo
Framework

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks


Title	Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
Authors	Juho Lee, Yoonho Lee, Jungtaek Kim, Adam R. Kosiorek, Seungjin Choi, Yee Whye Teh
Abstract	Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances. Since solutions to such problems do not depend on the order of elements of the set, models used to address them should be permutation invariant. We present an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set. The model consists of an encoder and a decoder, both of which rely on attention mechanisms. In an effort to reduce computational complexity, we introduce an attention scheme inspired by inducing point methods from sparse Gaussian process literature. It reduces the computation time of self-attention from quadratic to linear in the number of elements in the set. We show that our model is theoretically attractive and we evaluate it on a range of tasks, demonstrating the state-of-the-art performance compared to recent methods for set-structured data.
Tasks	3D Shape Recognition, Few-Shot Image Classification, Image Classification, Multiple Instance Learning
Published	2018-10-01
URL	https://arxiv.org/abs/1810.00825v3
PDF	https://arxiv.org/pdf/1810.00825v3.pdf
PWC	https://paperswithcode.com/paper/set-transformer-a-framework-for-attention
Repo
Framework

Clinical evaluation of semi-automatic opensource algorithmic software segmentation of the mandibular bone: Practical feasibility and assessment of a new course of action


Title	Clinical evaluation of semi-automatic opensource algorithmic software segmentation of the mandibular bone: Practical feasibility and assessment of a new course of action
Authors	Jürgen Wallner, Kerstin Hochegger, Xiaojun Chen, Irene Mischak, Knut Reinbacher, Mauro Pau, Tomislav Zrnc, Katja Schwenzer-Zimmerer, Wolfgang Zemann, Dieter Schmalstieg, Jan Egger
Abstract	Computer assisted technologies based on algorithmic software segmentation are an increasing topic of interest in complex surgical cases. However - due to functional instability, time consuming software processes, personnel resources or licensed-based financial costs many segmentation processes are often outsourced from clinical centers to third parties and the industry. Therefore, the aim of this trial was to assess the practical feasibility of an easy available, functional stable and licensed-free segmentation approach to be used in the clinical practice. In this retrospective, randomized, controlled trail the accuracy and accordance of the open-source based segmentation algorithm GrowCut (GC) was assessed through the comparison to the manually generated ground truth of the same anatomy using 10 CT lower jaw data-sets from the clinical routine. Assessment parameters were the segmentation time, the volume, the voxel number, the Dice Score (DSC) and the Hausdorff distance (HD). Overall segmentation times were about one minute. Mean DSC values of over 85% and HD below 33.5 voxel could be achieved. Statistical differences between the assessment parameters were not significant (p<0.05) and correlation coefficients were close to the value one (r > 0.94). Complete functional stable and time saving segmentations with high accuracy and high positive correlation could be performed by the presented interactive open-source based approach. In the cranio-maxillofacial complex the used method could represent an algorithmic alternative for image-based segmentation in the clinical practice for e.g. surgical treatment planning or visualization of postoperative results and offers several advantages. Systematic comparisons to other segmentation approaches or with a greater data amount are areas of future works.
Tasks
Published	2018-05-11
URL	http://arxiv.org/abs/1805.08604v1
PDF	http://arxiv.org/pdf/1805.08604v1.pdf
PWC	https://paperswithcode.com/paper/clinical-evaluation-of-semi-automatic
Repo
Framework

Neural Text Generation: Past, Present and Beyond


Title	Neural Text Generation: Past, Present and Beyond
Authors	Sidi Lu, Yaoming Zhu, Weinan Zhang, Jun Wang, Yong Yu
Abstract	This paper presents a systematic survey on recent development of neural text generation models. Specifically, we start from recurrent neural network language models with the traditional maximum likelihood estimation training scheme and point out its shortcoming for text generation. We thus introduce the recently proposed methods for text generation based on reinforcement learning, re-parametrization tricks and generative adversarial nets (GAN) techniques. We compare different properties of these models and the corresponding techniques to handle their common problems such as gradient vanishing and generation diversity. Finally, we conduct a benchmarking experiment with different types of neural text generation models on two well-known datasets and discuss the empirical results along with the aforementioned model properties.
Tasks	Text Generation
Published	2018-03-15
URL	http://arxiv.org/abs/1803.07133v1
PDF	http://arxiv.org/pdf/1803.07133v1.pdf
PWC	https://paperswithcode.com/paper/neural-text-generation-past-present-and
Repo
Framework

Handcrafted and Deep Trackers: Recent Visual Object Tracking Approaches and Trends


Title	Handcrafted and Deep Trackers: Recent Visual Object Tracking Approaches and Trends
Authors	Mustansar Fiaz, Arif Mahmood, Sajid Javed, Soon Ki Jung
Abstract	In recent years visual object tracking has become a very active research area. An increasing number of tracking algorithms are being proposed each year. It is because tracking has wide applications in various real world problems such as human-computer interaction, autonomous vehicles, robotics, surveillance and security just to name a few. In the current study, we review latest trends and advances in the tracking area and evaluate the robustness of different trackers based on the feature extraction methods. The first part of this work comprises a comprehensive survey of the recently proposed trackers. We broadly categorize trackers into Correlation Filter based Trackers (CFTs) and Non-CFTs. Each category is further classified into various types based on the architecture and the tracking mechanism. In the second part, we experimentally evaluated 24 recent trackers for robustness, and compared handcrafted and deep feature based trackers. We observe that trackers using deep features performed better, though in some cases a fusion of both increased performance significantly. In order to overcome the drawbacks of the existing benchmarks, a new benchmark Object Tracking and Temple Color (OTTC) has also been proposed and used in the evaluation of different algorithms. We analyze the performance of trackers over eleven different challenges in OTTC, and three other benchmarks. Our study concludes that Discriminative Correlation Filter (DCF) based trackers perform better than the others. Our study also reveals that inclusion of different types of regularizations over DCF often results in boosted tracking performance. Finally, we sum up our study by pointing out some insights and indicating future trends in visual object tracking field.
Tasks	Autonomous Vehicles, Object Tracking, Visual Object Tracking
Published	2018-12-06
URL	http://arxiv.org/abs/1812.07368v2
PDF	http://arxiv.org/pdf/1812.07368v2.pdf
PWC	https://paperswithcode.com/paper/handcrafted-and-deep-trackers-recent-visual
Repo
Framework

Cross-Classification Clustering: An Efficient Multi-Object Tracking Technique for 3-D Instance Segmentation in Connectomics


Title	Cross-Classification Clustering: An Efficient Multi-Object Tracking Technique for 3-D Instance Segmentation in Connectomics
Authors	Yaron Meirovitch, Lu Mi, Hayk Saribekyan, Alexander Matveev, David Rolnick, Nir Shavit
Abstract	Pixel-accurate tracking of objects is a key element in many computer vision applications, often solved by iterated individual object tracking or instance segmentation followed by object matching. Here we introduce cross-classification clustering (3C), a technique that simultaneously tracks complex, interrelated objects in an image stack. The key idea in cross-classification is to efficiently turn a clustering problem into a classification problem by running a logarithmic number of independent classifications per image, letting the cross-labeling of these classifications uniquely classify each pixel to the object labels. We apply the 3C mechanism to achieve state-of-the-art accuracy in connectomics – the nanoscale mapping of neural tissue from electron microscopy volumes. Our reconstruction system increases scalability by an order of magnitude over existing single-object tracking methods (such as flood-filling networks). This scalability is important for the deployment of connectomics pipelines, since currently the best performing techniques require computing infrastructures that are beyond the reach of most laboratories. Our algorithm may offer benefits in other domains that require pixel-accurate tracking of multiple objects, such as segmentation of videos and medical imagery.
Tasks	Instance Segmentation, Multi-Object Tracking, Object Tracking, Semantic Segmentation
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01157v2
PDF	https://arxiv.org/pdf/1812.01157v2.pdf
PWC	https://paperswithcode.com/paper/cross-classification-clustering-an-efficient
Repo
Framework

The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping


Title	The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping
Authors	Tom Bruls, Horia Porav, Lars Kunze, Paul Newman
Abstract	Many tasks performed by autonomous vehicles such as road marking detection, object tracking, and path planning are simpler in bird’s-eye view. Hence, Inverse Perspective Mapping (IPM) is often applied to remove the perspective effect from a vehicle’s front-facing camera and to remap its images into a 2D domain, resulting in a top-down view. Unfortunately, however, this leads to unnatural blurring and stretching of objects at further distance, due to the resolution of the camera, limiting applicability. In this paper, we present an adversarial learning approach for generating a significantly improved IPM from a single camera image in real time. The generated bird’s-eye-view images contain sharper features (e.g. road markings) and a more homogeneous illumination, while (dynamic) objects are automatically removed from the scene, thus revealing the underlying road layout in an improved fashion. We demonstrate our framework using real-world data from the Oxford RobotCar Dataset and show that scene understanding tasks directly benefit from our boosted IPM approach.
Tasks	Autonomous Vehicles, Object Tracking, Scene Understanding
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00913v2
PDF	http://arxiv.org/pdf/1812.00913v2.pdf
PWC	https://paperswithcode.com/paper/the-right-angled-perspective-improving-the
Repo
Framework