July 29, 2019

3099 words 15 mins read

Paper Group ANR 85

AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition. Query-free Clothing Retrieval via Implicit Relevance Feedback. Flow-free Video Object Segmentation. Long-Term Video Interpolation with Bidirectional Predictive Network. Reconciling Bayesian Epistemology and Narration-based Approaches to Judiciary Fact-finding. Grounding …

AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition


Title	AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Authors	Chun Yang, Xu-Cheng Yin, Zejun Li, Jianwei Wu, Chunchao Guo, Hongfa Wang, Lei Xiao
Abstract	Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks). In the end-to-end training procedure for scene text recognition, the outputs of deep neural networks at different iterations are always demonstrated with diversity and complementarity for the target object (text). Here, a simple but effective deep learning method, an adaptive ensemble of deep neural networks (AdaDNNs), is proposed to simply select and adaptively combine classifier components at different iterations from the whole learning system. Furthermore, the ensemble is formulated as a Bayesian framework for classifier weighting and combination. A variety of experiments on several typical acknowledged benchmarks, i.e., ICDAR Robust Reading Competition (Challenge 1, 2 and 4) datasets, verify the surprised improvement from the baseline DNNs, and the effectiveness of AdaDNNs compared with the recent state-of-the-art methods.
Tasks	Scene Text Recognition
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03425v1
PDF	http://arxiv.org/pdf/1710.03425v1.pdf
PWC	https://paperswithcode.com/paper/adadnns-adaptive-ensemble-of-deep-neural
Repo
Framework

Query-free Clothing Retrieval via Implicit Relevance Feedback


Title	Query-free Clothing Retrieval via Implicit Relevance Feedback
Authors	Zhuoxiang Chen, Zhe Xu, Ya Zhang, Xiao Gu
Abstract	Image-based clothing retrieval is receiving increasing interest with the growth of online shopping. In practice, users may often have a desired piece of clothing in mind (e.g., either having seen it before on the street or requiring certain specific clothing attributes) but may be unable to supply an image as a query. We model this problem as a new type of image retrieval task in which the target image resides only in the user’s mind (called “mental image retrieval” hereafter). Because of the absence of an explicit query image, we propose to solve this problem through relevance feedback. Specifically, a new Bayesian formulation is proposed that simultaneously models the retrieval target and its high-level representation in the mind of the user (called the “user metric” hereafter) as posterior distributions of pre-fetched shop images and heterogeneous features extracted from multiple clothing attributes, respectively. Requiring only clicks as user feedback, the proposed algorithm is able to account for the variability in human decision-making. Experiments with real users demonstrate the effectiveness of the proposed algorithm.
Tasks	Decision Making, Image Retrieval
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00248v1
PDF	http://arxiv.org/pdf/1711.00248v1.pdf
PWC	https://paperswithcode.com/paper/query-free-clothing-retrieval-via-implicit
Repo
Framework

Flow-free Video Object Segmentation


Title	Flow-free Video Object Segmentation
Authors	Aditya Vora, Shanmuganathan Raman
Abstract	Segmenting foreground object from a video is a challenging task because of the large deformations of the objects, occlusions, and background clutter. In this paper, we propose a frame-by-frame but computationally efficient approach for video object segmentation by clustering visually similar generic object segments throughout the video. Our algorithm segments various object instances appearing in the video and then perform clustering in order to group visually similar segments into one cluster. Since the object that needs to be segmented appears in most part of the video, we can retrieve the foreground segments from the cluster having maximum number of segments, thus filtering out noisy segments that do not represent any object. We then apply a track and fill approach in order to localize the objects in the frames where the object segmentation framework fails to segment any object. Our algorithm performs comparably to the recent automatic methods for video object segmentation when benchmarked on DAVIS dataset while being computationally much faster.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09544v1
PDF	http://arxiv.org/pdf/1706.09544v1.pdf
PWC	https://paperswithcode.com/paper/flow-free-video-object-segmentation
Repo
Framework

Long-Term Video Interpolation with Bidirectional Predictive Network


Title	Long-Term Video Interpolation with Bidirectional Predictive Network
Authors	Xiongtao Chen, Wenmin Wang, Jinzhuo Wang, Weimian Li, Baoyang Chen
Abstract	This paper considers the challenging task of long-term video interpolation. Unlike most existing methods that only generate few intermediate frames between existing adjacent ones, we attempt to speculate or imagine the procedure of an episode and further generate multiple frames between two non-consecutive frames in videos. In this paper, we present a novel deep architecture called bidirectional predictive network (BiPN) that predicts intermediate frames from two opposite directions. The bidirectional architecture allows the model to learn scene transformation with time as well as generate longer video sequences. Besides, our model can be extended to predict multiple possible procedures by sampling different noise vectors. A joint loss composed of clues in image and feature spaces and adversarial loss is designed to train our model. We demonstrate the advantages of BiPN on two benchmarks Moving 2D Shapes and UCF101 and report competitive results to recent approaches.
Tasks
Published	2017-06-13
URL	http://arxiv.org/abs/1706.03947v1
PDF	http://arxiv.org/pdf/1706.03947v1.pdf
PWC	https://paperswithcode.com/paper/long-term-video-interpolation-with
Repo
Framework

Reconciling Bayesian Epistemology and Narration-based Approaches to Judiciary Fact-finding


Title	Reconciling Bayesian Epistemology and Narration-based Approaches to Judiciary Fact-finding
Authors	Rafal Urbaniak
Abstract	Legal probabilism (LP) claims the degrees of conviction in juridical fact-finding are to be modeled exactly the way degrees of beliefs are modeled in standard bayesian epistemology. Classical legal probabilism (CLP) adds that the conviction is justified if the credence in guilt given the evidence is above an appropriate guilt probability threshold. The views are challenged on various counts, especially by the proponents of the so-called narrative approach, on which the fact-finders’ decision is the result of a dynamic interplay between competing narratives of what happened. I develop a way a bayesian epistemologist can make sense of the narrative approach. I do so by formulating a probabilistic framework for evaluating competing narrations in terms of formal explications of the informal evaluation criteria used in the narrative approach.
Tasks
Published	2017-07-27
URL	http://arxiv.org/abs/1707.08763v1
PDF	http://arxiv.org/pdf/1707.08763v1.pdf
PWC	https://paperswithcode.com/paper/reconciling-bayesian-epistemology-and
Repo
Framework

Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction


Title	Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction
Authors	Mohit Shridhar, David Hsu
Abstract	The human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is semantic and spatial grounding, which is to infer objects and their spatial relationships from images and natural language expressions. We introduce a two-stage neural-network grounding pipeline that maps natural language referring expressions directly to objects in the images. The first stage uses visual descriptions in the referring expressions to generate a candidate set of relevant objects. The second stage examines all pairwise relationships between the candidates and predicts the most likely referred object according to the spatial descriptions in the referring expressions. A key feature of our system is that by leveraging a large dataset of images labeled with text descriptions, it allows unrestricted object types and natural language referring expressions. Preliminary results indicate that our system outperforms a near state-of-the-art object comprehension system on standard benchmark datasets. We also present a robot system that follows voice commands to pick and place previously unseen objects.
Tasks
Published	2017-07-18
URL	http://arxiv.org/abs/1707.05720v1
PDF	http://arxiv.org/pdf/1707.05720v1.pdf
PWC	https://paperswithcode.com/paper/grounding-spatio-semantic-referring
Repo
Framework

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation


Title	A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation
Authors	Chunpeng Wu, Wei Wen, Tariq Afzal, Yongmei Zhang, Yiran Chen, Hai Li
Abstract	Recently, DNN model compression based on network architecture design, e.g., SqueezeNet, attracted a lot attention. No accuracy drop on image classification is observed on these extremely compact networks, compared to well-known models. An emerging question, however, is whether these model compression techniques hurt DNN’s learning ability other than classifying images on a single dataset. Our preliminary experiment shows that these compression methods could degrade domain adaptation (DA) ability, though the classification performance is preserved. Therefore, we propose a new compact network architecture and unsupervised DA method in this paper. The DNN is built on a new basic module Conv-M which provides more diverse feature extractors without significantly increasing parameters. The unified framework of our DA method will simultaneously learn invariance across domains, reduce divergence of feature representations, and adapt label prediction. Our DNN has 4.1M parameters, which is only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN obtains GoogLeNet-level accuracy both on classification and DA, and our DA method slightly outperforms previous competitive ones. Put all together, our DA strategy based on our DNN achieves state-of-the-art on sixteen of total eighteen DA tasks on popular Office-31 and Office-Caltech datasets.
Tasks	Domain Adaptation, Image Classification, Model Compression
Published	2017-03-12
URL	http://arxiv.org/abs/1703.04071v4
PDF	http://arxiv.org/pdf/1703.04071v4.pdf
PWC	https://paperswithcode.com/paper/a-compact-dnn-approaching-googlenet-level
Repo
Framework

Conformal k-NN Anomaly Detector for Univariate Data Streams


Title	Conformal k-NN Anomaly Detector for Univariate Data Streams
Authors	Vladislav Ishimtsev, Ivan Nazarov, Alexander Bernstein, Evgeny Burnaev
Abstract	Anomalies in time-series data give essential and often actionable information in many applications. In this paper we consider a model-free anomaly detection method for univariate time-series which adapts to non-stationarity in the data stream and provides probabilistic abnormality scores based on the conformal prediction paradigm. Despite its simplicity the method performs on par with complex prediction-based models on the Numenta Anomaly Detection benchmark and the Yahoo! S5 dataset.
Tasks	Anomaly Detection, Time Series
Published	2017-06-11
URL	http://arxiv.org/abs/1706.03412v1
PDF	http://arxiv.org/pdf/1706.03412v1.pdf
PWC	https://paperswithcode.com/paper/conformal-k-nn-anomaly-detector-for
Repo
Framework

Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization


Title	Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization
Authors	Yu Chen, Chunhua Shen, Hao Chen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang
Abstract	Landmark/pose estimation in single monocular images have received much effort in computer vision due to its important applications. It remains a challenging task when input images severe occlusions caused by, e.g., adverse camera views. Under such circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity. To address the problem, by incorporating priors about the structure of pose components, we propose a novel structure-aware fully convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, inspired by how human identifies implausible poses, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator G generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors. Training of the network follows the strategy of conditional Generative Adversarial Networks (GANs). The effectiveness of the proposed network is evaluated on three pose-related tasks: 2D single human pose estimation, 2D facial landmark estimation and 3D single human pose estimation. The proposed approach significantly outperforms the state-of-the-art methods and almost always generates plausible pose predictions, demonstrating the usefulness of implicit learning of structures using GANs.
Tasks	Pose Estimation
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00253v5
PDF	http://arxiv.org/pdf/1711.00253v5.pdf
PWC	https://paperswithcode.com/paper/adversarial-learning-of-structure-aware-fully
Repo
Framework

Robust Cost-Sensitive Learning for Recommendation with Implicit Feedback


Title	Robust Cost-Sensitive Learning for Recommendation with Implicit Feedback
Authors	Peng Yang, Peilin Zhao, Xin Gao, Yong Liu
Abstract	Recommendation is the task of improving customer experience through personalized recommendation based on users’ past feedback. In this paper, we investigate the most common scenario: the user-item (U-I) matrix of implicit feedback. Even though many recommendation approaches are designed based on implicit feedback, they attempt to project the U-I matrix into a low-rank latent space, which is a strict restriction that rarely holds in practice. In addition, although misclassification costs from imbalanced classes are significantly different, few methods take the cost of classification error into account. To address aforementioned issues, we propose a robust framework by decomposing the U-I matrix into two components: (1) a low-rank matrix that captures the common preference, and (2) a sparse matrix that detects the user-specific preference of individuals. A cost-sensitive learning model is embedded into the framework. Specifically, this model exploits different costs in the loss function for the observed and unobserved instances. We show that the resulting non-smooth convex objective can be optimized efficiently by an accelerated projected gradient method with closed-form solutions. Morever, the proposed algorithm can be scaled up to large-sized datasets after a relaxation. The theoretical result shows that even with a small fraction of 1’s in the U-I matrix $M\in\mathbb{R}^{n\times m}$, the cost-sensitive error of the proposed model is upper bounded by $O(\frac{\alpha}{\sqrt{mn}})$, where $\alpha$ is a bias over imbalanced classes. Finally, empirical experiments are extensively carried out to evaluate the effectiveness of our proposed algorithm. Encouraging experimental results show that our algorithm outperforms several state-of-the-art algorithms on benchmark recommendation datasets.
Tasks
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00536v2
PDF	http://arxiv.org/pdf/1707.00536v2.pdf
PWC	https://paperswithcode.com/paper/robust-cost-sensitive-learning-for
Repo
Framework

Deep Learning with Permutation-invariant Operator for Multi-instance Histopathology Classification


Title	Deep Learning with Permutation-invariant Operator for Multi-instance Histopathology Classification
Authors	Jakub M. Tomczak, Maximilian Ilse, Max Welling
Abstract	The computer-aided analysis of medical scans is a longstanding goal in the medical imaging field. Currently, deep learning has became a dominant methodology for supporting pathologists and radiologist. Deep learning algorithms have been successfully applied to digital pathology and radiology, nevertheless, there are still practical issues that prevent these tools to be widely used in practice. The main obstacles are low number of available cases and large size of images (a.k.a. the small n, large p problem in machine learning), and a very limited access to annotation at a pixel level that can lead to severe overfitting and large computational requirements. We propose to handle these issues by introducing a framework that processes a medical image as a collection of small patches using a single, shared neural network. The final diagnosis is provided by combining scores of individual patches using a permutation-invariant operator (combination). In machine learning community such approach is called a multi-instance learning (MIL).
Tasks
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00310v2
PDF	http://arxiv.org/pdf/1712.00310v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-with-permutation-invariant
Repo
Framework

Robust Rigid Point Registration based on Convolution of Adaptive Gaussian Mixture Models


Title	Robust Rigid Point Registration based on Convolution of Adaptive Gaussian Mixture Models
Authors	Can Pu, Nanbo Li, Robert B Fisher
Abstract	Matching 3D rigid point clouds in complex environments robustly and accurately is still a core technique used in many applications. This paper proposes a new architecture combining error estimation from sample covariances and dual global probability alignment based on the convolution of adaptive Gaussian Mixture Models (GMM) from point clouds. Firstly, a novel adaptive GMM is defined using probability distributions from the corresponding points. Then rigid point cloud alignment is performed by maximizing the global probability from the convolution of dual adaptive GMMs in the whole 2D or 3D space, which can be efficiently optimized and has a large zone of accurate convergence. Thousands of trials have been conducted on 200 models from public 2D and 3D datasets to demonstrate superior robustness and accuracy in complex environments with unpredictable noise, outliers, occlusion, initial rotation, shape and missing points.
Tasks
Published	2017-07-26
URL	http://arxiv.org/abs/1707.08626v1
PDF	http://arxiv.org/pdf/1707.08626v1.pdf
PWC	https://paperswithcode.com/paper/robust-rigid-point-registration-based-on
Repo
Framework

C-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of Kidney Stones


Title	C-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of Kidney Stones
Authors	Nuhad A. Malalla, Ying Chen
Abstract	In this paper, we investigated a C-arm tomographic technique as a new three dimensional (3D) kidney imaging method for nephrolithiasis and kidney stone detection over view angle less than 180o. Our C-arm tomographic technique provides a series of two dimensional (2D) images with a single scan over 40o view angle. Experimental studies were performed with a kidney phantom that was formed from a pig kidney with two embedded kidney stones. Different reconstruction methods were developed for C-arm tomographic technique to generate 3D kidney information including: point by point back projection (BP), filtered back projection (FBP), simultaneous algebraic reconstruction technique (SART) and maximum likelihood expectation maximization (MLEM). Computer simulation study was also done with simulated 3D spherical object to evaluate the reconstruction results. Preliminary results demonstrated the capability of our C-arm tomographic technique to generate 3D kidney information for kidney stone detection with low exposure of radiation. The kidney stones are visible on reconstructed planes with identifiable shapes and sizes.
Tasks
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02425v1
PDF	http://arxiv.org/pdf/1706.02425v1.pdf
PWC	https://paperswithcode.com/paper/c-arm-tomographic-imaging-technique-for
Repo
Framework

Content-Based Table Retrieval for Web Queries


Title	Content-Based Table Retrieval for Web Queries
Authors	Zhao Yan, Duyu Tang, Nan Duan, Junwei Bao, Yuanhua Lv, Ming Zhou, Zhoujun Li
Abstract	Understanding the connections between unstructured text and semi-structured table is an important yet neglected problem in natural language processing. In this work, we focus on content-based table retrieval. Given a query, the task is to find the most relevant table from a collection of tables. Further progress towards improving this area requires powerful models of semantic matching and richer training and evaluation resources. To remedy this, we present a ranking based approach, and implement both carefully designed features and neural network architectures to measure the relevance between a query and the content of a table. Furthermore, we release an open-domain dataset that includes 21,113 web queries for 273,816 tables. We conduct comprehensive experiments on both real world and synthetic datasets. Results verify the effectiveness of our approach and present the challenges for this task.
Tasks
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02427v1
PDF	http://arxiv.org/pdf/1706.02427v1.pdf
PWC	https://paperswithcode.com/paper/content-based-table-retrieval-for-web-queries
Repo
Framework

Line-Circle: A Geometric Filter for Single Camera Edge-Based Object Detection


Title	Line-Circle: A Geometric Filter for Single Camera Edge-Based Object Detection
Authors	Seyed Amir Tafrishi, Vahid E. Kandjani
Abstract	This paper presents a state-of-the-art approach in object detection for being applied in future SLAM problems. Although, many SLAM methods are proposed to create suitable autonomy for mobile robots namely ground vehicles, they still face overconfidence and large computations during entrance to immense spaces with many landmarks. In particular, they suffer from impractical applications via sole reliance on the limited sensors like camera. Proposed method claims that unmanned ground vehicles without having huge amount of database for object definition and highly advance prediction parameters can deal with incoming objects during straight motion of camera in real-time. Line-Circle (LC) filter tries to apply detection, tracking and learning to each defined experts to obtain more information for judging scene without over-calculation. In this filter, circle expert let us summarize edges in groups. The Interactive feedback learning between each expert creates minimal error that fights against overwhelming landmark signs in crowded scenes without mapping. Our experts basically are dependent on trust factors’ covariance with geometric definitions to ignore, emerge and compare detected landmarks. The experiment for validating the model is taken place utilizing a camera beside an IMU sensor for location estimation.
Tasks	Object Detection
Published	2017-07-25
URL	http://arxiv.org/abs/1707.08095v1
PDF	http://arxiv.org/pdf/1707.08095v1.pdf
PWC	https://paperswithcode.com/paper/line-circle-a-geometric-filter-for-single
Repo
Framework