July 29, 2019

3099 words 15 mins read

Paper Group ANR 85

Paper Group ANR 85

AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition. Query-free Clothing Retrieval via Implicit Relevance Feedback. Flow-free Video Object Segmentation. Long-Term Video Interpolation with Bidirectional Predictive Network. Reconciling Bayesian Epistemology and Narration-based Approaches to Judiciary Fact-finding. Grounding …

AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition

Title AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Authors Chun Yang, Xu-Cheng Yin, Zejun Li, Jianwei Wu, Chunchao Guo, Hongfa Wang, Lei Xiao
Abstract Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks). In the end-to-end training procedure for scene text recognition, the outputs of deep neural networks at different iterations are always demonstrated with diversity and complementarity for the target object (text). Here, a simple but effective deep learning method, an adaptive ensemble of deep neural networks (AdaDNNs), is proposed to simply select and adaptively combine classifier components at different iterations from the whole learning system. Furthermore, the ensemble is formulated as a Bayesian framework for classifier weighting and combination. A variety of experiments on several typical acknowledged benchmarks, i.e., ICDAR Robust Reading Competition (Challenge 1, 2 and 4) datasets, verify the surprised improvement from the baseline DNNs, and the effectiveness of AdaDNNs compared with the recent state-of-the-art methods.
Tasks Scene Text Recognition
Published 2017-10-10
URL http://arxiv.org/abs/1710.03425v1
PDF http://arxiv.org/pdf/1710.03425v1.pdf
PWC https://paperswithcode.com/paper/adadnns-adaptive-ensemble-of-deep-neural
Repo
Framework

Query-free Clothing Retrieval via Implicit Relevance Feedback

Title Query-free Clothing Retrieval via Implicit Relevance Feedback
Authors Zhuoxiang Chen, Zhe Xu, Ya Zhang, Xiao Gu
Abstract Image-based clothing retrieval is receiving increasing interest with the growth of online shopping. In practice, users may often have a desired piece of clothing in mind (e.g., either having seen it before on the street or requiring certain specific clothing attributes) but may be unable to supply an image as a query. We model this problem as a new type of image retrieval task in which the target image resides only in the user’s mind (called “mental image retrieval” hereafter). Because of the absence of an explicit query image, we propose to solve this problem through relevance feedback. Specifically, a new Bayesian formulation is proposed that simultaneously models the retrieval target and its high-level representation in the mind of the user (called the “user metric” hereafter) as posterior distributions of pre-fetched shop images and heterogeneous features extracted from multiple clothing attributes, respectively. Requiring only clicks as user feedback, the proposed algorithm is able to account for the variability in human decision-making. Experiments with real users demonstrate the effectiveness of the proposed algorithm.
Tasks Decision Making, Image Retrieval
Published 2017-11-01
URL http://arxiv.org/abs/1711.00248v1
PDF http://arxiv.org/pdf/1711.00248v1.pdf
PWC https://paperswithcode.com/paper/query-free-clothing-retrieval-via-implicit
Repo
Framework

Flow-free Video Object Segmentation

Title Flow-free Video Object Segmentation
Authors Aditya Vora, Shanmuganathan Raman
Abstract Segmenting foreground object from a video is a challenging task because of the large deformations of the objects, occlusions, and background clutter. In this paper, we propose a frame-by-frame but computationally efficient approach for video object segmentation by clustering visually similar generic object segments throughout the video. Our algorithm segments various object instances appearing in the video and then perform clustering in order to group visually similar segments into one cluster. Since the object that needs to be segmented appears in most part of the video, we can retrieve the foreground segments from the cluster having maximum number of segments, thus filtering out noisy segments that do not represent any object. We then apply a track and fill approach in order to localize the objects in the frames where the object segmentation framework fails to segment any object. Our algorithm performs comparably to the recent automatic methods for video object segmentation when benchmarked on DAVIS dataset while being computationally much faster.
Tasks Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2017-06-29
URL http://arxiv.org/abs/1706.09544v1
PDF http://arxiv.org/pdf/1706.09544v1.pdf
PWC https://paperswithcode.com/paper/flow-free-video-object-segmentation
Repo
Framework

Long-Term Video Interpolation with Bidirectional Predictive Network

Title Long-Term Video Interpolation with Bidirectional Predictive Network
Authors Xiongtao Chen, Wenmin Wang, Jinzhuo Wang, Weimian Li, Baoyang Chen
Abstract This paper considers the challenging task of long-term video interpolation. Unlike most existing methods that only generate few intermediate frames between existing adjacent ones, we attempt to speculate or imagine the procedure of an episode and further generate multiple frames between two non-consecutive frames in videos. In this paper, we present a novel deep architecture called bidirectional predictive network (BiPN) that predicts intermediate frames from two opposite directions. The bidirectional architecture allows the model to learn scene transformation with time as well as generate longer video sequences. Besides, our model can be extended to predict multiple possible procedures by sampling different noise vectors. A joint loss composed of clues in image and feature spaces and adversarial loss is designed to train our model. We demonstrate the advantages of BiPN on two benchmarks Moving 2D Shapes and UCF101 and report competitive results to recent approaches.
Tasks
Published 2017-06-13
URL http://arxiv.org/abs/1706.03947v1
PDF http://arxiv.org/pdf/1706.03947v1.pdf
PWC https://paperswithcode.com/paper/long-term-video-interpolation-with
Repo
Framework

Reconciling Bayesian Epistemology and Narration-based Approaches to Judiciary Fact-finding

Title Reconciling Bayesian Epistemology and Narration-based Approaches to Judiciary Fact-finding
Authors Rafal Urbaniak
Abstract Legal probabilism (LP) claims the degrees of conviction in juridical fact-finding are to be modeled exactly the way degrees of beliefs are modeled in standard bayesian epistemology. Classical legal probabilism (CLP) adds that the conviction is justified if the credence in guilt given the evidence is above an appropriate guilt probability threshold. The views are challenged on various counts, especially by the proponents of the so-called narrative approach, on which the fact-finders’ decision is the result of a dynamic interplay between competing narratives of what happened. I develop a way a bayesian epistemologist can make sense of the narrative approach. I do so by formulating a probabilistic framework for evaluating competing narrations in terms of formal explications of the informal evaluation criteria used in the narrative approach.
Tasks
Published 2017-07-27
URL http://arxiv.org/abs/1707.08763v1
PDF http://arxiv.org/pdf/1707.08763v1.pdf
PWC https://paperswithcode.com/paper/reconciling-bayesian-epistemology-and
Repo
Framework

Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction

Title Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction
Authors Mohit Shridhar, David Hsu
Abstract The human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is semantic and spatial grounding, which is to infer objects and their spatial relationships from images and natural language expressions. We introduce a two-stage neural-network grounding pipeline that maps natural language referring expressions directly to objects in the images. The first stage uses visual descriptions in the referring expressions to generate a candidate set of relevant objects. The second stage examines all pairwise relationships between the candidates and predicts the most likely referred object according to the spatial descriptions in the referring expressions. A key feature of our system is that by leveraging a large dataset of images labeled with text descriptions, it allows unrestricted object types and natural language referring expressions. Preliminary results indicate that our system outperforms a near state-of-the-art object comprehension system on standard benchmark datasets. We also present a robot system that follows voice commands to pick and place previously unseen objects.
Tasks
Published 2017-07-18
URL http://arxiv.org/abs/1707.05720v1
PDF http://arxiv.org/pdf/1707.05720v1.pdf
PWC https://paperswithcode.com/paper/grounding-spatio-semantic-referring
Repo
Framework

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Title A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation
Authors Chunpeng Wu, Wei Wen, Tariq Afzal, Yongmei Zhang, Yiran Chen, Hai Li
Abstract Recently, DNN model compression based on network architecture design, e.g., SqueezeNet, attracted a lot attention. No accuracy drop on image classification is observed on these extremely compact networks, compared to well-known models. An emerging question, however, is whether these model compression techniques hurt DNN’s learning ability other than classifying images on a single dataset. Our preliminary experiment shows that these compression methods could degrade domain adaptation (DA) ability, though the classification performance is preserved. Therefore, we propose a new compact network architecture and unsupervised DA method in this paper. The DNN is built on a new basic module Conv-M which provides more diverse feature extractors without significantly increasing parameters. The unified framework of our DA method will simultaneously learn invariance across domains, reduce divergence of feature representations, and adapt label prediction. Our DNN has 4.1M parameters, which is only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN obtains GoogLeNet-level accuracy both on classification and DA, and our DA method slightly outperforms previous competitive ones. Put all together, our DA strategy based on our DNN achieves state-of-the-art on sixteen of total eighteen DA tasks on popular Office-31 and Office-Caltech datasets.
Tasks Domain Adaptation, Image Classification, Model Compression
Published 2017-03-12
URL http://arxiv.org/abs/1703.04071v4
PDF http://arxiv.org/pdf/1703.04071v4.pdf
PWC https://paperswithcode.com/paper/a-compact-dnn-approaching-googlenet-level
Repo
Framework

Conformal k-NN Anomaly Detector for Univariate Data Streams

Title Conformal k-NN Anomaly Detector for Univariate Data Streams
Authors Vladislav Ishimtsev, Ivan Nazarov, Alexander Bernstein, Evgeny Burnaev
Abstract Anomalies in time-series data give essential and often actionable information in many applications. In this paper we consider a model-free anomaly detection method for univariate time-series which adapts to non-stationarity in the data stream and provides probabilistic abnormality scores based on the conformal prediction paradigm. Despite its simplicity the method performs on par with complex prediction-based models on the Numenta Anomaly Detection benchmark and the Yahoo! S5 dataset.
Tasks Anomaly Detection, Time Series
Published 2017-06-11
URL http://arxiv.org/abs/1706.03412v1
PDF http://arxiv.org/pdf/1706.03412v1.pdf
PWC https://paperswithcode.com/paper/conformal-k-nn-anomaly-detector-for
Repo
Framework

Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization

Title Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization
Authors Yu Chen, Chunhua Shen, Hao Chen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang
Abstract Landmark/pose estimation in single monocular images have received much effort in computer vision due to its important applications. It remains a challenging task when input images severe occlusions caused by, e.g., adverse camera views. Under such circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity. To address the problem, by incorporating priors about the structure of pose components, we propose a novel structure-aware fully convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, inspired by how human identifies implausible poses, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator G generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors. Training of the network follows the strategy of conditional Generative Adversarial Networks (GANs). The effectiveness of the proposed network is evaluated on three pose-related tasks: 2D single human pose estimation, 2D facial landmark estimation and 3D single human pose estimation. The proposed approach significantly outperforms the state-of-the-art methods and almost always generates plausible pose predictions, demonstrating the usefulness of implicit learning of structures using GANs.
Tasks Pose Estimation
Published 2017-11-01
URL http://arxiv.org/abs/1711.00253v5
PDF http://arxiv.org/pdf/1711.00253v5.pdf
PWC https://paperswithcode.com/paper/adversarial-learning-of-structure-aware-fully
Repo
Framework

Robust Cost-Sensitive Learning for Recommendation with Implicit Feedback

Title Robust Cost-Sensitive Learning for Recommendation with Implicit Feedback
Authors Peng Yang, Peilin Zhao, Xin Gao, Yong Liu
Abstract Recommendation is the task of improving customer experience through personalized recommendation based on users’ past feedback. In this paper, we investigate the most common scenario: the user-item (U-I) matrix of implicit feedback. Even though many recommendation approaches are designed based on implicit feedback, they attempt to project the U-I matrix into a low-rank latent space, which is a strict restriction that rarely holds in practice. In addition, although misclassification costs from imbalanced classes are significantly different, few methods take the cost of classification error into account. To address aforementioned issues, we propose a robust framework by decomposing the U-I matrix into two components: (1) a low-rank matrix that captures the common preference, and (2) a sparse matrix that detects the user-specific preference of individuals. A cost-sensitive learning model is embedded into the framework. Specifically, this model exploits different costs in the loss function for the observed and unobserved instances. We show that the resulting non-smooth convex objective can be optimized efficiently by an accelerated projected gradient method with closed-form solutions. Morever, the proposed algorithm can be scaled up to large-sized datasets after a relaxation. The theoretical result shows that even with a small fraction of 1’s in the U-I matrix $M\in\mathbb{R}^{n\times m}$, the cost-sensitive error of the proposed model is upper bounded by $O(\frac{\alpha}{\sqrt{mn}})$, where $\alpha$ is a bias over imbalanced classes. Finally, empirical experiments are extensively carried out to evaluate the effectiveness of our proposed algorithm. Encouraging experimental results show that our algorithm outperforms several state-of-the-art algorithms on benchmark recommendation datasets.
Tasks
Published 2017-07-03
URL http://arxiv.org/abs/1707.00536v2
PDF http://arxiv.org/pdf/1707.00536v2.pdf
PWC https://paperswithcode.com/paper/robust-cost-sensitive-learning-for
Repo
Framework

Deep Learning with Permutation-invariant Operator for Multi-instance Histopathology Classification

Title Deep Learning with Permutation-invariant Operator for Multi-instance Histopathology Classification
Authors Jakub M. Tomczak, Maximilian Ilse, Max Welling
Abstract The computer-aided analysis of medical scans is a longstanding goal in the medical imaging field. Currently, deep learning has became a dominant methodology for supporting pathologists and radiologist. Deep learning algorithms have been successfully applied to digital pathology and radiology, nevertheless, there are still practical issues that prevent these tools to be widely used in practice. The main obstacles are low number of available cases and large size of images (a.k.a. the small n, large p problem in machine learning), and a very limited access to annotation at a pixel level that can lead to severe overfitting and large computational requirements. We propose to handle these issues by introducing a framework that processes a medical image as a collection of small patches using a single, shared neural network. The final diagnosis is provided by combining scores of individual patches using a permutation-invariant operator (combination). In machine learning community such approach is called a multi-instance learning (MIL).
Tasks
Published 2017-12-01
URL http://arxiv.org/abs/1712.00310v2
PDF http://arxiv.org/pdf/1712.00310v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-with-permutation-invariant
Repo
Framework

Robust Rigid Point Registration based on Convolution of Adaptive Gaussian Mixture Models

Title Robust Rigid Point Registration based on Convolution of Adaptive Gaussian Mixture Models
Authors Can Pu, Nanbo Li, Robert B Fisher
Abstract Matching 3D rigid point clouds in complex environments robustly and accurately is still a core technique used in many applications. This paper proposes a new architecture combining error estimation from sample covariances and dual global probability alignment based on the convolution of adaptive Gaussian Mixture Models (GMM) from point clouds. Firstly, a novel adaptive GMM is defined using probability distributions from the corresponding points. Then rigid point cloud alignment is performed by maximizing the global probability from the convolution of dual adaptive GMMs in the whole 2D or 3D space, which can be efficiently optimized and has a large zone of accurate convergence. Thousands of trials have been conducted on 200 models from public 2D and 3D datasets to demonstrate superior robustness and accuracy in complex environments with unpredictable noise, outliers, occlusion, initial rotation, shape and missing points.
Tasks
Published 2017-07-26
URL http://arxiv.org/abs/1707.08626v1
PDF http://arxiv.org/pdf/1707.08626v1.pdf
PWC https://paperswithcode.com/paper/robust-rigid-point-registration-based-on
Repo
Framework

C-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of Kidney Stones

Title C-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of Kidney Stones
Authors Nuhad A. Malalla, Ying Chen
Abstract In this paper, we investigated a C-arm tomographic technique as a new three dimensional (3D) kidney imaging method for nephrolithiasis and kidney stone detection over view angle less than 180o. Our C-arm tomographic technique provides a series of two dimensional (2D) images with a single scan over 40o view angle. Experimental studies were performed with a kidney phantom that was formed from a pig kidney with two embedded kidney stones. Different reconstruction methods were developed for C-arm tomographic technique to generate 3D kidney information including: point by point back projection (BP), filtered back projection (FBP), simultaneous algebraic reconstruction technique (SART) and maximum likelihood expectation maximization (MLEM). Computer simulation study was also done with simulated 3D spherical object to evaluate the reconstruction results. Preliminary results demonstrated the capability of our C-arm tomographic technique to generate 3D kidney information for kidney stone detection with low exposure of radiation. The kidney stones are visible on reconstructed planes with identifiable shapes and sizes.
Tasks
Published 2017-06-08
URL http://arxiv.org/abs/1706.02425v1
PDF http://arxiv.org/pdf/1706.02425v1.pdf
PWC https://paperswithcode.com/paper/c-arm-tomographic-imaging-technique-for
Repo
Framework

Content-Based Table Retrieval for Web Queries

Title Content-Based Table Retrieval for Web Queries
Authors Zhao Yan, Duyu Tang, Nan Duan, Junwei Bao, Yuanhua Lv, Ming Zhou, Zhoujun Li
Abstract Understanding the connections between unstructured text and semi-structured table is an important yet neglected problem in natural language processing. In this work, we focus on content-based table retrieval. Given a query, the task is to find the most relevant table from a collection of tables. Further progress towards improving this area requires powerful models of semantic matching and richer training and evaluation resources. To remedy this, we present a ranking based approach, and implement both carefully designed features and neural network architectures to measure the relevance between a query and the content of a table. Furthermore, we release an open-domain dataset that includes 21,113 web queries for 273,816 tables. We conduct comprehensive experiments on both real world and synthetic datasets. Results verify the effectiveness of our approach and present the challenges for this task.
Tasks
Published 2017-06-08
URL http://arxiv.org/abs/1706.02427v1
PDF http://arxiv.org/pdf/1706.02427v1.pdf
PWC https://paperswithcode.com/paper/content-based-table-retrieval-for-web-queries
Repo
Framework

Line-Circle: A Geometric Filter for Single Camera Edge-Based Object Detection

Title Line-Circle: A Geometric Filter for Single Camera Edge-Based Object Detection
Authors Seyed Amir Tafrishi, Vahid E. Kandjani
Abstract This paper presents a state-of-the-art approach in object detection for being applied in future SLAM problems. Although, many SLAM methods are proposed to create suitable autonomy for mobile robots namely ground vehicles, they still face overconfidence and large computations during entrance to immense spaces with many landmarks. In particular, they suffer from impractical applications via sole reliance on the limited sensors like camera. Proposed method claims that unmanned ground vehicles without having huge amount of database for object definition and highly advance prediction parameters can deal with incoming objects during straight motion of camera in real-time. Line-Circle (LC) filter tries to apply detection, tracking and learning to each defined experts to obtain more information for judging scene without over-calculation. In this filter, circle expert let us summarize edges in groups. The Interactive feedback learning between each expert creates minimal error that fights against overwhelming landmark signs in crowded scenes without mapping. Our experts basically are dependent on trust factors’ covariance with geometric definitions to ignore, emerge and compare detected landmarks. The experiment for validating the model is taken place utilizing a camera beside an IMU sensor for location estimation.
Tasks Object Detection
Published 2017-07-25
URL http://arxiv.org/abs/1707.08095v1
PDF http://arxiv.org/pdf/1707.08095v1.pdf
PWC https://paperswithcode.com/paper/line-circle-a-geometric-filter-for-single
Repo
Framework
comments powered by Disqus