Paper Group ANR 85
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition. Query-free Clothing Retrieval via Implicit Relevance Feedback. Flow-free Video Object Segmentation. Long-Term Video Interpolation with Bidirectional Predictive Network. Reconciling Bayesian Epistemology and Narration-based Approaches to Judiciary Fact-finding. Grounding …
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Title | AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition |
Authors | Chun Yang, Xu-Cheng Yin, Zejun Li, Jianwei Wu, Chunchao Guo, Hongfa Wang, Lei Xiao |
Abstract | Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks). In the end-to-end training procedure for scene text recognition, the outputs of deep neural networks at different iterations are always demonstrated with diversity and complementarity for the target object (text). Here, a simple but effective deep learning method, an adaptive ensemble of deep neural networks (AdaDNNs), is proposed to simply select and adaptively combine classifier components at different iterations from the whole learning system. Furthermore, the ensemble is formulated as a Bayesian framework for classifier weighting and combination. A variety of experiments on several typical acknowledged benchmarks, i.e., ICDAR Robust Reading Competition (Challenge 1, 2 and 4) datasets, verify the surprised improvement from the baseline DNNs, and the effectiveness of AdaDNNs compared with the recent state-of-the-art methods. |
Tasks | Scene Text Recognition |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03425v1 |
http://arxiv.org/pdf/1710.03425v1.pdf | |
PWC | https://paperswithcode.com/paper/adadnns-adaptive-ensemble-of-deep-neural |
Repo | |
Framework | |
Query-free Clothing Retrieval via Implicit Relevance Feedback
Title | Query-free Clothing Retrieval via Implicit Relevance Feedback |
Authors | Zhuoxiang Chen, Zhe Xu, Ya Zhang, Xiao Gu |
Abstract | Image-based clothing retrieval is receiving increasing interest with the growth of online shopping. In practice, users may often have a desired piece of clothing in mind (e.g., either having seen it before on the street or requiring certain specific clothing attributes) but may be unable to supply an image as a query. We model this problem as a new type of image retrieval task in which the target image resides only in the user’s mind (called “mental image retrieval” hereafter). Because of the absence of an explicit query image, we propose to solve this problem through relevance feedback. Specifically, a new Bayesian formulation is proposed that simultaneously models the retrieval target and its high-level representation in the mind of the user (called the “user metric” hereafter) as posterior distributions of pre-fetched shop images and heterogeneous features extracted from multiple clothing attributes, respectively. Requiring only clicks as user feedback, the proposed algorithm is able to account for the variability in human decision-making. Experiments with real users demonstrate the effectiveness of the proposed algorithm. |
Tasks | Decision Making, Image Retrieval |
Published | 2017-11-01 |
URL | http://arxiv.org/abs/1711.00248v1 |
http://arxiv.org/pdf/1711.00248v1.pdf | |
PWC | https://paperswithcode.com/paper/query-free-clothing-retrieval-via-implicit |
Repo | |
Framework | |
Flow-free Video Object Segmentation
Title | Flow-free Video Object Segmentation |
Authors | Aditya Vora, Shanmuganathan Raman |
Abstract | Segmenting foreground object from a video is a challenging task because of the large deformations of the objects, occlusions, and background clutter. In this paper, we propose a frame-by-frame but computationally efficient approach for video object segmentation by clustering visually similar generic object segments throughout the video. Our algorithm segments various object instances appearing in the video and then perform clustering in order to group visually similar segments into one cluster. Since the object that needs to be segmented appears in most part of the video, we can retrieve the foreground segments from the cluster having maximum number of segments, thus filtering out noisy segments that do not represent any object. We then apply a track and fill approach in order to localize the objects in the frames where the object segmentation framework fails to segment any object. Our algorithm performs comparably to the recent automatic methods for video object segmentation when benchmarked on DAVIS dataset while being computationally much faster. |
Tasks | Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2017-06-29 |
URL | http://arxiv.org/abs/1706.09544v1 |
http://arxiv.org/pdf/1706.09544v1.pdf | |
PWC | https://paperswithcode.com/paper/flow-free-video-object-segmentation |
Repo | |
Framework | |
Long-Term Video Interpolation with Bidirectional Predictive Network
Title | Long-Term Video Interpolation with Bidirectional Predictive Network |
Authors | Xiongtao Chen, Wenmin Wang, Jinzhuo Wang, Weimian Li, Baoyang Chen |
Abstract | This paper considers the challenging task of long-term video interpolation. Unlike most existing methods that only generate few intermediate frames between existing adjacent ones, we attempt to speculate or imagine the procedure of an episode and further generate multiple frames between two non-consecutive frames in videos. In this paper, we present a novel deep architecture called bidirectional predictive network (BiPN) that predicts intermediate frames from two opposite directions. The bidirectional architecture allows the model to learn scene transformation with time as well as generate longer video sequences. Besides, our model can be extended to predict multiple possible procedures by sampling different noise vectors. A joint loss composed of clues in image and feature spaces and adversarial loss is designed to train our model. We demonstrate the advantages of BiPN on two benchmarks Moving 2D Shapes and UCF101 and report competitive results to recent approaches. |
Tasks | |
Published | 2017-06-13 |
URL | http://arxiv.org/abs/1706.03947v1 |
http://arxiv.org/pdf/1706.03947v1.pdf | |
PWC | https://paperswithcode.com/paper/long-term-video-interpolation-with |
Repo | |
Framework | |
Reconciling Bayesian Epistemology and Narration-based Approaches to Judiciary Fact-finding
Title | Reconciling Bayesian Epistemology and Narration-based Approaches to Judiciary Fact-finding |
Authors | Rafal Urbaniak |
Abstract | Legal probabilism (LP) claims the degrees of conviction in juridical fact-finding are to be modeled exactly the way degrees of beliefs are modeled in standard bayesian epistemology. Classical legal probabilism (CLP) adds that the conviction is justified if the credence in guilt given the evidence is above an appropriate guilt probability threshold. The views are challenged on various counts, especially by the proponents of the so-called narrative approach, on which the fact-finders’ decision is the result of a dynamic interplay between competing narratives of what happened. I develop a way a bayesian epistemologist can make sense of the narrative approach. I do so by formulating a probabilistic framework for evaluating competing narrations in terms of formal explications of the informal evaluation criteria used in the narrative approach. |
Tasks | |
Published | 2017-07-27 |
URL | http://arxiv.org/abs/1707.08763v1 |
http://arxiv.org/pdf/1707.08763v1.pdf | |
PWC | https://paperswithcode.com/paper/reconciling-bayesian-epistemology-and |
Repo | |
Framework | |
Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction
Title | Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction |
Authors | Mohit Shridhar, David Hsu |
Abstract | The human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is semantic and spatial grounding, which is to infer objects and their spatial relationships from images and natural language expressions. We introduce a two-stage neural-network grounding pipeline that maps natural language referring expressions directly to objects in the images. The first stage uses visual descriptions in the referring expressions to generate a candidate set of relevant objects. The second stage examines all pairwise relationships between the candidates and predicts the most likely referred object according to the spatial descriptions in the referring expressions. A key feature of our system is that by leveraging a large dataset of images labeled with text descriptions, it allows unrestricted object types and natural language referring expressions. Preliminary results indicate that our system outperforms a near state-of-the-art object comprehension system on standard benchmark datasets. We also present a robot system that follows voice commands to pick and place previously unseen objects. |
Tasks | |
Published | 2017-07-18 |
URL | http://arxiv.org/abs/1707.05720v1 |
http://arxiv.org/pdf/1707.05720v1.pdf | |
PWC | https://paperswithcode.com/paper/grounding-spatio-semantic-referring |
Repo | |
Framework | |
A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation
Title | A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation |
Authors | Chunpeng Wu, Wei Wen, Tariq Afzal, Yongmei Zhang, Yiran Chen, Hai Li |
Abstract | Recently, DNN model compression based on network architecture design, e.g., SqueezeNet, attracted a lot attention. No accuracy drop on image classification is observed on these extremely compact networks, compared to well-known models. An emerging question, however, is whether these model compression techniques hurt DNN’s learning ability other than classifying images on a single dataset. Our preliminary experiment shows that these compression methods could degrade domain adaptation (DA) ability, though the classification performance is preserved. Therefore, we propose a new compact network architecture and unsupervised DA method in this paper. The DNN is built on a new basic module Conv-M which provides more diverse feature extractors without significantly increasing parameters. The unified framework of our DA method will simultaneously learn invariance across domains, reduce divergence of feature representations, and adapt label prediction. Our DNN has 4.1M parameters, which is only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN obtains GoogLeNet-level accuracy both on classification and DA, and our DA method slightly outperforms previous competitive ones. Put all together, our DA strategy based on our DNN achieves state-of-the-art on sixteen of total eighteen DA tasks on popular Office-31 and Office-Caltech datasets. |
Tasks | Domain Adaptation, Image Classification, Model Compression |
Published | 2017-03-12 |
URL | http://arxiv.org/abs/1703.04071v4 |
http://arxiv.org/pdf/1703.04071v4.pdf | |
PWC | https://paperswithcode.com/paper/a-compact-dnn-approaching-googlenet-level |
Repo | |
Framework | |
Conformal k-NN Anomaly Detector for Univariate Data Streams
Title | Conformal k-NN Anomaly Detector for Univariate Data Streams |
Authors | Vladislav Ishimtsev, Ivan Nazarov, Alexander Bernstein, Evgeny Burnaev |
Abstract | Anomalies in time-series data give essential and often actionable information in many applications. In this paper we consider a model-free anomaly detection method for univariate time-series which adapts to non-stationarity in the data stream and provides probabilistic abnormality scores based on the conformal prediction paradigm. Despite its simplicity the method performs on par with complex prediction-based models on the Numenta Anomaly Detection benchmark and the Yahoo! S5 dataset. |
Tasks | Anomaly Detection, Time Series |
Published | 2017-06-11 |
URL | http://arxiv.org/abs/1706.03412v1 |
http://arxiv.org/pdf/1706.03412v1.pdf | |
PWC | https://paperswithcode.com/paper/conformal-k-nn-anomaly-detector-for |
Repo | |
Framework | |
Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization
Title | Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization |
Authors | Yu Chen, Chunhua Shen, Hao Chen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang |
Abstract | Landmark/pose estimation in single monocular images have received much effort in computer vision due to its important applications. It remains a challenging task when input images severe occlusions caused by, e.g., adverse camera views. Under such circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity. To address the problem, by incorporating priors about the structure of pose components, we propose a novel structure-aware fully convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, inspired by how human identifies implausible poses, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator G generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors. Training of the network follows the strategy of conditional Generative Adversarial Networks (GANs). The effectiveness of the proposed network is evaluated on three pose-related tasks: 2D single human pose estimation, 2D facial landmark estimation and 3D single human pose estimation. The proposed approach significantly outperforms the state-of-the-art methods and almost always generates plausible pose predictions, demonstrating the usefulness of implicit learning of structures using GANs. |
Tasks | Pose Estimation |
Published | 2017-11-01 |
URL | http://arxiv.org/abs/1711.00253v5 |
http://arxiv.org/pdf/1711.00253v5.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-learning-of-structure-aware-fully |
Repo | |
Framework | |
Robust Cost-Sensitive Learning for Recommendation with Implicit Feedback
Title | Robust Cost-Sensitive Learning for Recommendation with Implicit Feedback |
Authors | Peng Yang, Peilin Zhao, Xin Gao, Yong Liu |
Abstract | Recommendation is the task of improving customer experience through personalized recommendation based on users’ past feedback. In this paper, we investigate the most common scenario: the user-item (U-I) matrix of implicit feedback. Even though many recommendation approaches are designed based on implicit feedback, they attempt to project the U-I matrix into a low-rank latent space, which is a strict restriction that rarely holds in practice. In addition, although misclassification costs from imbalanced classes are significantly different, few methods take the cost of classification error into account. To address aforementioned issues, we propose a robust framework by decomposing the U-I matrix into two components: (1) a low-rank matrix that captures the common preference, and (2) a sparse matrix that detects the user-specific preference of individuals. A cost-sensitive learning model is embedded into the framework. Specifically, this model exploits different costs in the loss function for the observed and unobserved instances. We show that the resulting non-smooth convex objective can be optimized efficiently by an accelerated projected gradient method with closed-form solutions. Morever, the proposed algorithm can be scaled up to large-sized datasets after a relaxation. The theoretical result shows that even with a small fraction of 1’s in the U-I matrix $M\in\mathbb{R}^{n\times m}$, the cost-sensitive error of the proposed model is upper bounded by $O(\frac{\alpha}{\sqrt{mn}})$, where $\alpha$ is a bias over imbalanced classes. Finally, empirical experiments are extensively carried out to evaluate the effectiveness of our proposed algorithm. Encouraging experimental results show that our algorithm outperforms several state-of-the-art algorithms on benchmark recommendation datasets. |
Tasks | |
Published | 2017-07-03 |
URL | http://arxiv.org/abs/1707.00536v2 |
http://arxiv.org/pdf/1707.00536v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-cost-sensitive-learning-for |
Repo | |
Framework | |
Deep Learning with Permutation-invariant Operator for Multi-instance Histopathology Classification
Title | Deep Learning with Permutation-invariant Operator for Multi-instance Histopathology Classification |
Authors | Jakub M. Tomczak, Maximilian Ilse, Max Welling |
Abstract | The computer-aided analysis of medical scans is a longstanding goal in the medical imaging field. Currently, deep learning has became a dominant methodology for supporting pathologists and radiologist. Deep learning algorithms have been successfully applied to digital pathology and radiology, nevertheless, there are still practical issues that prevent these tools to be widely used in practice. The main obstacles are low number of available cases and large size of images (a.k.a. the small n, large p problem in machine learning), and a very limited access to annotation at a pixel level that can lead to severe overfitting and large computational requirements. We propose to handle these issues by introducing a framework that processes a medical image as a collection of small patches using a single, shared neural network. The final diagnosis is provided by combining scores of individual patches using a permutation-invariant operator (combination). In machine learning community such approach is called a multi-instance learning (MIL). |
Tasks | |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.00310v2 |
http://arxiv.org/pdf/1712.00310v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-with-permutation-invariant |
Repo | |
Framework | |
Robust Rigid Point Registration based on Convolution of Adaptive Gaussian Mixture Models
Title | Robust Rigid Point Registration based on Convolution of Adaptive Gaussian Mixture Models |
Authors | Can Pu, Nanbo Li, Robert B Fisher |
Abstract | Matching 3D rigid point clouds in complex environments robustly and accurately is still a core technique used in many applications. This paper proposes a new architecture combining error estimation from sample covariances and dual global probability alignment based on the convolution of adaptive Gaussian Mixture Models (GMM) from point clouds. Firstly, a novel adaptive GMM is defined using probability distributions from the corresponding points. Then rigid point cloud alignment is performed by maximizing the global probability from the convolution of dual adaptive GMMs in the whole 2D or 3D space, which can be efficiently optimized and has a large zone of accurate convergence. Thousands of trials have been conducted on 200 models from public 2D and 3D datasets to demonstrate superior robustness and accuracy in complex environments with unpredictable noise, outliers, occlusion, initial rotation, shape and missing points. |
Tasks | |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08626v1 |
http://arxiv.org/pdf/1707.08626v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-rigid-point-registration-based-on |
Repo | |
Framework | |
C-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of Kidney Stones
Title | C-arm Tomographic Imaging Technique for Nephrolithiasis and Detection of Kidney Stones |
Authors | Nuhad A. Malalla, Ying Chen |
Abstract | In this paper, we investigated a C-arm tomographic technique as a new three dimensional (3D) kidney imaging method for nephrolithiasis and kidney stone detection over view angle less than 180o. Our C-arm tomographic technique provides a series of two dimensional (2D) images with a single scan over 40o view angle. Experimental studies were performed with a kidney phantom that was formed from a pig kidney with two embedded kidney stones. Different reconstruction methods were developed for C-arm tomographic technique to generate 3D kidney information including: point by point back projection (BP), filtered back projection (FBP), simultaneous algebraic reconstruction technique (SART) and maximum likelihood expectation maximization (MLEM). Computer simulation study was also done with simulated 3D spherical object to evaluate the reconstruction results. Preliminary results demonstrated the capability of our C-arm tomographic technique to generate 3D kidney information for kidney stone detection with low exposure of radiation. The kidney stones are visible on reconstructed planes with identifiable shapes and sizes. |
Tasks | |
Published | 2017-06-08 |
URL | http://arxiv.org/abs/1706.02425v1 |
http://arxiv.org/pdf/1706.02425v1.pdf | |
PWC | https://paperswithcode.com/paper/c-arm-tomographic-imaging-technique-for |
Repo | |
Framework | |
Content-Based Table Retrieval for Web Queries
Title | Content-Based Table Retrieval for Web Queries |
Authors | Zhao Yan, Duyu Tang, Nan Duan, Junwei Bao, Yuanhua Lv, Ming Zhou, Zhoujun Li |
Abstract | Understanding the connections between unstructured text and semi-structured table is an important yet neglected problem in natural language processing. In this work, we focus on content-based table retrieval. Given a query, the task is to find the most relevant table from a collection of tables. Further progress towards improving this area requires powerful models of semantic matching and richer training and evaluation resources. To remedy this, we present a ranking based approach, and implement both carefully designed features and neural network architectures to measure the relevance between a query and the content of a table. Furthermore, we release an open-domain dataset that includes 21,113 web queries for 273,816 tables. We conduct comprehensive experiments on both real world and synthetic datasets. Results verify the effectiveness of our approach and present the challenges for this task. |
Tasks | |
Published | 2017-06-08 |
URL | http://arxiv.org/abs/1706.02427v1 |
http://arxiv.org/pdf/1706.02427v1.pdf | |
PWC | https://paperswithcode.com/paper/content-based-table-retrieval-for-web-queries |
Repo | |
Framework | |
Line-Circle: A Geometric Filter for Single Camera Edge-Based Object Detection
Title | Line-Circle: A Geometric Filter for Single Camera Edge-Based Object Detection |
Authors | Seyed Amir Tafrishi, Vahid E. Kandjani |
Abstract | This paper presents a state-of-the-art approach in object detection for being applied in future SLAM problems. Although, many SLAM methods are proposed to create suitable autonomy for mobile robots namely ground vehicles, they still face overconfidence and large computations during entrance to immense spaces with many landmarks. In particular, they suffer from impractical applications via sole reliance on the limited sensors like camera. Proposed method claims that unmanned ground vehicles without having huge amount of database for object definition and highly advance prediction parameters can deal with incoming objects during straight motion of camera in real-time. Line-Circle (LC) filter tries to apply detection, tracking and learning to each defined experts to obtain more information for judging scene without over-calculation. In this filter, circle expert let us summarize edges in groups. The Interactive feedback learning between each expert creates minimal error that fights against overwhelming landmark signs in crowded scenes without mapping. Our experts basically are dependent on trust factors’ covariance with geometric definitions to ignore, emerge and compare detected landmarks. The experiment for validating the model is taken place utilizing a camera beside an IMU sensor for location estimation. |
Tasks | Object Detection |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.08095v1 |
http://arxiv.org/pdf/1707.08095v1.pdf | |
PWC | https://paperswithcode.com/paper/line-circle-a-geometric-filter-for-single |
Repo | |
Framework | |