Paper Group ANR 40
Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning. “RAPID” Regions-of-Interest Detection In Big Histopathological Images. Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning. Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video. Sequential de …
Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning
Title | Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning |
Authors | Gustav Larsson |
Abstract | The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the need for large amounts of labeled data to train the model. This is associated with a costly human annotation effort. To address this concern, with the long-term goal of leveraging the abundance of cheap unlabeled data, we explore methods of unsupervised “pre-training.” In particular, we propose to use self-supervised automatic image colorization. We show that traditional methods for unsupervised learning, such as layer-wise clustering or autoencoders, remain inferior to supervised pre-training. In search for an alternative, we develop a fully automatic image colorization method. Our method sets a new state-of-the-art in revitalizing old black-and-white photography, without requiring human effort or expertise. Additionally, it gives us a method for self-supervised representation learning. In order for the model to appropriately re-color a grayscale object, it must first be able to identify it. This ability, learned entirely self-supervised, can be used to improve other visual tasks, such as classification and semantic segmentation. As a future direction for self-supervision, we investigate if multiple proxy tasks can be combined to improve generalization. This turns out to be a challenging open problem. We hope that our contributions to this endeavor will provide a foundation for future efforts in making self-supervision compete with supervised pre-training. |
Tasks | Colorization, Representation Learning, Semantic Segmentation |
Published | 2017-08-19 |
URL | http://arxiv.org/abs/1708.05812v1 |
http://arxiv.org/pdf/1708.05812v1.pdf | |
PWC | https://paperswithcode.com/paper/discovery-of-visual-semantics-by-unsupervised |
Repo | |
Framework | |
“RAPID” Regions-of-Interest Detection In Big Histopathological Images
Title | “RAPID” Regions-of-Interest Detection In Big Histopathological Images |
Authors | Li Sulimowicz, Ishfaq Ahmad |
Abstract | The sheer volume and size of histopathological images (e.g.,10^6 MPixel) underscores the need for faster and more accurate Regions-of-interest (ROI) detection algorithms. In this paper, we propose such an algorithm, which has four main components that help achieve greater accuracy and faster speed: First, while using coarse-to-fine topology preserving segmentation as the baseline, the proposed algorithm uses a superpixel regularity optimization scheme for avoiding irregular and extremely small superpixels. Second, the proposed technique employs a prediction strategy to focus only on important superpixels at finer image levels. Third, the algorithm reuses the information gained from the coarsest image level at other finer image levels. Both the second and the third components drastically lower the complexity. Fourth, the algorithm employs a highly effective parallelization scheme using adap- tive data partitioning, which gains high speedup. Experimental results, conducted on the BSD500 [1] and 500 whole-slide histological images from the National Lung Screening Trial (NLST)1 dataset, confirm that the proposed algorithm gained 13 times speedup compared with the baseline, and around 160 times compared with SLIC [11], without losing accuracy. |
Tasks | |
Published | 2017-04-07 |
URL | http://arxiv.org/abs/1704.02083v1 |
http://arxiv.org/pdf/1704.02083v1.pdf | |
PWC | https://paperswithcode.com/paper/rapid-regions-of-interest-detection-in-big |
Repo | |
Framework | |
Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning
Title | Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning |
Authors | Michael S. Ryoo, Kiyoon Kim, Hyun Jong Yang |
Abstract | This paper presents an approach for recognizing human activities from extreme low resolution (e.g., 16x12) videos. Extreme low resolution recognition is not only necessary for analyzing actions at a distance but also is crucial for enabling privacy-preserving recognition of human activities. We design a new two-stream multi-Siamese convolutional neural network. The idea is to explicitly capture the inherent property of low resolution (LR) videos that two images originated from the exact same scene often have totally different pixel values depending on their LR transformations. Our approach learns the shared embedding space that maps LR videos with the same content to the same location regardless of their transformations. We experimentally confirm that our approach of jointly learning such transform robust LR video representation and the classifier outperforms the previous state-of-the-art low resolution recognition approaches on two public standard datasets by a meaningful margin. |
Tasks | Activity Recognition |
Published | 2017-08-03 |
URL | http://arxiv.org/abs/1708.00999v2 |
http://arxiv.org/pdf/1708.00999v2.pdf | |
PWC | https://paperswithcode.com/paper/extreme-low-resolution-activity-recognition |
Repo | |
Framework | |
Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video
Title | Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video |
Authors | Weilin Huang, Christopher P. Bridge, J. Alison Noble, Andrew Zisserman |
Abstract | We present an automatic method to describe clinically useful information about scanning, and to guide image interpretation in ultrasound (US) videos of the fetal heart. Our method is able to jointly predict the visibility, viewing plane, location and orientation of the fetal heart at the frame level. The contributions of the paper are three-fold: (i) a convolutional neural network architecture is developed for a multi-task prediction, which is computed by sliding a 3x3 window spatially through convolutional maps. (ii) an anchor mechanism and Intersection over Union (IoU) loss are applied for improving localization accuracy. (iii) a recurrent architecture is designed to recursively compute regional convolutional features temporally over sequential frames, allowing each prediction to be conditioned on the whole video. This results in a spatial-temporal model that precisely describes detailed heart parameters in challenging US videos. We report results on a real-world clinical dataset, where our method achieves performance on par with expert annotations. |
Tasks | |
Published | 2017-07-03 |
URL | http://arxiv.org/abs/1707.00665v1 |
http://arxiv.org/pdf/1707.00665v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-heartnet-towards-human-level |
Repo | |
Framework | |
Sequential design of experiments to estimate a probability of exceeding a threshold in a multi-fidelity stochastic simulator
Title | Sequential design of experiments to estimate a probability of exceeding a threshold in a multi-fidelity stochastic simulator |
Authors | Rémi Stroh, Séverine Demeyer, Nicolas Fischer, Julien Bect, Emmanuel Vazquez |
Abstract | In this article, we consider a stochastic numerical simulator to assess the impact of some factors on a phenomenon. The simulator is seen as a black box with inputs and outputs. The quality of a simulation, hereafter referred to as fidelity, is assumed to be tunable by means of an additional input of the simulator (e.g., a mesh size parameter): high-fidelity simulations provide more accurate results, but are time-consuming. Using a limited computation-time budget, we want to estimate, for any value of the physical inputs, the probability that a certain scalar output of the simulator will exceed a given critical threshold at the highest fidelity level. The problem is addressed in a Bayesian framework, using a Gaussian process model of the multi-fidelity simulator. We consider a Bayesian estimator of the probability, together with an associated measure of uncertainty, and propose a new multi-fidelity sequential design strategy, called Maximum Speed of Uncertainty Reduction (MSUR), to select the value of physical inputs and the fidelity level of new simulations. The MSUR strategy is tested on an example. |
Tasks | |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08384v1 |
http://arxiv.org/pdf/1707.08384v1.pdf | |
PWC | https://paperswithcode.com/paper/sequential-design-of-experiments-to-estimate |
Repo | |
Framework | |
Principles and Examples of Plausible Reasoning and Propositional Plausible Logic
Title | Principles and Examples of Plausible Reasoning and Propositional Plausible Logic |
Authors | David Billington |
Abstract | Plausible reasoning concerns situations whose inherent lack of precision is not quantified; that is, there are no degrees or levels of precision, and hence no use of numbers like probabilities. A hopefully comprehensive set of principles that clarifies what it means for a formal logic to do plausible reasoning is presented. A new propositional logic, called Propositional Plausible Logic (PPL), is defined and applied to some important examples. PPL is the only non-numeric non-monotonic logic we know of that satisfies all the principles and correctly reasons with all the examples. Some important results about PPL are proved. |
Tasks | |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.01697v2 |
http://arxiv.org/pdf/1703.01697v2.pdf | |
PWC | https://paperswithcode.com/paper/principles-and-examples-of-plausible |
Repo | |
Framework | |
People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities
Title | People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities |
Authors | Subhabrata Mukherjee, Gerhard Weikum |
Abstract | Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of “citizen journalists” in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities. |
Tasks | |
Published | 2017-05-07 |
URL | http://arxiv.org/abs/1705.02667v2 |
http://arxiv.org/pdf/1705.02667v2.pdf | |
PWC | https://paperswithcode.com/paper/people-on-media-jointly-identifying-credible |
Repo | |
Framework | |
cGAN-based Manga Colorization Using a Single Training Image
Title | cGAN-based Manga Colorization Using a Single Training Image |
Authors | Paulina Hensman, Kiyoharu Aizawa |
Abstract | The Japanese comic format known as Manga is popular all over the world. It is traditionally produced in black and white, and colorization is time consuming and costly. Automatic colorization methods generally rely on greyscale values, which are not present in manga. Furthermore, due to copyright protection, colorized manga available for training is scarce. We propose a manga colorization method based on conditional Generative Adversarial Networks (cGAN). Unlike previous cGAN approaches that use many hundreds or thousands of training images, our method requires only a single colorized reference image for training, avoiding the need of a large dataset. Colorizing manga using cGANs can produce blurry results with artifacts, and the resolution is limited. We therefore also propose a method of segmentation and color-correction to mitigate these issues. The final results are sharp, clear, and in high resolution, and stay true to the character’s original color scheme. |
Tasks | Colorization |
Published | 2017-06-21 |
URL | http://arxiv.org/abs/1706.06918v1 |
http://arxiv.org/pdf/1706.06918v1.pdf | |
PWC | https://paperswithcode.com/paper/cgan-based-manga-colorization-using-a-single |
Repo | |
Framework | |
Information Theoretic Analysis of DNN-HMM Acoustic Modeling
Title | Information Theoretic Analysis of DNN-HMM Acoustic Modeling |
Authors | Pranay Dighe, Afsaneh Asaei, Hervé Bourlard |
Abstract | We propose an information theoretic framework for quantitative assessment of acoustic modeling for hidden Markov model (HMM) based automatic speech recognition (ASR). Acoustic modeling yields the probabilities of HMM sub-word states for a short temporal window of speech acoustic features. We cast ASR as a communication channel where the input sub-word probabilities convey the information about the output HMM state sequence. The quality of the acoustic model is thus quantified in terms of the information transmitted through this channel. The process of inferring the most likely HMM state sequence from the sub-word probabilities is known as decoding. HMM based decoding assumes that an acoustic model yields accurate state-level probabilities and the data distribution given the underlying hidden state is independent of any other state in the sequence. We quantify 1) the acoustic model accuracy and 2) its robustness to mismatch between data and the HMM conditional independence assumption in terms of some mutual information quantities. In this context, exploiting deep neural network (DNN) posterior probabilities leads to a simple and straightforward analysis framework to assess shortcomings of the acoustic model for HMM based decoding. This analysis enables us to evaluate the Gaussian mixture acoustic model (GMM) and the importance of many hidden layers in DNNs without any need of explicit speech recognition. In addition, it sheds light on the contribution of low-dimensional models to enhance acoustic modeling for better compliance with the HMM based decoding requirements. |
Tasks | Speech Recognition |
Published | 2017-08-29 |
URL | http://arxiv.org/abs/1709.01144v2 |
http://arxiv.org/pdf/1709.01144v2.pdf | |
PWC | https://paperswithcode.com/paper/information-theoretic-analysis-of-dnn-hmm |
Repo | |
Framework | |
Probabilistic Image Colorization
Title | Probabilistic Image Colorization |
Authors | Amelie Royer, Alexander Kolesnikov, Christoph H. Lampert |
Abstract | We develop a probabilistic technique for colorizing grayscale natural images. In light of the intrinsic uncertainty of this task, the proposed probabilistic framework has numerous desirable properties. In particular, our model is able to produce multiple plausible and vivid colorizations for a given grayscale image and is one of the first colorization models to provide a proper stochastic sampling scheme. Moreover, our training procedure is supported by a rigorous theoretical framework that does not require any ad hoc heuristics and allows for efficient modeling and learning of the joint pixel color distribution. We demonstrate strong quantitative and qualitative experimental results on the CIFAR-10 dataset and the challenging ILSVRC 2012 dataset. |
Tasks | Colorization |
Published | 2017-05-11 |
URL | http://arxiv.org/abs/1705.04258v1 |
http://arxiv.org/pdf/1705.04258v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-image-colorization |
Repo | |
Framework | |
Visual Features for Context-Aware Speech Recognition
Title | Visual Features for Context-Aware Speech Recognition |
Authors | Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze |
Abstract | Automatic transcriptions of consumer-generated multi-media content such as “Youtube” videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited. In this paper, we extend our earlier work on adapting the acoustic model of a DNN-based speech recognition system to an RNN language model and show how both can be adapted to the objects and scenes that can be automatically detected in the video. We are working on a corpus of “how-to” videos from the web, and the idea is that an object that can be seen (“car”), or a scene that is being detected (“kitchen”) can be used to condition both models on the “context” of the recording, thereby reducing perplexity and improving transcription. We achieve good improvements in both cases and compare and analyze the respective reductions in word error rate. We expect that our results can be used for any type of speech processing in which “context” information is available, for example in robotics, man-machine interaction, or when indexing large audio-visual archives, and should ultimately help to bring together the “video-to-text” and “speech-to-text” communities. |
Tasks | Language Modelling, Speech Recognition |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.00489v1 |
http://arxiv.org/pdf/1712.00489v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-features-for-context-aware-speech |
Repo | |
Framework | |
View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data
Title | View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data |
Authors | Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, Nanning Zheng |
Abstract | Skeleton-based human action recognition has recently attracted increasing attention due to the popularity of 3D skeleton data. One main challenge lies in the large view variations in captured human actions. We propose a novel view adaptation scheme to automatically regulate observation viewpoints during the occurrence of an action. Rather than re-positioning the skeletons based on a human defined prior criterion, we design a view adaptive recurrent neural network (RNN) with LSTM architecture, which enables the network itself to adapt to the most suitable observation viewpoints from end to end. Extensive experiment analyses show that the proposed view adaptive RNN model strives to (1) transform the skeletons of various views to much more consistent viewpoints and (2) maintain the continuity of the action rather than transforming every frame to the same position with the same body orientation. Our model achieves significant improvement over the state-of-the-art approaches on three benchmark datasets. |
Tasks | Skeleton Based Action Recognition, Temporal Action Localization |
Published | 2017-03-24 |
URL | http://arxiv.org/abs/1703.08274v2 |
http://arxiv.org/pdf/1703.08274v2.pdf | |
PWC | https://paperswithcode.com/paper/view-adaptive-recurrent-neural-networks-for |
Repo | |
Framework | |
Parallel Stochastic Gradient Descent with Sound Combiners
Title | Parallel Stochastic Gradient Descent with Sound Combiners |
Authors | Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz |
Abstract | Stochastic gradient descent (SGD) is a well known method for regression and classification tasks. However, it is an inherently sequential algorithm at each step, the processing of the current example depends on the parameters learned from the previous examples. Prior approaches to parallelizing linear learners using SGD, such as HOGWILD! and ALLREDUCE, do not honor these dependencies across threads and thus can potentially suffer poor convergence rates and/or poor scalability. This paper proposes SYMSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD. Each thread learns a local model in addition to a model combiner, which allows local models to be combined to produce the same result as what a sequential SGD would have produced. This paper evaluates SYMSGD’s accuracy and performance on 6 datasets on a shared-memory machine shows upto 11x speedup over our heavily optimized sequential baseline on 16 cores and 2.2x, on average, faster than HOGWILD!. |
Tasks | |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.08030v1 |
http://arxiv.org/pdf/1705.08030v1.pdf | |
PWC | https://paperswithcode.com/paper/parallel-stochastic-gradient-descent-with |
Repo | |
Framework | |
HMM-based Indic Handwritten Word Recognition using Zone Segmentation
Title | HMM-based Indic Handwritten Word Recognition using Zone Segmentation |
Authors | Partha Pratim Roy, Ayan Kumar Bhunia, Ayan Das, Prasenjit Dey, Umapada Pal |
Abstract | This paper presents a novel approach towards Indic handwritten word recognition using zone-wise information. Because of complex nature due to compound characters, modifiers, overlapping and touching, etc., character segmentation and recognition is a tedious job in Indic scripts (e.g. Devanagari, Bangla, Gurumukhi, and other similar scripts). To avoid character segmentation in such scripts, HMM-based sequence modeling has been used earlier in holistic way. This paper proposes an efficient word recognition framework by segmenting the handwritten word images horizontally into three zones (upper, middle and lower) and recognize the corresponding zones. The main aim of this zone segmentation approach is to reduce the number of distinct component classes compared to the total number of classes in Indic scripts. As a result, use of this zone segmentation approach enhances the recognition performance of the system. The components in middle zone where characters are mostly touching are recognized using HMM. After the recognition of middle zone, HMM based Viterbi forced alignment is applied to mark the left and right boundaries of the characters. Next, the residue components, if any, in upper and lower zones in their respective boundary are combined to achieve the final word level recognition. Water reservoir feature has been integrated in this framework to improve the zone segmentation and character alignment defects while segmentation. A novel sliding window-based feature, called Pyramid Histogram of Oriented Gradient (PHOG) is proposed for middle zone recognition. An exhaustive experiment is performed on two Indic scripts namely, Bangla and Devanagari for the performance evaluation. From the experiment, it has been noted that proposed zone-wise recognition improves accuracy with respect to the traditional way of Indic word recognition. |
Tasks | |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00227v1 |
http://arxiv.org/pdf/1708.00227v1.pdf | |
PWC | https://paperswithcode.com/paper/hmm-based-indic-handwritten-word-recognition |
Repo | |
Framework | |
Hierarchic Kernel Recursive Least-Squares
Title | Hierarchic Kernel Recursive Least-Squares |
Authors | Hossein Mohamadipanah, Girish Chowdhary |
Abstract | We present a new hierarchic kernel based modeling technique for modeling evenly distributed multidimensional datasets that does not rely on input space sparsification. The presented method reorganizes the typical single-layer kernel based model in a hierarchical structure, such that the weights of a kernel model over each dimension are modeled over the adjacent dimension. We show that the imposition of the hierarchical structure in the kernel based model leads to significant computational speedup and improved modeling accuracy (over an order of magnitude in many cases). For instance the presented method is about five times faster and more accurate than Sparsified Kernel Recursive Least- Squares in modeling of a two-dimensional real-world data set. |
Tasks | |
Published | 2017-04-14 |
URL | http://arxiv.org/abs/1704.04522v1 |
http://arxiv.org/pdf/1704.04522v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchic-kernel-recursive-least-squares |
Repo | |
Framework | |