July 29, 2019

3195 words 15 mins read

Paper Group ANR 40

Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning. “RAPID” Regions-of-Interest Detection In Big Histopathological Images. Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning. Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video. Sequential de …

Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning


Title	Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning
Authors	Gustav Larsson
Abstract	The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the need for large amounts of labeled data to train the model. This is associated with a costly human annotation effort. To address this concern, with the long-term goal of leveraging the abundance of cheap unlabeled data, we explore methods of unsupervised “pre-training.” In particular, we propose to use self-supervised automatic image colorization. We show that traditional methods for unsupervised learning, such as layer-wise clustering or autoencoders, remain inferior to supervised pre-training. In search for an alternative, we develop a fully automatic image colorization method. Our method sets a new state-of-the-art in revitalizing old black-and-white photography, without requiring human effort or expertise. Additionally, it gives us a method for self-supervised representation learning. In order for the model to appropriately re-color a grayscale object, it must first be able to identify it. This ability, learned entirely self-supervised, can be used to improve other visual tasks, such as classification and semantic segmentation. As a future direction for self-supervision, we investigate if multiple proxy tasks can be combined to improve generalization. This turns out to be a challenging open problem. We hope that our contributions to this endeavor will provide a foundation for future efforts in making self-supervision compete with supervised pre-training.
Tasks	Colorization, Representation Learning, Semantic Segmentation
Published	2017-08-19
URL	http://arxiv.org/abs/1708.05812v1
PDF	http://arxiv.org/pdf/1708.05812v1.pdf
PWC	https://paperswithcode.com/paper/discovery-of-visual-semantics-by-unsupervised
Repo
Framework

“RAPID” Regions-of-Interest Detection In Big Histopathological Images


Title	“RAPID” Regions-of-Interest Detection In Big Histopathological Images
Authors	Li Sulimowicz, Ishfaq Ahmad
Abstract	The sheer volume and size of histopathological images (e.g.,10^6 MPixel) underscores the need for faster and more accurate Regions-of-interest (ROI) detection algorithms. In this paper, we propose such an algorithm, which has four main components that help achieve greater accuracy and faster speed: First, while using coarse-to-fine topology preserving segmentation as the baseline, the proposed algorithm uses a superpixel regularity optimization scheme for avoiding irregular and extremely small superpixels. Second, the proposed technique employs a prediction strategy to focus only on important superpixels at finer image levels. Third, the algorithm reuses the information gained from the coarsest image level at other finer image levels. Both the second and the third components drastically lower the complexity. Fourth, the algorithm employs a highly effective parallelization scheme using adap- tive data partitioning, which gains high speedup. Experimental results, conducted on the BSD500 [1] and 500 whole-slide histological images from the National Lung Screening Trial (NLST)1 dataset, confirm that the proposed algorithm gained 13 times speedup compared with the baseline, and around 160 times compared with SLIC [11], without losing accuracy.
Tasks
Published	2017-04-07
URL	http://arxiv.org/abs/1704.02083v1
PDF	http://arxiv.org/pdf/1704.02083v1.pdf
PWC	https://paperswithcode.com/paper/rapid-regions-of-interest-detection-in-big
Repo
Framework

Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning


Title	Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning
Authors	Michael S. Ryoo, Kiyoon Kim, Hyun Jong Yang
Abstract	This paper presents an approach for recognizing human activities from extreme low resolution (e.g., 16x12) videos. Extreme low resolution recognition is not only necessary for analyzing actions at a distance but also is crucial for enabling privacy-preserving recognition of human activities. We design a new two-stream multi-Siamese convolutional neural network. The idea is to explicitly capture the inherent property of low resolution (LR) videos that two images originated from the exact same scene often have totally different pixel values depending on their LR transformations. Our approach learns the shared embedding space that maps LR videos with the same content to the same location regardless of their transformations. We experimentally confirm that our approach of jointly learning such transform robust LR video representation and the classifier outperforms the previous state-of-the-art low resolution recognition approaches on two public standard datasets by a meaningful margin.
Tasks	Activity Recognition
Published	2017-08-03
URL	http://arxiv.org/abs/1708.00999v2
PDF	http://arxiv.org/pdf/1708.00999v2.pdf
PWC	https://paperswithcode.com/paper/extreme-low-resolution-activity-recognition
Repo
Framework

Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video


Title	Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video
Authors	Weilin Huang, Christopher P. Bridge, J. Alison Noble, Andrew Zisserman
Abstract	We present an automatic method to describe clinically useful information about scanning, and to guide image interpretation in ultrasound (US) videos of the fetal heart. Our method is able to jointly predict the visibility, viewing plane, location and orientation of the fetal heart at the frame level. The contributions of the paper are three-fold: (i) a convolutional neural network architecture is developed for a multi-task prediction, which is computed by sliding a 3x3 window spatially through convolutional maps. (ii) an anchor mechanism and Intersection over Union (IoU) loss are applied for improving localization accuracy. (iii) a recurrent architecture is designed to recursively compute regional convolutional features temporally over sequential frames, allowing each prediction to be conditioned on the whole video. This results in a spatial-temporal model that precisely describes detailed heart parameters in challenging US videos. We report results on a real-world clinical dataset, where our method achieves performance on par with expert annotations.
Tasks
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00665v1
PDF	http://arxiv.org/pdf/1707.00665v1.pdf
PWC	https://paperswithcode.com/paper/temporal-heartnet-towards-human-level
Repo
Framework

Sequential design of experiments to estimate a probability of exceeding a threshold in a multi-fidelity stochastic simulator


Title	Sequential design of experiments to estimate a probability of exceeding a threshold in a multi-fidelity stochastic simulator
Authors	Rémi Stroh, Séverine Demeyer, Nicolas Fischer, Julien Bect, Emmanuel Vazquez
Abstract	In this article, we consider a stochastic numerical simulator to assess the impact of some factors on a phenomenon. The simulator is seen as a black box with inputs and outputs. The quality of a simulation, hereafter referred to as fidelity, is assumed to be tunable by means of an additional input of the simulator (e.g., a mesh size parameter): high-fidelity simulations provide more accurate results, but are time-consuming. Using a limited computation-time budget, we want to estimate, for any value of the physical inputs, the probability that a certain scalar output of the simulator will exceed a given critical threshold at the highest fidelity level. The problem is addressed in a Bayesian framework, using a Gaussian process model of the multi-fidelity simulator. We consider a Bayesian estimator of the probability, together with an associated measure of uncertainty, and propose a new multi-fidelity sequential design strategy, called Maximum Speed of Uncertainty Reduction (MSUR), to select the value of physical inputs and the fidelity level of new simulations. The MSUR strategy is tested on an example.
Tasks
Published	2017-07-26
URL	http://arxiv.org/abs/1707.08384v1
PDF	http://arxiv.org/pdf/1707.08384v1.pdf
PWC	https://paperswithcode.com/paper/sequential-design-of-experiments-to-estimate
Repo
Framework

Principles and Examples of Plausible Reasoning and Propositional Plausible Logic


Title	Principles and Examples of Plausible Reasoning and Propositional Plausible Logic
Authors	David Billington
Abstract	Plausible reasoning concerns situations whose inherent lack of precision is not quantified; that is, there are no degrees or levels of precision, and hence no use of numbers like probabilities. A hopefully comprehensive set of principles that clarifies what it means for a formal logic to do plausible reasoning is presented. A new propositional logic, called Propositional Plausible Logic (PPL), is defined and applied to some important examples. PPL is the only non-numeric non-monotonic logic we know of that satisfies all the principles and correctly reasons with all the examples. Some important results about PPL are proved.
Tasks
Published	2017-03-06
URL	http://arxiv.org/abs/1703.01697v2
PDF	http://arxiv.org/pdf/1703.01697v2.pdf
PWC	https://paperswithcode.com/paper/principles-and-examples-of-plausible
Repo
Framework

People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities


Title	People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities
Authors	Subhabrata Mukherjee, Gerhard Weikum
Abstract	Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of “citizen journalists” in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities.
Tasks
Published	2017-05-07
URL	http://arxiv.org/abs/1705.02667v2
PDF	http://arxiv.org/pdf/1705.02667v2.pdf
PWC	https://paperswithcode.com/paper/people-on-media-jointly-identifying-credible
Repo
Framework

cGAN-based Manga Colorization Using a Single Training Image


Title	cGAN-based Manga Colorization Using a Single Training Image
Authors	Paulina Hensman, Kiyoharu Aizawa
Abstract	The Japanese comic format known as Manga is popular all over the world. It is traditionally produced in black and white, and colorization is time consuming and costly. Automatic colorization methods generally rely on greyscale values, which are not present in manga. Furthermore, due to copyright protection, colorized manga available for training is scarce. We propose a manga colorization method based on conditional Generative Adversarial Networks (cGAN). Unlike previous cGAN approaches that use many hundreds or thousands of training images, our method requires only a single colorized reference image for training, avoiding the need of a large dataset. Colorizing manga using cGANs can produce blurry results with artifacts, and the resolution is limited. We therefore also propose a method of segmentation and color-correction to mitigate these issues. The final results are sharp, clear, and in high resolution, and stay true to the character’s original color scheme.
Tasks	Colorization
Published	2017-06-21
URL	http://arxiv.org/abs/1706.06918v1
PDF	http://arxiv.org/pdf/1706.06918v1.pdf
PWC	https://paperswithcode.com/paper/cgan-based-manga-colorization-using-a-single
Repo
Framework

Information Theoretic Analysis of DNN-HMM Acoustic Modeling


Title	Information Theoretic Analysis of DNN-HMM Acoustic Modeling
Authors	Pranay Dighe, Afsaneh Asaei, Hervé Bourlard
Abstract	We propose an information theoretic framework for quantitative assessment of acoustic modeling for hidden Markov model (HMM) based automatic speech recognition (ASR). Acoustic modeling yields the probabilities of HMM sub-word states for a short temporal window of speech acoustic features. We cast ASR as a communication channel where the input sub-word probabilities convey the information about the output HMM state sequence. The quality of the acoustic model is thus quantified in terms of the information transmitted through this channel. The process of inferring the most likely HMM state sequence from the sub-word probabilities is known as decoding. HMM based decoding assumes that an acoustic model yields accurate state-level probabilities and the data distribution given the underlying hidden state is independent of any other state in the sequence. We quantify 1) the acoustic model accuracy and 2) its robustness to mismatch between data and the HMM conditional independence assumption in terms of some mutual information quantities. In this context, exploiting deep neural network (DNN) posterior probabilities leads to a simple and straightforward analysis framework to assess shortcomings of the acoustic model for HMM based decoding. This analysis enables us to evaluate the Gaussian mixture acoustic model (GMM) and the importance of many hidden layers in DNNs without any need of explicit speech recognition. In addition, it sheds light on the contribution of low-dimensional models to enhance acoustic modeling for better compliance with the HMM based decoding requirements.
Tasks	Speech Recognition
Published	2017-08-29
URL	http://arxiv.org/abs/1709.01144v2
PDF	http://arxiv.org/pdf/1709.01144v2.pdf
PWC	https://paperswithcode.com/paper/information-theoretic-analysis-of-dnn-hmm
Repo
Framework

Probabilistic Image Colorization


Title	Probabilistic Image Colorization
Authors	Amelie Royer, Alexander Kolesnikov, Christoph H. Lampert
Abstract	We develop a probabilistic technique for colorizing grayscale natural images. In light of the intrinsic uncertainty of this task, the proposed probabilistic framework has numerous desirable properties. In particular, our model is able to produce multiple plausible and vivid colorizations for a given grayscale image and is one of the first colorization models to provide a proper stochastic sampling scheme. Moreover, our training procedure is supported by a rigorous theoretical framework that does not require any ad hoc heuristics and allows for efficient modeling and learning of the joint pixel color distribution. We demonstrate strong quantitative and qualitative experimental results on the CIFAR-10 dataset and the challenging ILSVRC 2012 dataset.
Tasks	Colorization
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04258v1
PDF	http://arxiv.org/pdf/1705.04258v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-image-colorization
Repo
Framework

Visual Features for Context-Aware Speech Recognition


Title	Visual Features for Context-Aware Speech Recognition
Authors	Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze
Abstract	Automatic transcriptions of consumer-generated multi-media content such as “Youtube” videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited. In this paper, we extend our earlier work on adapting the acoustic model of a DNN-based speech recognition system to an RNN language model and show how both can be adapted to the objects and scenes that can be automatically detected in the video. We are working on a corpus of “how-to” videos from the web, and the idea is that an object that can be seen (“car”), or a scene that is being detected (“kitchen”) can be used to condition both models on the “context” of the recording, thereby reducing perplexity and improving transcription. We achieve good improvements in both cases and compare and analyze the respective reductions in word error rate. We expect that our results can be used for any type of speech processing in which “context” information is available, for example in robotics, man-machine interaction, or when indexing large audio-visual archives, and should ultimately help to bring together the “video-to-text” and “speech-to-text” communities.
Tasks	Language Modelling, Speech Recognition
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00489v1
PDF	http://arxiv.org/pdf/1712.00489v1.pdf
PWC	https://paperswithcode.com/paper/visual-features-for-context-aware-speech
Repo
Framework

View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data


Title	View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data
Authors	Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, Nanning Zheng
Abstract	Skeleton-based human action recognition has recently attracted increasing attention due to the popularity of 3D skeleton data. One main challenge lies in the large view variations in captured human actions. We propose a novel view adaptation scheme to automatically regulate observation viewpoints during the occurrence of an action. Rather than re-positioning the skeletons based on a human defined prior criterion, we design a view adaptive recurrent neural network (RNN) with LSTM architecture, which enables the network itself to adapt to the most suitable observation viewpoints from end to end. Extensive experiment analyses show that the proposed view adaptive RNN model strives to (1) transform the skeletons of various views to much more consistent viewpoints and (2) maintain the continuity of the action rather than transforming every frame to the same position with the same body orientation. Our model achieves significant improvement over the state-of-the-art approaches on three benchmark datasets.
Tasks	Skeleton Based Action Recognition, Temporal Action Localization
Published	2017-03-24
URL	http://arxiv.org/abs/1703.08274v2
PDF	http://arxiv.org/pdf/1703.08274v2.pdf
PWC	https://paperswithcode.com/paper/view-adaptive-recurrent-neural-networks-for
Repo
Framework

Parallel Stochastic Gradient Descent with Sound Combiners


Title	Parallel Stochastic Gradient Descent with Sound Combiners
Authors	Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz
Abstract	Stochastic gradient descent (SGD) is a well known method for regression and classification tasks. However, it is an inherently sequential algorithm at each step, the processing of the current example depends on the parameters learned from the previous examples. Prior approaches to parallelizing linear learners using SGD, such as HOGWILD! and ALLREDUCE, do not honor these dependencies across threads and thus can potentially suffer poor convergence rates and/or poor scalability. This paper proposes SYMSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD. Each thread learns a local model in addition to a model combiner, which allows local models to be combined to produce the same result as what a sequential SGD would have produced. This paper evaluates SYMSGD’s accuracy and performance on 6 datasets on a shared-memory machine shows upto 11x speedup over our heavily optimized sequential baseline on 16 cores and 2.2x, on average, faster than HOGWILD!.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.08030v1
PDF	http://arxiv.org/pdf/1705.08030v1.pdf
PWC	https://paperswithcode.com/paper/parallel-stochastic-gradient-descent-with
Repo
Framework

HMM-based Indic Handwritten Word Recognition using Zone Segmentation


Title	HMM-based Indic Handwritten Word Recognition using Zone Segmentation
Authors	Partha Pratim Roy, Ayan Kumar Bhunia, Ayan Das, Prasenjit Dey, Umapada Pal
Abstract	This paper presents a novel approach towards Indic handwritten word recognition using zone-wise information. Because of complex nature due to compound characters, modifiers, overlapping and touching, etc., character segmentation and recognition is a tedious job in Indic scripts (e.g. Devanagari, Bangla, Gurumukhi, and other similar scripts). To avoid character segmentation in such scripts, HMM-based sequence modeling has been used earlier in holistic way. This paper proposes an efficient word recognition framework by segmenting the handwritten word images horizontally into three zones (upper, middle and lower) and recognize the corresponding zones. The main aim of this zone segmentation approach is to reduce the number of distinct component classes compared to the total number of classes in Indic scripts. As a result, use of this zone segmentation approach enhances the recognition performance of the system. The components in middle zone where characters are mostly touching are recognized using HMM. After the recognition of middle zone, HMM based Viterbi forced alignment is applied to mark the left and right boundaries of the characters. Next, the residue components, if any, in upper and lower zones in their respective boundary are combined to achieve the final word level recognition. Water reservoir feature has been integrated in this framework to improve the zone segmentation and character alignment defects while segmentation. A novel sliding window-based feature, called Pyramid Histogram of Oriented Gradient (PHOG) is proposed for middle zone recognition. An exhaustive experiment is performed on two Indic scripts namely, Bangla and Devanagari for the performance evaluation. From the experiment, it has been noted that proposed zone-wise recognition improves accuracy with respect to the traditional way of Indic word recognition.
Tasks
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00227v1
PDF	http://arxiv.org/pdf/1708.00227v1.pdf
PWC	https://paperswithcode.com/paper/hmm-based-indic-handwritten-word-recognition
Repo
Framework

Hierarchic Kernel Recursive Least-Squares


Title	Hierarchic Kernel Recursive Least-Squares
Authors	Hossein Mohamadipanah, Girish Chowdhary
Abstract	We present a new hierarchic kernel based modeling technique for modeling evenly distributed multidimensional datasets that does not rely on input space sparsification. The presented method reorganizes the typical single-layer kernel based model in a hierarchical structure, such that the weights of a kernel model over each dimension are modeled over the adjacent dimension. We show that the imposition of the hierarchical structure in the kernel based model leads to significant computational speedup and improved modeling accuracy (over an order of magnitude in many cases). For instance the presented method is about five times faster and more accurate than Sparsified Kernel Recursive Least- Squares in modeling of a two-dimensional real-world data set.
Tasks
Published	2017-04-14
URL	http://arxiv.org/abs/1704.04522v1
PDF	http://arxiv.org/pdf/1704.04522v1.pdf
PWC	https://paperswithcode.com/paper/hierarchic-kernel-recursive-least-squares
Repo
Framework