July 29, 2019

3423 words 17 mins read

Paper Group ANR 98

Weakly Supervised Semantic Segmentation using Web-Crawled Videos. Rotational Rectification Network: Enabling Pedestrian Detection for Mobile Vision. Bag-of-Vector Embeddings of Dependency Graphs for Semantic Induction. The Origins of Computational Mechanics: A Brief Intellectual History and Several Clarifications. Automatic segmentation of the intr …

Weakly Supervised Semantic Segmentation using Web-Crawled Videos


Title	Weakly Supervised Semantic Segmentation using Web-Crawled Videos
Authors	Seunghoon Hong, Donghun Yeo, Suha Kwak, Honglak Lee, Bohyung Han
Abstract	We propose a novel algorithm for weakly supervised semantic segmentation based on image-level class labels only. In weakly supervised setting, it is commonly observed that trained model overly focuses on discriminative parts rather than the entire object area. Our goal is to overcome this limitation with no additional human intervention by retrieving videos relevant to target class labels from web repository, and generating segmentation labels from the retrieved videos to simulate strong supervision for semantic segmentation. During this process, we take advantage of image classification with discriminative localization technique to reject false alarms in retrieved videos and identify relevant spatio-temporal volumes within retrieved videos. Although the entire procedure does not require any additional supervision, the segmentation annotations obtained from videos are sufficiently strong to learn a model for semantic segmentation. The proposed algorithm substantially outperforms existing methods based on the same level of supervision and is even as competitive as the approaches relying on extra annotations.
Tasks	Image Classification, Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2017-01-02
URL	http://arxiv.org/abs/1701.00352v3
PDF	http://arxiv.org/pdf/1701.00352v3.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-semantic-segmentation-using-1
Repo
Framework

Rotational Rectification Network: Enabling Pedestrian Detection for Mobile Vision


Title	Rotational Rectification Network: Enabling Pedestrian Detection for Mobile Vision
Authors	Xinshuo Weng, Shangxuan Wu, Fares Beainy, Kris Kitani
Abstract	Across a majority of pedestrian detection datasets, it is typically assumed that pedestrians will be standing upright with respect to the image coordinate system. This assumption, however, is not always valid for many vision-equipped mobile platforms such as mobile phones, UAVs or construction vehicles on rugged terrain. In these situations, the motion of the camera can cause images of pedestrians to be captured at extreme angles. This can lead to very poor pedestrian detection performance when using standard pedestrian detectors. To address this issue, we propose a Rotational Rectification Network (R2N) that can be inserted into any CNN-based pedestrian (or object) detector to adapt it to significant changes in camera rotation. The rotational rectification network uses a 2D rotation estimation module that passes rotational information to a spatial transformer network to undistort image features. To enable robust rotation estimation, we propose a Global Polar Pooling (GP-Pooling) operator to capture rotational shifts in convolutional features. Through our experiments, we show how our rotational rectification network can be used to improve the performance of the state-of-the-art pedestrian detector under heavy image rotation by up to 45%
Tasks	Pedestrian Detection
Published	2017-06-19
URL	http://arxiv.org/abs/1706.08917v3
PDF	http://arxiv.org/pdf/1706.08917v3.pdf
PWC	https://paperswithcode.com/paper/rotational-rectification-network-enabling
Repo
Framework

Bag-of-Vector Embeddings of Dependency Graphs for Semantic Induction


Title	Bag-of-Vector Embeddings of Dependency Graphs for Semantic Induction
Authors	Diana Nicoleta Popa, James Henderson
Abstract	Vector-space models, from word embeddings to neural network parsers, have many advantages for NLP. But how to generalise from fixed-length word vectors to a vector space for arbitrary linguistic structures is still unclear. In this paper we propose bag-of-vector embeddings of arbitrary linguistic graphs. A bag-of-vector space is the minimal nonparametric extension of a vector space, allowing the representation to grow with the size of the graph, but not tying the representation to any specific tree or graph structure. We propose efficient training and inference algorithms based on tensor factorisation for embedding arbitrary graphs in a bag-of-vector space. We demonstrate the usefulness of this representation by training bag-of-vector embeddings of dependency graphs and evaluating them on unsupervised semantic induction for the Semantic Textual Similarity and Natural Language Inference tasks.
Tasks	Natural Language Inference, Semantic Textual Similarity, Word Embeddings
Published	2017-09-30
URL	http://arxiv.org/abs/1710.00205v1
PDF	http://arxiv.org/pdf/1710.00205v1.pdf
PWC	https://paperswithcode.com/paper/bag-of-vector-embeddings-of-dependency-graphs
Repo
Framework

The Origins of Computational Mechanics: A Brief Intellectual History and Several Clarifications


Title	The Origins of Computational Mechanics: A Brief Intellectual History and Several Clarifications
Authors	James P. Crutchfield
Abstract	The principle goal of computational mechanics is to define pattern and structure so that the organization of complex systems can be detected and quantified. Computational mechanics developed from efforts in the 1970s and early 1980s to identify strange attractors as the mechanism driving weak fluid turbulence via the method of reconstructing attractor geometry from measurement time series and in the mid-1980s to estimate equations of motion directly from complex time series. In providing a mathematical and operational definition of structure it addressed weaknesses of these early approaches to discovering patterns in natural systems. Since then, computational mechanics has led to a range of results from theoretical physics and nonlinear mathematics to diverse applications—from closed-form analysis of Markov and non-Markov stochastic processes that are ergodic or nonergodic and their measures of information and intrinsic computation to complex materials and deterministic chaos and intelligence in Maxwellian demons to quantum compression of classical processes and the evolution of computation and language. This brief review clarifies several misunderstandings and addresses concerns recently raised regarding early works in the field (1980s). We show that misguided evaluations of the contributions of computational mechanics are groundless and stem from a lack of familiarity with its basic goals and from a failure to consider its historical context. For all practical purposes, its modern methods and results largely supersede the early works. This not only renders recent criticism moot and shows the solid ground on which computational mechanics stands but, most importantly, shows the significant progress achieved over three decades and points to the many intriguing and outstanding challenges in understanding the computational nature of complex dynamic systems.
Tasks	Time Series
Published	2017-10-18
URL	http://arxiv.org/abs/1710.06832v1
PDF	http://arxiv.org/pdf/1710.06832v1.pdf
PWC	https://paperswithcode.com/paper/the-origins-of-computational-mechanics-a
Repo
Framework

Automatic segmentation of the intracranialvolume in fetal MR images


Title	Automatic segmentation of the intracranialvolume in fetal MR images
Authors	N. Khalili, P. Moeskops, N. H. P. Claessens, S. Scherpenzeel, E. Turk, R. de Heus, M. J. N. L. Benders, M. A. Viergever, J. P. W. Pluim, I. Išgum
Abstract	MR images of the fetus allow non-invasive analysis of the fetal brain. Quantitative analysis of fetal brain development requires automatic brain tissue segmentation that is typically preceded by segmentation of the intracranial volume (ICV). This is challenging because fetal MR images visualize the whole moving fetus and in addition partially visualize the maternal body. This paper presents an automatic method for segmentation of the ICV in fetal MR images. The method employs a multi-scale convolutional neural network in 2D slices to enable learning spatial information from larger context as well as detailed local information. The method is developed and evaluated with 30 fetal T2-weighted MRI scans (average age $33.2\pm1.2$ weeks postmenstrual age). The set contains $10$ scans acquired in axial, $10$ in coronal and $10$ in sagittal imaging planes. A reference standard was defined in all images by manual annotation of the intracranial volume in $10$ equidistantly distributed slices. The automatic analysis was performed by training and testing the network using scans acquired in the representative imaging plane as well as combining the training data from all imaging planes. On average, the automatic method achieved Dice coefficients of 0.90 for the axial images, 0.90 for the coronal images and 0.92 for the sagittal images. Combining the training sets resulted in average Dice coefficients of 0.91 for the axial images, 0.95 for the coronal images, and 0.92 for the sagittal images. The results demonstrate that the evaluated method achieved good performance in extracting ICV in fetal MR scans regardless of the imaging plane.
Tasks
Published	2017-07-31
URL	http://arxiv.org/abs/1708.02282v1
PDF	http://arxiv.org/pdf/1708.02282v1.pdf
PWC	https://paperswithcode.com/paper/automatic-segmentation-of-the
Repo
Framework

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision


Title	Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision
Authors	Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang
Abstract	Scene labeling is a challenging classification problem where each input image requires a pixel-level prediction map. Recently, deep-learning-based methods have shown their effectiveness on solving this problem. However, we argue that the large intra-class variation provides ambiguous training information and hinders the deep models’ ability to learn more discriminative deep feature representations. Unlike existing methods that mainly utilize semantic context for regularizing or smoothing the prediction map, we design novel supervisions from semantic context for learning better deep feature representations. Two types of semantic context, scene names of images and label map statistics of image patches, are exploited to create label hierarchies between the original classes and newly created subclasses as the learning supervisions. Such subclasses show lower intra-class variation, and help CNN detect more meaningful visual patterns and learn more effective deep features. Novel training strategies and network structure that take advantages of such label hierarchies are introduced. Our proposed method is evaluated extensively on four popular datasets, Stanford Background (8 classes), SIFTFlow (33 classes), Barcelona (170 classes) and LM+Sun datasets (232 classes) with 3 different networks structures, and show state-of-the-art performance. The experiments show that our proposed method makes deep models learn more discriminative feature representations without increasing model size or complexity.
Tasks	Scene Labeling
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02493v2
PDF	http://arxiv.org/pdf/1706.02493v2.pdf
PWC	https://paperswithcode.com/paper/learning-deep-representations-for-scene
Repo
Framework

An Adaptive Cluster-based Filtering Framework for Speckle Reduction of OCT Skin Images


Title	An Adaptive Cluster-based Filtering Framework for Speckle Reduction of OCT Skin Images
Authors	Elaheh Rashedi, Saba Adabi, Darius Mehregan, Silvia Conforto, Xue-wen Chen
Abstract	Optical coherence tomography (OCT) has become a favorable device in the Dermatology discipline due to its moderate resolution and penetration depth. OCT images however contain a grainy pattern, called speckle, due to the use of a broadband source in the configuration of OCT. So far, a variety of filtering (de-speckling) techniques is introduced to reduce speckle in OCT images. Most of these methods are generic and can be applied to OCT images of different tissues. The ambition of this work is to provide a de-speckling framework specialized for filtering skin tissues for the community to utilize, adapt or build upon. In this paper, we present an adaptive cluster-based filtering framework, optimized for speckle reduction of OCT skin images. In this framework, by considering the layered structure of skin, first the OCT skin images are segmented into differentiable layers utilizing clustering algorithms, and then each cluster is de-speckled individually using adaptive filtering techniques. In this study, hierarchical clustering algorithm and adaptive Wiener filtering technique are utilized to develop the framework. The proposed method is tested on optical solid phantoms with predetermined optical properties. The method is also tested on healthy human skin images. The results show that the proposed cluster-based filtering method can effectively reduce the speckle and increase the signal-to-noise ratio and contrast while preserving the edges in the image. The proposed cluster-based filtering framework enables researchers to develop unsupervised learning solutions for de-speckling OCT skin images using adaptive filtering methods, or extend the framework to new applications.
Tasks
Published	2017-07-31
URL	http://arxiv.org/abs/1708.02285v4
PDF	http://arxiv.org/pdf/1708.02285v4.pdf
PWC	https://paperswithcode.com/paper/an-adaptive-cluster-based-filtering-framework
Repo
Framework

Generative Model with Coordinate Metric Learning for Object Recognition Based on 3D Models


Title	Generative Model with Coordinate Metric Learning for Object Recognition Based on 3D Models
Authors	Yida Wang, Weihong Deng
Abstract	Given large amount of real photos for training, Convolutional neural network shows excellent performance on object recognition tasks. However, the process of collecting data is so tedious and the background are also limited which makes it hard to establish a perfect database. In this paper, our generative model trained with synthetic images rendered from 3D models reduces the workload of data collection and limitation of conditions. Our structure is composed of two sub-networks: semantic foreground object reconstruction network based on Bayesian inference and classification network based on multi-triplet cost function for avoiding over-fitting problem on monotone surface and fully utilizing pose information by establishing sphere-like distribution of descriptors in each category which is helpful for recognition on regular photos according to poses, lighting condition, background and category information of rendered images. Firstly, our conjugate structure called generative model with metric learning utilizing additional foreground object channels generated from Bayesian rendering as the joint of two sub-networks. Multi-triplet cost function based on poses for object recognition are used for metric learning which makes it possible training a category classifier purely based on synthetic data. Secondly, we design a coordinate training strategy with the help of adaptive noises acting as corruption on input images to help both sub-networks benefit from each other and avoid inharmonious parameter tuning due to different convergence speed of two sub-networks. Our structure achieves the state of the art accuracy of over 50% on ShapeNet database with data migration obstacle from synthetic images to real photos. This pipeline makes it applicable to do recognition on real images only based on 3D models.
Tasks	Bayesian Inference, Metric Learning, Object Recognition, Object Reconstruction
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08590v2
PDF	http://arxiv.org/pdf/1705.08590v2.pdf
PWC	https://paperswithcode.com/paper/generative-model-with-coordinate-metric
Repo
Framework

Improving Neural Machine Translation through Phrase-based Forced Decoding


Title	Improving Neural Machine Translation through Phrase-based Forced Decoding
Authors	Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi Nakamura
Abstract	Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using this cost to rerank the n-best NMT outputs. The main challenge in implementing this approach is that NMT outputs may not be in the search space of the standard phrase-based decoding algorithm, because the search space of phrase-based SMT is limited by the phrase-based translation rule table. We propose a soft forced decoding algorithm, which can always successfully find a decoding path for any NMT output. We show that using the forced decoding cost to rerank the NMT outputs can successfully improve translation quality on four different language pairs.
Tasks	Machine Translation
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00309v1
PDF	http://arxiv.org/pdf/1711.00309v1.pdf
PWC	https://paperswithcode.com/paper/improving-neural-machine-translation-through
Repo
Framework

Incremental Learning Through Deep Adaptation


Title	Incremental Learning Through Deep Adaptation
Authors	Amir Rosenfeld, John K. Tsotsos
Abstract	Given an existing trained neural network, it is often desirable to learn new capabilities without hindering performance of those already learned. Existing approaches either learn sub-optimal solutions, require joint training, or incur a substantial increment in the number of parameters for each added domain, typically as many as the original network. We propose a method called \emph{Deep Adaptation Networks} (DAN) that constrains newly learned filters to be linear combinations of existing ones. DANs precisely preserve performance on the original domain, require a fraction (typically 13%, dependent on network architecture) of the number of parameters compared to standard fine-tuning procedures and converge in less cycles of training to a comparable or better level of performance. When coupled with standard network quantization techniques, we further reduce the parameter cost to around 3% of the original with negligible or no loss in accuracy. The learned architecture can be controlled to switch between various learned representations, enabling a single network to solve a task from multiple different domains. We conduct extensive experiments showing the effectiveness of our method on a range of image classification tasks and explore different aspects of its behavior.
Tasks	Image Classification, Quantization
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04228v2
PDF	http://arxiv.org/pdf/1705.04228v2.pdf
PWC	https://paperswithcode.com/paper/incremental-learning-through-deep-adaptation
Repo
Framework

Image Analysis Using a Dual-Tree $M$-Band Wavelet Transform


Title	Image Analysis Using a Dual-Tree $M$-Band Wavelet Transform
Authors	Caroline Chaux, Laurent Duval, Jean-Christophe Pesquet
Abstract	We propose a 2D generalization to the $M$-band case of the dual-tree decomposition structure (initially proposed by N. Kingsbury and further investigated by I. Selesnick) based on a Hilbert pair of wavelets. We particularly address (\textit{i}) the construction of the dual basis and (\textit{ii}) the resulting directional analysis. We also revisit the necessary pre-processing stage in the $M$-band case. While several reconstructions are possible because of the redundancy of the representation, we propose a new optimal signal reconstruction technique, which minimizes potential estimation errors. The effectiveness of the proposed $M$-band decomposition is demonstrated via denoising comparisons on several image types (natural, texture, seismics), with various $M$-band wavelets and thresholding strategies. Significant improvements in terms of both overall noise reduction and direction preservation are observed.
Tasks	Denoising
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08534v1
PDF	http://arxiv.org/pdf/1702.08534v1.pdf
PWC	https://paperswithcode.com/paper/image-analysis-using-a-dual-tree-m-band
Repo
Framework


Title	Journalists’ information needs, seeking behavior, and its determinants on social media
Authors	Omid Aghili, Mark Sanderson
Abstract	We describe the results of a qualitative study on journalists’ information seeking behavior on social media. Based on interviews with eleven journalists along with a study of a set of university level journalism modules, we determined the categories of information need types that lead journalists to social media. We also determined the ways that social media is exploited as a tool to satisfy information needs and to define influential factors, which impacted on journalists’ information seeking behavior. We find that not only is social media used as an information source, but it can also be a supplier of stories found serendipitously. We find seven information need types that expand the types found in previous work. We also find five categories of influential factors that affect the way journalists seek information.
Tasks
Published	2017-05-24
URL	https://arxiv.org/abs/1705.08598v3
PDF	https://arxiv.org/pdf/1705.08598v3.pdf
PWC	https://paperswithcode.com/paper/journalists-information-needs-seeking
Repo
Framework

GridNet with automatic shape prior registration for automatic MRI cardiac segmentation


Title	GridNet with automatic shape prior registration for automatic MRI cardiac segmentation
Authors	Clement Zotti, Zhiming Luo, Alain Lalande, Olivier Humbert, Pierre-Marc Jodoin
Abstract	In this paper, we propose a fully automatic MRI cardiac segmentation method based on a novel deep convolutional neural network (CNN) designed for the 2017 ACDC MICCAI challenge. The novelty of our network comes with its embedded shape prior and its loss function tailored to the cardiac anatomy. Our model includes a cardiac centerof-mass regression module which allows for an automatic shape prior registration. Also, since our method processes raw MR images without any manual preprocessing and/or image cropping, our CNN learns both high-level features (useful to distinguish the heart from other organs with a similar shape) and low-level features (useful to get accurate segmentation results). Those features are learned with a multi-resolution conv-deconv “grid” architecture which can be seen as an extension of the U-Net. Experimental results reveal that our method can segment the left and right ventricles as well as the myocardium from a 3D MRI cardiac volume in 0.4 second with an average Dice coefficient of 0.90 and an average Hausdorff distance of 10.4 mm.
Tasks	Cardiac Segmentation, Image Cropping
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08943v2
PDF	http://arxiv.org/pdf/1705.08943v2.pdf
PWC	https://paperswithcode.com/paper/gridnet-with-automatic-shape-prior
Repo
Framework

Prepositions in Context


Title	Prepositions in Context
Authors	Hongyu Gong, Jiaqi Mu, Suma Bhat, Pramod Viswanath
Abstract	Prepositions are highly polysemous, and their variegated senses encode significant semantic information. In this paper we match each preposition’s complement and attachment and their interplay crucially to the geometry of the word vectors to the left and right of the preposition. Extracting such features from the vast number of instances of each preposition and clustering them makes for an efficient preposition sense disambigution (PSD) algorithm, which is comparable to and better than state-of-the-art on two benchmark datasets. Our reliance on no external linguistic resource allows us to scale the PSD algorithm to a large WikiCorpus and learn sense-specific preposition representations – which we show to encode semantic relations and paraphrasing of verb particle compounds, via simple vector operations.
Tasks
Published	2017-02-05
URL	http://arxiv.org/abs/1702.01466v1
PDF	http://arxiv.org/pdf/1702.01466v1.pdf
PWC	https://paperswithcode.com/paper/prepositions-in-context
Repo
Framework

Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations


Title	Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations
Authors	Liangzhen Lai, Naveen Suda, Vikas Chandra
Abstract	Deep convolutional neural network (CNN) inference requires significant amount of memory and computation, which limits its deployment on embedded devices. To alleviate these problems to some extent, prior research utilize low precision fixed-point numbers to represent the CNN weights and activations. However, the minimum required data precision of fixed-point weights varies across different networks and also across different layers of the same network. In this work, we propose using floating-point numbers for representing the weights and fixed-point numbers for representing the activations. We show that using floating-point representation for weights is more efficient than fixed-point representation for the same bit-width and demonstrate it on popular large-scale CNNs such as AlexNet, SqueezeNet, GoogLeNet and VGG-16. We also show that such a representation scheme enables compact hardware multiply-and-accumulate (MAC) unit design. Experimental results show that the proposed scheme reduces the weight storage by up to 36% and power consumption of the hardware multiplier by up to 50%.
Tasks
Published	2017-03-08
URL	http://arxiv.org/abs/1703.03073v1
PDF	http://arxiv.org/pdf/1703.03073v1.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-neural-network-inference
Repo
Framework