July 26, 2019

3207 words 16 mins read

Paper Group ANR 781

S4Net: Single Stage Salient-Instance Segmentation. Detecting and Grouping Identical Objects for Region Proposal and Classification. Strictly Proper Kernel Scoring Rules and Divergences with an Application to Kernel Two-Sample Hypothesis Testing. Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis. …

S4Net: Single Stage Salient-Instance Segmentation


Title	S4Net: Single Stage Salient-Instance Segmentation
Authors	Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu
Abstract	We consider an interesting problem-salient instance segmentation in this paper. Other than producing bounding boxes, our network also outputs high-quality instance-level segments. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also its surrounding context, enabling us to distinguish the instances in the same scope even with obstruction. Our network is end-to-end trainable and runs at a fast speed (40 fps when processing an image with resolution 320x320). We evaluate our approach on a publicly available benchmark and show that it outperforms other alternative solutions. We also provide a thorough analysis of the design choices to help readers better understand the functions of each part of our network. The source code can be found at \url{https://github.com/RuochenFan/S4Net}.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2017-11-21
URL	http://arxiv.org/abs/1711.07618v2
PDF	http://arxiv.org/pdf/1711.07618v2.pdf
PWC	https://paperswithcode.com/paper/s4net-single-stage-salient-instance
Repo
Framework

Detecting and Grouping Identical Objects for Region Proposal and Classification


Title	Detecting and Grouping Identical Objects for Region Proposal and Classification
Authors	Wim Abbeloos, Sergio Caccamo, Esra Ataer-Cansizoglu, Yuichi Taguchi, Chen Feng, Teng-Yok Lee
Abstract	Often multiple instances of an object occur in the same scene, for example in a warehouse. Unsupervised multi-instance object discovery algorithms are able to detect and identify such objects. We use such an algorithm to provide object proposals to a convolutional neural network (CNN) based classifier. This results in fewer regions to evaluate, compared to traditional region proposal algorithms. Additionally, it enables using the joint probability of multiple instances of an object, resulting in improved classification accuracy. The proposed technique can also split a single class into multiple sub-classes corresponding to the different object types, enabling hierarchical classification.
Tasks
Published	2017-07-23
URL	http://arxiv.org/abs/1707.07255v1
PDF	http://arxiv.org/pdf/1707.07255v1.pdf
PWC	https://paperswithcode.com/paper/detecting-and-grouping-identical-objects-for
Repo
Framework

Strictly Proper Kernel Scoring Rules and Divergences with an Application to Kernel Two-Sample Hypothesis Testing


Title	Strictly Proper Kernel Scoring Rules and Divergences with an Application to Kernel Two-Sample Hypothesis Testing
Authors	Hamed Masnadi-Shirazi
Abstract	We study strictly proper scoring rules in the Reproducing Kernel Hilbert Space. We propose a general Kernel Scoring rule and associated Kernel Divergence. We consider conditions under which the Kernel Score is strictly proper. We then demonstrate that the Kernel Score includes the Maximum Mean Discrepancy as a special case. We also consider the connections between the Kernel Score and the minimum risk of a proper loss function. We show that the Kernel Score incorporates more information pertaining to the projected embedded distributions compared to the Maximum Mean Discrepancy. Finally, we show how to integrate the information provided from different Kernel Divergences, such as the proposed Bhattacharyya Kernel Divergence, using a one-class classifier for improved two-sample hypothesis testing results.
Tasks	One-class classifier
Published	2017-04-09
URL	http://arxiv.org/abs/1704.02578v2
PDF	http://arxiv.org/pdf/1704.02578v2.pdf
PWC	https://paperswithcode.com/paper/strictly-proper-kernel-scoring-rules-and
Repo
Framework

Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis


Title	Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis
Authors	Luca Ambrogioni, Eric Maris
Abstract	Computing accurate estimates of the Fourier transform of analog signals from discrete data points is important in many fields of science and engineering. The conventional approach of performing the discrete Fourier transform of the data implicitly assumes periodicity and bandlimitedness of the signal. In this paper, we use Gaussian process regression to estimate the Fourier transform (or any other integral transform) without making these assumptions. This is possible because the posterior expectation of Gaussian process regression maps a finite set of samples to a function defined on the whole real line, expressed as a linear combination of covariance functions. We estimate the covariance function from the data using an appropriately designed gradient ascent method that constrains the solution to a linear combination of tractable kernel functions. This procedure results in a posterior expectation of the analog signal whose Fourier transform can be obtained analytically by exploiting linearity. Our simulations show that the new method leads to sharper and more precise estimation of the spectral density both in noise-free and noise-corrupted signals. We further validate the method in two real-world applications: the analysis of the yearly fluctuation in atmospheric CO2 level and the analysis of the spectral content of brain signals.
Tasks
Published	2017-04-10
URL	http://arxiv.org/abs/1704.02828v2
PDF	http://arxiv.org/pdf/1704.02828v2.pdf
PWC	https://paperswithcode.com/paper/integral-transforms-from-finite-data-an
Repo
Framework


Title	DR-Net: Transmission Steered Single Image Dehazing Network with Weakly Supervised Refinement
Authors	Chongyi Li, Jichang Guo, Fatih Porikli, Chunle Guo, Huzhu Fu, Xi Li
Abstract	Despite the recent progress in image dehazing, several problems remain largely unsolved such as robustness for varying scenes, the visual quality of reconstructed images, and effectiveness and flexibility for applications. To tackle these problems, we propose a new deep network architecture for single image dehazing called DR-Net. Our model consists of three main subnetworks: a transmission prediction network that predicts transmission map for the input image, a haze removal network that reconstructs latent image steered by the transmission map, and a refinement network that enhances the details and color properties of the dehazed result via weakly supervised learning. Compared to previous methods, our method advances in three aspects: (i) pure data-driven model; (ii) the end-to-end system; (iii) superior robustness, accuracy, and applicability. Extensive experiments demonstrate that our DR-Net outperforms the state-of-the-art methods on both synthetic and real images in qualitative and quantitative metrics. Additionally, the utility of DR-Net has been illustrated by its potential usage in several important computer vision tasks.
Tasks	Image Dehazing, Single Image Dehazing
Published	2017-12-02
URL	http://arxiv.org/abs/1712.00621v1
PDF	http://arxiv.org/pdf/1712.00621v1.pdf
PWC	https://paperswithcode.com/paper/dr-net-transmission-steered-single-image
Repo
Framework

Approximation Algorithms for $\ell_0$-Low Rank Approximation


Title	Approximation Algorithms for $\ell_0$-Low Rank Approximation
Authors	Karl Bringmann, Pavel Kolev, David P. Woodruff
Abstract	We study the $\ell_0$-Low Rank Approximation Problem, where the goal is, given an $m \times n$ matrix $A$, to output a rank-$k$ matrix $A'$ for which $\A’-A_0$ is minimized. Here, for a matrix $B$, $\B_0$ denotes the number of its non-zero entries. This NP-hard variant of low rank approximation is natural for problems with no underlying metric, and its goal is to minimize the number of disagreeing data positions. We provide approximation algorithms which significantly improve the running time and approximation factor of previous work. For $k > 1$, we show how to find, in poly$(mn)$ time for every $k$, a rank $O(k \log(n/k))$ matrix $A'$ for which $\A’-A_0 \leq O(k^2 \log(n/k)) \mathrm{OPT}$. To the best of our knowledge, this is the first algorithm with provable guarantees for the $\ell_0$-Low Rank Approximation Problem for $k > 1$, even for bicriteria algorithms. For the well-studied case when $k = 1$, we give a $(2+\epsilon)$-approximation in {\it sublinear time}, which is impossible for other variants of low rank approximation such as for the Frobenius norm. We strengthen this for the well-studied case of binary matrices to obtain a $(1+O(\psi))$-approximation in sublinear time, where $\psi = \mathrm{OPT}/\lVert A\rVert_0$. For small $\psi$, our approximation factor is $1+o(1)$.
Tasks
Published	2017-10-30
URL	http://arxiv.org/abs/1710.11253v2
PDF	http://arxiv.org/pdf/1710.11253v2.pdf
PWC	https://paperswithcode.com/paper/approximation-algorithms-for-ell_0-low-rank
Repo
Framework

Image Dehazing using Bilinear Composition Loss Function


Title	Image Dehazing using Bilinear Composition Loss Function
Authors	Hui Yang, Jinshan Pan, Qiong Yan, Wenxiu Sun, Jimmy Ren, Yu-Wing Tai
Abstract	In this paper, we introduce a bilinear composition loss function to address the problem of image dehazing. Previous methods in image dehazing use a two-stage approach which first estimate the transmission map followed by clear image estimation. The drawback of a two-stage method is that it tends to boost local image artifacts such as noise, aliasing and blocking. This is especially the case for heavy haze images captured with a low quality device. Our method is based on convolutional neural networks. Unique in our method is the bilinear composition loss function which directly model the correlations between transmission map, clear image, and atmospheric light. This allows errors to be back-propagated to each sub-network concurrently, while maintaining the composition constraint to avoid overfitting of each sub-network. We evaluate the effectiveness of our proposed method using both synthetic and real world examples. Extensive experiments show that our method outperfoms state-of-the-art methods especially for haze images with severe noise level and compressions.
Tasks	Image Dehazing
Published	2017-10-01
URL	http://arxiv.org/abs/1710.00279v1
PDF	http://arxiv.org/pdf/1710.00279v1.pdf
PWC	https://paperswithcode.com/paper/image-dehazing-using-bilinear-composition
Repo
Framework

Disentangling Factors of Variation by Mixing Them


Title	Disentangling Factors of Variation by Mixing Them
Authors	Qiyang Hu, Attila Szabó, Tiziano Portenier, Matthias Zwicker, Paolo Favaro
Abstract	We propose an approach to learn image representations that consist of disentangled factors of variation without exploiting any manual labeling or data domain knowledge. A factor of variation corresponds to an image attribute that can be discerned consistently across a set of images, such as the pose or color of objects. Our disentangled representation consists of a concatenation of feature chunks, each chunk representing a factor of variation. It supports applications such as transferring attributes from one image to another, by simply mixing and unmixing feature chunks, and classification or retrieval based on one or several attributes, by considering a user-specified subset of feature chunks. We learn our representation without any labeling or knowledge of the data domain, using an autoencoder architecture with two novel training objectives: first, we propose an invariance objective to encourage that encoding of each attribute, and decoding of each chunk, are invariant to changes in other attributes and chunks, respectively; second, we include a classification objective, which ensures that each chunk corresponds to a consistently discernible attribute in the represented image, hence avoiding degenerate feature mappings where some chunks are completely ignored. We demonstrate the effectiveness of our approach on the MNIST, Sprites, and CelebA datasets.
Tasks
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07410v2
PDF	http://arxiv.org/pdf/1711.07410v2.pdf
PWC	https://paperswithcode.com/paper/disentangling-factors-of-variation-by-mixing
Repo
Framework

Using of heterogeneous corpora for training of an ASR system


Title	Using of heterogeneous corpora for training of an ASR system
Authors	Jan Trmal, Gaurav Kumar, Vimal Manohar, Sanjeev Khudanpur, Matt Post, Paul McNamee
Abstract	The paper summarizes the development of the LVCSR system built as a part of the Pashto speech-translation system at the SCALE (Summer Camp for Applied Language Exploration) 2015 workshop on “Speech-to-text-translation for low-resource languages”. The Pashto language was chosen as a good “proxy” low-resource language, exhibiting multiple phenomena which make the speech-recognition and and speech-to-text-translation systems development hard. Even when the amount of data is seemingly sufficient, given the fact that the data originates from multiple sources, the preliminary experiments reveal that there is little to no benefit in merging (concatenating) the corpora and more elaborate ways of making use of all of the data must be worked out. This paper concentrates only on the LVCSR part and presents a range of different techniques that were found to be useful in order to benefit from multiple different corpora
Tasks	Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00321v1
PDF	http://arxiv.org/pdf/1706.00321v1.pdf
PWC	https://paperswithcode.com/paper/using-of-heterogeneous-corpora-for-training
Repo
Framework

First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization


Title	First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization
Authors	Aryan Mokhtari, Alejandro Ribeiro
Abstract	This paper studies empirical risk minimization (ERM) problems for large-scale datasets and incorporates the idea of adaptive sample size methods to improve the guaranteed convergence bounds for first-order stochastic and deterministic methods. In contrast to traditional methods that attempt to solve the ERM problem corresponding to the full dataset directly, adaptive sample size schemes start with a small number of samples and solve the corresponding ERM problem to its statistical accuracy. The sample size is then grown geometrically – e.g., scaling by a factor of two – and use the solution of the previous ERM as a warm start for the new ERM. Theoretical analyses show that the use of adaptive sample size methods reduces the overall computational cost of achieving the statistical accuracy of the whole dataset for a broad range of deterministic and stochastic first-order methods. The gains are specific to the choice of method. When particularized to, e.g., accelerated gradient descent and stochastic variance reduce gradient, the computational cost advantage is a logarithm of the number of training samples. Numerical experiments on various datasets confirm theoretical claims and showcase the gains of using the proposed adaptive sample size scheme.
Tasks
Published	2017-09-02
URL	http://arxiv.org/abs/1709.00599v1
PDF	http://arxiv.org/pdf/1709.00599v1.pdf
PWC	https://paperswithcode.com/paper/first-order-adaptive-sample-size-methods-to
Repo
Framework

Beyond Low Rank: A Data-Adaptive Tensor Completion Method


Title	Beyond Low Rank: A Data-Adaptive Tensor Completion Method
Authors	Lei Zhang, Wei Wei, Qinfeng Shi, Chunhua Shen, Anton van den Hengel, Yanning Zhang
Abstract	Low rank tensor representation underpins much of recent progress in tensor completion. In real applications, however, this approach is confronted with two challenging problems, namely (1) tensor rank determination; (2) handling real tensor data which only approximately fulfils the low-rank requirement. To address these two issues, we develop a data-adaptive tensor completion model which explicitly represents both the low-rank and non-low-rank structures in a latent tensor. Representing the non-low-rank structure separately from the low-rank one allows priors which capture the important distinctions between the two, thus enabling more accurate modelling, and ultimately, completion. Through defining a new tensor rank, we develop a sparsity induced prior for the low-rank structure, with which the tensor rank can be automatically determined. The prior for the non-low-rank structure is established based on a mixture of Gaussians which is shown to be flexible enough, and powerful enough, to inform the completion process for a variety of real tensor data. With these two priors, we develop a Bayesian minimum mean squared error estimate (MMSE) framework for inference which provides the posterior mean of missing entries as well as their uncertainty. Compared with the state-of-the-art methods in various applications, the proposed model produces more accurate completion results.
Tasks
Published	2017-08-03
URL	http://arxiv.org/abs/1708.01008v1
PDF	http://arxiv.org/pdf/1708.01008v1.pdf
PWC	https://paperswithcode.com/paper/beyond-low-rank-a-data-adaptive-tensor
Repo
Framework

Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding


Title	Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding
Authors	Will Monroe, Robert X. D. Hawkins, Noah D. Goodman, Christopher Potts
Abstract	We present a model of pragmatic referring expression interpretation in a grounded communication task (identifying colors from descriptions) that draws upon predictions from two recurrent neural network classifiers, a speaker and a listener, unified by a recursive pragmatic reasoning framework. Experiments show that this combined pragmatic model interprets color descriptions more accurately than the classifiers from which it is built, and that much of this improvement results from combining the speaker and listener perspectives. We observe that pragmatic reasoning helps primarily in the hardest cases: when the model must distinguish very similar colors, or when few utterances adequately express the target color. Our findings make use of a newly-collected corpus of human utterances in color reference games, which exhibit a variety of pragmatic behaviors. We also show that the embedded speaker model reproduces many of these pragmatic behaviors.
Tasks
Published	2017-03-29
URL	http://arxiv.org/abs/1703.10186v2
PDF	http://arxiv.org/pdf/1703.10186v2.pdf
PWC	https://paperswithcode.com/paper/colors-in-context-a-pragmatic-neural-model
Repo
Framework

Land Cover Classification from Multi-temporal, Multi-spectral Remotely Sensed Imagery using Patch-Based Recurrent Neural Networks


Title	Land Cover Classification from Multi-temporal, Multi-spectral Remotely Sensed Imagery using Patch-Based Recurrent Neural Networks
Authors	Atharva Sharma, Xiuwen Liu, Xiaojun Yang
Abstract	Sustainability of the global environment is dependent on the accurate land cover information over large areas. Even with the increased number of satellite systems and sensors acquiring data with improved spectral, spatial, radiometric and temporal characteristics and the new data distribution policy, most existing land cover datasets were derived from a pixel-based single-date multi-spectral remotely sensed image with low accuracy. To improve the accuracy, the bottleneck is how to develop an accurate and effective image classification technique. By incorporating and utilizing the complete multi-spectral, multi-temporal and spatial information in remote sensing images and considering their inherit spatial and sequential interdependence, we propose a new patch-based RNN (PB-RNN) system tailored for multi-temporal remote sensing data. The system is designed by incorporating distinctive characteristics in multi-temporal remote sensing data. In particular, it uses multi-temporal-spectral-spatial samples and deals with pixels contaminated by clouds/shadow present in the multi-temporal data series. Using a Florida Everglades ecosystem study site covering an area of 771 square kilo-meters, the proposed PB-RNN system has achieved a significant improvement in the classification accuracy over pixel-based RNN system, pixel-based single-imagery NN system, pixel-based multi-images NN system, patch-based single-imagery NN system and patch-based multi-images NN system. For example, the proposed system achieves 97.21% classification accuracy while a pixel-based single-imagery NN system achieves 64.74%. By utilizing methods like the proposed PB-RNN one, we believe that much more accurate land cover datasets can be produced over large areas efficiently.
Tasks	Image Classification
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00813v1
PDF	http://arxiv.org/pdf/1708.00813v1.pdf
PWC	https://paperswithcode.com/paper/land-cover-classification-from-multi-temporal
Repo
Framework

What Can I Do Now? Guiding Users in a World of Automated Decisions


Title	What Can I Do Now? Guiding Users in a World of Automated Decisions
Authors	Matthias Gallé
Abstract	More and more processes governing our lives use in some part an automatic decision step, where – based on a feature vector derived from an applicant – an algorithm has the decision power over the final outcome. Here we present a simple idea which gives some of the power back to the applicant by providing her with alternatives which would make the decision algorithm decide differently. It is based on a formalization reminiscent of methods used for evasion attacks, and consists in enumerating the subspaces where the classifiers decides the desired output. This has been implemented for the specific case of decision forests (ensemble methods based on decision trees), mapping the problem to an iterative version of enumerating $k$-cliques.
Tasks
Published	2017-01-13
URL	http://arxiv.org/abs/1701.03755v1
PDF	http://arxiv.org/pdf/1701.03755v1.pdf
PWC	https://paperswithcode.com/paper/what-can-i-do-now-guiding-users-in-a-world-of
Repo
Framework

Fine-grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach


Title	Fine-grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach
Authors	Timnit Gebru, Judy Hoffman, Li Fei-Fei
Abstract	While fine-grained object recognition is an important problem in computer vision, current models are unlikely to accurately classify objects in the wild. These fully supervised models need additional annotated images to classify objects in every new scenario, a task that is infeasible. However, sources such as e-commerce websites and field guides provide annotated images for many classes. In this work, we study fine-grained domain adaptation as a step towards overcoming the dataset shift between easily acquired annotated images and the real world. Adaptation has not been studied in the fine-grained setting where annotations such as attributes could be used to increase performance. Our work uses an attribute based multi-task adaptation loss to increase accuracy from a baseline of 4.1% to 19.1% in the semi-supervised adaptation case. Prior do- main adaptation works have been benchmarked on small datasets such as [46] with a total of 795 images for some domains, or simplistic datasets such as [41] consisting of digits. We perform experiments on a subset of a new challenging fine-grained dataset consisting of 1,095,021 images of 2, 657 car categories drawn from e-commerce web- sites and Google Street View.
Tasks	Domain Adaptation, Object Recognition
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02476v1
PDF	http://arxiv.org/pdf/1709.02476v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-recognition-in-the-wild-a-multi
Repo
Framework