Paper Group ANR 781
S4Net: Single Stage Salient-Instance Segmentation. Detecting and Grouping Identical Objects for Region Proposal and Classification. Strictly Proper Kernel Scoring Rules and Divergences with an Application to Kernel Two-Sample Hypothesis Testing. Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis. …
S4Net: Single Stage Salient-Instance Segmentation
Title | S4Net: Single Stage Salient-Instance Segmentation |
Authors | Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu |
Abstract | We consider an interesting problem-salient instance segmentation in this paper. Other than producing bounding boxes, our network also outputs high-quality instance-level segments. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also its surrounding context, enabling us to distinguish the instances in the same scope even with obstruction. Our network is end-to-end trainable and runs at a fast speed (40 fps when processing an image with resolution 320x320). We evaluate our approach on a publicly available benchmark and show that it outperforms other alternative solutions. We also provide a thorough analysis of the design choices to help readers better understand the functions of each part of our network. The source code can be found at \url{https://github.com/RuochenFan/S4Net}. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2017-11-21 |
URL | http://arxiv.org/abs/1711.07618v2 |
http://arxiv.org/pdf/1711.07618v2.pdf | |
PWC | https://paperswithcode.com/paper/s4net-single-stage-salient-instance |
Repo | |
Framework | |
Detecting and Grouping Identical Objects for Region Proposal and Classification
Title | Detecting and Grouping Identical Objects for Region Proposal and Classification |
Authors | Wim Abbeloos, Sergio Caccamo, Esra Ataer-Cansizoglu, Yuichi Taguchi, Chen Feng, Teng-Yok Lee |
Abstract | Often multiple instances of an object occur in the same scene, for example in a warehouse. Unsupervised multi-instance object discovery algorithms are able to detect and identify such objects. We use such an algorithm to provide object proposals to a convolutional neural network (CNN) based classifier. This results in fewer regions to evaluate, compared to traditional region proposal algorithms. Additionally, it enables using the joint probability of multiple instances of an object, resulting in improved classification accuracy. The proposed technique can also split a single class into multiple sub-classes corresponding to the different object types, enabling hierarchical classification. |
Tasks | |
Published | 2017-07-23 |
URL | http://arxiv.org/abs/1707.07255v1 |
http://arxiv.org/pdf/1707.07255v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-and-grouping-identical-objects-for |
Repo | |
Framework | |
Strictly Proper Kernel Scoring Rules and Divergences with an Application to Kernel Two-Sample Hypothesis Testing
Title | Strictly Proper Kernel Scoring Rules and Divergences with an Application to Kernel Two-Sample Hypothesis Testing |
Authors | Hamed Masnadi-Shirazi |
Abstract | We study strictly proper scoring rules in the Reproducing Kernel Hilbert Space. We propose a general Kernel Scoring rule and associated Kernel Divergence. We consider conditions under which the Kernel Score is strictly proper. We then demonstrate that the Kernel Score includes the Maximum Mean Discrepancy as a special case. We also consider the connections between the Kernel Score and the minimum risk of a proper loss function. We show that the Kernel Score incorporates more information pertaining to the projected embedded distributions compared to the Maximum Mean Discrepancy. Finally, we show how to integrate the information provided from different Kernel Divergences, such as the proposed Bhattacharyya Kernel Divergence, using a one-class classifier for improved two-sample hypothesis testing results. |
Tasks | One-class classifier |
Published | 2017-04-09 |
URL | http://arxiv.org/abs/1704.02578v2 |
http://arxiv.org/pdf/1704.02578v2.pdf | |
PWC | https://paperswithcode.com/paper/strictly-proper-kernel-scoring-rules-and |
Repo | |
Framework | |
Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis
Title | Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis |
Authors | Luca Ambrogioni, Eric Maris |
Abstract | Computing accurate estimates of the Fourier transform of analog signals from discrete data points is important in many fields of science and engineering. The conventional approach of performing the discrete Fourier transform of the data implicitly assumes periodicity and bandlimitedness of the signal. In this paper, we use Gaussian process regression to estimate the Fourier transform (or any other integral transform) without making these assumptions. This is possible because the posterior expectation of Gaussian process regression maps a finite set of samples to a function defined on the whole real line, expressed as a linear combination of covariance functions. We estimate the covariance function from the data using an appropriately designed gradient ascent method that constrains the solution to a linear combination of tractable kernel functions. This procedure results in a posterior expectation of the analog signal whose Fourier transform can be obtained analytically by exploiting linearity. Our simulations show that the new method leads to sharper and more precise estimation of the spectral density both in noise-free and noise-corrupted signals. We further validate the method in two real-world applications: the analysis of the yearly fluctuation in atmospheric CO2 level and the analysis of the spectral content of brain signals. |
Tasks | |
Published | 2017-04-10 |
URL | http://arxiv.org/abs/1704.02828v2 |
http://arxiv.org/pdf/1704.02828v2.pdf | |
PWC | https://paperswithcode.com/paper/integral-transforms-from-finite-data-an |
Repo | |
Framework | |
DR-Net: Transmission Steered Single Image Dehazing Network with Weakly Supervised Refinement
Title | DR-Net: Transmission Steered Single Image Dehazing Network with Weakly Supervised Refinement |
Authors | Chongyi Li, Jichang Guo, Fatih Porikli, Chunle Guo, Huzhu Fu, Xi Li |
Abstract | Despite the recent progress in image dehazing, several problems remain largely unsolved such as robustness for varying scenes, the visual quality of reconstructed images, and effectiveness and flexibility for applications. To tackle these problems, we propose a new deep network architecture for single image dehazing called DR-Net. Our model consists of three main subnetworks: a transmission prediction network that predicts transmission map for the input image, a haze removal network that reconstructs latent image steered by the transmission map, and a refinement network that enhances the details and color properties of the dehazed result via weakly supervised learning. Compared to previous methods, our method advances in three aspects: (i) pure data-driven model; (ii) the end-to-end system; (iii) superior robustness, accuracy, and applicability. Extensive experiments demonstrate that our DR-Net outperforms the state-of-the-art methods on both synthetic and real images in qualitative and quantitative metrics. Additionally, the utility of DR-Net has been illustrated by its potential usage in several important computer vision tasks. |
Tasks | Image Dehazing, Single Image Dehazing |
Published | 2017-12-02 |
URL | http://arxiv.org/abs/1712.00621v1 |
http://arxiv.org/pdf/1712.00621v1.pdf | |
PWC | https://paperswithcode.com/paper/dr-net-transmission-steered-single-image |
Repo | |
Framework | |
Approximation Algorithms for $\ell_0$-Low Rank Approximation
Title | Approximation Algorithms for $\ell_0$-Low Rank Approximation |
Authors | Karl Bringmann, Pavel Kolev, David P. Woodruff |
Abstract | We study the $\ell_0$-Low Rank Approximation Problem, where the goal is, given an $m \times n$ matrix $A$, to output a rank-$k$ matrix $A'$ for which $\A’-A_0$ is minimized. Here, for a matrix $B$, $\B_0$ denotes the number of its non-zero entries. This NP-hard variant of low rank approximation is natural for problems with no underlying metric, and its goal is to minimize the number of disagreeing data positions. We provide approximation algorithms which significantly improve the running time and approximation factor of previous work. For $k > 1$, we show how to find, in poly$(mn)$ time for every $k$, a rank $O(k \log(n/k))$ matrix $A'$ for which $\A’-A_0 \leq O(k^2 \log(n/k)) \mathrm{OPT}$. To the best of our knowledge, this is the first algorithm with provable guarantees for the $\ell_0$-Low Rank Approximation Problem for $k > 1$, even for bicriteria algorithms. For the well-studied case when $k = 1$, we give a $(2+\epsilon)$-approximation in {\it sublinear time}, which is impossible for other variants of low rank approximation such as for the Frobenius norm. We strengthen this for the well-studied case of binary matrices to obtain a $(1+O(\psi))$-approximation in sublinear time, where $\psi = \mathrm{OPT}/\lVert A\rVert_0$. For small $\psi$, our approximation factor is $1+o(1)$. |
Tasks | |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.11253v2 |
http://arxiv.org/pdf/1710.11253v2.pdf | |
PWC | https://paperswithcode.com/paper/approximation-algorithms-for-ell_0-low-rank |
Repo | |
Framework | |
Image Dehazing using Bilinear Composition Loss Function
Title | Image Dehazing using Bilinear Composition Loss Function |
Authors | Hui Yang, Jinshan Pan, Qiong Yan, Wenxiu Sun, Jimmy Ren, Yu-Wing Tai |
Abstract | In this paper, we introduce a bilinear composition loss function to address the problem of image dehazing. Previous methods in image dehazing use a two-stage approach which first estimate the transmission map followed by clear image estimation. The drawback of a two-stage method is that it tends to boost local image artifacts such as noise, aliasing and blocking. This is especially the case for heavy haze images captured with a low quality device. Our method is based on convolutional neural networks. Unique in our method is the bilinear composition loss function which directly model the correlations between transmission map, clear image, and atmospheric light. This allows errors to be back-propagated to each sub-network concurrently, while maintaining the composition constraint to avoid overfitting of each sub-network. We evaluate the effectiveness of our proposed method using both synthetic and real world examples. Extensive experiments show that our method outperfoms state-of-the-art methods especially for haze images with severe noise level and compressions. |
Tasks | Image Dehazing |
Published | 2017-10-01 |
URL | http://arxiv.org/abs/1710.00279v1 |
http://arxiv.org/pdf/1710.00279v1.pdf | |
PWC | https://paperswithcode.com/paper/image-dehazing-using-bilinear-composition |
Repo | |
Framework | |
Disentangling Factors of Variation by Mixing Them
Title | Disentangling Factors of Variation by Mixing Them |
Authors | Qiyang Hu, Attila Szabó, Tiziano Portenier, Matthias Zwicker, Paolo Favaro |
Abstract | We propose an approach to learn image representations that consist of disentangled factors of variation without exploiting any manual labeling or data domain knowledge. A factor of variation corresponds to an image attribute that can be discerned consistently across a set of images, such as the pose or color of objects. Our disentangled representation consists of a concatenation of feature chunks, each chunk representing a factor of variation. It supports applications such as transferring attributes from one image to another, by simply mixing and unmixing feature chunks, and classification or retrieval based on one or several attributes, by considering a user-specified subset of feature chunks. We learn our representation without any labeling or knowledge of the data domain, using an autoencoder architecture with two novel training objectives: first, we propose an invariance objective to encourage that encoding of each attribute, and decoding of each chunk, are invariant to changes in other attributes and chunks, respectively; second, we include a classification objective, which ensures that each chunk corresponds to a consistently discernible attribute in the represented image, hence avoiding degenerate feature mappings where some chunks are completely ignored. We demonstrate the effectiveness of our approach on the MNIST, Sprites, and CelebA datasets. |
Tasks | |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07410v2 |
http://arxiv.org/pdf/1711.07410v2.pdf | |
PWC | https://paperswithcode.com/paper/disentangling-factors-of-variation-by-mixing |
Repo | |
Framework | |
Using of heterogeneous corpora for training of an ASR system
Title | Using of heterogeneous corpora for training of an ASR system |
Authors | Jan Trmal, Gaurav Kumar, Vimal Manohar, Sanjeev Khudanpur, Matt Post, Paul McNamee |
Abstract | The paper summarizes the development of the LVCSR system built as a part of the Pashto speech-translation system at the SCALE (Summer Camp for Applied Language Exploration) 2015 workshop on “Speech-to-text-translation for low-resource languages”. The Pashto language was chosen as a good “proxy” low-resource language, exhibiting multiple phenomena which make the speech-recognition and and speech-to-text-translation systems development hard. Even when the amount of data is seemingly sufficient, given the fact that the data originates from multiple sources, the preliminary experiments reveal that there is little to no benefit in merging (concatenating) the corpora and more elaborate ways of making use of all of the data must be worked out. This paper concentrates only on the LVCSR part and presents a range of different techniques that were found to be useful in order to benefit from multiple different corpora |
Tasks | Large Vocabulary Continuous Speech Recognition, Speech Recognition |
Published | 2017-06-01 |
URL | http://arxiv.org/abs/1706.00321v1 |
http://arxiv.org/pdf/1706.00321v1.pdf | |
PWC | https://paperswithcode.com/paper/using-of-heterogeneous-corpora-for-training |
Repo | |
Framework | |
First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization
Title | First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization |
Authors | Aryan Mokhtari, Alejandro Ribeiro |
Abstract | This paper studies empirical risk minimization (ERM) problems for large-scale datasets and incorporates the idea of adaptive sample size methods to improve the guaranteed convergence bounds for first-order stochastic and deterministic methods. In contrast to traditional methods that attempt to solve the ERM problem corresponding to the full dataset directly, adaptive sample size schemes start with a small number of samples and solve the corresponding ERM problem to its statistical accuracy. The sample size is then grown geometrically – e.g., scaling by a factor of two – and use the solution of the previous ERM as a warm start for the new ERM. Theoretical analyses show that the use of adaptive sample size methods reduces the overall computational cost of achieving the statistical accuracy of the whole dataset for a broad range of deterministic and stochastic first-order methods. The gains are specific to the choice of method. When particularized to, e.g., accelerated gradient descent and stochastic variance reduce gradient, the computational cost advantage is a logarithm of the number of training samples. Numerical experiments on various datasets confirm theoretical claims and showcase the gains of using the proposed adaptive sample size scheme. |
Tasks | |
Published | 2017-09-02 |
URL | http://arxiv.org/abs/1709.00599v1 |
http://arxiv.org/pdf/1709.00599v1.pdf | |
PWC | https://paperswithcode.com/paper/first-order-adaptive-sample-size-methods-to |
Repo | |
Framework | |
Beyond Low Rank: A Data-Adaptive Tensor Completion Method
Title | Beyond Low Rank: A Data-Adaptive Tensor Completion Method |
Authors | Lei Zhang, Wei Wei, Qinfeng Shi, Chunhua Shen, Anton van den Hengel, Yanning Zhang |
Abstract | Low rank tensor representation underpins much of recent progress in tensor completion. In real applications, however, this approach is confronted with two challenging problems, namely (1) tensor rank determination; (2) handling real tensor data which only approximately fulfils the low-rank requirement. To address these two issues, we develop a data-adaptive tensor completion model which explicitly represents both the low-rank and non-low-rank structures in a latent tensor. Representing the non-low-rank structure separately from the low-rank one allows priors which capture the important distinctions between the two, thus enabling more accurate modelling, and ultimately, completion. Through defining a new tensor rank, we develop a sparsity induced prior for the low-rank structure, with which the tensor rank can be automatically determined. The prior for the non-low-rank structure is established based on a mixture of Gaussians which is shown to be flexible enough, and powerful enough, to inform the completion process for a variety of real tensor data. With these two priors, we develop a Bayesian minimum mean squared error estimate (MMSE) framework for inference which provides the posterior mean of missing entries as well as their uncertainty. Compared with the state-of-the-art methods in various applications, the proposed model produces more accurate completion results. |
Tasks | |
Published | 2017-08-03 |
URL | http://arxiv.org/abs/1708.01008v1 |
http://arxiv.org/pdf/1708.01008v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-low-rank-a-data-adaptive-tensor |
Repo | |
Framework | |
Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding
Title | Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding |
Authors | Will Monroe, Robert X. D. Hawkins, Noah D. Goodman, Christopher Potts |
Abstract | We present a model of pragmatic referring expression interpretation in a grounded communication task (identifying colors from descriptions) that draws upon predictions from two recurrent neural network classifiers, a speaker and a listener, unified by a recursive pragmatic reasoning framework. Experiments show that this combined pragmatic model interprets color descriptions more accurately than the classifiers from which it is built, and that much of this improvement results from combining the speaker and listener perspectives. We observe that pragmatic reasoning helps primarily in the hardest cases: when the model must distinguish very similar colors, or when few utterances adequately express the target color. Our findings make use of a newly-collected corpus of human utterances in color reference games, which exhibit a variety of pragmatic behaviors. We also show that the embedded speaker model reproduces many of these pragmatic behaviors. |
Tasks | |
Published | 2017-03-29 |
URL | http://arxiv.org/abs/1703.10186v2 |
http://arxiv.org/pdf/1703.10186v2.pdf | |
PWC | https://paperswithcode.com/paper/colors-in-context-a-pragmatic-neural-model |
Repo | |
Framework | |
Land Cover Classification from Multi-temporal, Multi-spectral Remotely Sensed Imagery using Patch-Based Recurrent Neural Networks
Title | Land Cover Classification from Multi-temporal, Multi-spectral Remotely Sensed Imagery using Patch-Based Recurrent Neural Networks |
Authors | Atharva Sharma, Xiuwen Liu, Xiaojun Yang |
Abstract | Sustainability of the global environment is dependent on the accurate land cover information over large areas. Even with the increased number of satellite systems and sensors acquiring data with improved spectral, spatial, radiometric and temporal characteristics and the new data distribution policy, most existing land cover datasets were derived from a pixel-based single-date multi-spectral remotely sensed image with low accuracy. To improve the accuracy, the bottleneck is how to develop an accurate and effective image classification technique. By incorporating and utilizing the complete multi-spectral, multi-temporal and spatial information in remote sensing images and considering their inherit spatial and sequential interdependence, we propose a new patch-based RNN (PB-RNN) system tailored for multi-temporal remote sensing data. The system is designed by incorporating distinctive characteristics in multi-temporal remote sensing data. In particular, it uses multi-temporal-spectral-spatial samples and deals with pixels contaminated by clouds/shadow present in the multi-temporal data series. Using a Florida Everglades ecosystem study site covering an area of 771 square kilo-meters, the proposed PB-RNN system has achieved a significant improvement in the classification accuracy over pixel-based RNN system, pixel-based single-imagery NN system, pixel-based multi-images NN system, patch-based single-imagery NN system and patch-based multi-images NN system. For example, the proposed system achieves 97.21% classification accuracy while a pixel-based single-imagery NN system achieves 64.74%. By utilizing methods like the proposed PB-RNN one, we believe that much more accurate land cover datasets can be produced over large areas efficiently. |
Tasks | Image Classification |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00813v1 |
http://arxiv.org/pdf/1708.00813v1.pdf | |
PWC | https://paperswithcode.com/paper/land-cover-classification-from-multi-temporal |
Repo | |
Framework | |
What Can I Do Now? Guiding Users in a World of Automated Decisions
Title | What Can I Do Now? Guiding Users in a World of Automated Decisions |
Authors | Matthias Gallé |
Abstract | More and more processes governing our lives use in some part an automatic decision step, where – based on a feature vector derived from an applicant – an algorithm has the decision power over the final outcome. Here we present a simple idea which gives some of the power back to the applicant by providing her with alternatives which would make the decision algorithm decide differently. It is based on a formalization reminiscent of methods used for evasion attacks, and consists in enumerating the subspaces where the classifiers decides the desired output. This has been implemented for the specific case of decision forests (ensemble methods based on decision trees), mapping the problem to an iterative version of enumerating $k$-cliques. |
Tasks | |
Published | 2017-01-13 |
URL | http://arxiv.org/abs/1701.03755v1 |
http://arxiv.org/pdf/1701.03755v1.pdf | |
PWC | https://paperswithcode.com/paper/what-can-i-do-now-guiding-users-in-a-world-of |
Repo | |
Framework | |
Fine-grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach
Title | Fine-grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach |
Authors | Timnit Gebru, Judy Hoffman, Li Fei-Fei |
Abstract | While fine-grained object recognition is an important problem in computer vision, current models are unlikely to accurately classify objects in the wild. These fully supervised models need additional annotated images to classify objects in every new scenario, a task that is infeasible. However, sources such as e-commerce websites and field guides provide annotated images for many classes. In this work, we study fine-grained domain adaptation as a step towards overcoming the dataset shift between easily acquired annotated images and the real world. Adaptation has not been studied in the fine-grained setting where annotations such as attributes could be used to increase performance. Our work uses an attribute based multi-task adaptation loss to increase accuracy from a baseline of 4.1% to 19.1% in the semi-supervised adaptation case. Prior do- main adaptation works have been benchmarked on small datasets such as [46] with a total of 795 images for some domains, or simplistic datasets such as [41] consisting of digits. We perform experiments on a subset of a new challenging fine-grained dataset consisting of 1,095,021 images of 2, 657 car categories drawn from e-commerce web- sites and Google Street View. |
Tasks | Domain Adaptation, Object Recognition |
Published | 2017-09-07 |
URL | http://arxiv.org/abs/1709.02476v1 |
http://arxiv.org/pdf/1709.02476v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-recognition-in-the-wild-a-multi |
Repo | |
Framework | |