July 28, 2019

3164 words 15 mins read

Paper Group ANR 266

Paper Group ANR 266

People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting. Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality. Visual to Sound: Generating Natural Sound for Videos in the Wild. Reverse Classification Accuracy: Predicting Segmentation Performa …

People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting

Title People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting
Authors Mark Marsden, Kevin McGuinness, Suzanne Little, Ciara E. Keogh, Noel E. O’Connor
Abstract In this paper we propose a technique to adapt a convolutional neural network (CNN) based object counter to additional visual domains and object types while still preserving the original counting function. Domain-specific normalisation and scaling operators are trained to allow the model to adjust to the statistical distributions of the various visual domains. The developed adaptation technique is used to produce a singular patch-based counting regressor capable of counting various object types including people, vehicles, cell nuclei and wildlife. As part of this study a challenging new cell counting dataset in the context of tissue culture and patient diagnosis is constructed. This new collection, referred to as the Dublin Cell Counting (DCC) dataset, is the first of its kind to be made available to the wider computer vision community. State-of-the-art object counting performance is achieved in both the Shanghaitech (parts A and B) and Penguins datasets while competitive performance is observed on the TRANCOS and Modified Bone Marrow (MBM) datasets, all using a shared counting model.
Tasks Object Counting
Published 2017-11-15
URL http://arxiv.org/abs/1711.05586v1
PDF http://arxiv.org/pdf/1711.05586v1.pdf
PWC https://paperswithcode.com/paper/people-penguins-and-petri-dishes-adapting
Repo
Framework

Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality

Title Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality
Authors Alexandre Salle, Aline Villavicencio
Abstract Increasing the capacity of recurrent neural networks (RNN) usually involves augmenting the size of the hidden layer, with significant increase of computational cost. Recurrent neural tensor networks (RNTN) increase capacity using distinct hidden layer weights for each word, but with greater costs in memory usage. In this paper, we introduce restricted recurrent neural tensor networks (r-RNTN) which reserve distinct hidden layer weights for frequent vocabulary words while sharing a single set of weights for infrequent words. Perplexity evaluations show that for fixed hidden layer sizes, r-RNTNs improve language model performance over RNNs using only a small fraction of the parameters of unrestricted RNTNs. These results hold for r-RNTNs using Gated Recurrent Units and Long Short-Term Memory.
Tasks Language Modelling, Tensor Networks
Published 2017-04-03
URL http://arxiv.org/abs/1704.00774v3
PDF http://arxiv.org/pdf/1704.00774v3.pdf
PWC https://paperswithcode.com/paper/restricted-recurrent-neural-tensor-networks
Repo
Framework

Visual to Sound: Generating Natural Sound for Videos in the Wild

Title Visual to Sound: Generating Natural Sound for Videos in the Wild
Authors Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, Tamara L. Berg
Abstract As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.
Tasks
Published 2017-12-04
URL http://arxiv.org/abs/1712.01393v2
PDF http://arxiv.org/pdf/1712.01393v2.pdf
PWC https://paperswithcode.com/paper/visual-to-sound-generating-natural-sound-for
Repo
Framework

Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground Truth

Title Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground Truth
Authors Vanya V. Valindria, Ioannis Lavdas, Wenjia Bai, Konstantinos Kamnitsas, Eric O. Aboagye, Andrea G. Rockall, Daniel Rueckert, Ben Glocker
Abstract When integrating computational tools such as automatic segmentation into clinical practice, it is of utmost importance to be able to assess the level of accuracy on new data, and in particular, to detect when an automatic method fails. However, this is difficult to achieve due to absence of ground truth. Segmentation accuracy on clinical data might be different from what is found through cross-validation because validation data is often used during incremental method development, which can lead to overfitting and unrealistic performance expectations. Before deployment, performance is quantified using different metrics, for which the predicted segmentation is compared to a reference segmentation, often obtained manually by an expert. But little is known about the real performance after deployment when a reference is unavailable. In this paper, we introduce the concept of reverse classification accuracy (RCA) as a framework for predicting the performance of a segmentation method on new data. In RCA we take the predicted segmentation from a new image to train a reverse classifier which is evaluated on a set of reference images with available ground truth. The hypothesis is that if the predicted segmentation is of good quality, then the reverse classifier will perform well on at least some of the reference images. We validate our approach on multi-organ segmentation with different classifiers and segmentation methods. Our results indicate that it is indeed possible to predict the quality of individual segmentations, in the absence of ground truth. Thus, RCA is ideal for integration into automatic processing pipelines in clinical routine and as part of large-scale image analysis studies.
Tasks
Published 2017-02-11
URL http://arxiv.org/abs/1702.03407v1
PDF http://arxiv.org/pdf/1702.03407v1.pdf
PWC https://paperswithcode.com/paper/reverse-classification-accuracy-predicting
Repo
Framework

Accelerating HPC codes on Intel(R) Omni-Path Architecture networks: From particle physics to Machine Learning

Title Accelerating HPC codes on Intel(R) Omni-Path Architecture networks: From particle physics to Machine Learning
Authors Peter Boyle, Michael Chuvelev, Guido Cossu, Christopher Kelly, Christoph Lehner, Lawrence Meadows
Abstract We discuss practical methods to ensure near wirespeed performance from clusters with either one or two Intel(R) Omni-Path host fabric interfaces (HFI) per node, and Intel(R) Xeon Phi(TM) 72xx (Knight’s Landing) processors, and using the Linux operating system. The study evaluates the performance improvements achievable and the required programming approaches in two distinct example problems: firstly in Cartesian communicator halo exchange problems, appropriate for structured grid PDE solvers that arise in quantum chromodynamics simulations of particle physics, and secondly in gradient reduction appropriate to synchronous stochastic gradient descent for machine learning. As an example, we accelerate a published Baidu Research reduction code and obtain a factor of ten speedup over the original code using the techniques discussed in this paper. This displays how a factor of ten speedup in strongly scaled distributed machine learning could be achieved when synchronous stochastic gradient descent is massively parallelised with a fixed mini-batch size. We find a significant improvement in performance robustness when memory is obtained using carefully allocated 2MB “huge” virtual memory pages, implying that either non-standard allocation routines should be used for communication buffers. These can be accessed via a LD_PRELOAD override in the manner suggested by libhugetlbfs. We make use of a the Intel(R) MPI 2019 library “Technology Preview” and underlying software to enable thread concurrency throughout the communication software stake via multiple PSM2 endpoints per process and use of multiple independent MPI communicators. When using a single MPI process per node, we find that this greatly accelerates delivered bandwidth in many core Intel(R) Xeon Phi processors.
Tasks
Published 2017-11-13
URL http://arxiv.org/abs/1711.04883v1
PDF http://arxiv.org/pdf/1711.04883v1.pdf
PWC https://paperswithcode.com/paper/accelerating-hpc-codes-on-intelr-omni-path
Repo
Framework

Recognition of Grasp Points for Clothes Manipulation under unconstrained Conditions

Title Recognition of Grasp Points for Clothes Manipulation under unconstrained Conditions
Authors Luz María Martínez, Javier Ruiz-del-Solar
Abstract In this work a system for recognizing grasp points in RGB-D images is proposed. This system is intended to be used by a domestic robot when deploying clothes lying at a random position on a table. By taking into consideration that the grasp points are usually near key parts of clothing, such as the waist of pants or the neck of a shirt. The proposed system attempts to detect these key parts first, using a local multivariate contour that adapts its shape accordingly. Then, the proposed system applies the Vessel Enhancement filter to identify wrinkles in the clothes, allowing to compute a roughness index for the clothes. Finally, by mixing (i) the key part contours and (ii) the roughness information obtained by the vessel filter, the system is able to recognize grasp points for unfolding a piece of clothing. The recognition system is validated using realistic RGB-D images of different cloth types.
Tasks
Published 2017-06-20
URL http://arxiv.org/abs/1706.06694v1
PDF http://arxiv.org/pdf/1706.06694v1.pdf
PWC https://paperswithcode.com/paper/recognition-of-grasp-points-for-clothes
Repo
Framework

Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames

Title Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames
Authors Suryansh Kumar, Yuchao Dai, Hongdong Li
Abstract This paper proposes a new approach for monocular dense 3D reconstruction of a complex dynamic scene from two perspective frames. By applying superpixel over-segmentation to the image, we model a generically dynamic (hence non-rigid) scene with a piecewise planar and rigid approximation. In this way, we reduce the dynamic reconstruction problem to a “3D jigsaw puzzle” problem which takes pieces from an unorganized “soup of superpixels”. We show that our method provides an effective solution to the inherent relative scale ambiguity in structure-from-motion. Since our method does not assume a template prior, or per-object segmentation, or knowledge about the rigidity of the dynamic scene, it is applicable to a wide range of scenarios. Extensive experiments on both synthetic and real monocular sequences demonstrate the superiority of our method compared with the state-of-the-art methods.
Tasks 3D Reconstruction, Semantic Segmentation
Published 2017-08-15
URL http://arxiv.org/abs/1708.04398v2
PDF http://arxiv.org/pdf/1708.04398v2.pdf
PWC https://paperswithcode.com/paper/monocular-dense-3d-reconstruction-of-a
Repo
Framework

Simplified Minimal Gated Unit Variations for Recurrent Neural Networks

Title Simplified Minimal Gated Unit Variations for Recurrent Neural Networks
Authors Joel Heck, Fathi M. Salem
Abstract Recurrent neural networks with various types of hidden units have been used to solve a diverse range of problems involving sequence data. Two of the most recent proposals, gated recurrent units (GRU) and minimal gated units (MGU), have shown comparable promising results on example public datasets. In this paper, we introduce three model variants of the minimal gated unit (MGU) which further simplify that design by reducing the number of parameters in the forget-gate dynamic equation. These three model variants, referred to simply as MGU1, MGU2, and MGU3, were tested on sequences generated from the MNIST dataset and from the Reuters Newswire Topics (RNT) dataset. The new models have shown similar accuracy to the MGU model while using fewer parameters and thus lowering training expense. One model variant, namely MGU2, performed better than MGU on the datasets considered, and thus may be used as an alternate to MGU or GRU in recurrent neural networks.
Tasks
Published 2017-01-12
URL http://arxiv.org/abs/1701.03452v1
PDF http://arxiv.org/pdf/1701.03452v1.pdf
PWC https://paperswithcode.com/paper/simplified-minimal-gated-unit-variations-for
Repo
Framework

Fuzzy clustering using linguistic-valued exponent

Title Fuzzy clustering using linguistic-valued exponent
Authors Hung Thai Le, Khang Ding Tran, Hung Van Le
Abstract The purpose of this paper is to study the algorithm FCM and some of its famous innovations, analyse and discover the method of applying hedge algebra theory that uses algebra to represent linguistic-valued variables, to FCM. Then, this paper will propose a new FCM-based algorithm which uses hedge algebra to model FCM’s exponent parameter. Finally, the design, analysis and implementation of the new algorithm as well some experimental results will be presented to prove our algorithm’s capacity of solving clustering problems in practice.
Tasks
Published 2017-10-30
URL http://arxiv.org/abs/1711.01149v1
PDF http://arxiv.org/pdf/1711.01149v1.pdf
PWC https://paperswithcode.com/paper/fuzzy-clustering-using-linguistic-valued
Repo
Framework

Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images

Title Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images
Authors Hao Zhou, Jin Sun, Yaser Yacoob, David W. Jacobs
Abstract Lighting estimation from face images is an important task and has applications in many areas such as image editing, intrinsic image decomposition, and image forgery detection. We propose to train a deep Convolutional Neural Network (CNN) to regress lighting parameters from a single face image. Lacking massive ground truth lighting labels for face images in the wild, we use an existing method to estimate lighting parameters, which are treated as ground truth with unknown noises. To alleviate the effect of such noises, we utilize the idea of Generative Adversarial Networks (GAN) and propose a Label Denoising Adversarial Network (LDAN) to make use of synthetic data with accurate ground truth to help train a deep CNN for lighting regression on real face images. Experiments show that our network outperforms existing methods in producing consistent lighting parameters of different faces under similar lighting conditions. Moreover, our method is 100,000 times faster in execution time than prior optimization-based lighting estimation approaches.
Tasks Denoising, Intrinsic Image Decomposition
Published 2017-09-06
URL http://arxiv.org/abs/1709.01993v1
PDF http://arxiv.org/pdf/1709.01993v1.pdf
PWC https://paperswithcode.com/paper/label-denoising-adversarial-network-ldan-for
Repo
Framework

Contextual Object Detection with a Few Relevant Neighbors

Title Contextual Object Detection with a Few Relevant Neighbors
Authors Ehud Barnea, Ohad Ben-Shahar
Abstract A natural way to improve the detection of objects is to consider the contextual constraints imposed by the detection of additional objects in a given scene. In this work, we exploit the spatial relations between objects in order to improve detection capacity, as well as analyze various properties of the contextual object detection problem. To precisely calculate context-based probabilities of objects, we developed a model that examines the interactions between objects in an exact probabilistic setting, in contrast to previous methods that typically utilize approximations based on pairwise interactions. Such a scheme is facilitated by the realistic assumption that the existence of an object in any given location is influenced by only few informative locations in space. Based on this assumption, we suggest a method for identifying these relevant locations and integrating them into a mostly exact calculation of probability based on their raw detector responses. This scheme is shown to improve detection results and provides unique insights about the process of contextual inference for object detection. We show that it is generally difficult to learn that a particular object reduces the probability of another, and that in cases when the context and detector strongly disagree this learning becomes virtually impossible for the purposes of improving the results of an object detector. Finally, we demonstrate improved detection results through use of our approach as applied to the PASCAL VOC and COCO datasets.
Tasks Object Detection
Published 2017-11-15
URL http://arxiv.org/abs/1711.05705v3
PDF http://arxiv.org/pdf/1711.05705v3.pdf
PWC https://paperswithcode.com/paper/contextual-object-detection-with-a-few
Repo
Framework

Energy Prediction using Spatiotemporal Pattern Networks

Title Energy Prediction using Spatiotemporal Pattern Networks
Authors Zhanhong Jiang, Chao Liu, Adedotun Akintayo, Gregor Henze, Soumik Sarkar
Abstract This paper presents a novel data-driven technique based on the spatiotemporal pattern network (STPN) for energy/power prediction for complex dynamical systems. Built on symbolic dynamic filtering, the STPN framework is used to capture not only the individual system characteristics but also the pair-wise causal dependencies among different sub-systems. For quantifying the causal dependency, a mutual information based metric is presented. An energy prediction approach is subsequently proposed based on the STPN framework. For validating the proposed scheme, two case studies are presented, one involving wind turbine power prediction (supply side energy) using the Western Wind Integration data set generated by the National Renewable Energy Laboratory (NREL) for identifying the spatiotemporal characteristics, and the other, residential electric energy disaggregation (demand side energy) using the Building America 2010 data set from NREL for exploring the temporal features. In the energy disaggregation context, convex programming techniques beyond the STPN framework are developed and applied to achieve improved disaggregation performance.
Tasks
Published 2017-02-03
URL http://arxiv.org/abs/1702.01125v1
PDF http://arxiv.org/pdf/1702.01125v1.pdf
PWC https://paperswithcode.com/paper/energy-prediction-using-spatiotemporal
Repo
Framework

Perception-based energy functions in seam-cutting

Title Perception-based energy functions in seam-cutting
Authors Nan Li, Tianli Liao, Chao Wang
Abstract Image stitching is challenging in consumer-level photography, due to alignment difficulties in unconstrained shooting environment. Recent studies show that seam-cutting approaches can effectively relieve artifacts generated by local misalignment. Normally, seam-cutting is described in terms of energy minimization, however, few of existing methods consider human perception in their energy functions, which sometimes causes that a seam with minimum energy is not most invisible in the overlapping region. In this paper, we propose a novel perception-based energy function in the seam-cutting framework, which considers the nonlinearity and the nonuniformity of human perception in energy minimization. Our perception-based approach adopts a sigmoid metric to characterize the perception of color discrimination, and a saliency weight to simulate that human eyes incline to pay more attention to salient objects. In addition, our seam-cutting composition can be easily implemented into other stitching pipelines. Experiments show that our method outperforms the seam-cutting method of the normal energy function, and a user study demonstrates that our composed results are more consistent with human perception.
Tasks Image Stitching
Published 2017-01-22
URL http://arxiv.org/abs/1701.06141v1
PDF http://arxiv.org/pdf/1701.06141v1.pdf
PWC https://paperswithcode.com/paper/perception-based-energy-functions-in-seam
Repo
Framework

Convergence Rates of Variational Posterior Distributions

Title Convergence Rates of Variational Posterior Distributions
Authors Fengshuo Zhang, Chao Gao
Abstract We study convergence rates of variational posterior distributions for nonparametric and high-dimensional inference. We formulate general conditions on prior, likelihood, and variational class that characterize the convergence rates. Under similar “prior mass and testing” conditions considered in the literature, the rate is found to be the sum of two terms. The first term stands for the convergence rate of the true posterior distribution, and the second term is contributed by the variational approximation error. For a class of priors that admit the structure of a mixture of product measures, we propose a novel prior mass condition, under which the variational approximation error of the mean-field class is dominated by convergence rate of the true posterior. We demonstrate the applicability of our general results for various models, prior distributions and variational classes by deriving convergence rates of the corresponding variational posteriors.
Tasks
Published 2017-12-07
URL https://arxiv.org/abs/1712.02519v4
PDF https://arxiv.org/pdf/1712.02519v4.pdf
PWC https://paperswithcode.com/paper/convergence-rates-of-variational-posterior
Repo
Framework

Priv’IT: Private and Sample Efficient Identity Testing

Title Priv’IT: Private and Sample Efficient Identity Testing
Authors Bryan Cai, Constantinos Daskalakis, Gautam Kamath
Abstract We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\cal D$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\varepsilon$, accuracy parameter $\alpha$, and requirements $\beta_{\rm I}$ and $\beta_{\rm II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_{\rm{TV}}(p,q) \geq \alpha$. We provide theoretical bounds for the sample size ${\cal D}$ so that our method both satisfies $(\varepsilon,0)$-differential privacy, and guarantees $\beta_{\rm I}$ and $\beta_{\rm II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.
Tasks
Published 2017-03-29
URL http://arxiv.org/abs/1703.10127v3
PDF http://arxiv.org/pdf/1703.10127v3.pdf
PWC https://paperswithcode.com/paper/privit-private-and-sample-efficient-identity
Repo
Framework
comments powered by Disqus