January 28, 2020

3134 words 15 mins read

Paper Group ANR 864

Paper Group ANR 864

Road User Detection in Videos. Comment on “Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network”. Chart Auto-Encoders for Manifold Structured Data. A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms. Representation Learning of Music Using Artist, Album, and Track Information. Validation of image-guided c …

Road User Detection in Videos

Title Road User Detection in Videos
Authors Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, Pierre Gravel
Abstract Successive frames of a video are highly redundant, and the most popular object detection methods do not take advantage of this fact. Using multiple consecutive frames can improve detection of small objects or difficult examples and can improve speed and detection consistency in a video sequence, for instance by interpolating features between frames. In this work, a novel approach is introduced to perform online video object detection using two consecutive frames of video sequences involving road users. Two new models, RetinaNet-Double and RetinaNet-Flow, are proposed, based respectively on the concatenation of a target frame with a preceding frame, and the concatenation of the optical flow with the target frame. The models are trained and evaluated on three public datasets. Experiments show that using a preceding frame improves performance over single frame detectors, but using explicit optical flow usually does not.
Tasks Object Detection, Optical Flow Estimation, Video Object Detection
Published 2019-03-28
URL http://arxiv.org/abs/1903.12049v1
PDF http://arxiv.org/pdf/1903.12049v1.pdf
PWC https://paperswithcode.com/paper/road-user-detection-in-videos
Repo
Framework

Comment on “Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network”

Title Comment on “Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network”
Authors Roland S. Zimmermann
Abstract A recent paper by Liu et al. combines the topics of adversarial training and Bayesian Neural Networks (BNN) and suggests that adversarially trained BNNs are more robust against adversarial attacks than their non-Bayesian counterparts. Here, I analyze the proposed defense and suggest that one needs to adjust the adversarial attack to incorporate the stochastic nature of a Bayesian network to perform an accurate evaluation of its robustness. Using this new type of attack I show that there appears to be no strong evidence for higher robustness of the adversarially trained BNNs.
Tasks Adversarial Attack, Adversarial Defense
Published 2019-07-01
URL https://arxiv.org/abs/1907.00895v1
PDF https://arxiv.org/pdf/1907.00895v1.pdf
PWC https://paperswithcode.com/paper/comment-on-adv-bnn-improved-adversarial
Repo
Framework

Chart Auto-Encoders for Manifold Structured Data

Title Chart Auto-Encoders for Manifold Structured Data
Authors Stefan Schonsheck, Jie Chen, Rongjie Lai
Abstract Auto-encoding and generative models have made tremendous successes in image and signal representation learning and generation. These models, however, generally employ the full Euclidean space or a bounded subset (such as $[0,1]^l$) as the latent space, whose flat geometry is often too simplistic to meaningfully reflect the topological structure of the data. This paper aims at exploring a universal geometric structure of the latent space for better data representation. Inspired by differential geometry, we propose a Chart Auto-Encoder (CAE), which captures the manifold structure of the data with multiple charts and transition functions among them. CAE translates the mathematical definition of manifold through parameterizing the entire data set as a collection of overlapping charts, creating local latent representations. These representations are an enhancement of the single-charted latent space commonly employed in auto-encoding models, as they reflect the intrinsic structure of the manifold. Therefore, CAE achieves a more accurate approximation of data and generates realistic synthetic examples. We demonstrate the efficacy of CAEs through a series experiments with synthetic and real-life data which illustrate that CAEs can out-preform variational auto-encoders on reconstruction tasks while using much smaller latent spaces.
Tasks Representation Learning
Published 2019-12-20
URL https://arxiv.org/abs/1912.10094v1
PDF https://arxiv.org/pdf/1912.10094v1.pdf
PWC https://paperswithcode.com/paper/chart-auto-encoders-for-manifold-structured-1
Repo
Framework

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

Title A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
Authors Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, Christopher Pal
Abstract We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional changes, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to sparse expected gradients and a lower effective number of degrees of freedom needing to be relearned while adapting to the change. It motivates using the speed of adaptation to a modified distribution as a meta-learning objective. We demonstrate how this can be used to determine the cause-effect relationship between two observed variables. The distributional changes do not need to correspond to standard interventions (clamping a variable), and the learner has no direct knowledge of these interventions. We show that causal structures can be parameterized via continuous variables and learned end-to-end. We then explore how these ideas could be used to also learn an encoder that would map low-level observed variables to unobserved causal variables leading to faster adaptation out-of-distribution, learning a representation space where one can satisfy the assumptions of independent mechanisms and of small and sparse changes in these mechanisms due to actions and non-stationarities.
Tasks Meta-Learning
Published 2019-01-30
URL http://arxiv.org/abs/1901.10912v2
PDF http://arxiv.org/pdf/1901.10912v2.pdf
PWC https://paperswithcode.com/paper/a-meta-transfer-objective-for-learning-to
Repo
Framework

Representation Learning of Music Using Artist, Album, and Track Information

Title Representation Learning of Music Using Artist, Album, and Track Information
Authors Jongpil Lee, Jiyoung Park, Juhan Nam
Abstract Supervised music representation learning has been performed mainly using semantic labels such as music genres. However, annotating music with semantic labels requires time and cost. In this work, we investigate the use of factual metadata such as artist, album, and track information, which are naturally annotated to songs, for supervised music representation learning. The results show that each of the metadata has individual concept characteristics, and using them jointly improves overall performance.
Tasks Representation Learning
Published 2019-06-27
URL https://arxiv.org/abs/1906.11783v1
PDF https://arxiv.org/pdf/1906.11783v1.pdf
PWC https://paperswithcode.com/paper/representation-learning-of-music-using-artist-1
Repo
Framework

Validation of image-guided cochlear implant programming techniques

Title Validation of image-guided cochlear implant programming techniques
Authors Yiyuan Zhao, Jianing Wang, Rui Li, Robert F. Labadie, Benoit M. Dawant, Jack H. Noble
Abstract Cochlear implants (CIs) are a standard treatment for patients who experience severe to profound hearing loss. Recent studies have shown that hearing outcome is correlated with intra-cochlear anatomy and electrode placement. Our group has developed image-guided CI programming (IGCIP) techniques that use image analysis methods to both segment the inner ear structures in pre- or post-implantation CT images and localize the CI electrodes in post-implantation CT images. This permits to assist audiologists with CI programming by suggesting which among the contacts should be deactivated to reduce electrode interaction that is known to affect outcomes. Clinical studies have shown that IGCIP can improve hearing outcomes for CI recipients. However, the sensitivity of IGCIP with respect to the accuracy of the two major steps: electrode localization and intra-cochlear anatomy segmentation, is unknown. In this article, we create a ground truth dataset with conventional CT and micro-CT images of 35 temporal bone specimens to both rigorously characterize the accuracy of these two steps and assess how inaccuracies in these steps affect the overall results. Our study results show that when clinical pre- and post-implantation CTs are available, IGCIP produces results that are comparable to those obtained with the corresponding ground truth in 86.7% of the subjects tested. When only post-implantation CTs are available, this number is 83.3%. These results suggest that our current method is robust to errors in segmentation and localization but also that it can be improved upon. Keywords: cochlear implant, ground truth, segmentation, validation
Tasks
Published 2019-09-23
URL https://arxiv.org/abs/1909.10137v1
PDF https://arxiv.org/pdf/1909.10137v1.pdf
PWC https://paperswithcode.com/paper/190910137
Repo
Framework

Emergent properties of the local geometry of neural loss landscapes

Title Emergent properties of the local geometry of neural loss landscapes
Authors Stanislav Fort, Surya Ganguli
Abstract The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training. Indeed recent works have observed 4 striking local properties of neural loss landscapes on classification tasks: (1) the landscape exhibits exactly $C$ directions of high positive curvature, where $C$ is the number of classes; (2) gradient directions are largely confined to this extremely low dimensional subspace of positive Hessian curvature, leaving the vast majority of directions in weight space unexplored; (3) gradient descent transiently explores intermediate regions of higher positive curvature before eventually finding flatter minima; (4) training can be successful even when confined to low dimensional {\it random} affine hyperplanes, as long as these hyperplanes intersect a Goldilocks zone of higher than average curvature. We develop a simple theoretical model of gradients and Hessians, justified by numerical experiments on architectures and datasets used in practice, that {\it simultaneously} accounts for all $4$ of these surprising and seemingly unrelated properties. Our unified model provides conceptual insights into the emergence of these properties and makes connections with diverse topics in neural networks, random matrix theory, and spin glasses, including the neural tangent kernel, BBP phase transitions, and Derrida’s random energy model.
Tasks
Published 2019-10-14
URL https://arxiv.org/abs/1910.05929v1
PDF https://arxiv.org/pdf/1910.05929v1.pdf
PWC https://paperswithcode.com/paper/emergent-properties-of-the-local-geometry-of
Repo
Framework

Constraining the Parameters of High-Dimensional Models with Active Learning

Title Constraining the Parameters of High-Dimensional Models with Active Learning
Authors Sascha Caron, Tom Heskes, Sydney Otten, Bob Stienen
Abstract Constraining the parameters of physical models with $>5-10$ parameters is a widespread problem in fields like particle physics and astronomy. The generation of data to explore this parameter space often requires large amounts of computational resources. The commonly used solution of reducing the number of relevant physical parameters hampers the generality of the results. In this paper we show that this problem can be alleviated by the use of active learning. We illustrate this with examples from high energy physics, a field where simulations are often expensive and parameter spaces are high-dimensional. We show that the active learning techniques query-by-committee and query-by-dropout-committee allow for the identification of model points in interesting regions of high-dimensional parameter spaces (e.g. around decision boundaries). This makes it possible to constrain model parameters more efficiently than is currently done with the most common sampling algorithms and to train better performing machine learning models on the same amount of data. Code implementing the experiments in this paper can be found on GitHub.
Tasks Active Learning
Published 2019-05-19
URL https://arxiv.org/abs/1905.08628v2
PDF https://arxiv.org/pdf/1905.08628v2.pdf
PWC https://paperswithcode.com/paper/constraining-the-parameters-of-high
Repo
Framework

VIANA: Visual Interactive Annotation of Argumentation

Title VIANA: Visual Interactive Annotation of Argumentation
Authors Fabian Sperrle, Rita Sevastjanova, Rebecca Kehlbeck, Mennatallah El-Assady
Abstract Argumentation Mining addresses the challenging tasks of identifying boundaries of argumentative text fragments and extracting their relationships. Fully automated solutions do not reach satisfactory accuracy due to their insufficient incorporation of semantics and domain knowledge. Therefore, experts currently rely on time-consuming manual annotations. In this paper, we present a visual analytics system that augments the manual annotation process by automatically suggesting which text fragments to annotate next. The accuracy of those suggestions is improved over time by incorporating linguistic knowledge and language modeling to learn a measure of argument similarity from user interactions. Based on a long-term collaboration with domain experts, we identify and model five high-level analysis tasks. We enable close reading and note-taking, annotation of arguments, argument reconstruction, extraction of argument relations, and exploration of argument graphs. To avoid context switches, we transition between all views through seamless morphing, visually anchoring all text- and graph-based layers. We evaluate our system with a two-stage expert user study based on a corpus of presidential debates. The results show that experts prefer our system over existing solutions due to the speedup provided by the automatic suggestions and the tight integration between text and graph views.
Tasks Language Modelling
Published 2019-07-29
URL https://arxiv.org/abs/1907.12413v1
PDF https://arxiv.org/pdf/1907.12413v1.pdf
PWC https://paperswithcode.com/paper/viana-visual-interactive-annotation-of
Repo
Framework

You can’t see what you can’t see: Experimental evidence for how much relevant information may be missed due to Google’s Web search personalisation

Title You can’t see what you can’t see: Experimental evidence for how much relevant information may be missed due to Google’s Web search personalisation
Authors Cameron Lai, Markus Luczak-Roesch
Abstract The influence of Web search personalisation on professional knowledge work is an understudied area. Here we investigate how public sector officials self-assess their dependency on the Google Web search engine, whether they are aware of the potential impact of algorithmic biases on their ability to retrieve all relevant information, and how much relevant information may actually be missed due to Web search personalisation. We find that the majority of participants in our experimental study are neither aware that there is a potential problem nor do they have a strategy to mitigate the risk of missing relevant information when performing online searches. Most significantly, we provide empirical evidence that up to 20% of relevant information may be missed due to Web search personalisation. This work has significant implications for Web research by public sector professionals, who should be provided with training about the potential algorithmic biases that may affect their judgments and decision making, as well as clear guidelines how to minimise the risk of missing relevant information.
Tasks Decision Making
Published 2019-04-30
URL https://arxiv.org/abs/1904.13022v2
PDF https://arxiv.org/pdf/1904.13022v2.pdf
PWC https://paperswithcode.com/paper/you-cant-see-what-you-cant-see-experimental
Repo
Framework

Neural Voice Puppetry: Audio-driven Facial Reenactment

Title Neural Voice Puppetry: Audio-driven Facial Reenactment
Authors Justus Thies, Mohamed Elgharib, Ayush Tewari, Christian Theobalt, Matthias Nießner
Abstract We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis. Given an audio sequence of a source person or digital assistant, we generate a photo-realistic output video of a target person that is in sync with the audio of the source input. This audio-driven facial reenactment is driven by a deep neural network that employs a latent 3D face model space. Through the underlying 3D representation, the model inherently learns temporal stability while we leverage neural rendering to generate photo-realistic output frames. Our approach generalizes across different people, allowing us to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches. Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head. We demonstrate the capabilities of our method in a series of audio- and text-based puppetry examples. Our method is not only more general than existing works since we are generic to the input person, but we also show superior visual and lip sync quality compared to photo-realistic audio- and video-driven reenactment techniques.
Tasks
Published 2019-12-11
URL https://arxiv.org/abs/1912.05566v1
PDF https://arxiv.org/pdf/1912.05566v1.pdf
PWC https://paperswithcode.com/paper/neural-voice-puppetry-audio-driven-facial
Repo
Framework

Face Recognition using Compressive Sensing

Title Face Recognition using Compressive Sensing
Authors Slavko Kovacevic, Vuko Djaletic, Jelena Vukovic
Abstract This paper deals with the Compressive Sensing implementation in the Face Recognition problem. Compressive Sensing is new approach in signal processing with a single goal to recover signal from small set of available samples. Compressive Sensing finds its usage in many real applications as it lowers the memory demand and acquisition time, and therefore allows dealing with huge data in the fastest manner. In this paper, the undersampled signal is recovered using the algorithm based on Total Variation minimization. The theory is verified with an experimental results using different percentage of signal samples.
Tasks Compressive Sensing, Face Recognition
Published 2019-02-06
URL http://arxiv.org/abs/1902.05388v1
PDF http://arxiv.org/pdf/1902.05388v1.pdf
PWC https://paperswithcode.com/paper/face-recognition-using-compressive-sensing
Repo
Framework

Efficient feature embedding of 3D brain MRI images for content-based image retrieval with deep metric learning

Title Efficient feature embedding of 3D brain MRI images for content-based image retrieval with deep metric learning
Authors Yuto Onga, Shingo Fujiyama, Hayato Arai, Yusuke Chayama, Hitoshi Iyatomi, Kenichi Oishi
Abstract Increasing numbers of MRI brain scans, improvements in image resolution, and advancements in MRI acquisition technology are causing significant increases in the demand for and burden on radiologists’ efforts in terms of reading and interpreting brain MRIs. Content-based image retrieval (CBIR) is an emerging technology for reducing this burden by supporting the reading of medical images. High dimensionality is a major challenge in developing a CBIR system that is applicable for 3D brain MRIs. In this study, we propose a system called disease-oriented data concentration with metric learning (DDCML). In DDCML, we introduce deep metric learning to a 3D convolutional autoencoder (CAE). Our proposed DDCML scheme achieves a high dimensional compression rate (4096:1) while preserving the disease-related anatomical features that are important for medical image classification. The low-dimensional representation obtained by DDCML improved the clustering performance by 29.1% compared to plain 3D-CAE in terms of discriminating Alzheimer’s disease patients from healthy subjects, and successfully reproduced the relationships of the severity of disease categories that were not included in the training.
Tasks Content-Based Image Retrieval, Image Classification, Image Retrieval, Metric Learning
Published 2019-12-04
URL https://arxiv.org/abs/1912.01824v1
PDF https://arxiv.org/pdf/1912.01824v1.pdf
PWC https://paperswithcode.com/paper/efficient-feature-embedding-of-3d-brain-mri
Repo
Framework

Content-based image retrieval speedup

Title Content-based image retrieval speedup
Authors Sadegh Fadaei, Abdolreza Rashno, Elyas Rashno
Abstract Content-based image retrieval (CBIR) is a task of retrieving images from their contents. Since retrieval process is a time-consuming task in large image databases, acceleration methods can be very useful. This paper presents a novel method to speed up CBIR systems. In the proposed method, first Zernike moments are extracted from query image and an interval is calculated for that query. Images in database which are out of the interval are ignored in retrieval process. Therefore, a database reduction occurs before retrieval which leads to speed up. It is shown that in reduced database, relevant images to query image are preserved and irrelevant images are throwed away. Therefore, the proposed method speed up retrieval process and preserve CBIR accuracy, simultaneously.
Tasks Content-Based Image Retrieval, Image Retrieval
Published 2019-11-26
URL https://arxiv.org/abs/1911.11379v2
PDF https://arxiv.org/pdf/1911.11379v2.pdf
PWC https://paperswithcode.com/paper/content-based-image-retrieval-speedup
Repo
Framework

Crawler for Image Acquisition from World Wide Web

Title Crawler for Image Acquisition from World Wide Web
Authors R Rajkumar, Dr. M V Sudhamani
Abstract Due to the advancement in computer communication and storage technologies, large amount of image data is available on World Wide Web (WWW). In order to locate a particular set of images the available search engines may be used with the help of keywords. Here, the filtering of unwanted data is not done. For the purpose of retrieving relevant images with appropriate keyword(s) an image crawler is designed and implemented. Here, keyword(s) are submitted as query and with the help of sender engine, images are downloaded along with metadata like URL, filename, file size, file access date and time etc.,. Later, with the help of URL, images already present in repository and newly downloaded are compared for uniqueness. Only unique URLs are in turn considered and stored in repository. The images in the repository are used to build novel Content Based Image Retrieval (CBIR) system in future. This repository may be used for various purposes. This image crawler tool is useful in building image datasets which can be used by any CBIR system for training and testing purposes.
Tasks Content-Based Image Retrieval, Image Retrieval
Published 2019-11-11
URL https://arxiv.org/abs/1911.11066v1
PDF https://arxiv.org/pdf/1911.11066v1.pdf
PWC https://paperswithcode.com/paper/crawler-for-image-acquisition-from-world-wide
Repo
Framework
comments powered by Disqus