January 28, 2020

3134 words 15 mins read

Paper Group ANR 864

Road User Detection in Videos. Comment on “Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network”. Chart Auto-Encoders for Manifold Structured Data. A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms. Representation Learning of Music Using Artist, Album, and Track Information. Validation of image-guided c …

Road User Detection in Videos


Title	Road User Detection in Videos
Authors	Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, Pierre Gravel
Abstract	Successive frames of a video are highly redundant, and the most popular object detection methods do not take advantage of this fact. Using multiple consecutive frames can improve detection of small objects or difficult examples and can improve speed and detection consistency in a video sequence, for instance by interpolating features between frames. In this work, a novel approach is introduced to perform online video object detection using two consecutive frames of video sequences involving road users. Two new models, RetinaNet-Double and RetinaNet-Flow, are proposed, based respectively on the concatenation of a target frame with a preceding frame, and the concatenation of the optical flow with the target frame. The models are trained and evaluated on three public datasets. Experiments show that using a preceding frame improves performance over single frame detectors, but using explicit optical flow usually does not.
Tasks	Object Detection, Optical Flow Estimation, Video Object Detection
Published	2019-03-28
URL	http://arxiv.org/abs/1903.12049v1
PDF	http://arxiv.org/pdf/1903.12049v1.pdf
PWC	https://paperswithcode.com/paper/road-user-detection-in-videos
Repo
Framework

Comment on “Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network”


Title	Comment on “Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network”
Authors	Roland S. Zimmermann
Abstract	A recent paper by Liu et al. combines the topics of adversarial training and Bayesian Neural Networks (BNN) and suggests that adversarially trained BNNs are more robust against adversarial attacks than their non-Bayesian counterparts. Here, I analyze the proposed defense and suggest that one needs to adjust the adversarial attack to incorporate the stochastic nature of a Bayesian network to perform an accurate evaluation of its robustness. Using this new type of attack I show that there appears to be no strong evidence for higher robustness of the adversarially trained BNNs.
Tasks	Adversarial Attack, Adversarial Defense
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00895v1
PDF	https://arxiv.org/pdf/1907.00895v1.pdf
PWC	https://paperswithcode.com/paper/comment-on-adv-bnn-improved-adversarial
Repo
Framework

Chart Auto-Encoders for Manifold Structured Data


Title	Chart Auto-Encoders for Manifold Structured Data
Authors	Stefan Schonsheck, Jie Chen, Rongjie Lai
Abstract	Auto-encoding and generative models have made tremendous successes in image and signal representation learning and generation. These models, however, generally employ the full Euclidean space or a bounded subset (such as $[0,1]^l$) as the latent space, whose flat geometry is often too simplistic to meaningfully reflect the topological structure of the data. This paper aims at exploring a universal geometric structure of the latent space for better data representation. Inspired by differential geometry, we propose a Chart Auto-Encoder (CAE), which captures the manifold structure of the data with multiple charts and transition functions among them. CAE translates the mathematical definition of manifold through parameterizing the entire data set as a collection of overlapping charts, creating local latent representations. These representations are an enhancement of the single-charted latent space commonly employed in auto-encoding models, as they reflect the intrinsic structure of the manifold. Therefore, CAE achieves a more accurate approximation of data and generates realistic synthetic examples. We demonstrate the efficacy of CAEs through a series experiments with synthetic and real-life data which illustrate that CAEs can out-preform variational auto-encoders on reconstruction tasks while using much smaller latent spaces.
Tasks	Representation Learning
Published	2019-12-20
URL	https://arxiv.org/abs/1912.10094v1
PDF	https://arxiv.org/pdf/1912.10094v1.pdf
PWC	https://paperswithcode.com/paper/chart-auto-encoders-for-manifold-structured-1
Repo
Framework

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms


Title	A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
Authors	Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, Christopher Pal
Abstract	We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional changes, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to sparse expected gradients and a lower effective number of degrees of freedom needing to be relearned while adapting to the change. It motivates using the speed of adaptation to a modified distribution as a meta-learning objective. We demonstrate how this can be used to determine the cause-effect relationship between two observed variables. The distributional changes do not need to correspond to standard interventions (clamping a variable), and the learner has no direct knowledge of these interventions. We show that causal structures can be parameterized via continuous variables and learned end-to-end. We then explore how these ideas could be used to also learn an encoder that would map low-level observed variables to unobserved causal variables leading to faster adaptation out-of-distribution, learning a representation space where one can satisfy the assumptions of independent mechanisms and of small and sparse changes in these mechanisms due to actions and non-stationarities.
Tasks	Meta-Learning
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10912v2
PDF	http://arxiv.org/pdf/1901.10912v2.pdf
PWC	https://paperswithcode.com/paper/a-meta-transfer-objective-for-learning-to
Repo
Framework

Representation Learning of Music Using Artist, Album, and Track Information


Title	Representation Learning of Music Using Artist, Album, and Track Information
Authors	Jongpil Lee, Jiyoung Park, Juhan Nam
Abstract	Supervised music representation learning has been performed mainly using semantic labels such as music genres. However, annotating music with semantic labels requires time and cost. In this work, we investigate the use of factual metadata such as artist, album, and track information, which are naturally annotated to songs, for supervised music representation learning. The results show that each of the metadata has individual concept characteristics, and using them jointly improves overall performance.
Tasks	Representation Learning
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11783v1
PDF	https://arxiv.org/pdf/1906.11783v1.pdf
PWC	https://paperswithcode.com/paper/representation-learning-of-music-using-artist-1
Repo
Framework

Validation of image-guided cochlear implant programming techniques


Title	Validation of image-guided cochlear implant programming techniques
Authors	Yiyuan Zhao, Jianing Wang, Rui Li, Robert F. Labadie, Benoit M. Dawant, Jack H. Noble
Abstract	Cochlear implants (CIs) are a standard treatment for patients who experience severe to profound hearing loss. Recent studies have shown that hearing outcome is correlated with intra-cochlear anatomy and electrode placement. Our group has developed image-guided CI programming (IGCIP) techniques that use image analysis methods to both segment the inner ear structures in pre- or post-implantation CT images and localize the CI electrodes in post-implantation CT images. This permits to assist audiologists with CI programming by suggesting which among the contacts should be deactivated to reduce electrode interaction that is known to affect outcomes. Clinical studies have shown that IGCIP can improve hearing outcomes for CI recipients. However, the sensitivity of IGCIP with respect to the accuracy of the two major steps: electrode localization and intra-cochlear anatomy segmentation, is unknown. In this article, we create a ground truth dataset with conventional CT and micro-CT images of 35 temporal bone specimens to both rigorously characterize the accuracy of these two steps and assess how inaccuracies in these steps affect the overall results. Our study results show that when clinical pre- and post-implantation CTs are available, IGCIP produces results that are comparable to those obtained with the corresponding ground truth in 86.7% of the subjects tested. When only post-implantation CTs are available, this number is 83.3%. These results suggest that our current method is robust to errors in segmentation and localization but also that it can be improved upon. Keywords: cochlear implant, ground truth, segmentation, validation
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10137v1
PDF	https://arxiv.org/pdf/1909.10137v1.pdf
PWC	https://paperswithcode.com/paper/190910137
Repo
Framework

Emergent properties of the local geometry of neural loss landscapes


Title	Emergent properties of the local geometry of neural loss landscapes
Authors	Stanislav Fort, Surya Ganguli
Abstract	The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training. Indeed recent works have observed 4 striking local properties of neural loss landscapes on classification tasks: (1) the landscape exhibits exactly $C$ directions of high positive curvature, where $C$ is the number of classes; (2) gradient directions are largely confined to this extremely low dimensional subspace of positive Hessian curvature, leaving the vast majority of directions in weight space unexplored; (3) gradient descent transiently explores intermediate regions of higher positive curvature before eventually finding flatter minima; (4) training can be successful even when confined to low dimensional {\it random} affine hyperplanes, as long as these hyperplanes intersect a Goldilocks zone of higher than average curvature. We develop a simple theoretical model of gradients and Hessians, justified by numerical experiments on architectures and datasets used in practice, that {\it simultaneously} accounts for all $4$ of these surprising and seemingly unrelated properties. Our unified model provides conceptual insights into the emergence of these properties and makes connections with diverse topics in neural networks, random matrix theory, and spin glasses, including the neural tangent kernel, BBP phase transitions, and Derrida’s random energy model.
Tasks
Published	2019-10-14
URL	https://arxiv.org/abs/1910.05929v1
PDF	https://arxiv.org/pdf/1910.05929v1.pdf
PWC	https://paperswithcode.com/paper/emergent-properties-of-the-local-geometry-of
Repo
Framework

Constraining the Parameters of High-Dimensional Models with Active Learning


Title	Constraining the Parameters of High-Dimensional Models with Active Learning
Authors	Sascha Caron, Tom Heskes, Sydney Otten, Bob Stienen
Abstract	Constraining the parameters of physical models with $>5-10$ parameters is a widespread problem in fields like particle physics and astronomy. The generation of data to explore this parameter space often requires large amounts of computational resources. The commonly used solution of reducing the number of relevant physical parameters hampers the generality of the results. In this paper we show that this problem can be alleviated by the use of active learning. We illustrate this with examples from high energy physics, a field where simulations are often expensive and parameter spaces are high-dimensional. We show that the active learning techniques query-by-committee and query-by-dropout-committee allow for the identification of model points in interesting regions of high-dimensional parameter spaces (e.g. around decision boundaries). This makes it possible to constrain model parameters more efficiently than is currently done with the most common sampling algorithms and to train better performing machine learning models on the same amount of data. Code implementing the experiments in this paper can be found on GitHub.
Tasks	Active Learning
Published	2019-05-19
URL	https://arxiv.org/abs/1905.08628v2
PDF	https://arxiv.org/pdf/1905.08628v2.pdf
PWC	https://paperswithcode.com/paper/constraining-the-parameters-of-high
Repo
Framework

VIANA: Visual Interactive Annotation of Argumentation


Title	VIANA: Visual Interactive Annotation of Argumentation
Authors	Fabian Sperrle, Rita Sevastjanova, Rebecca Kehlbeck, Mennatallah El-Assady
Abstract	Argumentation Mining addresses the challenging tasks of identifying boundaries of argumentative text fragments and extracting their relationships. Fully automated solutions do not reach satisfactory accuracy due to their insufficient incorporation of semantics and domain knowledge. Therefore, experts currently rely on time-consuming manual annotations. In this paper, we present a visual analytics system that augments the manual annotation process by automatically suggesting which text fragments to annotate next. The accuracy of those suggestions is improved over time by incorporating linguistic knowledge and language modeling to learn a measure of argument similarity from user interactions. Based on a long-term collaboration with domain experts, we identify and model five high-level analysis tasks. We enable close reading and note-taking, annotation of arguments, argument reconstruction, extraction of argument relations, and exploration of argument graphs. To avoid context switches, we transition between all views through seamless morphing, visually anchoring all text- and graph-based layers. We evaluate our system with a two-stage expert user study based on a corpus of presidential debates. The results show that experts prefer our system over existing solutions due to the speedup provided by the automatic suggestions and the tight integration between text and graph views.
Tasks	Language Modelling
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12413v1
PDF	https://arxiv.org/pdf/1907.12413v1.pdf
PWC	https://paperswithcode.com/paper/viana-visual-interactive-annotation-of
Repo
Framework

You can’t see what you can’t see: Experimental evidence for how much relevant information may be missed due to Google’s Web search personalisation


Title	You can’t see what you can’t see: Experimental evidence for how much relevant information may be missed due to Google’s Web search personalisation
Authors	Cameron Lai, Markus Luczak-Roesch
Abstract	The influence of Web search personalisation on professional knowledge work is an understudied area. Here we investigate how public sector officials self-assess their dependency on the Google Web search engine, whether they are aware of the potential impact of algorithmic biases on their ability to retrieve all relevant information, and how much relevant information may actually be missed due to Web search personalisation. We find that the majority of participants in our experimental study are neither aware that there is a potential problem nor do they have a strategy to mitigate the risk of missing relevant information when performing online searches. Most significantly, we provide empirical evidence that up to 20% of relevant information may be missed due to Web search personalisation. This work has significant implications for Web research by public sector professionals, who should be provided with training about the potential algorithmic biases that may affect their judgments and decision making, as well as clear guidelines how to minimise the risk of missing relevant information.
Tasks	Decision Making
Published	2019-04-30
URL	https://arxiv.org/abs/1904.13022v2
PDF	https://arxiv.org/pdf/1904.13022v2.pdf
PWC	https://paperswithcode.com/paper/you-cant-see-what-you-cant-see-experimental
Repo
Framework

Neural Voice Puppetry: Audio-driven Facial Reenactment


Title	Neural Voice Puppetry: Audio-driven Facial Reenactment
Authors	Justus Thies, Mohamed Elgharib, Ayush Tewari, Christian Theobalt, Matthias Nießner
Abstract	We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis. Given an audio sequence of a source person or digital assistant, we generate a photo-realistic output video of a target person that is in sync with the audio of the source input. This audio-driven facial reenactment is driven by a deep neural network that employs a latent 3D face model space. Through the underlying 3D representation, the model inherently learns temporal stability while we leverage neural rendering to generate photo-realistic output frames. Our approach generalizes across different people, allowing us to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches. Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head. We demonstrate the capabilities of our method in a series of audio- and text-based puppetry examples. Our method is not only more general than existing works since we are generic to the input person, but we also show superior visual and lip sync quality compared to photo-realistic audio- and video-driven reenactment techniques.
Tasks
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05566v1
PDF	https://arxiv.org/pdf/1912.05566v1.pdf
PWC	https://paperswithcode.com/paper/neural-voice-puppetry-audio-driven-facial
Repo
Framework

Face Recognition using Compressive Sensing


Title	Face Recognition using Compressive Sensing
Authors	Slavko Kovacevic, Vuko Djaletic, Jelena Vukovic
Abstract	This paper deals with the Compressive Sensing implementation in the Face Recognition problem. Compressive Sensing is new approach in signal processing with a single goal to recover signal from small set of available samples. Compressive Sensing finds its usage in many real applications as it lowers the memory demand and acquisition time, and therefore allows dealing with huge data in the fastest manner. In this paper, the undersampled signal is recovered using the algorithm based on Total Variation minimization. The theory is verified with an experimental results using different percentage of signal samples.
Tasks	Compressive Sensing, Face Recognition
Published	2019-02-06
URL	http://arxiv.org/abs/1902.05388v1
PDF	http://arxiv.org/pdf/1902.05388v1.pdf
PWC	https://paperswithcode.com/paper/face-recognition-using-compressive-sensing
Repo
Framework

Efficient feature embedding of 3D brain MRI images for content-based image retrieval with deep metric learning


Title	Efficient feature embedding of 3D brain MRI images for content-based image retrieval with deep metric learning
Authors	Yuto Onga, Shingo Fujiyama, Hayato Arai, Yusuke Chayama, Hitoshi Iyatomi, Kenichi Oishi
Abstract	Increasing numbers of MRI brain scans, improvements in image resolution, and advancements in MRI acquisition technology are causing significant increases in the demand for and burden on radiologists’ efforts in terms of reading and interpreting brain MRIs. Content-based image retrieval (CBIR) is an emerging technology for reducing this burden by supporting the reading of medical images. High dimensionality is a major challenge in developing a CBIR system that is applicable for 3D brain MRIs. In this study, we propose a system called disease-oriented data concentration with metric learning (DDCML). In DDCML, we introduce deep metric learning to a 3D convolutional autoencoder (CAE). Our proposed DDCML scheme achieves a high dimensional compression rate (4096:1) while preserving the disease-related anatomical features that are important for medical image classification. The low-dimensional representation obtained by DDCML improved the clustering performance by 29.1% compared to plain 3D-CAE in terms of discriminating Alzheimer’s disease patients from healthy subjects, and successfully reproduced the relationships of the severity of disease categories that were not included in the training.
Tasks	Content-Based Image Retrieval, Image Classification, Image Retrieval, Metric Learning
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01824v1
PDF	https://arxiv.org/pdf/1912.01824v1.pdf
PWC	https://paperswithcode.com/paper/efficient-feature-embedding-of-3d-brain-mri
Repo
Framework

Content-based image retrieval speedup


Title	Content-based image retrieval speedup
Authors	Sadegh Fadaei, Abdolreza Rashno, Elyas Rashno
Abstract	Content-based image retrieval (CBIR) is a task of retrieving images from their contents. Since retrieval process is a time-consuming task in large image databases, acceleration methods can be very useful. This paper presents a novel method to speed up CBIR systems. In the proposed method, first Zernike moments are extracted from query image and an interval is calculated for that query. Images in database which are out of the interval are ignored in retrieval process. Therefore, a database reduction occurs before retrieval which leads to speed up. It is shown that in reduced database, relevant images to query image are preserved and irrelevant images are throwed away. Therefore, the proposed method speed up retrieval process and preserve CBIR accuracy, simultaneously.
Tasks	Content-Based Image Retrieval, Image Retrieval
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11379v2
PDF	https://arxiv.org/pdf/1911.11379v2.pdf
PWC	https://paperswithcode.com/paper/content-based-image-retrieval-speedup
Repo
Framework

Crawler for Image Acquisition from World Wide Web


Title	Crawler for Image Acquisition from World Wide Web
Authors	R Rajkumar, Dr. M V Sudhamani
Abstract	Due to the advancement in computer communication and storage technologies, large amount of image data is available on World Wide Web (WWW). In order to locate a particular set of images the available search engines may be used with the help of keywords. Here, the filtering of unwanted data is not done. For the purpose of retrieving relevant images with appropriate keyword(s) an image crawler is designed and implemented. Here, keyword(s) are submitted as query and with the help of sender engine, images are downloaded along with metadata like URL, filename, file size, file access date and time etc.,. Later, with the help of URL, images already present in repository and newly downloaded are compared for uniqueness. Only unique URLs are in turn considered and stored in repository. The images in the repository are used to build novel Content Based Image Retrieval (CBIR) system in future. This repository may be used for various purposes. This image crawler tool is useful in building image datasets which can be used by any CBIR system for training and testing purposes.
Tasks	Content-Based Image Retrieval, Image Retrieval
Published	2019-11-11
URL	https://arxiv.org/abs/1911.11066v1
PDF	https://arxiv.org/pdf/1911.11066v1.pdf
PWC	https://paperswithcode.com/paper/crawler-for-image-acquisition-from-world-wide
Repo
Framework