January 28, 2020

3112 words 15 mins read

Paper Group ANR 825

Query by Semantic Sketch. Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA. On measuring the iconicity of a face. Do Compressed Representations Generalize Better?. DeepC-MVS: Deep Confidence Prediction for Multi-View Stereo Reconstruction. Quantifying and Correlating Rhythm Formants in Speech. Adaptive, Distri …

Query by Semantic Sketch


Title	Query by Semantic Sketch
Authors	Luca Rossetto, Ralph Gasser, Heiko Schuldt
Abstract	Sketch-based query formulation is very common in image and video retrieval as these techniques often complement textual retrieval methods that are based on either manual or machine generated annotations. In this paper, we present a retrieval approach that allows to query visual media collections by sketching concept maps, thereby merging sketch-based retrieval with the search for semantic labels. Users can draw a spatial distribution of different concept labels, such as “sky”, “sea” or “person” and then use these sketches to find images or video scenes that exhibit a similar distribution of these concepts. Hence, this approach does not only take the semantic concepts themselves into account, but also their semantic relations as well as their spatial context. The efficient vector representation enables efficient retrieval even in large multimedia collections. We have integrated the semantic sketch query mode into our retrieval engine vitrivr and demonstrated its effectiveness.
Tasks	Video Retrieval
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12526v1
PDF	https://arxiv.org/pdf/1909.12526v1.pdf
PWC	https://paperswithcode.com/paper/query-by-semantic-sketch
Repo
Framework


Title	Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA
Authors	Donghuo Zeng, Yi Yu, Keizo Oyama
Abstract	Deep learning has successfully shown excellent performance in learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities, such as audio and video, should be taken into account. Music video retrieval by given musical audio is a natural way to search and interact with music contents. In this work, we study cross-modal music video retrieval in terms of emotion similarity. Particularly, audio of an arbitrary length is used to retrieve a longer or full-length music video. To this end, we propose a novel audio-visual embedding algorithm by Supervised Deep CanonicalCorrelation Analysis (S-DCCA) that projects audio and video into a shared space to bridge the semantic gap between audio and video. This also preserves the similarity between audio and visual contents from different videos with the same class label and the temporal structure. The contribution of our approach is mainly manifested in the two aspects: i) We propose to select top k audio chunks by attention-based Long Short-Term Memory (LSTM)model, which can represent good audio summarization with local properties. ii) We propose an end-to-end deep model for cross-modal audio-visual learning where S-DCCA is trained to learn the semantic correlation between audio and visual modalities. Due to the lack of music video dataset, we construct 10K music video dataset from YouTube 8M dataset. Some promising results such as MAP and precision-recall show that our proposed model can be applied to music video retrieval.
Tasks	Video Retrieval
Published	2019-08-10
URL	https://arxiv.org/abs/1908.03744v1
PDF	https://arxiv.org/pdf/1908.03744v1.pdf
PWC	https://paperswithcode.com/paper/audio-visual-embedding-for-cross-modal
Repo
Framework

On measuring the iconicity of a face


Title	On measuring the iconicity of a face
Authors	Prithviraj Dhar, Carlos D. Castillo, Rama Chellappa
Abstract	For a given identity in a face dataset, there are certain iconic images which are more representative of the subject than others. In this paper, we explore the problem of computing the iconicity of a face. The premise of the proposed approach is as follows: For an identity containing a mixture of iconic and non iconic images, if a given face cannot be successfully matched with any other face of the same identity, then the iconicity of the face image is low. Using this information, we train a Siamese Multi-Layer Perceptron network, such that each of its twins predict iconicity scores of the image feature pair, fed in as input. We observe the variation of the obtained scores with respect to covariates such as blur, yaw, pitch, roll and occlusion to demonstrate that they effectively predict the quality of the image and compare it with other existing metrics. Furthermore, we use these scores to weight features for template-based face verification and compare it with media averaging of features.
Tasks	Face Verification
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01581v1
PDF	http://arxiv.org/pdf/1903.01581v1.pdf
PWC	https://paperswithcode.com/paper/on-measuring-the-iconicity-of-a-face
Repo
Framework

Do Compressed Representations Generalize Better?


Title	Do Compressed Representations Generalize Better?
Authors	Hassan Hafez-Kolahi, Shohreh Kasaei, Mahdiyeh Soleymani-Baghshah
Abstract	One of the most studied problems in machine learning is finding reasonable constraints that guarantee the generalization of a learning algorithm. These constraints are usually expressed as some simplicity assumptions on the target. For instance, in the Vapnik-Chervonenkis (VC) theory the space of possible hypotheses is considered to have a limited VC dimension. In this paper, the constraint on the entropy $H(X)$ of the input variable $X$ is studied as a simplicity assumption. It is proven that the sample complexity to achieve an $\epsilon$-$\delta$ Probably Approximately Correct (PAC) hypothesis is bounded by $\frac{2^{ \left.6H(X)\middle/\epsilon\right.}+\log{\frac{1}{\delta}}}{\epsilon^2}$ which is sharp up to the $\frac{1}{\epsilon^2}$ factor. Morever, it is shown that if a feature learning process is employed to learn the compressed representation from the dataset, this bound no longer exists. These findings have important implications on the Information Bottleneck (IB) theory which had been utilized to explain the generalization power of Deep Neural Networks (DNNs), but its applicability for this purpose is currently under debate by researchers. In particular, this is a rigorous proof for the previous heuristic that compressed representations are exponentially easier to be learned. However, our analysis pinpoints two factors preventing the IB, in its current form, to be applicable in studying neural networks. Firstly, the exponential dependence of sample complexity on $\frac{1}{\epsilon}$, which can lead to a dramatic effect on the bounds in practical applications when $\epsilon$ is small. Secondly, our analysis reveals that arguments based on input compression are inherently insufficient to explain generalization of methods like DNNs in which the features are also learned using available data.
Tasks
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09706v2
PDF	https://arxiv.org/pdf/1909.09706v2.pdf
PWC	https://paperswithcode.com/paper/190909706
Repo
Framework

DeepC-MVS: Deep Confidence Prediction for Multi-View Stereo Reconstruction


Title	DeepC-MVS: Deep Confidence Prediction for Multi-View Stereo Reconstruction
Authors	Andreas Kuhn, Christian Sormann, Mattia Rossi, Oliver Erdler, Friedrich Fraundorfer
Abstract	Deep Neural Networks (DNNs) have the potential to improve the quality of image-based 3D reconstructions. A challenge which still remains is to utilize the potential of DNNs to improve 3D reconstructions from high-resolution image datasets as available by the ETH3D benchmark. In this paper, we propose a way to employ DNNs in the image domain to gain a significant quality improvement of geometric image based 3D reconstruction. This is achieved by utilizing confidence prediction networks which have been adapted to the Multi-View Stereo (MVS) case and are trained on automatically generated ground truth established by geometric error propagation. In addition to a semi-dense real-world ground truth dataset for training the DNN, we present a synthetic dataset to enlarge the training dataset. We demonstrate the utility of the confidence predictions for two essential steps within a 3D reconstruction pipeline: Firstly, to be used for outlier clustering and filtering and secondly to be used within a depth refinement step. The presented 3D reconstruction pipeline DeepC-MVS makes use of deep learning for an essential part in MVS from high-resolution images and the experimental evaluation on popular benchmarks demonstrates the achieved state-of-the-art quality in 3D reconstruction.
Tasks	3D Reconstruction
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00439v2
PDF	https://arxiv.org/pdf/1912.00439v2.pdf
PWC	https://paperswithcode.com/paper/deepc-mvs-deep-confidence-prediction-for
Repo
Framework

Quantifying and Correlating Rhythm Formants in Speech


Title	Quantifying and Correlating Rhythm Formants in Speech
Authors	Dafydd Gibbon, Peng Li
Abstract	The objective of the present study is exploratory: to introduce and apply a new theory of speech rhythm zones or rhythm formants (R-formants). R-formants are zones of high magnitude frequencies in the low frequency (LF) long-term spectrum (LTS), rather like formants in the short-term spectra of vowels and consonants. After an illustration of the method, an R-formant analysis is made of non-elicited extracts from public speeches. The LF-LTS of three domains, the amplitude modulated (AM) absolute (rectified) signal, the amplitude envelope modulation (AEM) and frequency modulation (FM, F0, ‘pitch’) of the signal are compared. The first two correlate well, but the third does not correlate consistently with the other two, presumably due to variability of tone, pitch accent and intonation. Consequently, only the LF LTS of the absolute speech signal is used in the empirical analysis. An informal discussion of the relation between R-formant patterns and utterance structure and a selection of pragmatic variables over the same utterances showed some trends for R-formant functionality and thus useful directions for future research.
Tasks
Published	2019-09-03
URL	https://arxiv.org/abs/1909.05639v1
PDF	https://arxiv.org/pdf/1909.05639v1.pdf
PWC	https://paperswithcode.com/paper/quantifying-and-correlating-rhythm-formants
Repo
Framework

Adaptive, Distribution-Free Prediction Intervals for Deep Networks


Title	Adaptive, Distribution-Free Prediction Intervals for Deep Networks
Authors	Danijel Kivaranovic, Kory D. Johnson, Hannes Leeb
Abstract	The machine learning literature contains several constructions for prediction intervals that are intuitively reasonable but ultimately ad-hoc in that they do not come with provable performance guarantees. We present methods from the statistics literature that can be used efficiently with neural networks under minimal assumptions with guaranteed performance. We propose a neural network that outputs three values instead of a single point estimate and optimizes a loss function motivated by the standard quantile regression loss. We provide two prediction interval methods with finite sample coverage guarantees solely under the assumption that the observations are independent and identically distributed. The first method leverages the conformal inference framework and provides average coverage. The second method provides a new, stronger guarantee by conditioning on the observed data. Lastly, our loss function does not compromise the predictive accuracy of the network like other prediction interval methods. We demonstrate the ease of use of our procedures as well as its improvements over other methods on both simulated and real data. As most deep networks can easily be modified by our method to output predictions with valid prediction intervals, its use should become standard practice, much like reporting standard errors along with mean estimates.
Tasks
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10634v2
PDF	https://arxiv.org/pdf/1905.10634v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-distribution-free-prediction
Repo
Framework

BAFFLE : Blockchain based Aggregator Free Federated Learning


Title	BAFFLE : Blockchain based Aggregator Free Federated Learning
Authors	Paritosh Ramanan, Kiyoshi Nakayama, Ratnesh Sharma
Abstract	A key aspect of Federated Learning (FL) is the requirement of a centralized aggregator to select and integrate models from various user devices. However, infeasibility of an aggregator due to a variety of operational constraints could prevent FL from being widely adopted. In this paper, we introduce BAFFLE, an aggregator free FL environment. Being powered by the blockchain, BAFFLE is inherently decentralized and successfully eliminates the constraints associated with an aggregator based FL framework. Our results indicate that BAFFLE provides superior performance while circumventing critical computational bottlenecks associated with the blockchain.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07452v1
PDF	https://arxiv.org/pdf/1909.07452v1.pdf
PWC	https://paperswithcode.com/paper/baffle-blockchain-based-aggregator-free
Repo
Framework

A Pre-defined Sparse Kernel Based Convolution for Deep CNNs


Title	A Pre-defined Sparse Kernel Based Convolution for Deep CNNs
Authors	Souvik Kundu, Saurav Prakash, Haleh Akrami, Peter A. Beerel, Keith M. Chugg
Abstract	The high demand for computational and storage resources severely impede the deployment of deep convolutional neural networks (CNNs) in limited-resource devices. Recent CNN architectures have proposed reduced complexity versions (e.g. SuffleNet and MobileNet) but at the cost of modest decreases inaccuracy. This paper proposes pSConv, a pre-defined sparse 2D kernel-based convolution, which promises significant improvements in the trade-off between complexity and accuracy for both CNN training and inference. To explore the potential of this approach, we have experimented with two widely accepted datasets, CIFAR-10 and Tiny ImageNet, in sparse variants of both the ResNet18 and VGG16 architectures. Our approach shows a parameter count reduction of up to 4.24x with modest degradation in classification accuracy relative to that of standard CNNs. Our approach outperforms a popular variant of ShuffleNet using a variant of ResNet18 with pSConv having 3x3 kernels with only four of nine elements not fixed at zero. In particular, the parameter count is reduced by 1.7x for CIFAR-10 and 2.29x for Tiny ImageNet with an increased accuracy of ~4%.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00724v2
PDF	https://arxiv.org/pdf/1910.00724v2.pdf
PWC	https://paperswithcode.com/paper/a-pre-defined-sparse-kernel-based
Repo
Framework

The Level Weighted Structural Similarity Loss: A Step Away from the MSE


Title	The Level Weighted Structural Similarity Loss: A Step Away from the MSE
Authors	Yingjing Lu
Abstract	The Mean Square Error (MSE) has shown its strength when applied in deep generative models such as Auto-Encoders to model reconstruction loss. However, in image domain especially, the limitation of MSE is obvious: it assumes pixel independence and ignores spatial relationships of samples. This contradicts most architectures of Auto-Encoders which use convolutional layers to extract spatial dependent features. We base on the structural similarity metric (SSIM) and propose a novel level weighted structural similarity (LWSSIM) loss for convolutional Auto-Encoders. Experiments on common datasets on various Auto-Encoder variants show that our loss is able to outperform the MSE loss and the Vanilla SSIM loss. We also provide reasons why our model is able to succeed in cases where the standard SSIM loss fails.
Tasks
Published	2019-04-30
URL	http://arxiv.org/abs/1904.13362v1
PDF	http://arxiv.org/pdf/1904.13362v1.pdf
PWC	https://paperswithcode.com/paper/the-level-weighted-structural-similarity-loss
Repo
Framework

Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach


Title	Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach
Authors	Nan Lu, Tianyi Zhang, Gang Niu, Masashi Sugiyama
Abstract	The recently proposed unlabeled-unlabeled (UU) classification method allows us to train a binary classifier only from two unlabeled datasets with different class priors. Since this method is based on the empirical risk minimization, it works as if it is a supervised classification method, compatible with any model and optimizer. However, this method sometimes suffers from severe overfitting, which we would like to prevent in this paper. Our empirical finding in applying the original UU method is that overfitting often co-occurs with the empirical risk going negative, which is not legitimate. Therefore, we propose to wrap the terms that cause a negative empirical risk by certain correction functions. Then, we prove the consistency of the corrected risk estimator and derive an estimation error bound for the corrected risk minimizer. Experiments show that our proposal can successfully mitigate overfitting of the UU method and significantly improve the classification accuracy.
Tasks
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08974v4
PDF	https://arxiv.org/pdf/1910.08974v4.pdf
PWC	https://paperswithcode.com/paper/mitigating-overfitting-in-supervised
Repo
Framework

Stability of Graph Neural Networks to Relative Perturbations


Title	Stability of Graph Neural Networks to Relative Perturbations
Authors	Fernando Gama, Joan Bruna, Alejandro Ribeiro
Abstract	Graph neural networks (GNNs), consisting of a cascade of layers applying a graph convolution followed by a pointwise nonlinearity, have become a powerful architecture to process signals supported on graphs. Graph convolutions (and thus, GNNs), rely heavily on knowledge of the graph for operation. However, in many practical cases the GSO is not known and needs to be estimated, or might change from training time to testing time. In this paper, we are set to study the effect that a change in the underlying graph topology that supports the signal has on the output of a GNN. We prove that graph convolutions with integral Lipschitz filters lead to GNNs whose output change is bounded by the size of the relative change in the topology. Furthermore, we leverage this result to show that the main reason for the success of GNNs is that they are stable architectures capable of discriminating features on high eigenvalues, which is a feat that cannot be achieved by linear graph filters (which are either stable or discriminative, but cannot be both). Finally, we comment on the use of this result to train GNNs with increased stability and run experiments on movie recommendation systems.
Tasks	Recommendation Systems
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09655v1
PDF	https://arxiv.org/pdf/1910.09655v1.pdf
PWC	https://paperswithcode.com/paper/stability-of-graph-neural-networks-to
Repo
Framework

DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis


Title	DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis
Authors	Sangkug Lym, Donghyuk Lee, Mike O’Connor, Niladrish Chatterjee, Mattan Erez
Abstract	Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves when computing and memory resources are increased. We present DeLTA, the first analytical model that accurately estimates the traffic at each GPU memory hierarchy level, while accounting for the complex reuse patterns of a parallel convolution algorithm. We demonstrate that our model is both accurate and robust for different CNNs and GPU architectures. We then show how this model can be used to carefully balance the scaling of different GPU resources for efficient CNN performance improvement.
Tasks
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01691v1
PDF	http://arxiv.org/pdf/1904.01691v1.pdf
PWC	https://paperswithcode.com/paper/delta-gpu-performance-model-for-deep-learning
Repo
Framework

SDGM: Sparse Bayesian Classifier Based on a Discriminative Gaussian Mixture Model


Title	SDGM: Sparse Bayesian Classifier Based on a Discriminative Gaussian Mixture Model
Authors	Hideaki Hayashi, Seiichi Uchida
Abstract	In probabilistic classification, a discriminative model based on Gaussian mixture exhibits flexible fitting capability. Nevertheless, it is difficult to determine the number of components. We propose a sparse classifier based on a discriminative Gaussian mixture model (GMM), which is named sparse discriminative Gaussian mixture (SDGM). In the SDGM, a GMM-based discriminative model is trained by sparse Bayesian learning. This learning algorithm improves the generalization capability by obtaining a sparse solution and automatically determines the number of components by removing redundant components. The SDGM can be embedded into neural networks (NNs) such as convolutional NNs and can be trained in an end-to-end manner. Experimental results indicated that the proposed method prevented overfitting by obtaining sparsity. Furthermore, we demonstrated that the proposed method outperformed a fully connected layer with the softmax function in certain cases when it was used as the last layer of a deep NN.
Tasks
Published	2019-11-14
URL	https://arxiv.org/abs/1911.06028v1
PDF	https://arxiv.org/pdf/1911.06028v1.pdf
PWC	https://paperswithcode.com/paper/sdgm-sparse-bayesian-classifier-based-on-a
Repo
Framework

Towards Quantification of Bias in Machine Learning for Healthcare: A Case Study of Renal Failure Prediction


Title	Towards Quantification of Bias in Machine Learning for Healthcare: A Case Study of Renal Failure Prediction
Authors	Josie Williams, Narges Razavian
Abstract	As machine learning (ML) models, trained on real-world datasets, become common practice, it is critical to measure and quantify their potential biases. In this paper, we focus on renal failure and compare a commonly used traditional risk score, Tangri, with a more powerful machine learning model, which has access to a larger variable set and trained on 1.6 million patients’ EHR data. We will compare and discuss the generalization and applicability of these two models, in an attempt to quantify biases of status quo clinical practice, compared to ML-driven models.
Tasks
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07679v1
PDF	https://arxiv.org/pdf/1911.07679v1.pdf
PWC	https://paperswithcode.com/paper/towards-quantification-of-bias-in-machine
Repo
Framework