July 29, 2019

3134 words 15 mins read

Paper Group ANR 141

A Sampling Theory Perspective of Graph-based Semi-supervised Learning. Implicitly Incorporating Morphological Information into Word Embedding. Visually grounded learning of keyword prediction from untranscribed speech. Mesh-based 3D Textured Urban Mapping. Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning. Classificati …

A Sampling Theory Perspective of Graph-based Semi-supervised Learning


Title	A Sampling Theory Perspective of Graph-based Semi-supervised Learning
Authors	Aamir Anis, Aly El Gamal, Salman Avestimehr, Antonio Ortega
Abstract	Graph-based methods have been quite successful in solving unsupervised and semi-supervised learning problems, as they provide a means to capture the underlying geometry of the dataset. It is often desirable for the constructed graph to satisfy two properties: first, data points that are similar in the feature space should be strongly connected on the graph, and second, the class label information should vary smoothly with respect to the graph, where smoothness is measured using the spectral properties of the graph Laplacian matrix. Recent works have justified some of these smoothness conditions by showing that they are strongly linked to the semi-supervised smoothness assumption and its variants. In this work, we reinforce this connection by viewing the problem from a graph sampling theoretic perspective, where class indicator functions are treated as bandlimited graph signals (in the eigenvector basis of the graph Laplacian) and label prediction as a bandlimited reconstruction problem. Our approach involves analyzing the bandwidth of class indicator signals generated from statistical data models with separable and nonseparable classes. These models are quite general and mimic the nature of most real-world datasets. Our results show that in the asymptotic limit, the bandwidth of any class indicator is also closely related to the geometry of the dataset. This allows one to theoretically justify the assumption of bandlimitedness of class indicator signals, thereby providing a sampling theoretic interpretation of graph-based semi-supervised classification.
Tasks
Published	2017-05-26
URL	http://arxiv.org/abs/1705.09518v2
PDF	http://arxiv.org/pdf/1705.09518v2.pdf
PWC	https://paperswithcode.com/paper/a-sampling-theory-perspective-of-graph-based
Repo
Framework

Implicitly Incorporating Morphological Information into Word Embedding


Title	Implicitly Incorporating Morphological Information into Word Embedding
Authors	Yang Xu, Jiawei Liu
Abstract	In this paper, we propose three novel models to enhance word embedding by implicitly using morphological information. Experiments on word similarity and syntactic analogy show that the implicit models are superior to traditional explicit ones. Our models outperform all state-of-the-art baselines and significantly improve the performance on both tasks. Moreover, our performance on the smallest corpus is similar to the performance of CBOW on the corpus which is five times the size of ours. Parameter analysis indicates that the implicit models can supplement semantic information during the word embedding training process.
Tasks
Published	2017-01-10
URL	http://arxiv.org/abs/1701.02481v3
PDF	http://arxiv.org/pdf/1701.02481v3.pdf
PWC	https://paperswithcode.com/paper/implicitly-incorporating-morphological
Repo
Framework

Visually grounded learning of keyword prediction from untranscribed speech


Title	Visually grounded learning of keyword prediction from untranscribed speech
Authors	Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
Abstract	During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful common space, allowing images to be retrieved using speech and vice versa. In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech. Concretely, we use an image-to-words multi-label visual classifier to tag images with soft textual labels, and then train a neural network to map from the speech to these soft targets. We show that the resulting speech system is able to predict which words occur in an utterance—acting as a spoken bag-of-words classifier—without seeing any parallel speech and text. We find that the model often confuses semantically related words, e.g. “man” and “person”, making it even more effective as a semantic keyword spotter.
Tasks	Language Acquisition
Published	2017-03-23
URL	http://arxiv.org/abs/1703.08136v2
PDF	http://arxiv.org/pdf/1703.08136v2.pdf
PWC	https://paperswithcode.com/paper/visually-grounded-learning-of-keyword
Repo
Framework

Mesh-based 3D Textured Urban Mapping


Title	Mesh-based 3D Textured Urban Mapping
Authors	Andrea Romanoni, Daniele Fiorenti, Matteo Matteucci
Abstract	In the era of autonomous driving, urban mapping represents a core step to let vehicles interact with the urban context. Successful mapping algorithms have been proposed in the last decade building the map leveraging on data from a single sensor. The focus of the system presented in this paper is twofold: the joint estimation of a 3D map from lidar data and images, based on a 3D mesh, and its texturing. Indeed, even if most surveying vehicles for mapping are endowed by cameras and lidar, existing mapping algorithms usually rely on either images or lidar data; moreover both image-based and lidar-based systems often represent the map as a point cloud, while a continuous textured mesh representation would be useful for visualization and navigation purposes. In the proposed framework, we join the accuracy of the 3D lidar data, and the dense information and appearance carried by the images, in estimating a visibility consistent map upon the lidar measurements, and refining it photometrically through the acquired images. We evaluate the proposed framework against the KITTI dataset and we show the performance improvement with respect to two state of the art urban mapping algorithms, and two widely used surface reconstruction algorithms in Computer Graphics.
Tasks	Autonomous Driving
Published	2017-08-18
URL	http://arxiv.org/abs/1708.05543v1
PDF	http://arxiv.org/pdf/1708.05543v1.pdf
PWC	https://paperswithcode.com/paper/mesh-based-3d-textured-urban-mapping
Repo
Framework

Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning


Title	Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
Authors	Briland Hitaj, Giuseppe Ateniese, Fernando Perez-Cruz
Abstract	Deep Learning has recently become hugely popular in machine learning, providing significant improvements in classification accuracy in the presence of highly-structured and large databases. Researchers have also considered privacy implications of deep learning. Models are typically trained in a centralized manner with all the data being processed by the same training algorithm. If the data is a collection of users’ private data, including habits, personal pictures, geographical positions, interests, and more, the centralized server will have access to sensitive information that could potentially be mishandled. To tackle this problem, collaborative deep learning models have recently been proposed where parties locally train their deep learning structures and only share a subset of the parameters in the attempt to keep their respective training sets private. Parameters can also be obfuscated via differential privacy (DP) to make information extraction even more challenging, as proposed by Shokri and Shmatikov at CCS’15. Unfortunately, we show that any privacy-preserving collaborative deep learning is susceptible to a powerful attack that we devise in this paper. In particular, we show that a distributed, federated, or decentralized deep learning approach is fundamentally broken and does not protect the training sets of honest participants. The attack we developed exploits the real-time nature of the learning process that allows the adversary to train a Generative Adversarial Network (GAN) that generates prototypical samples of the targeted training set that was meant to be private (the samples generated by the GAN are intended to come from the same distribution as the training data). Interestingly, we show that record-level DP applied to the shared parameters of the model, as suggested in previous work, is ineffective (i.e., record-level DP is not designed to address our attack).
Tasks
Published	2017-02-24
URL	http://arxiv.org/abs/1702.07464v3
PDF	http://arxiv.org/pdf/1702.07464v3.pdf
PWC	https://paperswithcode.com/paper/deep-models-under-the-gan-information-leakage
Repo
Framework

Classification of Quantitative Light-Induced Fluorescence Images Using Convolutional Neural Network


Title	Classification of Quantitative Light-Induced Fluorescence Images Using Convolutional Neural Network
Authors	Sultan Imangaliyev, Monique H. van der Veen, Catherine M. C. Volgenant, Bruno G. Loos, Bart J. F. Keijser, Wim Crielaard, Evgeni Levin
Abstract	Images are an important data source for diagnosis and treatment of oral diseases. The manual classification of images may lead to misdiagnosis or mistreatment due to subjective errors. In this paper an image classification model based on Convolutional Neural Network is applied to Quantitative Light-induced Fluorescence images. The deep neural network outperforms other state of the art shallow classification models in predicting labels derived from three different dental plaque assessment scores. The model directly benefits from multi-channel representation of the images resulting in improved performance when, besides the Red colour channel, additional Green and Blue colour channels are used.
Tasks	Image Classification
Published	2017-05-25
URL	http://arxiv.org/abs/1705.09193v1
PDF	http://arxiv.org/pdf/1705.09193v1.pdf
PWC	https://paperswithcode.com/paper/classification-of-quantitative-light-induced
Repo
Framework

Structured Best Arm Identification with Fixed Confidence


Title	Structured Best Arm Identification with Fixed Confidence
Authors	Ruitong Huang, Mohammad M. Ajallooeian, Csaba Szepesvári, Martin Müller
Abstract	We study the problem of identifying the best action among a set of possible options when the value of each action is given by a mapping from a number of noisy micro-observables in the so-called fixed confidence setting. Our main motivation is the application to the minimax game search, which has been a major topic of interest in artificial intelligence. In this paper we introduce an abstract setting to clearly describe the essential properties of the problem. While previous work only considered a two-move game tree search problem, our abstract setting can be applied to the general minimax games where the depth can be non-uniform and arbitrary, and transpositions are allowed. We introduce a new algorithm (LUCB-micro) for the abstract setting, and give its lower and upper sample complexity results. Our bounds recover some previous results, which were only available in more limited settings, while they also shed further light on how the structure of minimax problems influence sample complexity.
Tasks
Published	2017-06-16
URL	http://arxiv.org/abs/1706.05198v2
PDF	http://arxiv.org/pdf/1706.05198v2.pdf
PWC	https://paperswithcode.com/paper/structured-best-arm-identification-with-fixed
Repo
Framework

CNN based music emotion classification


Title	CNN based music emotion classification
Authors	Xin Liu, Qingcai Chen, Xiangping Wu, Yan Liu, Yang Liu
Abstract	Music emotion recognition (MER) is usually regarded as a multi-label tagging task, and each segment of music can inspire specific emotion tags. Most researchers extract acoustic features from music and explore the relations between these features and their corresponding emotion tags. Considering the inconsistency of emotions inspired by the same music segment for human beings, seeking for the key acoustic features that really affect on emotions is really a challenging task. In this paper, we propose a novel MER method by using deep convolutional neural network (CNN) on the music spectrograms that contains both the original time and frequency domain information. By the proposed method, no additional effort on extracting specific features required, which is left to the training procedure of the CNN model. Experiments are conducted on the standard CAL500 and CAL500exp dataset. Results show that, for both datasets, the proposed method outperforms state-of-the-art methods.
Tasks	Emotion Classification, Emotion Recognition, Music Emotion Recognition
Published	2017-04-19
URL	http://arxiv.org/abs/1704.05665v1
PDF	http://arxiv.org/pdf/1704.05665v1.pdf
PWC	https://paperswithcode.com/paper/cnn-based-music-emotion-classification
Repo
Framework

Self-Committee Approach for Image Restoration Problems using Convolutional Neural Network


Title	Self-Committee Approach for Image Restoration Problems using Convolutional Neural Network
Authors	Byeongyong Ahn, Nam Ik Cho
Abstract	There have been many discriminative learning methods using convolutional neural networks (CNN) for several image restoration problems, which learn the mapping function from a degraded input to the clean output. In this letter, we propose a self-committee method that can find enhanced restoration results from the multiple trial of a trained CNN with different but related inputs. Specifically, it is noted that the CNN sometimes finds different mapping functions when the input is transformed by a reversible transform and thus produces different but related outputs with the original. Hence averaging the outputs for several different transformed inputs can enhance the results as evidenced by the network committee methods. Unlike the conventional committee approaches that require several networks, the proposed method needs only a single network. Experimental results show that adding an additional transform as a committee always brings additional gain on image denoising and single image supre-resolution problems.
Tasks	Denoising, Image Denoising, Image Restoration
Published	2017-05-12
URL	http://arxiv.org/abs/1705.04528v2
PDF	http://arxiv.org/pdf/1705.04528v2.pdf
PWC	https://paperswithcode.com/paper/self-committee-approach-for-image-restoration
Repo
Framework

User-driven Intelligent Interface on the Basis of Multimodal Augmented Reality and Brain-Computer Interaction for People with Functional Disabilities


Title	User-driven Intelligent Interface on the Basis of Multimodal Augmented Reality and Brain-Computer Interaction for People with Functional Disabilities
Authors	S. Stirenko, Yu. Gordienko, T. Shemsedinov, O. Alienin, Yu. Kochura, N. Gordienko, A. Rojbi, J. R. López Benito, E. Artetxe González
Abstract	The analysis of the current integration attempts of some modes and use cases of user-machine interaction is presented. The new concept of the user-driven intelligent interface is proposed on the basis of multimodal augmented reality and brain-computer interaction for various applications: in disabilities studies, education, home care, health care, etc. The several use cases of multimodal augmentation are presented. The perspectives of the better human comprehension by the immediate feedback through neurophysical channels by means of brain-computer interaction are outlined. It is shown that brain-computer interface (BCI) technology provides new strategies to overcome limits of the currently available user interfaces, especially for people with functional disabilities. The results of the previous studies of the low end consumer and open-source BCI-devices allow us to conclude that combination of machine learning (ML), multimodal interactions (visual, sound, tactile) with BCI will profit from the immediate feedback from the actual neurophysical reactions classified by ML methods. In general, BCI in combination with other modes of AR interaction can deliver much more information than these types of interaction themselves. Even in the current state the combined AR-BCI interfaces could provide the highly adaptable and personal services, especially for people with functional disabilities.
Tasks
Published	2017-04-12
URL	http://arxiv.org/abs/1704.05915v2
PDF	http://arxiv.org/pdf/1704.05915v2.pdf
PWC	https://paperswithcode.com/paper/user-driven-intelligent-interface-on-the
Repo
Framework

Deep Recurrent Neural Networks for mapping winter vegetation quality coverage via multi-temporal SAR Sentinel-1


Title	Deep Recurrent Neural Networks for mapping winter vegetation quality coverage via multi-temporal SAR Sentinel-1
Authors	Dinh Ho Tong Minh, Dino Ienco, Raffaele Gaetano, Nathalie Lalande, Emile Ndikumana, Faycal Osman, Pierre Maurel
Abstract	Mapping winter vegetation quality coverage is a challenge problem of remote sensing. This is due to the cloud coverage in winter period, leading to use radar rather than optical images. The objective of this paper is to provide a better understanding of the capabilities of radar Sentinel-1 and deep learning concerning about mapping winter vegetation quality coverage. The analysis presented in this paper is carried out on multi-temporal Sentinel-1 data over the site of La Rochelle, France, during the campaign in December 2016. This dataset were processed in order to produce an intensity radar data stack from October 2016 to February 2017. Two deep Recurrent Neural Network (RNN) based classifier methods were employed. We found that the results of RNNs clearly outperformed the classical machine learning approaches (Support Vector Machine and Random Forest). This study confirms that the time series radar Sentinel-1 and RNNs could be exploited for winter vegetation quality cover mapping.
Tasks	Time Series
Published	2017-08-11
URL	http://arxiv.org/abs/1708.03694v1
PDF	http://arxiv.org/pdf/1708.03694v1.pdf
PWC	https://paperswithcode.com/paper/deep-recurrent-neural-networks-for-mapping
Repo
Framework

Generative Compression


Title	Generative Compression
Authors	Shibani Santurkar, David Budden, Nir Shavit
Abstract	Traditional image and video compression algorithms rely on hand-crafted encoder/decoder pairs (codecs) that lack adaptability and are agnostic to the data being compressed. Here we describe the concept of generative compression, the compression of data using generative models, and suggest that it is a direction worth pursuing to produce more accurate and visually pleasing reconstructions at much deeper compression levels for both image and video data. We also demonstrate that generative compression is orders-of-magnitude more resilient to bit error rates (e.g. from noisy wireless channels) than traditional variable-length coding schemes.
Tasks	Video Compression
Published	2017-03-04
URL	http://arxiv.org/abs/1703.01467v2
PDF	http://arxiv.org/pdf/1703.01467v2.pdf
PWC	https://paperswithcode.com/paper/generative-compression
Repo
Framework

Reexamining Low Rank Matrix Factorization for Trace Norm Regularization


Title	Reexamining Low Rank Matrix Factorization for Trace Norm Regularization
Authors	Carlo Ciliberto, Dimitris Stamos, Massimiliano Pontil
Abstract	Trace norm regularization is a widely used approach for learning low rank matrices. A standard optimization strategy is based on formulating the problem as one of low rank matrix factorization which, however, leads to a non-convex problem. In practice this approach works well, and it is often computationally faster than standard convex solvers such as proximal gradient methods. Nevertheless, it is not guaranteed to converge to a global optimum, and the optimization can be trapped at poor stationary points. In this paper we show that it is possible to characterize all critical points of the non-convex problem. This allows us to provide an efficient criterion to determine whether a critical point is also a global minimizer. Our analysis suggests an iterative meta-algorithm that dynamically expands the parameter space and allows the optimization to escape any non-global critical point, thereby converging to a global minimizer. The algorithm can be applied to problems such as matrix completion or multitask learning, and our analysis holds for any random initialization of the factor matrices. Finally, we confirm the good performance of the algorithm on synthetic and real datasets.
Tasks	Matrix Completion
Published	2017-06-27
URL	http://arxiv.org/abs/1706.08934v3
PDF	http://arxiv.org/pdf/1706.08934v3.pdf
PWC	https://paperswithcode.com/paper/reexamining-low-rank-matrix-factorization-for
Repo
Framework

Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation


Title	Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
Authors	Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
Abstract	We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.
Tasks
Published	2017-05-09
URL	http://arxiv.org/abs/1705.03372v1
PDF	http://arxiv.org/pdf/1705.03372v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-joint-topic-modelling-for-weakly
Repo
Framework

Skeleton-based Action Recognition Using LSTM and CNN


Title	Skeleton-based Action Recognition Using LSTM and CNN
Authors	Chuankun Li, Pichao Wang, Shuang Wang, Yonghong Hou, Wanqing Li
Abstract	Recent methods based on 3D skeleton data have achieved outstanding performance due to its conciseness, robustness, and view-independent representation. With the development of deep learning, Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM)-based learning methods have achieved promising performance for action recognition. However, for CNN-based methods, it is inevitable to loss temporal information when a sequence is encoded into images. In order to capture as much spatial-temporal information as possible, LSTM and CNN are adopted to conduct effective recognition with later score fusion. In addition, experimental results show that the score fusion between CNN and LSTM performs better than that between LSTM and LSTM for the same feature. Our method achieved state-of-the-art results on NTU RGB+D datasets for 3D human action analysis. The proposed method achieved 87.40% in terms of accuracy and ranked $1^{st}$ place in Large Scale 3D Human Activity Analysis Challenge in Depth Videos.
Tasks	Skeleton Based Action Recognition, Temporal Action Localization
Published	2017-07-06
URL	http://arxiv.org/abs/1707.02356v1
PDF	http://arxiv.org/pdf/1707.02356v1.pdf
PWC	https://paperswithcode.com/paper/skeleton-based-action-recognition-using-lstm
Repo
Framework