July 27, 2019

3094 words 15 mins read

Paper Group ANR 635

Feature Map Pooling for Cross-View Gait Recognition Based on Silhouette Sequence Images. Online Learning to Rank in Stochastic Click Models. Visual speech recognition: aligning terminologies for better understanding. Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System. Deep learning for semantic segmentation of remote …

Feature Map Pooling for Cross-View Gait Recognition Based on Silhouette Sequence Images


Title	Feature Map Pooling for Cross-View Gait Recognition Based on Silhouette Sequence Images
Authors	Qiang Chen, Yunhong Wang, Zheng Liu, Qingjie Liu, Di Huang
Abstract	In this paper, we develop a novel convolutional neural network based approach to extract and aggregate useful information from gait silhouette sequence images instead of simply representing the gait process by averaging silhouette images. The network takes a pair of arbitrary length sequence images as inputs and extracts features for each silhouette independently. Then a feature map pooling strategy is adopted to aggregate sequence features. Subsequently, a network which is similar to Siamese network is designed to perform recognition. The proposed network is simple and easy to implement and can be trained in an end-to-end manner. Cross-view gait recognition experiments are conducted on OU-ISIR large population dataset. The results demonstrate that our network can extract and aggregate features from silhouette sequence effectively. It also achieves significant equal error rates and comparable identification rates when compared with the state of the art.
Tasks	Gait Recognition
Published	2017-11-26
URL	http://arxiv.org/abs/1711.09358v1
PDF	http://arxiv.org/pdf/1711.09358v1.pdf
PWC	https://paperswithcode.com/paper/feature-map-pooling-for-cross-view-gait
Repo
Framework

Online Learning to Rank in Stochastic Click Models


Title	Online Learning to Rank in Stochastic Click Models
Authors	Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, Zheng Wen
Abstract	Online learning to rank is a core problem in information retrieval and machine learning. Many provably efficient algorithms have been recently proposed for this problem in specific click models. The click model is a model of how the user interacts with a list of documents. Though these results are significant, their impact on practice is limited, because all proposed algorithms are designed for specific click models and lack convergence guarantees in other models. In this work, we propose BatchRank, the first online learning to rank algorithm for a broad class of click models. The class encompasses two most fundamental click models, the cascade and position-based models. We derive a gap-dependent upper bound on the $T$-step regret of BatchRank and evaluate it on a range of web search queries. We observe that BatchRank outperforms ranked bandits and is more robust than CascadeKL-UCB, an existing algorithm for the cascade model.
Tasks	Information Retrieval, Learning-To-Rank
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02527v2
PDF	http://arxiv.org/pdf/1703.02527v2.pdf
PWC	https://paperswithcode.com/paper/online-learning-to-rank-in-stochastic-click
Repo
Framework

Visual speech recognition: aligning terminologies for better understanding


Title	Visual speech recognition: aligning terminologies for better understanding
Authors	Helen L Bear, Sarah Taylor
Abstract	We are at an exciting time for machine lipreading. Traditional research stemmed from the adaptation of audio recognition systems. But now, the computer vision community is also participating. This joining of two previously disparate areas with different perspectives on computer lipreading is creating opportunities for collaborations, but in doing so the literature is experiencing challenges in knowledge sharing due to multiple uses of terms and phrases and the range of methods for scoring results. In particular we highlight three areas with the intention to improve communication between those researching lipreading; the effects of interchanging between speech reading and lipreading; speaker dependence across train, validation, and test splits; and the use of accuracy, correctness, errors, and varying units (phonemes, visemes, words, and sentences) to measure system performance. We make recommendations as to how we can be more consistent.
Tasks	Lipreading, Speech Recognition, Visual Speech Recognition
Published	2017-10-03
URL	http://arxiv.org/abs/1710.01292v1
PDF	http://arxiv.org/pdf/1710.01292v1.pdf
PWC	https://paperswithcode.com/paper/visual-speech-recognition-aligning
Repo
Framework

Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System


Title	Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System
Authors	Marina Zimmermann, Mostafa Mehdipour Ghazi, Hazım Kemal Ekenel, Jean-Philippe Thiran
Abstract	Automatic visual speech recognition is an interesting problem in pattern recognition especially when audio data is noisy or not readily available. It is also a very challenging task mainly because of the lower amount of information in the visual articulations compared to the audible utterance. In this work, principle component analysis is applied to the image patches - extracted from the video data - to learn the weights of a two-stage convolutional network. Block histograms are then extracted as the unsupervised learning features. These features are employed to learn a recurrent neural network with a set of long short-term memory cells to obtain spatiotemporal features. Finally, the obtained features are used in a tandem GMM-HMM system for speech recognition. Our results show that the proposed method has outperformed the baseline techniques applied to the OuluVS2 audiovisual database for phrase recognition with the frontal view cross-validation and testing sentence correctness reaching 79% and 73%, respectively, as compared to the baseline of 74% on cross-validation.
Tasks	Speech Recognition, Visual Speech Recognition
Published	2017-10-19
URL	http://arxiv.org/abs/1710.07161v1
PDF	http://arxiv.org/pdf/1710.07161v1.pdf
PWC	https://paperswithcode.com/paper/visual-speech-recognition-using-pca-networks
Repo
Framework

Deep learning for semantic segmentation of remote sensing images with rich spectral content


Title	Deep learning for semantic segmentation of remote sensing images with rich spectral content
Authors	A Hamida, A. Benoît, P. Lambert, L Klein, C Amar, N. Audebert, S. Lefèvre
Abstract	With the rapid development of Remote Sensing acquisition techniques, there is a need to scale and improve processing tools to cope with the observed increase of both data volume and richness. Among popular techniques in remote sensing, Deep Learning gains increasing interest but depends on the quality of the training data. Therefore, this paper presents recent Deep Learning approaches for fine or coarse land cover semantic segmentation estimation. Various 2D architectures are tested and a new 3D model is introduced in order to jointly process the spatial and spectral dimensions of the data. Such a set of networks enables the comparison of the different spectral fusion schemes. Besides, we also assess the use of a " noisy ground truth " (i.e. outdated and low spatial resolution labels) for training and testing the networks.
Tasks	Semantic Segmentation
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01600v1
PDF	http://arxiv.org/pdf/1712.01600v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-semantic-segmentation-of
Repo
Framework

Resolution limits on visual speech recognition


Title	Resolution limits on visual speech recognition
Authors	Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan
Abstract	Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest.
Tasks	Speech Recognition, Visual Speech Recognition
Published	2017-10-03
URL	http://arxiv.org/abs/1710.01073v1
PDF	http://arxiv.org/pdf/1710.01073v1.pdf
PWC	https://paperswithcode.com/paper/resolution-limits-on-visual-speech
Repo
Framework

Unsupervised Domain Adaptation with Random Walks on Target Labelings


Title	Unsupervised Domain Adaptation with Random Walks on Target Labelings
Authors	Twan van Laarhoven, Elena Marchiori
Abstract	Unsupervised Domain Adaptation (DA) is used to automatize the task of labeling data: an unlabeled dataset (target) is annotated using a labeled dataset (source) from a related domain. We cast domain adaptation as the problem of finding stable labels for target examples. A new definition of label stability is proposed, motivated by a generalization error bound for large margin linear classifiers: a target labeling is stable when, with high probability, a classifier trained on a random subsample of the target with that labeling yields the same labeling. We find stable labelings using a random walk on a directed graph with transition probabilities based on labeling stability. The majority vote of those labelings visited by the walk yields a stable label for each target example. The resulting domain adaptation algorithm is strikingly easy to implement and apply: It does not rely on data transformations, which are in general computational prohibitive in the presence of many input features, and does not need to access the source data, which is advantageous when data sharing is restricted. By acting on the original feature space, our method is able to take full advantage of deep features from external pre-trained neural networks, as demonstrated by the results of our experiments.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2017-06-16
URL	http://arxiv.org/abs/1706.05335v2
PDF	http://arxiv.org/pdf/1706.05335v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-domain-adaptation-with-random
Repo
Framework

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models


Title	Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models
Authors	Wojciech Samek, Thomas Wiegand, Klaus-Robert Müller
Abstract	With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.
Tasks	Image Classification, Sentiment Analysis
Published	2017-08-28
URL	http://arxiv.org/abs/1708.08296v1
PDF	http://arxiv.org/pdf/1708.08296v1.pdf
PWC	https://paperswithcode.com/paper/explainable-artificial-intelligence
Repo
Framework

Local Patch Encoding-Based Method for Single Image Super-Resolution


Title	Local Patch Encoding-Based Method for Single Image Super-Resolution
Authors	Yang Zhao, Ronggang Wang, Wei Jia, Jianchao Yang, Wenmin Wang, Wen Gao
Abstract	Recent learning-based super-resolution (SR) methods often focus on dictionary learning or network training. In this paper, we discuss in detail a new SR method based on local patch encoding (LPE) instead of traditional dictionary learning. The proposed method consists of a learning stage and a reconstructing stage. In the learning stage, image patches are classified into different classes by means of the proposed LPE, and then a projection matrix is computed for each class by utilizing a simple constraint. In the reconstructing stage, an input LR patch can be simply reconstructed by computing its LPE code and then multiplying the corresponding projection matrix. Furthermore, we discuss the relationship between the proposed method and the anchored neighborhood regression methods; we also analyze the extendibility of the proposed method. The experimental results on several image sets demonstrate the effectiveness of the LPE-based methods.
Tasks	Dictionary Learning, Image Super-Resolution, Super-Resolution
Published	2017-03-12
URL	http://arxiv.org/abs/1703.04088v2
PDF	http://arxiv.org/pdf/1703.04088v2.pdf
PWC	https://paperswithcode.com/paper/local-patch-encoding-based-method-for-single
Repo
Framework

Achieving Privacy in the Adversarial Multi-Armed Bandit


Title	Achieving Privacy in the Adversarial Multi-Armed Bandit
Authors	Aristide C. Y. Tossou, Christos Dimitrakakis
Abstract	In this paper, we improve the previously best known regret bound to achieve $\epsilon$-differential privacy in oblivious adversarial bandits from $\mathcal{O}{(T^{2/3}/\epsilon)}$ to $\mathcal{O}{(\sqrt{T} \ln T /\epsilon)}$. This is achieved by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear amount of information in $T$. However, we can improve this privacy by relying on its intrinsic exponential mechanism for selecting actions. This allows us to reach $\mathcal{O}{(\sqrt{\ln T})}$-DP, with a regret of $\mathcal{O}{(T^{2/3})}$ that holds against an adaptive adversary, an improvement from the best known of $\mathcal{O}{(T^{3/4})}$. This is done by using an algorithm that run EXP3 in a mini-batch loop. Finally, we run experiments that clearly demonstrate the validity of our theoretical analysis.
Tasks
Published	2017-01-16
URL	http://arxiv.org/abs/1701.04222v1
PDF	http://arxiv.org/pdf/1701.04222v1.pdf
PWC	https://paperswithcode.com/paper/achieving-privacy-in-the-adversarial-multi
Repo
Framework

Exploiting skeletal structure in computer vision annotation with Benders decomposition


Title	Exploiting skeletal structure in computer vision annotation with Benders decomposition
Authors	Shaofei Wang, Konrad Kording, Julian Yarkony
Abstract	Many annotation problems in computer vision can be phrased as integer linear programs (ILPs). The use of standard industrial solvers does not to exploit the underlying structure of such problems eg, the skeleton in pose estimation. The leveraging of the underlying structure in conjunction with industrial solvers promises increases in both speed and accuracy. Such structure can be exploited using Bender’s decomposition, a technique from operations research, that solves complex ILPs or mixed integer linear programs by decomposing them into sub-problems that communicate via a master problem. The intuition is that conditioned on a small subset of the variables the solution to the remaining variables can be computed easily by taking advantage of properties of the ILP constraint matrix such as block structure. In this paper we apply Benders decomposition to a typical problem in computer vision where we have many sub-ILPs (eg, partitioning of detections, body-parts) coupled to a master ILP (eg, constructing skeletons). Dividing inference problems into a master problem and sub-problems motivates the development of a plethora of novel models, and inference approaches for the field of computer vision.
Tasks	Pose Estimation
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04411v1
PDF	http://arxiv.org/pdf/1709.04411v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-skeletal-structure-in-computer
Repo
Framework

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning


Title	Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
Authors	Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, Dawn Song
Abstract	Deep learning models have achieved high performance on many tasks, and thus have been applied to many security-critical scenarios. For example, deep learning-based face recognition systems have been used to authenticate users to access many security-sensitive applications like payment apps. Such usages of deep learning systems provide the adversaries with sufficient incentives to perform attacks against these systems for their adversarial purposes. In this work, we consider a new type of attacks, called backdoor attacks, where the attacker’s goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor. Specifically, the adversary aims at creating backdoor instances, so that the victim learning system will be misled to classify the backdoor instances as a target label specified by the adversary. In particular, we study backdoor poisoning attacks, which achieve backdoor attacks using poisoning strategies. Different from all existing work, our studied poisoning strategies can apply under a very weak threat model: (1) the adversary has no knowledge of the model and the training set used by the victim system; (2) the attacker is allowed to inject only a small amount of poisoning samples; (3) the backdoor key is hard to notice even by human beings to achieve stealthiness. We conduct evaluation to demonstrate that a backdoor adversary can inject only around 50 poisoning samples, while achieving an attack success rate of above 90%. We are also the first work to show that a data poisoning attack can create physically implementable backdoors without touching the training process. Our work demonstrates that backdoor poisoning attacks pose real threats to a learning system, and thus highlights the importance of further investigation and proposing defense strategies against them.
Tasks	data poisoning, Face Recognition
Published	2017-12-15
URL	http://arxiv.org/abs/1712.05526v1
PDF	http://arxiv.org/pdf/1712.05526v1.pdf
PWC	https://paperswithcode.com/paper/targeted-backdoor-attacks-on-deep-learning
Repo
Framework

Interpretable Transformations with Encoder-Decoder Networks


Title	Interpretable Transformations with Encoder-Decoder Networks
Authors	Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, Gabriel J. Brostow
Abstract	Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding factors, such as pose, appearance, and illumination, from object identity. Disentangling these is difficult because they interact in very nonlinear ways. We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations. A person or algorithm can then manipulate the disentangled representation, for example, to re-render an image with explicit control over parameterized degrees of freedom. The feature space is constructed using a transforming encoder-decoder network with a custom feature transform layer, acting on the hidden representations. We demonstrate the advantages of explicit disentangling on a variety of datasets and transformations, and as an aid for traditional tasks, such as classification.
Tasks
Published	2017-10-19
URL	http://arxiv.org/abs/1710.07307v1
PDF	http://arxiv.org/pdf/1710.07307v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-transformations-with-encoder
Repo
Framework

Model Selection with Nonlinear Embedding for Unsupervised Domain Adaptation


Title	Model Selection with Nonlinear Embedding for Unsupervised Domain Adaptation
Authors	Hemanth Venkateswara, Shayok Chakraborty, Troy McDaniel, Sethuraman Panchanathan
Abstract	Domain adaptation deals with adapting classifiers trained on data from a source distribution, to work effectively on data from a target distribution. In this paper, we introduce the Nonlinear Embedding Transform (NET) for unsupervised domain adaptation. The NET reduces cross-domain disparity through nonlinear domain alignment. It also embeds the domain-aligned data such that similar data points are clustered together. This results in enhanced classification. To determine the parameters in the NET model (and in other unsupervised domain adaptation models), we introduce a validation procedure by sampling source data points that are similar in distribution to the target data. We test the NET and the validation procedure using popular image datasets and compare the classification results across competitive procedures for unsupervised domain adaptation.
Tasks	Domain Adaptation, Model Selection, Unsupervised Domain Adaptation
Published	2017-06-23
URL	http://arxiv.org/abs/1706.07527v1
PDF	http://arxiv.org/pdf/1706.07527v1.pdf
PWC	https://paperswithcode.com/paper/model-selection-with-nonlinear-embedding-for
Repo
Framework

Un modèle pour la représentation des connaissances temporelles dans les documents historiques


Title	Un modèle pour la représentation des connaissances temporelles dans les documents historiques
Authors	Sahar Aljalbout, Gilles Falquet
Abstract	Processing and publishing the data of the historical sciences in the semantic web is an interesting challenge in which the representation of temporal aspects plays a key role. We propose in this paper a model of temporal knowledge representation adapted to work on historical documents. This model is based on the notion of fluent that is represented in RDF graphs. We show how this model allows to represent the knowledge necessary to the historians and how it can be used to reason on this knowledge using the SWRL and SPARQL languages. This model is being used in a project to digitize, study and publish the manuscripts of linguist Ferdinand de Saussure.
Tasks
Published	2017-07-25
URL	http://arxiv.org/abs/1707.08000v1
PDF	http://arxiv.org/pdf/1707.08000v1.pdf
PWC	https://paperswithcode.com/paper/un-modele-pour-la-representation-des
Repo
Framework