Paper Group ANR 635
Feature Map Pooling for Cross-View Gait Recognition Based on Silhouette Sequence Images. Online Learning to Rank in Stochastic Click Models. Visual speech recognition: aligning terminologies for better understanding. Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System. Deep learning for semantic segmentation of remote …
Feature Map Pooling for Cross-View Gait Recognition Based on Silhouette Sequence Images
Title | Feature Map Pooling for Cross-View Gait Recognition Based on Silhouette Sequence Images |
Authors | Qiang Chen, Yunhong Wang, Zheng Liu, Qingjie Liu, Di Huang |
Abstract | In this paper, we develop a novel convolutional neural network based approach to extract and aggregate useful information from gait silhouette sequence images instead of simply representing the gait process by averaging silhouette images. The network takes a pair of arbitrary length sequence images as inputs and extracts features for each silhouette independently. Then a feature map pooling strategy is adopted to aggregate sequence features. Subsequently, a network which is similar to Siamese network is designed to perform recognition. The proposed network is simple and easy to implement and can be trained in an end-to-end manner. Cross-view gait recognition experiments are conducted on OU-ISIR large population dataset. The results demonstrate that our network can extract and aggregate features from silhouette sequence effectively. It also achieves significant equal error rates and comparable identification rates when compared with the state of the art. |
Tasks | Gait Recognition |
Published | 2017-11-26 |
URL | http://arxiv.org/abs/1711.09358v1 |
http://arxiv.org/pdf/1711.09358v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-map-pooling-for-cross-view-gait |
Repo | |
Framework | |
Online Learning to Rank in Stochastic Click Models
Title | Online Learning to Rank in Stochastic Click Models |
Authors | Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, Zheng Wen |
Abstract | Online learning to rank is a core problem in information retrieval and machine learning. Many provably efficient algorithms have been recently proposed for this problem in specific click models. The click model is a model of how the user interacts with a list of documents. Though these results are significant, their impact on practice is limited, because all proposed algorithms are designed for specific click models and lack convergence guarantees in other models. In this work, we propose BatchRank, the first online learning to rank algorithm for a broad class of click models. The class encompasses two most fundamental click models, the cascade and position-based models. We derive a gap-dependent upper bound on the $T$-step regret of BatchRank and evaluate it on a range of web search queries. We observe that BatchRank outperforms ranked bandits and is more robust than CascadeKL-UCB, an existing algorithm for the cascade model. |
Tasks | Information Retrieval, Learning-To-Rank |
Published | 2017-03-07 |
URL | http://arxiv.org/abs/1703.02527v2 |
http://arxiv.org/pdf/1703.02527v2.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-to-rank-in-stochastic-click |
Repo | |
Framework | |
Visual speech recognition: aligning terminologies for better understanding
Title | Visual speech recognition: aligning terminologies for better understanding |
Authors | Helen L Bear, Sarah Taylor |
Abstract | We are at an exciting time for machine lipreading. Traditional research stemmed from the adaptation of audio recognition systems. But now, the computer vision community is also participating. This joining of two previously disparate areas with different perspectives on computer lipreading is creating opportunities for collaborations, but in doing so the literature is experiencing challenges in knowledge sharing due to multiple uses of terms and phrases and the range of methods for scoring results. In particular we highlight three areas with the intention to improve communication between those researching lipreading; the effects of interchanging between speech reading and lipreading; speaker dependence across train, validation, and test splits; and the use of accuracy, correctness, errors, and varying units (phonemes, visemes, words, and sentences) to measure system performance. We make recommendations as to how we can be more consistent. |
Tasks | Lipreading, Speech Recognition, Visual Speech Recognition |
Published | 2017-10-03 |
URL | http://arxiv.org/abs/1710.01292v1 |
http://arxiv.org/pdf/1710.01292v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-speech-recognition-aligning |
Repo | |
Framework | |
Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System
Title | Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System |
Authors | Marina Zimmermann, Mostafa Mehdipour Ghazi, Hazım Kemal Ekenel, Jean-Philippe Thiran |
Abstract | Automatic visual speech recognition is an interesting problem in pattern recognition especially when audio data is noisy or not readily available. It is also a very challenging task mainly because of the lower amount of information in the visual articulations compared to the audible utterance. In this work, principle component analysis is applied to the image patches - extracted from the video data - to learn the weights of a two-stage convolutional network. Block histograms are then extracted as the unsupervised learning features. These features are employed to learn a recurrent neural network with a set of long short-term memory cells to obtain spatiotemporal features. Finally, the obtained features are used in a tandem GMM-HMM system for speech recognition. Our results show that the proposed method has outperformed the baseline techniques applied to the OuluVS2 audiovisual database for phrase recognition with the frontal view cross-validation and testing sentence correctness reaching 79% and 73%, respectively, as compared to the baseline of 74% on cross-validation. |
Tasks | Speech Recognition, Visual Speech Recognition |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07161v1 |
http://arxiv.org/pdf/1710.07161v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-speech-recognition-using-pca-networks |
Repo | |
Framework | |
Deep learning for semantic segmentation of remote sensing images with rich spectral content
Title | Deep learning for semantic segmentation of remote sensing images with rich spectral content |
Authors | A Hamida, A. Benoît, P. Lambert, L Klein, C Amar, N. Audebert, S. Lefèvre |
Abstract | With the rapid development of Remote Sensing acquisition techniques, there is a need to scale and improve processing tools to cope with the observed increase of both data volume and richness. Among popular techniques in remote sensing, Deep Learning gains increasing interest but depends on the quality of the training data. Therefore, this paper presents recent Deep Learning approaches for fine or coarse land cover semantic segmentation estimation. Various 2D architectures are tested and a new 3D model is introduced in order to jointly process the spatial and spectral dimensions of the data. Such a set of networks enables the comparison of the different spectral fusion schemes. Besides, we also assess the use of a " noisy ground truth " (i.e. outdated and low spatial resolution labels) for training and testing the networks. |
Tasks | Semantic Segmentation |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01600v1 |
http://arxiv.org/pdf/1712.01600v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-semantic-segmentation-of |
Repo | |
Framework | |
Resolution limits on visual speech recognition
Title | Resolution limits on visual speech recognition |
Authors | Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan |
Abstract | Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest. |
Tasks | Speech Recognition, Visual Speech Recognition |
Published | 2017-10-03 |
URL | http://arxiv.org/abs/1710.01073v1 |
http://arxiv.org/pdf/1710.01073v1.pdf | |
PWC | https://paperswithcode.com/paper/resolution-limits-on-visual-speech |
Repo | |
Framework | |
Unsupervised Domain Adaptation with Random Walks on Target Labelings
Title | Unsupervised Domain Adaptation with Random Walks on Target Labelings |
Authors | Twan van Laarhoven, Elena Marchiori |
Abstract | Unsupervised Domain Adaptation (DA) is used to automatize the task of labeling data: an unlabeled dataset (target) is annotated using a labeled dataset (source) from a related domain. We cast domain adaptation as the problem of finding stable labels for target examples. A new definition of label stability is proposed, motivated by a generalization error bound for large margin linear classifiers: a target labeling is stable when, with high probability, a classifier trained on a random subsample of the target with that labeling yields the same labeling. We find stable labelings using a random walk on a directed graph with transition probabilities based on labeling stability. The majority vote of those labelings visited by the walk yields a stable label for each target example. The resulting domain adaptation algorithm is strikingly easy to implement and apply: It does not rely on data transformations, which are in general computational prohibitive in the presence of many input features, and does not need to access the source data, which is advantageous when data sharing is restricted. By acting on the original feature space, our method is able to take full advantage of deep features from external pre-trained neural networks, as demonstrated by the results of our experiments. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2017-06-16 |
URL | http://arxiv.org/abs/1706.05335v2 |
http://arxiv.org/pdf/1706.05335v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-domain-adaptation-with-random |
Repo | |
Framework | |
Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models
Title | Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models |
Authors | Wojciech Samek, Thomas Wiegand, Klaus-Robert Müller |
Abstract | With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks. |
Tasks | Image Classification, Sentiment Analysis |
Published | 2017-08-28 |
URL | http://arxiv.org/abs/1708.08296v1 |
http://arxiv.org/pdf/1708.08296v1.pdf | |
PWC | https://paperswithcode.com/paper/explainable-artificial-intelligence |
Repo | |
Framework | |
Local Patch Encoding-Based Method for Single Image Super-Resolution
Title | Local Patch Encoding-Based Method for Single Image Super-Resolution |
Authors | Yang Zhao, Ronggang Wang, Wei Jia, Jianchao Yang, Wenmin Wang, Wen Gao |
Abstract | Recent learning-based super-resolution (SR) methods often focus on dictionary learning or network training. In this paper, we discuss in detail a new SR method based on local patch encoding (LPE) instead of traditional dictionary learning. The proposed method consists of a learning stage and a reconstructing stage. In the learning stage, image patches are classified into different classes by means of the proposed LPE, and then a projection matrix is computed for each class by utilizing a simple constraint. In the reconstructing stage, an input LR patch can be simply reconstructed by computing its LPE code and then multiplying the corresponding projection matrix. Furthermore, we discuss the relationship between the proposed method and the anchored neighborhood regression methods; we also analyze the extendibility of the proposed method. The experimental results on several image sets demonstrate the effectiveness of the LPE-based methods. |
Tasks | Dictionary Learning, Image Super-Resolution, Super-Resolution |
Published | 2017-03-12 |
URL | http://arxiv.org/abs/1703.04088v2 |
http://arxiv.org/pdf/1703.04088v2.pdf | |
PWC | https://paperswithcode.com/paper/local-patch-encoding-based-method-for-single |
Repo | |
Framework | |
Achieving Privacy in the Adversarial Multi-Armed Bandit
Title | Achieving Privacy in the Adversarial Multi-Armed Bandit |
Authors | Aristide C. Y. Tossou, Christos Dimitrakakis |
Abstract | In this paper, we improve the previously best known regret bound to achieve $\epsilon$-differential privacy in oblivious adversarial bandits from $\mathcal{O}{(T^{2/3}/\epsilon)}$ to $\mathcal{O}{(\sqrt{T} \ln T /\epsilon)}$. This is achieved by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear amount of information in $T$. However, we can improve this privacy by relying on its intrinsic exponential mechanism for selecting actions. This allows us to reach $\mathcal{O}{(\sqrt{\ln T})}$-DP, with a regret of $\mathcal{O}{(T^{2/3})}$ that holds against an adaptive adversary, an improvement from the best known of $\mathcal{O}{(T^{3/4})}$. This is done by using an algorithm that run EXP3 in a mini-batch loop. Finally, we run experiments that clearly demonstrate the validity of our theoretical analysis. |
Tasks | |
Published | 2017-01-16 |
URL | http://arxiv.org/abs/1701.04222v1 |
http://arxiv.org/pdf/1701.04222v1.pdf | |
PWC | https://paperswithcode.com/paper/achieving-privacy-in-the-adversarial-multi |
Repo | |
Framework | |
Exploiting skeletal structure in computer vision annotation with Benders decomposition
Title | Exploiting skeletal structure in computer vision annotation with Benders decomposition |
Authors | Shaofei Wang, Konrad Kording, Julian Yarkony |
Abstract | Many annotation problems in computer vision can be phrased as integer linear programs (ILPs). The use of standard industrial solvers does not to exploit the underlying structure of such problems eg, the skeleton in pose estimation. The leveraging of the underlying structure in conjunction with industrial solvers promises increases in both speed and accuracy. Such structure can be exploited using Bender’s decomposition, a technique from operations research, that solves complex ILPs or mixed integer linear programs by decomposing them into sub-problems that communicate via a master problem. The intuition is that conditioned on a small subset of the variables the solution to the remaining variables can be computed easily by taking advantage of properties of the ILP constraint matrix such as block structure. In this paper we apply Benders decomposition to a typical problem in computer vision where we have many sub-ILPs (eg, partitioning of detections, body-parts) coupled to a master ILP (eg, constructing skeletons). Dividing inference problems into a master problem and sub-problems motivates the development of a plethora of novel models, and inference approaches for the field of computer vision. |
Tasks | Pose Estimation |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04411v1 |
http://arxiv.org/pdf/1709.04411v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-skeletal-structure-in-computer |
Repo | |
Framework | |
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
Title | Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning |
Authors | Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, Dawn Song |
Abstract | Deep learning models have achieved high performance on many tasks, and thus have been applied to many security-critical scenarios. For example, deep learning-based face recognition systems have been used to authenticate users to access many security-sensitive applications like payment apps. Such usages of deep learning systems provide the adversaries with sufficient incentives to perform attacks against these systems for their adversarial purposes. In this work, we consider a new type of attacks, called backdoor attacks, where the attacker’s goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor. Specifically, the adversary aims at creating backdoor instances, so that the victim learning system will be misled to classify the backdoor instances as a target label specified by the adversary. In particular, we study backdoor poisoning attacks, which achieve backdoor attacks using poisoning strategies. Different from all existing work, our studied poisoning strategies can apply under a very weak threat model: (1) the adversary has no knowledge of the model and the training set used by the victim system; (2) the attacker is allowed to inject only a small amount of poisoning samples; (3) the backdoor key is hard to notice even by human beings to achieve stealthiness. We conduct evaluation to demonstrate that a backdoor adversary can inject only around 50 poisoning samples, while achieving an attack success rate of above 90%. We are also the first work to show that a data poisoning attack can create physically implementable backdoors without touching the training process. Our work demonstrates that backdoor poisoning attacks pose real threats to a learning system, and thus highlights the importance of further investigation and proposing defense strategies against them. |
Tasks | data poisoning, Face Recognition |
Published | 2017-12-15 |
URL | http://arxiv.org/abs/1712.05526v1 |
http://arxiv.org/pdf/1712.05526v1.pdf | |
PWC | https://paperswithcode.com/paper/targeted-backdoor-attacks-on-deep-learning |
Repo | |
Framework | |
Interpretable Transformations with Encoder-Decoder Networks
Title | Interpretable Transformations with Encoder-Decoder Networks |
Authors | Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, Gabriel J. Brostow |
Abstract | Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding factors, such as pose, appearance, and illumination, from object identity. Disentangling these is difficult because they interact in very nonlinear ways. We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations. A person or algorithm can then manipulate the disentangled representation, for example, to re-render an image with explicit control over parameterized degrees of freedom. The feature space is constructed using a transforming encoder-decoder network with a custom feature transform layer, acting on the hidden representations. We demonstrate the advantages of explicit disentangling on a variety of datasets and transformations, and as an aid for traditional tasks, such as classification. |
Tasks | |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07307v1 |
http://arxiv.org/pdf/1710.07307v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-transformations-with-encoder |
Repo | |
Framework | |
Model Selection with Nonlinear Embedding for Unsupervised Domain Adaptation
Title | Model Selection with Nonlinear Embedding for Unsupervised Domain Adaptation |
Authors | Hemanth Venkateswara, Shayok Chakraborty, Troy McDaniel, Sethuraman Panchanathan |
Abstract | Domain adaptation deals with adapting classifiers trained on data from a source distribution, to work effectively on data from a target distribution. In this paper, we introduce the Nonlinear Embedding Transform (NET) for unsupervised domain adaptation. The NET reduces cross-domain disparity through nonlinear domain alignment. It also embeds the domain-aligned data such that similar data points are clustered together. This results in enhanced classification. To determine the parameters in the NET model (and in other unsupervised domain adaptation models), we introduce a validation procedure by sampling source data points that are similar in distribution to the target data. We test the NET and the validation procedure using popular image datasets and compare the classification results across competitive procedures for unsupervised domain adaptation. |
Tasks | Domain Adaptation, Model Selection, Unsupervised Domain Adaptation |
Published | 2017-06-23 |
URL | http://arxiv.org/abs/1706.07527v1 |
http://arxiv.org/pdf/1706.07527v1.pdf | |
PWC | https://paperswithcode.com/paper/model-selection-with-nonlinear-embedding-for |
Repo | |
Framework | |
Un modèle pour la représentation des connaissances temporelles dans les documents historiques
Title | Un modèle pour la représentation des connaissances temporelles dans les documents historiques |
Authors | Sahar Aljalbout, Gilles Falquet |
Abstract | Processing and publishing the data of the historical sciences in the semantic web is an interesting challenge in which the representation of temporal aspects plays a key role. We propose in this paper a model of temporal knowledge representation adapted to work on historical documents. This model is based on the notion of fluent that is represented in RDF graphs. We show how this model allows to represent the knowledge necessary to the historians and how it can be used to reason on this knowledge using the SWRL and SPARQL languages. This model is being used in a project to digitize, study and publish the manuscripts of linguist Ferdinand de Saussure. |
Tasks | |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.08000v1 |
http://arxiv.org/pdf/1707.08000v1.pdf | |
PWC | https://paperswithcode.com/paper/un-modele-pour-la-representation-des |
Repo | |
Framework | |