October 20, 2019

3050 words 15 mins read

Paper Group AWR 321

Deep $k$-Means: Jointly clustering with $k$-Means and learning representations. Cross-media Multi-level Alignment with Relation Attention Network. State Representation Learning for Control: An Overview. Similarity encoding for learning with dirty categorical variables. Disambiguating Music Artists at Scale with Audio Metric Learning. Perturbative N …

Deep $k$-Means: Jointly clustering with $k$-Means and learning representations


Title	Deep $k$-Means: Jointly clustering with $k$-Means and learning representations
Authors	Maziar Moradi Fard, Thibaut Thonet, Eric Gaussier
Abstract	We study in this paper the problem of jointly clustering and learning representations. As several previous studies have shown, learning representations that are both faithful to the data to be clustered and adapted to the clustering algorithm can lead to better clustering performance, all the more so that the two tasks are performed jointly. We propose here such an approach for $k$-Means clustering based on a continuous reparametrization of the objective function that leads to a truly joint solution. The behavior of our approach is illustrated on various datasets showing its efficacy in learning representations for objects while clustering them.
Tasks
Published	2018-06-26
URL	http://arxiv.org/abs/1806.10069v2
PDF	http://arxiv.org/pdf/1806.10069v2.pdf
PWC	https://paperswithcode.com/paper/deep-k-means-jointly-clustering-with-k-means
Repo	https://github.com/MaziarMF/deep-k-means
Framework	tf

Cross-media Multi-level Alignment with Relation Attention Network


Title	Cross-media Multi-level Alignment with Relation Attention Network
Authors	Jinwei Qi, Yuxin Peng, Yuxin Yuan
Abstract	With the rapid growth of multimedia data, such as image and text, it is a highly challenging problem to effectively correlate and retrieve the data of different media types. Naturally, when correlating an image with textual description, people focus on not only the alignment between discriminative image regions and key words, but also the relations lying in the visual and textual context. Relation understanding is essential for cross-media correlation learning, which is ignored by prior cross-media retrieval works. To address the above issue, we propose Cross-media Relation Attention Network (CRAN) with multi-level alignment. First, we propose visual-language relation attention model to explore both fine-grained patches and their relations of different media types. We aim to not only exploit cross-media fine-grained local information, but also capture the intrinsic relation information, which can provide complementary hints for correlation learning. Second, we propose cross-media multi-level alignment to explore global, local and relation alignments across different media types, which can mutually boost to learn more precise cross-media correlation. We conduct experiments on 2 cross-media datasets, and compare with 10 state-of-the-art methods to verify the effectiveness of proposed approach.
Tasks
Published	2018-04-25
URL	https://arxiv.org/abs/1804.09539v1
PDF	https://arxiv.org/pdf/1804.09539v1.pdf
PWC	https://paperswithcode.com/paper/cross-media-multi-level-alignment-with
Repo	https://github.com/gchb2012/VQA
Framework	none

State Representation Learning for Control: An Overview


Title	State Representation Learning for Control: An Overview
Authors	Timothée Lesort, Natalia Díaz-Rodríguez, Jean-François Goudou, David Filliat
Abstract	Representation learning algorithms are designed to learn abstract features that characterize data. State representation learning (SRL) focuses on a particular kind of representation learning where learned features are in low dimension, evolve through time, and are influenced by actions of an agent. The representation is learned to capture the variation in the environment generated by the agent’s actions; this kind of representation is particularly suitable for robotics and control scenarios. In particular, the low dimension characteristic of the representation helps to overcome the curse of dimensionality, provides easier interpretation and utilization by humans and can help improve performance and speed in policy learning algorithms such as reinforcement learning. This survey aims at covering the state-of-the-art on state representation learning in the most recent years. It reviews different SRL methods that involve interaction with the environment, their implementations and their applications in robotics control tasks (simulated or real). In particular, it highlights how generic learning objectives are differently exploited in the reviewed algorithms. Finally, it discusses evaluation methods to assess the representation learned and summarizes current and future lines of research.
Tasks	Representation Learning
Published	2018-02-12
URL	http://arxiv.org/abs/1802.04181v2
PDF	http://arxiv.org/pdf/1802.04181v2.pdf
PWC	https://paperswithcode.com/paper/state-representation-learning-for-control-an
Repo	https://github.com/araffin/srl-zoo
Framework	pytorch

Similarity encoding for learning with dirty categorical variables


Title	Similarity encoding for learning with dirty categorical variables
Authors	Patricio Cerda, Gaël Varoquaux, Balázs Kégl
Abstract	For statistical learning, categorical variables in a table are usually considered as discrete entities and encoded separately to feature vectors, e.g., with one-hot encoding. “Dirty” non-curated data gives rise to categorical variables with a very high cardinality but redundancy: several categories reflect the same entity. In databases, this issue is typically solved with a deduplication step. We show that a simple approach that exposes the redundancy to the learning algorithm brings significant gains. We study a generalization of one-hot encoding, similarity encoding, that builds feature vectors from similarities across categories. We perform a thorough empirical validation on non-curated tables, a problem seldom studied in machine learning. Results on seven real-world datasets show that similarity encoding brings significant gains in prediction in comparison with known encoding methods for categories or strings, notably one-hot encoding and bag of character n-grams. We draw practical recommendations for encoding dirty categories: 3-gram similarity appears to be a good choice to capture morphological resemblance. For very high-cardinality, dimensionality reduction significantly reduces the computational cost with little loss in performance: random projections or choosing a subset of prototype categories still outperforms classic encoding approaches.
Tasks	Dimensionality Reduction
Published	2018-06-04
URL	http://arxiv.org/abs/1806.00979v1
PDF	http://arxiv.org/pdf/1806.00979v1.pdf
PWC	https://paperswithcode.com/paper/similarity-encoding-for-learning-with-dirty
Repo	https://github.com/jorisvandenbossche/target-encoder-benchmarks
Framework	none

Disambiguating Music Artists at Scale with Audio Metric Learning


Title	Disambiguating Music Artists at Scale with Audio Metric Learning
Authors	Jimena Royo-Letelier, Romain Hennequin, Viet-Anh Tran, Manuel Moussallam
Abstract	We address the problem of disambiguating large scale catalogs through the definition of an unknown artist clustering task. We explore the use of metric learning techniques to learn artist embeddings directly from audio, and using a dedicated homonym artists dataset, we compare our method with a recent approach that learn similar embeddings using artist classifiers. While both systems have the ability to disambiguate unknown artists relying exclusively on audio, we show that our system is more suitable in the case when enough audio data is available for each artist in the train dataset. We also propose a new negative sampling method for metric learning that takes advantage of side information such as music genre during the learning phase and shows promising results for the artist clustering task.
Tasks	Metric Learning
Published	2018-10-03
URL	http://arxiv.org/abs/1810.01807v1
PDF	http://arxiv.org/pdf/1810.01807v1.pdf
PWC	https://paperswithcode.com/paper/disambiguating-music-artists-at-scale-with
Repo	https://github.com/deezer/Disambiguating-Music-Artists-at-Scale-with-Audio-Metric-Learning
Framework	none

Perturbative Neural Networks


Title	Perturbative Neural Networks
Authors	Felix Juefei-Xu, Vishnu Naresh Boddeti, Marios Savvides
Abstract	Convolutional neural networks are witnessing wide adoption in computer vision systems with numerous applications across a range of visual recognition tasks. Much of this progress is fueled through advances in convolutional neural network architectures and learning algorithms even as the basic premise of a convolutional layer has remained unchanged. In this paper, we seek to revisit the convolutional layer that has been the workhorse of state-of-the-art visual recognition models. We introduce a very simple, yet effective, module called a perturbation layer as an alternative to a convolutional layer. The perturbation layer does away with convolution in the traditional sense and instead computes its response as a weighted linear combination of non-linearly activated additive noise perturbed inputs. We demonstrate both analytically and empirically that this perturbation layer can be an effective replacement for a standard convolutional layer. Empirically, deep neural networks with perturbation layers, called Perturbative Neural Networks (PNNs), in lieu of convolutional layers perform comparably with standard CNNs on a range of visual datasets (MNIST, CIFAR-10, PASCAL VOC, and ImageNet) with fewer parameters.
Tasks
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01817v1
PDF	http://arxiv.org/pdf/1806.01817v1.pdf
PWC	https://paperswithcode.com/paper/perturbative-neural-networks
Repo	https://github.com/juefeix/pnn.pytorch
Framework	pytorch


Title	FMCode: A 3D In-the-Air Finger Motion Based User Login Framework for Gesture Interface
Authors	Duo Lu, Dijiang Huang
Abstract	Applications using gesture-based human-computer interface require a new user login method with gestures because it does not have a traditional input method to type a password. However, due to various challenges, existing gesture-based authentication systems are generally considered too weak to be useful in practice. In this paper, we propose a unified user login framework using 3D in-air-handwriting, called FMCode. We define new types of features critical to distinguish legitimate users from attackers and utilize Support Vector Machine (SVM) for user authentication. The features and data-driven models are specially designed to accommodate minor behavior variations that existing gesture authentication methods neglect. In addition, we use deep neural network approaches to efficiently identify the user based on his or her in-air-handwriting, which avoids expansive account database search methods employed by existing work. On a dataset collected by us with over 100 users, our prototype system achieves 0.1% and 0.5% best Equal Error Rate (EER) for user authentication, as well as 96.7% and 94.3% accuracy for user identification, using two types of gesture input devices. Compared to existing behavioral biometric systems using gesture and in-air-handwriting, our framework achieves the state-of-the-art performance. In addition, our experimental results show that FMCode is capable to defend against client-side spoofing attacks, and it performs persistently in the long run. These results and discoveries pave the way to practical usage of gesture-based user login over the gesture interface.
Tasks
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00130v1
PDF	http://arxiv.org/pdf/1808.00130v1.pdf
PWC	https://paperswithcode.com/paper/fmcode-a-3d-in-the-air-finger-motion-based
Repo	https://github.com/duolu/fmkit
Framework	none

PerSIM: Multi-resolution Image Quality Assessment in the Perceptually Uniform Color Domain


Title	PerSIM: Multi-resolution Image Quality Assessment in the Perceptually Uniform Color Domain
Authors	Dogancan Temel, Ghassan AlRegib
Abstract	An average observer perceives the world in color instead of black and white. Moreover, the visual system focuses on structures and segments instead of individual pixels. Based on these observations, we propose a full reference objective image quality metric modeling visual system characteristics and chroma similarity in the perceptually uniform color domain (Lab). Laplacian of Gaussian features are obtained in the L channel to model the retinal ganglion cells in human visual system and color similarity is calculated over the a and b channels. In the proposed perceptual similarity index (PerSIM), a multi-resolution approach is followed to mimic the hierarchical nature of human visual system. LIVE and TID2013 databases are used in the validation and PerSIM outperforms all the compared metrics in the overall databases in terms of ranking, monotonic behavior and linearity.
Tasks	Image Quality Assessment
Published	2018-11-18
URL	http://arxiv.org/abs/1811.07417v1
PDF	http://arxiv.org/pdf/1811.07417v1.pdf
PWC	https://paperswithcode.com/paper/persim-multi-resolution-image-quality
Repo	https://github.com/olivesgatech/PerSIM
Framework	none

Non-Local Video Denoising by CNN


Title	Non-Local Video Denoising by CNN
Authors	Axel Davy, Thibaud Ehret, Jean-Michel Morel, Pablo Arias, Gabriele Facciolo
Abstract	Non-local patch based methods were until recently state-of-the-art for image denoising but are now outperformed by CNNs. Yet they are still the state-of-the-art for video denoising, as video redundancy is a key factor to attain high denoising performance. The problem is that CNN architectures are hardly compatible with the search for self-similarities. In this work we propose a new and efficient way to feed video self-similarities to a CNN. The non-locality is incorporated into the network via a first non-trainable layer which finds for each patch in the input image its most similar patches in a search region. The central values of these patches are then gathered in a feature vector which is assigned to each image pixel. This information is presented to a CNN which is trained to predict the clean image. We apply the proposed architecture to image and video denoising. For the latter patches are searched for in a 3D spatio-temporal volume. The proposed architecture achieves state-of-the-art results. To the best of our knowledge, this is the first successful application of a CNN to video denoising.
Tasks	Denoising, Image Denoising, Video Denoising
Published	2018-11-30
URL	https://arxiv.org/abs/1811.12758v2
PDF	https://arxiv.org/pdf/1811.12758v2.pdf
PWC	https://paperswithcode.com/paper/non-local-video-denoising-by-cnn
Repo	https://github.com/axeldavy/vnlnet
Framework	pytorch

Kalman Filter-based Heuristic Ensemble (KFHE): A new perspective on multi-class ensemble classification using Kalman filters


Title	Kalman Filter-based Heuristic Ensemble (KFHE): A new perspective on multi-class ensemble classification using Kalman filters
Authors	Arjun Pakrashi, Brian Mac Namee
Abstract	This paper introduces a new perspective on multi-class ensemble classification that considers training an ensemble as a state estimation problem. The new perspective considers the final ensemble classifier model as a static state, which can be estimated using a Kalman filter that combines noisy estimates made by individual classifier models. A new algorithm based on this perspective, the Kalman Filter-based Heuristic Ensemble (KFHE), is also presented in this paper which shows the practical applicability of the new perspective. Experiments performed on 30 datasets compare KFHE with state-of-the-art multi-class ensemble classification algorithms and show the potential and effectiveness of the new perspective and algorithm. Existing ensemble approaches trade off classification accuracy against robustness to class label noise, but KFHE is shown to be significantly better or at least as good as the state-of-the-art algorithms for datasets both with and without class label noise.
Tasks
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11429v3
PDF	http://arxiv.org/pdf/1807.11429v3.pdf
PWC	https://paperswithcode.com/paper/kalman-filter-based-heuristic-ensemble-kfhe-a
Repo	https://github.com/phoxis/kfhe
Framework	none

Design Challenges in Named Entity Transliteration


Title	Design Challenges in Named Entity Transliteration
Authors	Yuval Merhav, Stephen Ash
Abstract	We analyze some of the fundamental design challenges that impact the development of a multilingual state-of-the-art named entity transliteration system, including curating bilingual named entity datasets and evaluation of multiple transliteration methods. We empirically evaluate the transliteration task using traditional weighted finite state transducer (WFST) approach against two neural approaches: the encoder-decoder recurrent neural network method and the recent, non-sequential Transformer method. In order to improve availability of bilingual named entity transliteration datasets, we release personal name bilingual dictionaries minded from Wikidata for English to Russian, Hebrew, Arabic and Japanese Katakana. Our code and dictionaries are publicly available.
Tasks	Transliteration
Published	2018-08-07
URL	http://arxiv.org/abs/1808.02563v1
PDF	http://arxiv.org/pdf/1808.02563v1.pdf
PWC	https://paperswithcode.com/paper/design-challenges-in-named-entity
Repo	https://github.com/deepchar/entities
Framework	none

Learning Multimodal Representations for Unseen Activities


Title	Learning Multimodal Representations for Unseen Activities
Authors	AJ Piergiovanni, Michael S. Ryoo
Abstract	We present a method to learn a joint multimodal representation space that enables recognition of unseen activities in videos. We first compare the effect of placing various constraints on the embedding space using paired text and video data. We also propose a method to improve the joint embedding space using an adversarial formulation, allowing it to benefit from unpaired text and video data. By using unpaired text data, we show the ability to learn a representation that better captures unseen activities. In addition to testing on publicly available datasets, we introduce a new, large-scale text/video dataset. We experimentally confirm that using paired and unpaired data to learn a shared embedding space benefits three difficult tasks (i) zero-shot activity classification, (ii) unsupervised activity discovery, and (iii) unseen activity captioning, outperforming the state-of-the-arts.
Tasks	Temporal Action Localization
Published	2018-06-21
URL	https://arxiv.org/abs/1806.08251v3
PDF	https://arxiv.org/pdf/1806.08251v3.pdf
PWC	https://paperswithcode.com/paper/unseen-action-recognition-with-multimodal
Repo	https://github.com/piergiaj/mlb-youtube
Framework	pytorch

A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines


Title	A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines
Authors	David Charte, Francisco Charte, Salvador García, María J. del Jesus, Francisco Herrera
Abstract	Many of the existing machine learning algorithms, both supervised and unsupervised, depend on the quality of the input characteristics to generate a good model. The amount of these variables is also important, since performance tends to decline as the input dimensionality increases, hence the interest in using feature fusion techniques, able to produce feature sets that are more compact and higher level. A plethora of procedures to fuse original variables for producing new ones has been developed in the past decades. The most basic ones use linear combinations of the original variables, such as PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis), while others find manifold embeddings of lower dimensionality based on non-linear combinations, such as Isomap or LLE (Linear Locally Embedding) techniques. More recently, autoencoders (AEs) have emerged as an alternative to manifold learning for conducting nonlinear feature fusion. Dozens of AE models have been proposed lately, each with its own specific traits. Although many of them can be used to generate reduced feature sets through the fusion of the original ones, there also AEs designed with other applications in mind. The goal of this paper is to provide the reader with a broad view of what an AE is, how they are used for feature fusion, a taxonomy gathering a broad range of models, and how they relate to other classical techniques. In addition, a set of didactic guidelines on how to choose the proper AE for a given task is supplied, together with a discussion of the software tools available. Finally, two case studies illustrate the usage of AEs with datasets of handwritten digits and breast cancer.
Tasks
Published	2018-01-04
URL	http://arxiv.org/abs/1801.01586v1
PDF	http://arxiv.org/pdf/1801.01586v1.pdf
PWC	https://paperswithcode.com/paper/a-practical-tutorial-on-autoencoders-for
Repo	https://github.com/fdavidcl/ae-review-resources
Framework	tf

Probabilistic Trajectory Segmentation by Means of Hierarchical Dirichlet Process Switching Linear Dynamical Systems


Title	Probabilistic Trajectory Segmentation by Means of Hierarchical Dirichlet Process Switching Linear Dynamical Systems
Authors	Maximilian Sieb, Matthias Schultheis, Sebastian Szelag, Rudolf Lioutikov, Jan Peters
Abstract	Using movement primitive libraries is an effective means to enable robots to solve more complex tasks. In order to build these movement libraries, current algorithms require a prior segmentation of the demonstration trajectories. A promising approach is to model the trajectory as being generated by a set of Switching Linear Dynamical Systems and inferring a meaningful segmentation by inspecting the transition points characterized by the switching dynamics. With respect to the learning, a nonparametric Bayesian approach is employed utilizing a Gibbs sampler.
Tasks
Published	2018-05-29
URL	https://arxiv.org/abs/1806.06063v3
PDF	https://arxiv.org/pdf/1806.06063v3.pdf
PWC	https://paperswithcode.com/paper/probabilistic-trajectory-segmentation-by
Repo	https://github.com/AutuanLiu/Kalman-Filter
Framework	pytorch

Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only


Title	Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only
Authors	Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić
Abstract	We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all. The framework leverages shared cross-lingual word embedding spaces in which terms, queries, and documents can be represented, irrespective of their actual language. The shared embedding spaces are induced solely on the basis of monolingual corpora in two languages through an iterative process based on adversarial neural networks. Our experiments on the standard CLEF CLIR collections for three language pairs of varying degrees of language similarity (English-Dutch/Italian/Finnish) demonstrate the usefulness of the proposed fully unsupervised approach. Our CLIR models with unsupervised cross-lingual embeddings outperform baselines that utilize cross-lingual embeddings induced relying on word-level and document-level alignments. We then demonstrate that further improvements can be achieved by unsupervised ensemble CLIR models. We believe that the proposed framework is the first step towards development of effective CLIR models for language pairs and domains where parallel data are scarce or non-existent.
Tasks	Information Retrieval
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00879v1
PDF	http://arxiv.org/pdf/1805.00879v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-cross-lingual-information
Repo	https://github.com/rlitschk/UnsupCLIR
Framework	none