Paper Group ANR 966
Bayesian Zero-Shot Learning. LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation. Zero-shot Learning for Audio-based Music Classification and Tagging. Machine Learning Cryptanalysis of a Quantum Random Number Generator. Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects. Single Clas …
Bayesian Zero-Shot Learning
Title | Bayesian Zero-Shot Learning |
Authors | Sarkhan Badirli, Zeynep Akata, Murat Dundar |
Abstract | Object classes that surround us have a natural tendency to emerge at varying levels of abstraction. We propose a Bayesian approach to zero-shot learning (ZSL) that introduces the notion of meta-classes and implements a Bayesian hierarchy around these classes to effectively blend data likelihood with local and global priors. Local priors driven by data from seen classes, i.e. classes that are available at training time, become instrumental in recovering unseen classes, i.e. classes that are missing at training time, in a generalized ZSL setting. Hyperparameters of the Bayesian model offer a convenient way to optimize the trade-off between seen and unseen class accuracy in addition to guiding other aspects of model fitting. We conduct experiments on seven benchmark datasets including the large scale ImageNet and show that our model improves the current state of the art in the challenging generalized ZSL setting. |
Tasks | Zero-Shot Learning |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09624v2 |
https://arxiv.org/pdf/1907.09624v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-zero-shot-learning |
Repo | |
Framework | |
LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation
Title | LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation |
Authors | Keunhong Park, Arsalan Mousavian, Yu Xiang, Dieter Fox |
Abstract | Current 6D object pose estimation methods usually require a 3D model for each object. These methods also require additional training in order to incorporate new objects. As a result, they are difficult to scale to a large number of objects and cannot be directly applied to unseen objects. In this work, we propose a novel framework for 6D pose estimation of unseen objects. We design an end-to-end neural network that reconstructs a latent 3D representation of an object using a small number of reference views of the object. Using the learned 3D representation, the network is able to render the object from arbitrary views. Using this neural renderer, we directly optimize for pose given an input image. By training our network with a large number of 3D shapes for reconstruction and rendering, our network generalizes well to unseen objects. We present a new dataset for unseen object pose estimation–MOPED. We evaluate the performance of our method for unseen object pose estimation on MOPED as well as the ModelNet dataset. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00416v2 |
https://arxiv.org/pdf/1912.00416v2.pdf | |
PWC | https://paperswithcode.com/paper/latentfusion-end-to-end-differentiable |
Repo | |
Framework | |
Zero-shot Learning for Audio-based Music Classification and Tagging
Title | Zero-shot Learning for Audio-based Music Classification and Tagging |
Authors | Jeong Choi, Jongpil Lee, Jiyoung Park, Juhan Nam |
Abstract | Audio-based music classification and tagging is typically based on categorical supervised learning with a fixed set of labels. This intrinsically cannot handle unseen labels such as newly added music genres or semantic words that users arbitrarily choose for music retrieval. Zero-shot learning can address this problem by leveraging an additional semantic space of labels where side information about the labels is used to unveil the relationship between each other. In this work, we investigate the zero-shot learning in the music domain and organize two different setups of side information. One is using human-labeled attribute information based on Free Music Archive and OpenMIC-2018 datasets. The other is using general word semantic information based on Million Song Dataset and Last.fm tag annotations. Considering a music track is usually multi-labeled in music classification and tagging datasets, we also propose a data split scheme and associated evaluation settings for the multi-label zero-shot learning. Finally, we report experimental results and discuss the effectiveness and new possibilities of zero-shot learning in the music domain. |
Tasks | Music Classification, Zero-Shot Learning |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.02670v2 |
https://arxiv.org/pdf/1907.02670v2.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-learning-for-audio-based-music |
Repo | |
Framework | |
Machine Learning Cryptanalysis of a Quantum Random Number Generator
Title | Machine Learning Cryptanalysis of a Quantum Random Number Generator |
Authors | Nhan Duy Truong, Jing Yan Haw, Syed Muhamad Assad, Ping Koy Lam, Omid Kavehei |
Abstract | Random number generators (RNGs) that are crucial for cryptographic applications have been the subject of adversarial attacks. These attacks exploit environmental information to predict generated random numbers that are supposed to be truly random and unpredictable. Though quantum random number generators (QRNGs) are based on the intrinsic indeterministic nature of quantum properties, the presence of classical noise in the measurement process compromises the integrity of a QRNG. In this paper, we develop a predictive machine learning (ML) analysis to investigate the impact of deterministic classical noise in different stages of an optical continuous variable QRNG. Our ML model successfully detects inherent correlations when the deterministic noise sources are prominent. After appropriate filtering and randomness extraction processes are introduced, our QRNG system, in turn, demonstrates its robustness against ML. We further demonstrate the robustness of our ML approach by applying it to uniformly distributed random numbers from the QRNG and a congruential RNG. Hence, our result shows that ML has potentials in benchmarking the quality of RNG devices. |
Tasks | Cryptanalysis |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02342v2 |
https://arxiv.org/pdf/1905.02342v2.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-cryptanalysis-of-a-quantum |
Repo | |
Framework | |
Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects
Title | Sub-frame Appearance and 6D Pose Estimation of Fast Moving Objects |
Authors | Denys Rozumnyi, Jan Kotera, Filip Sroubek, Jiri Matas |
Abstract | We propose a novel method that tracks fast moving objects, mainly non-uniform spherical, in full 6 degrees of freedom, estimating simultaneously their 3D motion trajectory, 3D pose and object appearance changes with a time step that is a fraction of the video frame exposure time. The sub-frame object localization and appearance estimation allows realistic temporal super-resolution and precise shape estimation. The method, called TbD-3D (Tracking by Deblatting in 3D) relies on a novel reconstruction algorithm which solves a piece-wise deblurring and matting problem. The 3D rotation is estimated by minimizing the reprojection error. As a second contribution, we present a new challenging dataset with fast moving objects that change their appearance and distance to the camera. High speed camera recordings with zero lag between frame exposures were used to generate videos with different frame rates annotated with ground-truth trajectory and pose. |
Tasks | 6D Pose Estimation, Deblurring, Object Localization, Pose Estimation, Super-Resolution |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10927v1 |
https://arxiv.org/pdf/1911.10927v1.pdf | |
PWC | https://paperswithcode.com/paper/sub-frame-appearance-and-6d-pose-estimation |
Repo | |
Framework | |
Single Class Universum-SVM
Title | Single Class Universum-SVM |
Authors | Sauptik Dhar, Vladimir Cherkassky |
Abstract | This paper extends the idea of Universum learning [1, 2] to single-class learning problems. We propose Single Class Universum-SVM setting that incorporates a priori knowledge (in the form of additional data samples) into the single class estimation problem. These additional data samples or Universum belong to the same application domain as (positive) data samples from a single class (of interest), but they follow a different distribution. Proposed methodology for single class U-SVM is based on the known connection between binary classification and single class learning formulations [3]. Several empirical comparisons are presented to illustrate the utility of the proposed approach. |
Tasks | |
Published | 2019-09-21 |
URL | https://arxiv.org/abs/1909.09862v1 |
https://arxiv.org/pdf/1909.09862v1.pdf | |
PWC | https://paperswithcode.com/paper/190909862 |
Repo | |
Framework | |
Classifying topological sector via machine learning
Title | Classifying topological sector via machine learning |
Authors | Masakiyo Kitazawa, Takuya Matsumoto, Yasuhiro Kohno |
Abstract | We employ a machine learning technique for an estimate of the topological charge $Q$ of gauge configurations in SU(3) Yang-Mills theory in vacuum. As a first trial, we feed the four-dimensional topological charge density with and without smoothing into the convolutional neural network and train it to estimate the value of $Q$. We find that the trained neural network can estimate the value of $Q$ from the topological charge density at small flow time with high accuracy. Next, we perform the dimensional reduction of the input data as a preprocessing and analyze lower dimensional data by the neural network. We find that the accuracy of the neural network does not have statistically-significant dependence on the dimension of the input data. From this result we argue that the neural network does not find characteristic features responsible for the determination of $Q$ in the higher dimensional space. |
Tasks | |
Published | 2019-12-28 |
URL | https://arxiv.org/abs/1912.12410v1 |
https://arxiv.org/pdf/1912.12410v1.pdf | |
PWC | https://paperswithcode.com/paper/classifying-topological-sector-via-machine |
Repo | |
Framework | |
Generalized Data Augmentation for Low-Resource Translation
Title | Generalized Data Augmentation for Low-Resource Translation |
Authors | Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, Graham Neubig |
Abstract | Translation to or from low-resource languages LRLs poses challenges for machine translation in terms of both adequacy and fluency. Data augmentation utilizing large amounts of monolingual data is regarded as an effective way to alleviate these problems. In this paper, we propose a general framework for data augmentation in low-resource machine translation that not only uses target-side monolingual data, but also pivots through a related high-resource language HRL. Specifically, we experiment with a two-step pivoting method to convert high-resource data to the LRL, making use of available resources to better approximate the true data distribution of the LRL. First, we inject LRL words into HRL sentences through an induced bilingual dictionary. Second, we further edit these modified sentences using a modified unsupervised machine translation framework. Extensive experiments on four low-resource datasets show that under extreme low-resource settings, our data augmentation techniques improve translation quality by up to~1.5 to~8 BLEU points compared to supervised back-translation baselines |
Tasks | Data Augmentation, Machine Translation, Unsupervised Machine Translation |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03785v1 |
https://arxiv.org/pdf/1906.03785v1.pdf | |
PWC | https://paperswithcode.com/paper/generalized-data-augmentation-for-low |
Repo | |
Framework | |
A survey of advances in vision-based vehicle re-identification
Title | A survey of advances in vision-based vehicle re-identification |
Authors | Sultan Daud Khan, Habib Ullah |
Abstract | Vehicle re-identification (V-reID) has become significantly popular in the community due to its applications and research significance. In particular, the V-reID is an important problem that still faces numerous open challenges. This paper reviews different V-reID methods including sensor based methods, hybrid methods, and vision based methods which are further categorized into hand-crafted feature based methods and deep feature based methods. The vision based methods make the V-reID problem particularly interesting, and our review systematically addresses and evaluates these methods for the first time. We conduct experiments on four comprehensive benchmark datasets and compare the performances of recent hand-crafted feature based methods and deep feature based methods. We present the detail analysis of these methods in terms of mean average precision (mAP) and cumulative matching curve (CMC). These analyses provide objective insight into the strengths and weaknesses of these methods. We also provide the details of different V-reID datasets and critically discuss the challenges and future trends of V-reID methods. |
Tasks | Vehicle Re-Identification |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13258v1 |
https://arxiv.org/pdf/1905.13258v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-advances-in-vision-based-vehicle |
Repo | |
Framework | |
Addressing database variability in learning from medical data: an ensemble-based approach using convolutional neural networks and a case of study applied to automatic sleep scoring
Title | Addressing database variability in learning from medical data: an ensemble-based approach using convolutional neural networks and a case of study applied to automatic sleep scoring |
Authors | Diego Alvarez-Estevez, Isaac Fernández-Varela |
Abstract | In this work we examine some of the problems associated with the development of machine learning models with the objective to achieve robust generalization capabilities on common-task multiple-database scenarios. Referred to as the “database variability problem”, we focus on a specific medical domain (sleep staging in sleep medicine) to show the non-triviality of translating the estimated model’s local generalization capabilities into independent external databases. We analyze some of the scalability problems when multiple-database data are used as inputs to train a single learning model. Then, we introduce a novel approach based on an ensemble of local models, and we show its advantages in terms of inter-database generalization performance and data scalability. In addition, we analyze different model configurations and data pre-processing techniques to determine their effects on the overall generalization performance. For this purpose, we carry out experimentation that involves several sleep databases and evaluates different machine learning models based on convolutional neural networks |
Tasks | |
Published | 2019-06-16 |
URL | https://arxiv.org/abs/1906.06666v3 |
https://arxiv.org/pdf/1906.06666v3.pdf | |
PWC | https://paperswithcode.com/paper/dealing-with-the-database-variability-problem |
Repo | |
Framework | |
A Genetic Algorithm based Kernel-size Selection Approach for a Multi-column Convolutional Neural Network
Title | A Genetic Algorithm based Kernel-size Selection Approach for a Multi-column Convolutional Neural Network |
Authors | Animesh Singh, Sandip Saha, Ritesh Sarkhel, Mahantapas Kundu, Mita Nasipuri, Nibaran Das |
Abstract | Deep neural network-based architectures give promising results in various domains including pattern recognition. Finding the optimal combination of the hyper-parameters of such a large-sized architecture is tedious and requires a large number of laboratory experiments. But, identifying the optimal combination of a hyper-parameter or appropriate kernel size for a given architecture of deep learning is always a challenging and tedious task. Here, we introduced a genetic algorithm-based technique to reduce the efforts of finding the optimal combination of a hyper-parameter (kernel size) of a convolutional neural network-based architecture. The method is evaluated on three popular datasets of different handwritten Bangla characters and digits. The implementation of the proposed methodology can be found in the following link: https://github.com/DeepQn/GA-Based-Kernel-Size. |
Tasks | |
Published | 2019-12-28 |
URL | https://arxiv.org/abs/1912.12405v2 |
https://arxiv.org/pdf/1912.12405v2.pdf | |
PWC | https://paperswithcode.com/paper/a-genetic-algorithm-based-kernel-size |
Repo | |
Framework | |
Analysis of Baseline Evolutionary Algorithms for the Packing While Travelling Problem
Title | Analysis of Baseline Evolutionary Algorithms for the Packing While Travelling Problem |
Authors | Vahid Roostapour, Mojgan Pourhassan, Frank Neumann |
Abstract | The performance of base-line Evolutionary Algorithms (EAs) on combinatorial problems has been studied rigorously. From the theoretical viewpoint, the literature extensively investigates the linear problems, while the theoretical analysis of the non-linear problems is still far behind. In this paper, variations of the Packing While Travelling (PWT) – also known as the non-linear knapsack problem – are studied as an attempt to analyse the behaviour of EAs on non-linear problems from theoretical perspective. We investigate PWT for two cities and $n$ items with correlated weights and profits, using single-objective and multi-objective algorithms. Our results show that RLS_swap, which differs from the classical RLS by having the ability to swap two bits in one iteration, finds the optimal solution in $O(n^3)$ expected time. We also study an enhanced version of GSEMO, which a specific selection operator to deal with exponential population size, and prove that it finds the Pareto front in the same asymptotic expected time. In the case of uniform weights, (1+1)~EA is able to find the optimal solution in expected time $O(n^2\log{(\max{n,p_{\max}})})$, where $p_{\max}$ is the largest profit of the given items. We also perform an experimental analysis to complement our theoretical investigations and provide additional insights into the runtime behavior. |
Tasks | |
Published | 2019-02-13 |
URL | https://arxiv.org/abs/1902.04692v2 |
https://arxiv.org/pdf/1902.04692v2.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-baseline-evolutionary-algorithms |
Repo | |
Framework | |
The Ex-Ante View of Recommender System Design
Title | The Ex-Ante View of Recommender System Design |
Authors | Guy Aridor, Duarte Goncalves, Shan Sikdar |
Abstract | Recommender systems (RS) are traditionally deployed in environments where users are uncertain about their preferences and thus face a problem of choice under uncertainty, but most popular design approaches ignore this fact. We argue that predicting and modeling consumer choice in these contexts can improve the usefulness of RS and reframe the RS problem as providing useful information to help reduce user uncertainty as opposed to simply predicting user preferences. Using a theoretical model, we show how this insight can be utilized to design RS that mitigate negative consequences such as filter bubble and user-homogenization effects as well as to better understand the role that RS play in contributing to these phenomena. |
Tasks | Recommendation Systems |
Published | 2019-04-23 |
URL | https://arxiv.org/abs/1904.10527v2 |
https://arxiv.org/pdf/1904.10527v2.pdf | |
PWC | https://paperswithcode.com/paper/the-ex-ante-view-of-recommender-system-design |
Repo | |
Framework | |
Neural Puppet: Generative Layered Cartoon Characters
Title | Neural Puppet: Generative Layered Cartoon Characters |
Authors | Omid Poursaeed, Vladimir G. Kim, Eli Shechtman, Jun Saito, Serge Belongie |
Abstract | We propose a learning based method for generating new animations of a cartoon character given a few example images. Our method is designed to learn from a traditionally animated sequence, where each frame is drawn by an artist, and thus the input images lack any common structure, correspondences, or labels. We express pose changes as a deformation of a layered 2.5D template mesh, and devise a novel architecture that learns to predict mesh deformations matching the template to a target image. This enables us to extract a common low-dimensional structure from a diverse set of character poses. We combine recent advances in differentiable rendering as well as mesh-aware models to successfully align common template even if only a few character images are available during training. In addition to coarse poses, character appearance also varies due to shading, out-of-plane motions, and artistic effects. We capture these subtle changes by applying an image translation network to refine the mesh rendering, providing an end-to-end model to generate new animations of a character with high visual quality. We demonstrate that our generative model can be used to synthesize in-between frames and to create data-driven deformation. Our template fitting procedure outperforms state-of-the-art generic techniques for detecting image correspondences. |
Tasks | |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.02060v2 |
https://arxiv.org/pdf/1910.02060v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-puppet-generative-layered-cartoon |
Repo | |
Framework | |
All-in-One Image-Grounded Conversational Agents
Title | All-in-One Image-Grounded Conversational Agents |
Authors | Da Ju, Kurt Shuster, Y-Lan Boureau, Jason Weston |
Abstract | As single-task accuracy on individual language and image tasks has improved substantially in the last few years, the long-term goal of a generally skilled agent that can both see and talk becomes more feasible to explore. In this work, we focus on leveraging individual language and image tasks, along with resources that incorporate both vision and language towards that objective. We design an architecture that combines state-of-the-art Transformer and ResNeXt modules fed into a novel attentive multimodal module to produce a combined model trained on many tasks. We provide a thorough analysis of the components of the model, and transfer performance when training on one, some, or all of the tasks. Our final models provide a single system that obtains good results on all vision and language tasks considered, and improves the state-of-the-art in image-grounded conversational applications. |
Tasks | |
Published | 2019-12-28 |
URL | https://arxiv.org/abs/1912.12394v2 |
https://arxiv.org/pdf/1912.12394v2.pdf | |
PWC | https://paperswithcode.com/paper/all-in-one-image-grounded-conversational |
Repo | |
Framework | |