Paper Group ANR 1446
Learning Weighted Submanifolds with Variational Autoencoders and Riemannian Variational Autoencoders. The VIA Annotation Software for Images, Audio and Video. Referring Expression Generation Using Entity Profiles. Leveraging Pretrained Image Classifiers for Language-Based Segmentation. Differentiable Representations For Multihop Inference Rules. Ef …
Learning Weighted Submanifolds with Variational Autoencoders and Riemannian Variational Autoencoders
Title | Learning Weighted Submanifolds with Variational Autoencoders and Riemannian Variational Autoencoders |
Authors | Nina Miolane, Susan Holmes |
Abstract | Manifold-valued data naturally arises in medical imaging. In cognitive neuroscience, for instance, brain connectomes base the analysis of coactivation patterns between different brain regions on the analysis of the correlations of their functional Magnetic Resonance Imaging (fMRI) time series - an object thus constrained by construction to belong to the manifold of symmetric positive definite matrices. One of the challenges that naturally arises consists of finding a lower-dimensional subspace for representing such manifold-valued data. Traditional techniques, like principal component analysis, are ill-adapted to tackle non-Euclidean spaces and may fail to achieve a lower-dimensional representation of the data - thus potentially pointing to the absence of lower-dimensional representation of the data. However, these techniques are restricted in that: (i) they do not leverage the assumption that the connectomes belong on a pre-specified manifold, therefore discarding information; (ii) they can only fit a linear subspace to the data. In this paper, we are interested in variants to learn potentially highly curved submanifolds of manifold-valued data. Motivated by the brain connectomes example, we investigate a latent variable generative model, which has the added benefit of providing us with uncertainty estimates - a crucial quantity in the medical applications we are considering. While latent variable models have been proposed to learn linear and nonlinear spaces for Euclidean data, or geodesic subspaces for manifold data, no intrinsic latent variable model exists to learn nongeodesic subspaces for manifold data. This paper fills this gap and formulates a Riemannian variational autoencoder with an intrinsic generative model of manifold-valued data. We evaluate its performances on synthetic and real datasets by introducing the formalism of weighted Riemannian submanifolds. |
Tasks | Latent Variable Models, Time Series |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08147v1 |
https://arxiv.org/pdf/1911.08147v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-weighted-submanifolds-with |
Repo | |
Framework | |
The VIA Annotation Software for Images, Audio and Video
Title | The VIA Annotation Software for Images, Audio and Video |
Authors | Abhishek Dutta, Andrew Zisserman |
Abstract | In this paper, we introduce a simple and standalone manual annotation tool for images, audio and video: the VGG Image Annotator (VIA). This is a light weight, standalone and offline software package that does not require any installation or setup and runs solely in a web browser. The VIA software allows human annotators to define and describe spatial regions in images or video frames, and temporal segments in audio or video. These manual annotations can be exported to plain text data formats such as JSON and CSV and therefore are amenable to further processing by other software tools. VIA also supports collaborative annotation of a large dataset by a group of human annotators. The BSD open source license of this software allows it to be used in any academic project or commercial application. |
Tasks | |
Published | 2019-04-24 |
URL | https://arxiv.org/abs/1904.10699v3 |
https://arxiv.org/pdf/1904.10699v3.pdf | |
PWC | https://paperswithcode.com/paper/the-vgg-image-annotator-via |
Repo | |
Framework | |
Referring Expression Generation Using Entity Profiles
Title | Referring Expression Generation Using Entity Profiles |
Authors | Meng Cao, Jackie Chi Kit Cheung |
Abstract | Referring Expression Generation (REG) is the task of generating contextually appropriate references to entities. A limitation of existing REG systems is that they rely on entity-specific supervised training, which means that they cannot handle entities not seen during training. In this study, we address this in two ways. First, we propose task setups in which we specifically test a REG system’s ability to generalize to entities not seen during training. Second, we propose a profile-based deep neural network model, ProfileREG, which encodes both the local context and an external profile of the entity to generate reference realizations. Our model generates tokens by learning to choose between generating pronouns, generating from a fixed vocabulary, or copying a word from the profile. We evaluate our model on three different splits of the WebNLG dataset, and show that it outperforms competitive baselines in all settings according to automatic and human evaluations. |
Tasks | |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01528v1 |
https://arxiv.org/pdf/1909.01528v1.pdf | |
PWC | https://paperswithcode.com/paper/referring-expression-generation-using-entity |
Repo | |
Framework | |
Leveraging Pretrained Image Classifiers for Language-Based Segmentation
Title | Leveraging Pretrained Image Classifiers for Language-Based Segmentation |
Authors | David Golub, Ahmed El-Kishky, Roberto Martín-Martín |
Abstract | Current semantic segmentation models cannot easily generalize to new object classes unseen during train time: they require additional annotated images and retraining. We propose a novel segmentation model that injects visual priors into semantic segmentation architectures, allowing them to segment out new target labels without retraining. As visual priors, we use the activations of pretrained image classifiers, which provide noisy indications of the spatial location of both the target object and distractor objects in the scene. We leverage language semantics to obtain these activations for a target label unseen by the classifier. Further experiments show that the visual priors obtained via language semantics for both relevant and distracting objects are key to our performance. |
Tasks | Semantic Segmentation |
Published | 2019-11-03 |
URL | https://arxiv.org/abs/1911.00830v3 |
https://arxiv.org/pdf/1911.00830v3.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-pretrained-image-classifiers-for |
Repo | |
Framework | |
Differentiable Representations For Multihop Inference Rules
Title | Differentiable Representations For Multihop Inference Rules |
Authors | William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler |
Abstract | We present efficient differentiable implementations of second-order multi-hop reasoning using a large symbolic knowledge base (KB). We introduce a new operation which can be used to compositionally construct second-order multi-hop templates in a neural model, and evaluate a number of alternative implementations, with different time and memory trade offs. These techniques scale to KBs with millions of entities and tens of millions of triples, and lead to simple models with competitive performance on several learning tasks requiring multi-hop reasoning. |
Tasks | |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10417v1 |
https://arxiv.org/pdf/1905.10417v1.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-representations-for-multihop |
Repo | |
Framework | |
Efficient average-case population recovery in the presence of insertions and deletions
Title | Efficient average-case population recovery in the presence of insertions and deletions |
Authors | Frank Ban, Xi Chen, Rocco A. Servedio, Sandip Sinha |
Abstract | Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string $x\in{0,1}^n$ is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of $x$. The goal is to reconstruct the original string~$x$ from independent traces of $x$. While the best algorithms known for worst-case strings use $\exp(O(n^{1/3}))$ traces \cite{DOS17,NazarovPeres17}, highly efficient algorithms are known \cite{PZ17,HPP18} for the \emph{average-case} version, in which $x$ is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call \emph{average-case population recovery in the presence of insertions and deletions}. In this problem, there is an unknown distribution $\cal{D}$ over $s$ unknown source strings $x^1,\dots,x^s \in {0,1}^n$, and each sample is independently generated by drawing some $x^i$ from $\cal{D}$ and returning an independent trace of $x^i$. Building on \cite{PZ17} and \cite{HPP18}, we give an efficient algorithm for this problem. For any support size $s \leq \smash{\exp(\Theta(n^{1/3}))}$, for a $1-o(1)$ fraction of all $s$-element support sets ${x^1,\dots,x^s} \subset {0,1}^n$, for every distribution $\cal{D}$ supported on ${x^1,\dots,x^s}$, our algorithm efficiently recovers ${\cal D}$ up to total variation distance $\epsilon$ with high probability, given access to independent traces of independent draws from $\cal{D}$. The algorithm runs in time poly$(n,s,1/\epsilon)$ and its sample complexity is poly$(s,1/\epsilon,\exp(\log^{1/3}n)).$ This polynomial dependence on the support size $s$ is in sharp contrast with the \emph{worst-case} version (when $x^1,\dots,x^s$ may be any strings in ${0,1}^n$), in which the sample complexity of the most efficient known algorithm \cite{BCFSS19} is doubly exponential in $s$. |
Tasks | |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05964v1 |
https://arxiv.org/pdf/1907.05964v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-average-case-population-recovery-in |
Repo | |
Framework | |
Generalization of Reinforcement Learners with Working and Episodic Memory
Title | Generalization of Reinforcement Learners with Working and Episodic Memory |
Authors | Meire Fortunato, Melissa Tan, Ryan Faulkner, Steven Hansen, Adrià Puigdomènech Badia, Gavin Buttimore, Charlie Deck, Joel Z Leibo, Charles Blundell |
Abstract | Memory is an important aspect of intelligence and plays a role in many deep reinforcement learning models. However, little progress has been made in understanding when specific memory systems help more than others and how well they generalize. The field also has yet to see a prevalent consistent and rigorous approach for evaluating agent performance on holdout data. In this paper, we aim to develop a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that we suggest are relevant for evaluating memory-specific generalization. To that end, we first construct a diverse set of memory tasks that allow us to evaluate test-time generalization across multiple dimensions. Second, we develop and perform multiple ablations on an agent architecture that combines multiple memory systems, observe its baseline models, and investigate its performance against the task suite. |
Tasks | |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13406v2 |
https://arxiv.org/pdf/1910.13406v2.pdf | |
PWC | https://paperswithcode.com/paper/191013406 |
Repo | |
Framework | |
Comprehensive Personalized Ranking Using One-Bit Comparison Data
Title | Comprehensive Personalized Ranking Using One-Bit Comparison Data |
Authors | Aria Ameri, Arindam Bose, Mojtaba Soltanalian |
Abstract | The task of a personalization system is to recommend items or a set of items according to the users’ taste, and thus predicting their future needs. In this paper, we address such personalized recommendation problems for which one-bit comparison data of user preferences for different items as well as the different user inclinations toward an item are available. We devise a comprehensive personalized ranking (CPR) system by employing a Bayesian treatment. We also provide a connection to the learning method with respect to the CPR optimization criterion to learn the underlying low-rank structure of the rating matrix based on the well-established matrix factorization method. Numerical results are provided to verify the performance of our algorithm. |
Tasks | |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02408v1 |
https://arxiv.org/pdf/1906.02408v1.pdf | |
PWC | https://paperswithcode.com/paper/comprehensive-personalized-ranking-using-one |
Repo | |
Framework | |
My lips are concealed: Audio-visual speech enhancement through obstructions
Title | My lips are concealed: Audio-visual speech enhancement through obstructions |
Authors | Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman |
Abstract | Our objective is an audio-visual model for separating a single speaker from a mixture of sounds such as other speakers and background noise. Moreover, we wish to hear the speaker even when the visual cues are temporarily absent due to occlusion. To this end we introduce a deep audio-visual speech enhancement network that is able to separate a speaker’s voice by conditioning on both the speaker’s lip movements and/or a representation of their voice. The voice representation can be obtained by either (i) enrollment, or (ii) by self-enrollment – learning the representation on-the-fly given sufficient unobstructed visual input. The model is trained by blending audios, and by introducing artificial occlusions around the mouth region that prevent the visual modality from dominating. The method is speaker-independent, and we demonstrate it on real examples of speakers unheard (and unseen) during training. The method also improves over previous models in particular for cases of occlusion in the visual modality. |
Tasks | Speech Enhancement |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.04975v1 |
https://arxiv.org/pdf/1907.04975v1.pdf | |
PWC | https://paperswithcode.com/paper/my-lips-are-concealed-audio-visual-speech |
Repo | |
Framework | |
User Validation of Recommendation Serendipity Metrics
Title | User Validation of Recommendation Serendipity Metrics |
Authors | Li Chen, Ningxia Wang, Yonghua Yang, Keping Yang, Quan Yuan |
Abstract | Though it has been recognized that recommending serendipitous (i.e., surprising and relevant) items can be helpful for increasing users’ satisfaction and behavioral intention, how to measure serendipity in the offline environment is still an open issue. In recent years, a number of metrics have been proposed, but most of them were based on researchers’ assumptions due to the serendipity’s subjective nature. In order to validate these metrics’ actual performance, we collected over 10,000 users’ real feedback data and compared with the metrics’ results. It turns out the user profile based metrics, especially content-based ones, perform better than those based on item popularity, in terms of estimating the unexpectedness facet of recommendations. Moreover, the full metrics, which involve the unexpectedness component, relevance, timeliness, and user curiosity, can more accurately indicate the recommendation’s serendipity degree, relative to those that just involve some of them. The application of these metrics to several recommender algorithms further consolidates their practical usage, because the comparison results are consistent with those from user evaluation. Thus, this work is constructive for filling the gap between offline measurement and user study on recommendation serendipity. |
Tasks | |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11431v1 |
https://arxiv.org/pdf/1906.11431v1.pdf | |
PWC | https://paperswithcode.com/paper/user-validation-of-recommendation-serendipity |
Repo | |
Framework | |
Constructing Energy-efficient Mixed-precision Neural Networks through Principal Component Analysis for Edge Intelligence
Title | Constructing Energy-efficient Mixed-precision Neural Networks through Principal Component Analysis for Edge Intelligence |
Authors | Indranil Chakraborty, Deboleena Roy, Isha Garg, Aayush Ankit, Kaushik Roy |
Abstract | The `Internet of Things’ has brought increased demand for AI-based edge computing in applications ranging from healthcare monitoring systems to autonomous vehicles. Quantization is a powerful tool to address the growing computational cost of such applications, and yields significant compression over full-precision networks. However, quantization can result in substantial loss of performance for complex image classification tasks. To address this, we propose a Principal Component Analysis (PCA) driven methodology to identify the important layers of a binary network, and design mixed-precision networks. The proposed Hybrid-Net achieves a more than 10% improvement in classification accuracy over binary networks such as XNOR-Net for ResNet and VGG architectures on CIFAR-100 and ImageNet datasets while still achieving up to 94% of the energy-efficiency of XNOR-Nets. This work furthers the feasibility of using highly compressed neural networks for energy-efficient neural computing in edge devices. | |
Tasks | Autonomous Vehicles, Dimensionality Reduction, Image Classification, Quantization |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01493v2 |
https://arxiv.org/pdf/1906.01493v2.pdf | |
PWC | https://paperswithcode.com/paper/pca-driven-hybrid-network-design-for-enabling |
Repo | |
Framework | |
MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement
Title | MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement |
Authors | Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin |
Abstract | Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores. To overcome this issue, we propose a novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics. Moreover, based on MetricGAN, the metric scores of the generated data can also be arbitrarily specified by users. We tested the proposed MetricGAN on a speech enhancement task, which is particularly suitable to verify the proposed approach because there are multiple metrics measuring different aspects of speech signals. Moreover, these metrics are generally complex and could not be fully optimized by Lp or conventional adversarial losses. |
Tasks | Speech Enhancement |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.04874v1 |
https://arxiv.org/pdf/1905.04874v1.pdf | |
PWC | https://paperswithcode.com/paper/metricgan-generative-adversarial-networks |
Repo | |
Framework | |
Distributed Learning for Channel Allocation Over a Shared Spectrum
Title | Distributed Learning for Channel Allocation Over a Shared Spectrum |
Authors | S. M. Zafaruddin, Ilai Bistritz, Amir Leshem, Dusit Niyato |
Abstract | Channel allocation is the task of assigning channels to users such that some objective (e.g., sum-rate) is maximized. In centralized networks such as cellular networks, this task is carried by the base station which gathers the channel state information (CSI) from the users and computes the optimal solution. In distributed networks such as ad-hoc and device-to-device (D2D) networks, no base station exists and conveying global CSI between users is costly or simply impractical. When the CSI is time varying and unknown to the users, the users face the challenge of both learning the channel statistics online and converge to a good channel allocation. This introduces a multi-armed bandit (MAB) scenario with multiple decision makers. If two users or more choose the same channel, a collision occurs and they all receive zero reward. We propose a distributed channel allocation algorithm that each user runs and converges to the optimal allocation while achieving an order optimal regret of O\left(\log T\right). The algorithm is based on a carrier sensing multiple access (CSMA) implementation of the distributed auction algorithm. It does not require any exchange of information between users. Users need only to observe a single channel at a time and sense if there is a transmission on that channel, without decoding the transmissions or identifying the transmitting users. We demonstrate the performance of our algorithm using simulated LTE and 5G channels. |
Tasks | |
Published | 2019-02-17 |
URL | http://arxiv.org/abs/1902.06353v2 |
http://arxiv.org/pdf/1902.06353v2.pdf | |
PWC | https://paperswithcode.com/paper/distributed-learning-for-channel-allocation |
Repo | |
Framework | |
Learning a Curve Guardian for Motorcycles
Title | Learning a Curve Guardian for Motorcycles |
Authors | Simon Hecker, Alexander Liniger, Henrik Maurenbrecher, Dengxin Dai, Luc Van Gool |
Abstract | Up to 17% of all motorcycle accidents occur when the rider is maneuvering through a curve and the main cause of curve accidents can be attributed to inappropriate speed and wrong intra-lane position of the motorcycle. Existing curve warning systems lack crucial state estimation components and do not scale well. We propose a new type of road curvature warning system for motorcycles, combining the latest advances in computer vision, optimal control and mapping technologies to alleviate these shortcomings. Our contributes are fourfold: 1) we predict the motorcycle’s intra-lane position using a convolutional neural network (CNN), 2) we predict the motorcycle roll angle using a CNN, 3) we use an upgraded controller model that incorporates road incline for a more realistic model and prediction, 4) we design a scale-able system by utilizing HERE Technologies map database to obtain the accurate road geometry of the future path. In addition, we present two datasets that are used for training and evaluating of our system respectively, both datasets will be made publicly available. We test our system on a diverse set of real world scenarios and present a detailed case-study. We show that our system is able to predict more accurate and safer curve trajectories, and consequently warn and improve the safety for motorcyclists. |
Tasks | |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05738v1 |
https://arxiv.org/pdf/1907.05738v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-curve-guardian-for-motorcycles |
Repo | |
Framework | |
A GA-based feature selection of the EEG signals by classification evaluation: Application in BCI systems
Title | A GA-based feature selection of the EEG signals by classification evaluation: Application in BCI systems |
Authors | Samira Vafay Eslahi, Nader Jafarnia Dabanloo, Keivan Maghooli |
Abstract | In electroencephalogram (EEG) signal processing, finding the appropriate information from a dataset has been a big challenge for successful signal classification. The feature selection methods make it possible to solve this problem; however, the method selection is still under investigation to find out which feature can perform the best to extract the most proper features of the signal to improve the classification performance. In this study, we use the genetic algorithm (GA), a heuristic searching algorithm, to find the optimum combination of the feature extraction methods and the classifiers, in the brain-computer interface (BCI) applications. A BCI system can be practical if and only if it performs with high accuracy and high speed alongside each other. In the proposed method, GA performs as a searching engine to find the best combination of the features and classifications. The features used here are Katz, Higuchi, Petrosian, Sevcik, and box-counting dimension (BCD) feature extraction methods. These features are applied to the wavelet subbands and are classified with four classifiers such as adaptive neuro-fuzzy inference system (ANFIS), fuzzy k-nearest neighbors (FKNN), support vector machine (SVM) and linear discriminant analysis (LDA). Due to the huge number of features, the GA optimization is used to find the features with the optimum fitness value (FV). Results reveal that Katz fractal feature estimation method with LDA classification has the best FV. Consequently, due to the low computation time of the first Daubechies wavelet transformation in comparison to the original signal, the final selected methods contain the fractal features of the first coefficient of the detail subbands. |
Tasks | EEG, Feature Selection |
Published | 2019-01-16 |
URL | http://arxiv.org/abs/1903.02081v1 |
http://arxiv.org/pdf/1903.02081v1.pdf | |
PWC | https://paperswithcode.com/paper/a-ga-based-feature-selection-of-the-eeg |
Repo | |
Framework | |