Paper Group ANR 817
3D Dense Face Alignment via Graph Convolution Networks. Schema Matching using Machine Learning. Analyzing the Interpretability Robustness of Self-Explaining Models. Synaptic Delays for Temporal Feature Detection in Dynamic Neuromorphic Processors. Asymmetric Impurity Functions, Class Weighting, and Optimal Splits for Binary Classification Trees. ba …
3D Dense Face Alignment via Graph Convolution Networks
Title | 3D Dense Face Alignment via Graph Convolution Networks |
Authors | Huawei Wei, Shuang Liang, Yichen Wei |
Abstract | Recently, 3D face reconstruction and face alignment tasks are gradually combined into one task: 3D dense face alignment. Its goal is to reconstruct the 3D geometric structure of face with pose information. In this paper, we propose a graph convolution network to regress 3D face coordinates. Our method directly performs feature learning on the 3D face mesh, where the geometric structure and details are well preserved. Extensive experiments show that our approach gains superior performance over state-of-the-art methods on several challenging datasets. |
Tasks | 3D Face Reconstruction, Face Alignment, Face Reconstruction |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05562v1 |
http://arxiv.org/pdf/1904.05562v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-dense-face-alignment-via-graph-convolution |
Repo | |
Framework | |
Schema Matching using Machine Learning
Title | Schema Matching using Machine Learning |
Authors | Tanvi Sahay, Ankita Mehta, Shruti Jadon |
Abstract | Schema Matching is a method of finding attributes that are either similar to each other linguistically or represent the same information. In this project, we take a hybrid approach at solving this problem by making use of both the provided data and the schema name to perform one to one schema matching and introduce the creation of a global dictionary to achieve one to many schema matching. We experiment with two methods of one to one matching and compare both based on their F-scores, precision, and recall. We also compare our method with the ones previously suggested and highlight differences between them. |
Tasks | |
Published | 2019-11-24 |
URL | https://arxiv.org/abs/1911.11543v1 |
https://arxiv.org/pdf/1911.11543v1.pdf | |
PWC | https://paperswithcode.com/paper/schema-matching-using-machine-learning |
Repo | |
Framework | |
Analyzing the Interpretability Robustness of Self-Explaining Models
Title | Analyzing the Interpretability Robustness of Self-Explaining Models |
Authors | Haizhong Zheng, Earlence Fernandes, Atul Prakash |
Abstract | Recently, interpretable models called self-explaining models (SEMs) have been proposed with the goal of providing interpretability robustness. We evaluate the interpretability robustness of SEMs and show that explanations provided by SEMs as currently proposed are not robust to adversarial inputs. Specifically, we successfully created adversarial inputs that do not change the model outputs but cause significant changes in the explanations. We find that even though current SEMs use stable co-efficients for mapping explanations to output labels, they do not consider the robustness of the first stage of the model that creates interpretable basis concepts from the input, leading to non-robust explanations. Our work makes a case for future work to start examining how to generate interpretable basis concepts in a robust way. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.12429v2 |
https://arxiv.org/pdf/1905.12429v2.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-the-interpretability-robustness-of |
Repo | |
Framework | |
Synaptic Delays for Temporal Feature Detection in Dynamic Neuromorphic Processors
Title | Synaptic Delays for Temporal Feature Detection in Dynamic Neuromorphic Processors |
Authors | Fredrik Sandin, Mattias Nilsson |
Abstract | Spiking neural networks implemented in dynamic neuromorphic processors are well suited for spatiotemporal feature detection and learning, for example in ultra low-power embedded intelligence and deep edge applications. Such pattern recognition networks naturally involve a combination of dynamic delay mechanisms and coincidence detection. Inspired by an auditory feature detection circuit in crickets, featuring a delayed excitation by postinhibitory rebound, we investigate disynaptic delay elements formed by inhibitory-excitatory pairs of dynamic synapses. We configure such disynaptic delay elements in the DYNAP-SE neuromorphic processor and characterize the distribution of delayed excitations resulting from device mismatch. Furthermore, we present a network that mimics the auditory feature detection circuit of crickets and demonstrate how varying synapse weights, input noise and processor temperature affects the circuit. Interestingly, we find that the disynaptic delay elements can be configured such that the timing and magnitude of the delayed postsynaptic excitation depend mainly on the efficacy of the inhibitory and excitatory synapses, respectively. Delay elements of this kind can be implemented in other reconfigurable dynamic neuromorphic processors and opens up for synapse level temporal feature tuning with large fan-in and flexible delays of order 10-100 ms. |
Tasks | |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1906.12282v1 |
https://arxiv.org/pdf/1906.12282v1.pdf | |
PWC | https://paperswithcode.com/paper/synaptic-delays-for-temporal-feature |
Repo | |
Framework | |
Asymmetric Impurity Functions, Class Weighting, and Optimal Splits for Binary Classification Trees
Title | Asymmetric Impurity Functions, Class Weighting, and Optimal Splits for Binary Classification Trees |
Authors | David Zimmermann |
Abstract | We investigate how asymmetrizing an impurity function affects the choice of optimal node splits when growing a decision tree for binary classification. In particular, we relax the usual axioms of an impurity function and show how skewing an impurity function biases the optimal splits to isolate points of a particular class when splitting a node. We give a rigorous definition of this notion, then give a necessary and sufficient condition for such a bias to hold. We also show that the technique of class weighting is equivalent to applying a specific transformation to the impurity function, and tie all these notions together for a class of impurity functions that includes the entropy and Gini impurity. We also briefly discuss cost-insensitive impurity functions and give a characterization of such functions. |
Tasks | |
Published | 2019-04-29 |
URL | http://arxiv.org/abs/1904.12465v1 |
http://arxiv.org/pdf/1904.12465v1.pdf | |
PWC | https://paperswithcode.com/paper/asymmetric-impurity-functions-class-weighting |
Repo | |
Framework | |
bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)
Title | bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond) |
Authors | Nikolaus Umlauf, Nadja Klein, Thorsten Simon, Achim Zeileis |
Abstract | Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible model specifications are needed as big(ger) data become available, facilitated by more powerful computing infrastructure. (2) Full probabilistic modeling rather than predicting just means or expectations is crucial in many applications. (3) Interest in Bayesian inference has been increasing both as an appealing framework for regularizing or penalizing model estimation as well as a natural alternative to classical frequentist inference. However, while there has been a lot of research in all three areas, also leading to associated software packages, a modular software implementation that allows to easily combine all three aspects has not yet been available. For filling this gap, the R package bamlss is introduced for Bayesian additive models for location, scale, and shape (and beyond). At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models (GAMs) or generalized additive models for location, scale, and shape (GAMLSS), also known as distributional regression. However, its building blocks are designed as “Lego bricks” encompassing various distributions (exponential family, Cox, joint models, …), regression terms (linear, splines, random effects, tensor products, spatial fields, …), and estimators (MCMC, backfitting, gradient boosting, lasso, …). It is demonstrated how these can be easily recombined to make classical models more flexible or create new custom models for specific modeling challenges. |
Tasks | Bayesian Inference |
Published | 2019-09-25 |
URL | https://arxiv.org/abs/1909.11784v1 |
https://arxiv.org/pdf/1909.11784v1.pdf | |
PWC | https://paperswithcode.com/paper/bamlss-a-lego-toolbox-for-flexible-bayesian |
Repo | |
Framework | |
End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network
Title | End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network |
Authors | Sajjad Abdoli, Patrick Cardinal, Alessandro Lameiras Koerich |
Abstract | In this paper, we present an end-to-end approach for environmental sound classification based on a 1D Convolution Neural Network (CNN) that learns a representation directly from the audio signal. Several convolutional layers are used to capture the signal’s fine time structure and learn diverse filters that are relevant to the classification task. The proposed approach can deal with audio signals of any length as it splits the signal into overlapped frames using a sliding window. Different architectures considering several input sizes are evaluated, including the initialization of the first convolutional layer with a Gammatone filterbank that models the human auditory filter response in the cochlea. The performance of the proposed end-to-end approach in classifying environmental sounds was assessed on the UrbanSound8k dataset and the experimental results have shown that it achieves 89% of mean accuracy. Therefore, the propose approach outperforms most of the state-of-the-art approaches that use handcrafted features or 2D representations as input. Furthermore, the proposed approach has a small number of parameters compared to other architectures found in the literature, which reduces the amount of data required for training. |
Tasks | Environmental Sound Classification |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.08990v1 |
http://arxiv.org/pdf/1904.08990v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-environmental-sound-classification |
Repo | |
Framework | |
Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification
Title | Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification |
Authors | Zhichao Zhang, Shugong Xu, Tianhao Qiao, Shunqing Zhang, Shan Cao |
Abstract | Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose an convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and achieved the state-of-the-art performance in terms of classification accuracy. |
Tasks | Environmental Sound Classification |
Published | 2019-07-04 |
URL | https://arxiv.org/abs/1907.02230v1 |
https://arxiv.org/pdf/1907.02230v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-convolutional-recurrent |
Repo | |
Framework | |
TTS Skins: Speaker Conversion via ASR
Title | TTS Skins: Speaker Conversion via ASR |
Authors | Adam Polyak, Lior Wolf, Yaniv Taigman |
Abstract | We present a fully convolutional wav-to-wav network for converting between speakers’ voices, without relying on text. Our network is based on an encoder-decoder architecture, where the encoder is pre-trained for the task of Automatic Speech Recognition (ASR), and a multi-speaker waveform decoder is trained to reconstruct the original signal in an autoregressive manner. We train the network on narrated audiobooks, and demonstrate the ability to perform multi-voice TTS in those voices, by converting the voice of a TTS robot. We observe no degradation in the quality of the generated voices, in comparison to the reference TTS voice. The modularity of our approach, which separates the target voice generation from the TTS module, enables client-side personalized TTS in a privacy-aware manner. |
Tasks | Speech Recognition |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.08983v1 |
http://arxiv.org/pdf/1904.08983v1.pdf | |
PWC | https://paperswithcode.com/paper/tts-skins-speaker-conversion-via-asr |
Repo | |
Framework | |
Graph Neural Networks for User Identity Linkage
Title | Graph Neural Networks for User Identity Linkage |
Authors | Wen Zhang, Kai Shu, Huan Liu, Yalin Wang |
Abstract | The increasing popularity and diversity of social media sites has encouraged more and more people to participate in multiple online social networks to enjoy their services. Each user may create a user identity to represent his or her unique public figure in every social network. User identity linkage across online social networks is an emerging task and has attracted increasing attention, which could potentially impact various domains such as recommendations and link predictions. The majority of existing work focuses on mining network proximity or user profile data for discovering user identity linkages. With the recent advancements in graph neural networks (GNNs), it provides great potential to advance user identity linkage since users are connected in social graphs, and learning latent factors of users and items is the key. However, predicting user identity linkages based on GNNs faces challenges. For example, the user social graphs encode both \textit{local} structure such as users’ neighborhood signals, and \textit{global} structure with community properties. To address these challenges simultaneously, in this paper, we present a novel graph neural network framework ({\m}) for user identity linkage. In particular, we provide a principled approach to jointly capture local and global information in the user-user social graph and propose the framework {\m}, which jointly learning user representations for user identity linkage. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed framework. |
Tasks | |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02174v1 |
http://arxiv.org/pdf/1903.02174v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-neural-networks-for-user-identity |
Repo | |
Framework | |
Uncertainty Quantification with Statistical Guarantees in End-to-End Autonomous Driving Control
Title | Uncertainty Quantification with Statistical Guarantees in End-to-End Autonomous Driving Control |
Authors | Rhiannon Michelmore, Matthew Wicker, Luca Laurenti, Luca Cardelli, Yarin Gal, Marta Kwiatkowska |
Abstract | Deep neural network controllers for autonomous driving have recently benefited from significant performance improvements, and have begun deployment in the real world. Prior to their widespread adoption, safety guarantees are needed on the controller behaviour that properly take account of the uncertainty within the model as well as sensor noise. Bayesian neural networks, which assume a prior over the weights, have been shown capable of producing such uncertainty measures, but properties surrounding their safety have not yet been quantified for use in autonomous driving scenarios. In this paper, we develop a framework based on a state-of-the-art simulator for evaluating end-to-end Bayesian controllers. In addition to computing pointwise uncertainty measures that can be computed in real time and with statistical guarantees, we also provide a method for estimating the probability that, given a scenario, the controller keeps the car safe within a finite horizon. We experimentally evaluate the quality of uncertainty computation by several Bayesian inference methods in different scenarios and show how the uncertainty measures can be combined and calibrated for use in collision avoidance. Our results suggest that uncertainty estimates can greatly aid decision making in autonomous driving. |
Tasks | Autonomous Driving, Bayesian Inference, Decision Making |
Published | 2019-09-21 |
URL | https://arxiv.org/abs/1909.09884v1 |
https://arxiv.org/pdf/1909.09884v1.pdf | |
PWC | https://paperswithcode.com/paper/190909884 |
Repo | |
Framework | |
Domain-Independent turn-level Dialogue Quality Evaluation via User Satisfaction Estimation
Title | Domain-Independent turn-level Dialogue Quality Evaluation via User Satisfaction Estimation |
Authors | Praveen Kumar Bodigutla, Longshaokan Wang, Kate Ridgeway, Joshua Levy, Swanand Joshi, Alborz Geramifard, Spyros Matsoukas |
Abstract | An automated metric to evaluate dialogue quality is vital for optimizing data driven dialogue management. The common approach of relying on explicit user feedback during a conversation is intrusive and sparse. Current models to estimate user satisfaction use limited feature sets and rely on annotation schemes with low inter-rater reliability, limiting generalizability to conversations spanning multiple domains. To address these gaps, we created a new Response Quality annotation scheme, based on which we developed turn-level User Satisfaction metric. We introduced five new domain-independent feature sets and experimented with six machine learning models to estimate the new satisfaction metric. Using Response Quality annotation scheme, across randomly sampled single and multi-turn conversations from 26 domains, we achieved high inter-annotator agreement (Spearman’s rho 0.94). The Response Quality labels were highly correlated (0.76) with explicit turn-level user ratings. Gradient boosting regression achieved best correlation of ~0.79 between predicted and annotated user satisfaction labels. Multi Layer Perceptron and Gradient Boosting regression models generalized to an unseen domain better (linear correlation 0.67) than other models. Finally, our ablation study verified that our novel features significantly improved model performance. |
Tasks | Dialogue Management |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.07064v1 |
https://arxiv.org/pdf/1908.07064v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-independent-turn-level-dialogue |
Repo | |
Framework | |
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles
Title | Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles |
Authors | Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh |
Abstract | Reinforcement learning (RL) methods have been shown to be capable of learning intelligent behavior in rich domains. However, this has largely been done in simulated domains without adequate focus on the process of building the simulator. In this paper, we consider a setting where we have access to an ensemble of pre-trained and possibly inaccurate simulators (models). We approximate the real environment using a state-dependent linear combination of the ensemble, where the coefficients are determined by the given state features and some unknown parameters. Our proposed algorithm provably learns a near-optimal policy with a sample complexity polynomial in the number of unknown parameters, and incurs no dependence on the size of the state (or action) space. As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set. We provide exponential lower bounds that illustrate the fundamental hardness of this problem, and develop a provably efficient algorithm under additional natural assumptions. |
Tasks | Model Selection |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10597v1 |
https://arxiv.org/pdf/1910.10597v1.pdf | |
PWC | https://paperswithcode.com/paper/sample-complexity-of-reinforcement-learning |
Repo | |
Framework | |
Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images
Title | Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images |
Authors | Mia Kokic, Danica Kragic, Jeannette Bohg |
Abstract | We develop a system for modeling hand-object interactions in 3D from RGB images that show a hand which is holding a novel object from a known category. We design a Convolutional Neural Network (CNN) for Hand-held Object Pose and Shape estimation called HOPS-Net and utilize prior work to estimate the hand pose and configuration. We leverage the insight that information about the hand facilitates object pose and shape estimation by incorporating the hand into both training and inference of the object pose and shape as well as the refinement of the estimated pose. The network is trained on a large synthetic dataset of objects in interaction with a human hand. To bridge the gap between real and synthetic images, we employ an image-to-image translation model (Augmented CycleGAN) that generates realistically textured objects given a synthetic rendering. This provides a scalable way of generating annotated data for training HOPS-Net. Our quantitative experiments show that even noisy hand parameters significantly help object pose and shape estimation. The qualitative experiments show results of pose and shape estimation of objects held by a hand “in the wild”. |
Tasks | Image-to-Image Translation |
Published | 2019-03-08 |
URL | https://arxiv.org/abs/1903.03340v3 |
https://arxiv.org/pdf/1903.03340v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-estimate-pose-and-shape-of-hand |
Repo | |
Framework | |
Data mining Mandarin tone contour shapes
Title | Data mining Mandarin tone contour shapes |
Authors | Shuo Zhang |
Abstract | In spontaneous speech, Mandarin tones that belong to the same tone category may exhibit many different contour shapes. We explore the use of data mining and NLP techniques for understanding the variability of tones in a large corpus of Mandarin newscast speech. First, we adapt a graph-based approach to characterize the clusters (fuzzy types) of tone contour shapes observed in each tone n-gram category. Second, we show correlations between these realized contour shape types and a bag of automatically extracted linguistic features. We discuss the implications of the current study within the context of phonological and information theory. |
Tasks | |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01668v1 |
https://arxiv.org/pdf/1907.01668v1.pdf | |
PWC | https://paperswithcode.com/paper/data-mining-mandarin-tone-contour-shapes |
Repo | |
Framework | |