Paper Group ANR 954
Recovery of Noisy Points on Band-limited Surfaces: Kernel Methods Re-explained. Learning to Localize Sound Source in Visual Scenes. A chemical language based approach for protein - ligand interaction prediction. Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks. On the Learning of Deep L …
Recovery of Noisy Points on Band-limited Surfaces: Kernel Methods Re-explained
Title | Recovery of Noisy Points on Band-limited Surfaces: Kernel Methods Re-explained |
Authors | Sunrita Poddar, Mathews Jacob |
Abstract | We introduce a continuous domain framework for the recovery of points on a surface in high dimensional space, represented as the zero-level set of a bandlimited function. We show that the exponential maps of the points on the surface satisfy annihilation relations, implying that they lie in a finite dimensional subspace. The subspace properties are used to derive sampling conditions, which will guarantee the perfect recovery of the surface from finite number of points. We rely on nuclear norm minimization to exploit the low-rank structure of the maps to recover the points from noisy measurements. Since the direct estimation of the surface is computationally prohibitive in very high dimensions, we propose an iterative reweighted algorithm using the “kernel trick”. The iterative algorithm reveals deep links to Laplacian based algorithms widely used in graph signal processing; the theory and the sampling conditions can serve as a basis for discrete-continuous domain processing of signals on a graph. |
Tasks | |
Published | 2018-01-03 |
URL | http://arxiv.org/abs/1801.00890v2 |
http://arxiv.org/pdf/1801.00890v2.pdf | |
PWC | https://paperswithcode.com/paper/recovery-of-noisy-points-on-band-limited |
Repo | |
Framework | |
Learning to Localize Sound Source in Visual Scenes
Title | Learning to Localize Sound Source in Visual Scenes |
Authors | Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon |
Abstract | Visual events are usually accompanied by sounds in our daily lives. We pose the question: Can the machine learn the correspondence between visual scene and the sound, and localize the sound source only by observing sound and visual scene pairs like human? In this paper, we propose a novel unsupervised algorithm to address the problem of localizing the sound source in visual scenes. A two-stream network structure which handles each modality, with attention mechanism is developed for sound source localization. Moreover, although our network is formulated within the unsupervised learning framework, it can be extended to a unified architecture with a simple modification for the supervised and semi-supervised learning settings as well. Meanwhile, a new sound source dataset is developed for performance evaluation. Our empirical evaluation shows that the unsupervised method eventually go through false conclusion in some cases. We show that even with a few supervision, false conclusion is able to be corrected and the source of sound in a visual scene can be localized effectively. |
Tasks | |
Published | 2018-03-10 |
URL | http://arxiv.org/abs/1803.03849v1 |
http://arxiv.org/pdf/1803.03849v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-localize-sound-source-in-visual |
Repo | |
Framework | |
A chemical language based approach for protein - ligand interaction prediction
Title | A chemical language based approach for protein - ligand interaction prediction |
Authors | Hakime Öztürk, Arzucan Özgür, Elif Ozkirimli |
Abstract | Identification of high affinity drug-target interactions (DTI) is a major research question in drug discovery. In this study, we propose a novel methodology to predict drug-target binding affinity using only ligand SMILES information. We represent proteins using the word-embeddings of the SMILES representations of their strong binding ligands. Each SMILES is represented in the form of a set of chemical words and a protein is described by the set of chemical words with the highest Term Frequency- Inverse Document Frequency (TF-IDF) value. We then utilize the Support Vector Regression (SVR) algorithm to predict protein - drug binding affinities in the Davis and KIBA Kinase datasets. We also compared the performance of SMILES representation with the recently proposed DeepSMILES representation and found that using DeepSMILES yields better performance in the prediction task. Using only SMILESVec, which is a strictly string based representation of the proteins based on their interacting ligands, we were able to predict drug-target binding affinity as well as or better than the KronRLS or SimBoost models that utilize protein sequence. |
Tasks | Drug Discovery, Word Embeddings |
Published | 2018-11-02 |
URL | http://arxiv.org/abs/1811.00761v1 |
http://arxiv.org/pdf/1811.00761v1.pdf | |
PWC | https://paperswithcode.com/paper/a-chemical-language-based-approach-for |
Repo | |
Framework | |
Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks
Title | Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks |
Authors | Isidro Cortes-Ciriano, Andreas Bender |
Abstract | Deep learning architectures have proved versatile in a number of drug discovery applications, including the modelling of in vitro compound activity. While controlling for prediction confidence is essential to increase the trust, interpretability and usefulness of virtual screening models in drug discovery, techniques to estimate the reliability of the predictions generated with deep learning networks remain largely underexplored. Here, we present Deep Confidence, a framework to compute valid and efficient confidence intervals for individual predictions using the deep learning technique Snapshot Ensembling and conformal prediction. Specifically, Deep Confidence generates an ensemble of deep neural networks by recording the network parameters throughout the local minima visited during the optimization phase of a single neural network. This approach serves to derive a set of base learners (i.e., snapshots) with comparable predictive power on average, that will however generate slightly different predictions for a given instance. The variability across base learners and the validation residuals are in turn harnessed to compute confidence intervals using the conformal prediction framework. Using a set of 24 diverse IC50 data sets from ChEMBL 23, we show that Snapshot Ensembles perform on par with Random Forest (RF) and ensembles of independently trained deep neural networks. In addition, we find that the confidence regions predicted using the Deep Confidence framework span a narrower set of values. Overall, Deep Confidence represents a highly versatile error prediction framework that can be applied to any deep learning-based application at no extra computational cost. |
Tasks | Drug Discovery |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.09060v1 |
http://arxiv.org/pdf/1809.09060v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-confidence-a-computationally-efficient |
Repo | |
Framework | |
On the Learning of Deep Local Features for Robust Face Spoofing Detection
Title | On the Learning of Deep Local Features for Robust Face Spoofing Detection |
Authors | Gustavo Botelho de Souza, João Paulo Papa, Aparecido Nilceu Marana |
Abstract | Biometrics emerged as a robust solution for security systems. However, given the dissemination of biometric applications, criminals are developing techniques to circumvent them by simulating physical or behavioral traits of legal users (spoofing attacks). Despite face being a promising characteristic due to its universality, acceptability and presence of cameras almost everywhere, face recognition systems are extremely vulnerable to such frauds since they can be easily fooled with common printed facial photographs. State-of-the-art approaches, based on Convolutional Neural Networks (CNNs), present good results in face spoofing detection. However, these methods do not consider the importance of learning deep local features from each facial region, even though it is known from face recognition that each facial region presents different visual aspects, which can also be exploited for face spoofing detection. In this work we propose a novel CNN architecture trained in two steps for such task. Initially, each part of the neural network learns features from a given facial region. Afterwards, the whole model is fine-tuned on the whole facial images. Results show that such pre-training step allows the CNN to learn different local spoofing cues, improving the performance and the convergence speed of the final model, outperforming the state-of-the-art approaches. |
Tasks | Face Recognition |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07492v2 |
http://arxiv.org/pdf/1806.07492v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-learning-of-deep-local-features-for |
Repo | |
Framework | |
Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language
Title | Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language |
Authors | M Faisal, Sanaullah Manzoor |
Abstract | Human lip-reading is a challenging task. It requires not only knowledge of underlying language but also visual clues to predict spoken words. Experts need certain level of experience and understanding of visual expressions learning to decode spoken words. Now-a-days, with the help of deep learning it is possible to translate lip sequences into meaningful words. The speech recognition in the noisy environments can be increased with the visual information [1]. To demonstrate this, in this project, we have tried to train two different deep-learning models for lip-reading: first one for video sequences using spatiotemporal convolution neural network, Bi-gated recurrent neural network and Connectionist Temporal Classification Loss, and second for audio that inputs the MFCC features to a layer of LSTM cells and output the sequence. We have also collected a small audio-visual dataset to train and test our model. Our target is to integrate our both models to improve the speech recognition in the noisy environment |
Tasks | Speech Recognition |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05521v1 |
http://arxiv.org/pdf/1802.05521v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-lip-reading-using-audio |
Repo | |
Framework | |
Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner
Title | Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner |
Authors | Sao Mai Nguyen, Pierre-Yves Oudeyer |
Abstract | We present an active learning architecture that allows a robot to actively learn which data collection strategy is most efficient for acquiring motor skills to achieve multiple outcomes, and generalise over its experience to achieve new outcomes. The robot explores its environment both via interactive learning and goal-babbling. It learns at the same time when, who and what to actively imitate from several available teachers, and learns when not to use social guidance but use active goal-oriented self-exploration. This is formalised in the framework of life-long strategic learning. The proposed architecture, called Socially Guided Intrinsic Motivation with Active Choice of Teacher and Strategy (SGIM-ACTS), relies on hierarchical active decisions of what and how to learn driven by empirical evaluation of learning progress for each learning strategy. We illustrate with an experiment where a simulated robot learns to control its arm for realising two kinds of different outcomes. It has to choose actively and hierarchically at each learning episode: 1) what to learn: which outcome is most interesting to select as a goal to focus on for goal-directed exploration; 2) how to learn: which data collection strategy to use among self-exploration, mimicry and emulation; 3) once he has decided when and what to imitate by choosing mimicry or emulation, then he has to choose who to imitate, from a set of different teachers. We show that SGIM-ACTS learns significantly more efficiently than using single learning strategies, and coherently selects the best strategy with respect to the chosen outcome, taking advantage of the available teachers (with different levels of skills). |
Tasks | Active Learning |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06819v1 |
http://arxiv.org/pdf/1804.06819v1.pdf | |
PWC | https://paperswithcode.com/paper/active-choice-of-teachers-learning-strategies |
Repo | |
Framework | |
Gender Aware Spoken Language Translation Applied to English-Arabic
Title | Gender Aware Spoken Language Translation Applied to English-Arabic |
Authors | Mostafa Elaraby, Ahmed Y. Tawfik, Mahmoud Khaled, Hany Hassan, Aly Osama |
Abstract | Spoken Language Translation (SLT) is becoming more widely used and becoming a communication tool that helps in crossing language barriers. One of the challenges of SLT is the translation from a language without gender agreement to a language with gender agreement such as English to Arabic. In this paper, we introduce an approach to tackle such limitation by enabling a Neural Machine Translation system to produce gender-aware translation. We show that NMT system can model the speaker/listener gender information to produce gender-aware translation. We propose a method to generate data used in adapting a NMT system to produce gender-aware. The proposed approach can achieve significant improvement of the translation quality by 2 BLEU points. |
Tasks | Machine Translation |
Published | 2018-02-26 |
URL | http://arxiv.org/abs/1802.09287v1 |
http://arxiv.org/pdf/1802.09287v1.pdf | |
PWC | https://paperswithcode.com/paper/gender-aware-spoken-language-translation |
Repo | |
Framework | |
Hallucinating very low-resolution and obscured face images
Title | Hallucinating very low-resolution and obscured face images |
Authors | Lianping Yang, Bin Shao, Ting Sun, Song Ding, Xiangde Zhang |
Abstract | Most of the face hallucination methods are designed for complete inputs. They will not work well if the inputs are very tiny or contaminated by large occlusion. Inspired by this fact, we propose an obscured face hallucination network(OFHNet). The OFHNet consists of four parts: an inpainting network, an upsampling network, a discriminative network, and a fixed facial landmark detection network. The inpainting network restores the low-resolution(LR) obscured face images. The following upsampling network is to upsample the output of inpainting network. In order to ensure the generated high-resolution(HR) face images more photo-realistic, we utilize the discriminative network and the facial landmark detection network to better the result of upsampling network. In addition, we present a semantic structure loss, which makes the generated HR face images more pleasing. Extensive experiments show that our framework can restore the appealing HR face images from 1/4 missing area LR face images with a challenging scaling factor of 8x. |
Tasks | Face Hallucination, Facial Landmark Detection |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04645v4 |
http://arxiv.org/pdf/1811.04645v4.pdf | |
PWC | https://paperswithcode.com/paper/hallucinating-very-low-resolution-and |
Repo | |
Framework | |
End-to-End Physics Event Classification with CMS Open Data: Applying Image-Based Deep Learning to Detector Data for the Direct Classification of Collision Events at the LHC
Title | End-to-End Physics Event Classification with CMS Open Data: Applying Image-Based Deep Learning to Detector Data for the Direct Classification of Collision Events at the LHC |
Authors | Michael Andrews, Manfred Paulini, Sergei Gleyzer, Barnabas Poczos |
Abstract | This paper describes the construction of novel end-to-end image-based classifiers that directly leverage low-level simulated detector data to discriminate signal and background processes in pp collision events at the Large Hadron Collider at CERN. To better understand what end-to-end classifiers are capable of learning from the data and to address a number of associated challenges, we distinguish the decay of the standard model Higgs boson into two photons from its leading background sources using high-fidelity simulated CMS Open Data. We demonstrate the ability of end-to-end classifiers to learn from the angular distribution of the photons recorded as electromagnetic showers, their intrinsic shapes, and the energy of their constituent hits, even when the underlying particles are not fully resolved, delivering a clear advantage in such cases over purely kinematics-based classifiers. |
Tasks | |
Published | 2018-07-31 |
URL | https://arxiv.org/abs/1807.11916v2 |
https://arxiv.org/pdf/1807.11916v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-physics-event-classification-with |
Repo | |
Framework | |
On the Curved Geometry of Accelerated Optimization
Title | On the Curved Geometry of Accelerated Optimization |
Authors | Aaron Defazio |
Abstract | In this work we propose a differential geometric motivation for Nesterov’s accelerated gradient method (AGM) for strongly-convex problems. By considering the optimization procedure as occurring on a Riemannian manifold with a natural structure, The AGM method can be seen as the proximal point method applied in this curved space. This viewpoint can also be extended to the continuous time case, where the accelerated gradient method arises from the natural block-implicit Euler discretization of an ODE on the manifold. We provide an analysis of the convergence rate of this ODE for quadratic objectives. |
Tasks | |
Published | 2018-12-11 |
URL | https://arxiv.org/abs/1812.04634v2 |
https://arxiv.org/pdf/1812.04634v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-curved-geometry-of-accelerated |
Repo | |
Framework | |
Variational Bayesian Inference for Robust Streaming Tensor Factorization and Completion
Title | Variational Bayesian Inference for Robust Streaming Tensor Factorization and Completion |
Authors | Cole Hawkins, Zheng Zhang |
Abstract | Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them susceptible to outliers and vulnerable to over-fitting. This paper presents a Bayesian robust streaming tensor factorization model to identify sparse outliers, automatically determine the underlying tensor rank and accurately fit low-rank structure. We implement our model in Matlab and compare it with existing algorithms on tensor datasets generated from dynamic MRI and Internet traffic. |
Tasks | Bayesian Inference, Recommendation Systems |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02153v2 |
http://arxiv.org/pdf/1809.02153v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-bayesian-inference-for-robust |
Repo | |
Framework | |
Use of Neural Signals to Evaluate the Quality of Generative Adversarial Network Performance in Facial Image Generation
Title | Use of Neural Signals to Evaluate the Quality of Generative Adversarial Network Performance in Facial Image Generation |
Authors | Zhengwei Wang, Graham Healy, Alan F. Smeaton, Tomas E. Ward |
Abstract | There is a growing interest in using generative adversarial networks (GANs) to produce image content that is indistinguishable from real images as judged by a typical person. A number of GAN variants for this purpose have been proposed, however, evaluating GANs performance is inherently difficult because current methods for measuring the quality of their output are not always consistent with what a human perceives. We propose a novel approach that combines a brain-computer interface (BCI) with GANs to generate a measure we call Neuroscore, which closely mirrors the behavioral ground truth measured from participants tasked with discerning real from synthetic images. This technique we call a neuro-AI interface, as it provides an interface between a human’s neural systems and an AI process. In this paper, we first compare the three most widely used metrics in the literature for evaluating GANs in terms of visual quality and compare their outputs with human judgments. Secondly we propose and demonstrate a novel approach using neural signals and rapid serial visual presentation (RSVP) that directly measures a human perceptual response to facial production quality, independent of a behavioral response measurement. The correlation between our proposed Neuroscore and human perceptual judgments has Pearson correlation statistics: $\mathrm{r}(48) = -0.767, \mathrm{p} = 2.089e-10$. We also present the bootstrap result for the correlation i.e., $\mathrm{p}\leq 0.0001$. Results show that our Neuroscore is more consistent with human judgment compared to the conventional metrics we evaluated. We conclude that neural signals have potential applications for high quality, rapid evaluation of GANs in the context of visual image synthesis. |
Tasks | Image Generation |
Published | 2018-11-10 |
URL | https://arxiv.org/abs/1811.04172v3 |
https://arxiv.org/pdf/1811.04172v3.pdf | |
PWC | https://paperswithcode.com/paper/use-of-neural-signals-to-evaluate-the-quality |
Repo | |
Framework | |
Powerful, transferable representations for molecules through intelligent task selection in deep multitask networks
Title | Powerful, transferable representations for molecules through intelligent task selection in deep multitask networks |
Authors | Clyde Fare, Lukas Turcani, Edward O. Pyzer-Knapp |
Abstract | Chemical representations derived from deep learning are emerging as a powerful tool in areas such as drug discovery and materials innovation. Currently, this methodology has three major limitations - the cost of representation generation, risk of inherited bias, and the requirement for large amounts of data. We propose the use of multi-task learning in tandem with transfer learning to address these limitations directly. In order to avoid introducing unknown bias into multi-task learning through the task selection itself, we calculate task similarity through pairwise task affinity, and use this measure to programmatically select tasks. We test this methodology on several real-world data sets to demonstrate its potential for execution in complex and low-data environments. Finally, we utilise the task similarity to further probe the expressiveness of the learned representation through a comparison to a commonly used cheminformatics fingerprint, and show that the deep representation is able to capture more expressive task-based information. |
Tasks | Drug Discovery, Multi-Task Learning, Transfer Learning |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06334v1 |
http://arxiv.org/pdf/1809.06334v1.pdf | |
PWC | https://paperswithcode.com/paper/powerful-transferable-representations-for |
Repo | |
Framework | |
Super-Identity Convolutional Neural Network for Face Hallucination
Title | Super-Identity Convolutional Neural Network for Face Hallucination |
Authors | Kaipeng Zhang, Zhanpeng Zhang, Chia-Wen Cheng, Winston H. Hsu, Yu Qiao, Wei Liu, Tong Zhang |
Abstract | Face hallucination is a generative task to super-resolve the facial image with low resolution while human perception of face heavily relies on identity information. However, previous face hallucination approaches largely ignore facial identity recovery. This paper proposes Super-Identity Convolutional Neural Network (SICNN) to recover identity information for generating faces closed to the real identity. Specifically, we define a super-identity loss to measure the identity difference between a hallucinated face and its corresponding high-resolution face within the hypersphere identity metric space. However, directly using this loss will lead to a Dynamic Domain Divergence problem, which is caused by the large margin between the high-resolution domain and the hallucination domain. To overcome this challenge, we present a domain-integrated training approach by constructing a robust identity metric for faces from these two domains. Extensive experimental evaluations demonstrate that the proposed SICNN achieves superior visual quality over the state-of-the-art methods on a challenging task to super-resolve 12$\times$14 faces with an 8$\times$ upscaling factor. In addition, SICNN significantly improves the recognizability of ultra-low-resolution faces. |
Tasks | Face Generation, Face Hallucination |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02328v1 |
http://arxiv.org/pdf/1811.02328v1.pdf | |
PWC | https://paperswithcode.com/paper/super-identity-convolutional-neural-network |
Repo | |
Framework | |