Paper Group ANR 1257
EdgeFool: An Adversarial Image Enhancement Filter. Spoof detection using time-delay shallow neural network and feature switching. Pairwise Neural Machine Translation Evaluation. Recurrent Neural Networks (RNNs): A gentle Introduction and Overview. Multi-Frame GAN: Image Enhancement for Stereo Visual Odometry in Low Light. A sub-Riemannian model of …
EdgeFool: An Adversarial Image Enhancement Filter
Title | EdgeFool: An Adversarial Image Enhancement Filter |
Authors | Ali Shahin Shamsabadi, Changjae Oh, Andrea Cavallaro |
Abstract | Adversarial examples are intentionally perturbed images that mislead classifiers. These images can, however, be easily detected using denoising algorithms, when high-frequency spatial perturbations are used, or can be noticed by humans, when perturbations are large. In this paper, we propose EdgeFool, an adversarial image enhancement filter that learns structure-aware adversarial perturbations. EdgeFool generates adversarial images with perturbations that enhance image details via training a fully convolutional neural network end-to-end with a multi-task loss function. This loss function accounts for both image detail enhancement and class misleading objectives. We evaluate EdgeFool on three classifiers (ResNet-50, ResNet-18 and AlexNet) using two datasets (ImageNet and Private-Places365) and compare it with six adversarial methods (DeepFool, SparseFool, Carlini-Wagner, SemanticAdv, Non-targeted and Private Fast Gradient Sign Methods). Code is available at https://github.com/smartcameras/EdgeFool.git. |
Tasks | Denoising, Image Enhancement |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12227v2 |
https://arxiv.org/pdf/1910.12227v2.pdf | |
PWC | https://paperswithcode.com/paper/edgefool-an-adversarial-image-enhancement |
Repo | |
Framework | |
Spoof detection using time-delay shallow neural network and feature switching
Title | Spoof detection using time-delay shallow neural network and feature switching |
Authors | Mari Ganesh Kumar, Suvidha Rupesh Kumar, Saranya M, B. Bharathi, Hema A. Murthy |
Abstract | Detecting spoofed utterances is a fundamental problem in voice-based biometrics. Spoofing can be performed either by logical accesses like speech synthesis, voice conversion or by physical accesses such as replaying the pre-recorded utterance. Inspired by the state-of-the-art \emph{x}-vector based speaker verification approach, this paper proposes a time-delay shallow neural network (TD-SNN) for spoof detection for both logical and physical access. The novelty of the proposed TD-SNN system vis-a-vis conventional DNN systems is that it can handle variable length utterances during testing. Performance of the proposed TD-SNN systems and the baseline Gaussian mixture models (GMMs) is analyzed on the ASV-spoof-2019 dataset. The performance of the systems is measured in terms of the minimum normalized tandem detection cost function (min-t-DCF). When studied with individual features, the TD-SNN system consistently outperforms the GMM system for physical access. For logical access, GMM surpasses TD-SNN systems for certain individual features. When combined with the decision-level feature switching (DLFS) paradigm, the best TD-SNN system outperforms the best baseline GMM system on evaluation data with a relative improvement of 48.03% and 49.47% for both logical and physical access, respectively. |
Tasks | Speaker Verification, Speech Synthesis, Voice Conversion |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07453v2 |
https://arxiv.org/pdf/1904.07453v2.pdf | |
PWC | https://paperswithcode.com/paper/spoof-detection-using-x-vector-and-feature |
Repo | |
Framework | |
Pairwise Neural Machine Translation Evaluation
Title | Pairwise Neural Machine Translation Evaluation |
Authors | Francisco Guzman, Shafiq Joty, Lluis Marquez, Preslav Nakov |
Abstract | We present a novel framework for machine translation evaluation using neural networks in a pairwise setting, where the goal is to select the better translation from a pair of hypotheses, given the reference translation. In this framework, lexical, syntactic and semantic information from the reference and the two hypotheses is compacted into relatively small distributed vector representations, and fed into a multi-layer neural network that models the interaction between each of the hypotheses and the reference, as well as between the two hypotheses. These compact representations are in turn based on word and sentence embeddings, which are learned using neural networks. The framework is flexible, allows for efficient learning and classification, and yields correlation with humans that rivals the state of the art. |
Tasks | Machine Translation, Sentence Embeddings |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.03135v1 |
https://arxiv.org/pdf/1912.03135v1.pdf | |
PWC | https://paperswithcode.com/paper/pairwise-neural-machine-translation-2 |
Repo | |
Framework | |
Recurrent Neural Networks (RNNs): A gentle Introduction and Overview
Title | Recurrent Neural Networks (RNNs): A gentle Introduction and Overview |
Authors | Robin M. Schmidt |
Abstract | State-of-the-art solutions in the areas of “Language Modelling & Generating Text”, “Speech Recognition”, “Generating Image Descriptions” or “Video Tagging” have been using Recurrent Neural Networks as the foundation for their approaches. Understanding the underlying concepts is therefore of tremendous importance if we want to keep up with recent or upcoming publications in those areas. In this work we give a short overview over some of the most important concepts in the realm of Recurrent Neural Networks which enables readers to easily understand the fundamentals such as but not limited to “Backpropagation through Time” or “Long Short-Term Memory Units” as well as some of the more recent advances like the “Attention Mechanism” or “Pointer Networks”. We also give recommendations for further reading regarding more complex topics where it is necessary. |
Tasks | Language Modelling, Speech Recognition |
Published | 2019-11-23 |
URL | https://arxiv.org/abs/1912.05911v1 |
https://arxiv.org/pdf/1912.05911v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-networks-rnns-a-gentle |
Repo | |
Framework | |
Multi-Frame GAN: Image Enhancement for Stereo Visual Odometry in Low Light
Title | Multi-Frame GAN: Image Enhancement for Stereo Visual Odometry in Low Light |
Authors | Eunah Jung, Nan Yang, Daniel Cremers |
Abstract | We propose the concept of a multi-frame GAN (MFGAN) and demonstrate its potential as an image sequence enhancement for stereo visual odometry in low light conditions. We base our method on an invertible adversarial network to transfer the beneficial features of brightly illuminated scenes to the sequence in poor illumination without costly paired datasets. In order to preserve the coherent geometric cues for the translated sequence, we present a novel network architecture as well as a novel loss term combining temporal and stereo consistencies based on optical flow estimation. We demonstrate that the enhanced sequences improve the performance of state-of-the-art feature-based and direct stereo visual odometry methods on both synthetic and real datasets in challenging illumination. We also show that MFGAN outperforms other state-of-the-art image enhancement and style transfer methods by a large margin in terms of visual odometry. |
Tasks | Image Enhancement, Optical Flow Estimation, Style Transfer, Visual Odometry |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06632v1 |
https://arxiv.org/pdf/1910.06632v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-frame-gan-image-enhancement-for-stereo |
Repo | |
Framework | |
A sub-Riemannian model of the visual cortex with frequency and phase
Title | A sub-Riemannian model of the visual cortex with frequency and phase |
Authors | E. Baspinar, A. Sarti, G. Citti |
Abstract | In this paper we present a novel model of the primary visual cortex (V1) based on orientation, frequency and phase selective behavior of the V1 simple cells. We start from the first level mechanisms of visual perception: receptive profiles. The model interprets V1 as a fiber bundle over the 2-dimensional retinal plane by introducing orientation, frequency and phase as intrinsic variables. Each receptive profile on the fiber is mathematically interpreted as a rotated, frequency modulated and phase shifted Gabor function. We start from the Gabor function and show that it induces in a natural way the model geometry and the associated horizontal connectivity modeling the neural connectivity patterns in V1. We provide an image enhancement algorithm employing the model framework. The algorithm is capable of exploiting not only orientation but also frequency and phase information existing intrinsically in a 2-dimensional input image. We provide the experimental results corresponding to the enhancement algorithm. |
Tasks | Image Enhancement |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.04992v1 |
https://arxiv.org/pdf/1910.04992v1.pdf | |
PWC | https://paperswithcode.com/paper/a-sub-riemannian-model-of-the-visual-cortex |
Repo | |
Framework | |
Ab-Initio Solution of the Many-Electron Schrödinger Equation with Deep Neural Networks
Title | Ab-Initio Solution of the Many-Electron Schrödinger Equation with Deep Neural Networks |
Authors | David Pfau, James S. Spencer, Alexander G. de G. Matthews, W. M. C. Foulkes |
Abstract | Given access to accurate solutions of the many-electron Schr"odinger equation, nearly all chemistry could be derived from first principles. Exact wavefunctions of interesting chemical systems are out of reach because they are NP-hard to compute in general, but approximations can be found using polynomially-scaling algorithms. The key challenge for many of these algorithms is the choice of wavefunction approximation, or Ansatz, which must trade off between efficiency and accuracy. Neural networks have shown impressive power as accurate practical function approximators and promise as a compact wavefunction Ansatz for spin systems, but problems in electronic structure require wavefunctions that obey Fermi-Dirac statistics. Here we introduce a novel deep learning architecture, the Fermionic Neural Network, as a powerful wavefunction Ansatz for many-electron systems. The Fermionic Neural Network is able to achieve accuracy beyond other variational quantum Monte Carlo Ans"atze on a variety of atoms and small molecules. Using no data other than atomic positions and charges, we predict the dissociation curves of the nitrogen molecule and hydrogen chain, two challenging strongly-correlated systems, to significantly higher accuracy than the coupled cluster method, widely considered the most accurate scalable method for quantum chemistry at equilibrium geometry. This demonstrates that deep neural networks can improve the accuracy of variational quantum Monte Carlo to the point where it outperforms other ab-initio quantum chemistry methods, opening the possibility of accurate direct optimisation of wavefunctions for previously intractable molecules and solids. |
Tasks | |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02487v2 |
https://arxiv.org/pdf/1909.02487v2.pdf | |
PWC | https://paperswithcode.com/paper/ab-initio-solution-of-the-many-electron |
Repo | |
Framework | |
A Network-based Multimodal Data Fusion Approach for Characterizing Dynamic Multimodal Physiological Patterns
Title | A Network-based Multimodal Data Fusion Approach for Characterizing Dynamic Multimodal Physiological Patterns |
Authors | Miaolin Fan, Chun-An Chou, Sheng-Che Yen, Yingzi Lin |
Abstract | Characterizing the dynamic interactive patterns of complex systems helps gain in-depth understanding of how components interrelate with each other while performing certain functions as a whole. In this study, we present a novel multimodal data fusion approach to construct a complex network, which models the interactions of biological subsystems in the human body under emotional states through physiological responses. Joint recurrence plot and temporal network metrics are employed to integrate the multimodal information at the signal level. A benchmark public dataset of is used for evaluating our model. |
Tasks | |
Published | 2019-01-03 |
URL | http://arxiv.org/abs/1901.00877v1 |
http://arxiv.org/pdf/1901.00877v1.pdf | |
PWC | https://paperswithcode.com/paper/a-network-based-multimodal-data-fusion |
Repo | |
Framework | |
Deep Learning for Low-Field to High-Field MR: Image Quality Transfer with Probabilistic Decimation Simulator
Title | Deep Learning for Low-Field to High-Field MR: Image Quality Transfer with Probabilistic Decimation Simulator |
Authors | Hongxiang Lin, Matteo Figini, Ryutaro Tanno, Stefano B. Blumberg, Enrico Kaden, Godwin Ogbole, Biobele J. Brown, Felice D’Arco, David W. Carmichael, Ikeoluwa Lagunju, Helen J. Cross, Delmiro Fernandez-Reyes, Daniel C. Alexander |
Abstract | MR images scanned at low magnetic field ($<1$T) have lower resolution in the slice direction and lower contrast, due to a relatively small signal-to-noise ratio (SNR) than those from high field (typically 1.5T and 3T). We adapt the recent idea of Image Quality Transfer (IQT) to enhance very low-field structural images aiming to estimate the resolution, spatial coverage, and contrast of high-field images. Analogous to many learning-based image enhancement techniques, IQT generates training data from high-field scans alone by simulating low-field images through a pre-defined decimation model. However, the ground truth decimation model is not well-known in practice, and lack of its specification can bias the trained model, aggravating performance on the real low-field scans. In this paper we propose a probabilistic decimation simulator to improve robustness of model training. It is used to generate and augment various low-field images whose parameters are random variables and sampled from an empirical distribution related to tissue-specific SNR on a 0.36T scanner. The probabilistic decimation simulator is model-agnostic, that is, it can be used with any super-resolution networks. Furthermore we propose a variant of U-Net architecture to improve its learning performance. We show promising qualitative results from clinical low-field images confirming the strong efficacy of IQT in an important new application area: epilepsy diagnosis in sub-Saharan Africa where only low-field scanners are normally available. |
Tasks | Image Enhancement, Super-Resolution |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06763v1 |
https://arxiv.org/pdf/1909.06763v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-low-field-to-high-field-mr |
Repo | |
Framework | |
Hierarchical Bayesian Personalized Recommendation: A Case Study and Beyond
Title | Hierarchical Bayesian Personalized Recommendation: A Case Study and Beyond |
Authors | Zitao Liu, Zhexuan Xu, Yan Yan |
Abstract | Items in modern recommender systems are often organized in hierarchical structures. These hierarchical structures and the data within them provide valuable information for building personalized recommendation systems. In this paper, we propose a general hierarchical Bayesian learning framework, i.e., \emph{HBayes}, to learn both the structures and associated latent factors. Furthermore, we develop a variational inference algorithm that is able to learn model parameters with fast empirical convergence rate. The proposed HBayes is evaluated on two real-world datasets from different domains. The results demonstrate the benefits of our approach on item recommendation tasks, and show that it can outperform the state-of-the-art models in terms of precision, recall, and normalized discounted cumulative gain. To encourage the reproducible results, we make our code public on a git repo: \url{https://tinyurl.com/ycruhk4t}. |
Tasks | Recommendation Systems |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07371v1 |
https://arxiv.org/pdf/1908.07371v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-bayesian-personalized |
Repo | |
Framework | |
A method for Cloud Mapping in the Field of View of the Infra-Red Camera during the EUSO-SPB1 flight
Title | A method for Cloud Mapping in the Field of View of the Infra-Red Camera during the EUSO-SPB1 flight |
Authors | Alessandro Bruno, Anna Anzalone, Carlo Vigorito |
Abstract | EUSO-SPB1 was released on April 24th, 2017, from the NASA balloon launch site in Wanaka (New Zealand) and landed on the South Pacific Ocean on May 7th. The data collected by the instruments onboard the balloon were analyzed to search UV pulse signatures of UHECR (Ultra High Energy Cosmic Rays) air showers. Indirect measurements of UHECRs can be affected by cloud presence during nighttime, therefore it is crucial to know the meteorological conditions during the observation period of the detector. During the flight, the onboard EUSO-SPB1 UCIRC camera (University of Chicago Infra-Red Camera), acquired images in the field of view of the UV telescope. The available nighttime and daytime images include information on meteorological conditions of the atmosphere observed in two infra-red bands. The presence of clouds has been investigated employing a method developed to provide a dense cloudiness map for each available infra-red image. The final masks are intended to give pixel cloudiness information at the IR-camera pixel resolution that is nearly 4-times higher than the one of the UV-camera. In this work, cloudiness maps are obtained by using an expert system based on the analysis of different low-level image features. Furthermore, an image enhancement step was needed to be applied as a preprocessing step to deal with uncalibrated data. |
Tasks | Image Enhancement |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05917v1 |
https://arxiv.org/pdf/1909.05917v1.pdf | |
PWC | https://paperswithcode.com/paper/a-method-for-cloud-mapping-in-the-field-of |
Repo | |
Framework | |
Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin
Title | Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin |
Authors | Bryan Li, Xinyue Wang, Homayoon Beigi |
Abstract | We propose a system to develop a basic automatic speech recognizer(ASR) for Cantonese, a low-resource language, through transfer learning of Mandarin, a high-resource language. We take a time-delayed neural network trained on Mandarin, and perform weight transfer of several layers to a newly initialized model for Cantonese. We experiment with the number of layers transferred, their learning rates, and pretraining i-vectors. Key findings are that this approach allows for quicker training time with less data. We find that for every epoch, log-probability is smaller for transfer learning models compared to a Cantonese-only model. The transfer learning models show slight improvement in CER. |
Tasks | Speech Recognition, Transfer Learning |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09271v1 |
https://arxiv.org/pdf/1911.09271v1.pdf | |
PWC | https://paperswithcode.com/paper/cantonese-automatic-speech-recognition-using |
Repo | |
Framework | |
Adaptive Sequential Experiments with Unknown Information Flows
Title | Adaptive Sequential Experiments with Unknown Information Flows |
Authors | Yonatan Gur, Ahmadreza Momeni |
Abstract | Systems that make sequential decisions in the presence of partial feedback on actions often need to strike a balance between maximizing immediate payoffs based on available information, and acquiring new information that may be essential for maximizing future payoffs. This trade-off is captured by the multi-armed bandit (MAB) framework that has been studied and applied for designing sequential experiments when at each time epoch a single observation is collected on the action that was selected at that epoch. However, in many practical settings additional information may become available between decision epochs. We introduce a generalized MAB formulation in which auxiliary information on each arm may appear arbitrarily over time. By obtaining matching lower and upper bounds, we characterize the minimax complexity of this family of MAB problems as a function of the information arrival process, and study how salient characteristics of this process impact policy design and achievable performance. We establish the robustness of a Thompson sampling policy in the presence of additional information, but observe that other policies that are of practical importance do not exhibit such robustness. We therefore introduce a broad adaptive exploration approach for designing policies that, without any prior knowledge on the information arrival process, attain the best performance (in terms of regret rate) that is achievable when the information arrival process is a priori known. Our approach is based on adjusting MAB policies designed to perform well in the absence of auxiliary information by using dynamically customized virtual time indexes to endogenously control the exploration rate of the policy. We demonstrate our approach through appropriately adjusting known MAB policies and establishing improved performance bounds for these policies in the presence of auxiliary information. |
Tasks | |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1907.00107v1 |
https://arxiv.org/pdf/1907.00107v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-sequential-experiments-with-unknown |
Repo | |
Framework | |
Efficient hinging hyperplanes neural network and its application in nonlinear system identification
Title | Efficient hinging hyperplanes neural network and its application in nonlinear system identification |
Authors | Jun Xu, Qinghua Tao, Zhen Li, Xiangming Xi, Johan A. K. Suykens, Shuning Wang |
Abstract | In this paper, the efficient hinging hyperplanes (EHH) neural network is proposed based on the model of hinging hyperplanes (HH). The EHH neural network is a distributed representation, the training of which involves solving several convex optimization problems and is fast. It is proved that for every EHH neural network, there is an equivalent adaptive hinging hyperplanes (AHH) tree, which was also proposed based on the model of HH and find good applications in system identification. The construction of the EHH neural network includes 2 stages. First the initial structure of the EHH neural network is randomly determined and the Lasso regression is used to choose the appropriate network. To alleviate the impact of randomness, secondly, the stacking strategy is employed to formulate a more general network structure. Different from other neural networks, the EHH neural network has interpretability ability, which can be easily obtained through its ANOVA decomposition (or interaction matrix). The interpretability can then be used as a suggestion for input variable selection. The EHH neural network is applied in nonlinear system identification, the simulation results show that the regression vector selected is reasonable and the identification speed is fast, while at the same time, the simulation accuracy is satisfactory. |
Tasks | |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06518v2 |
https://arxiv.org/pdf/1905.06518v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-hinging-hyperplanes-neural-network |
Repo | |
Framework | |
Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models
Title | Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models |
Authors | Matthew LeMay, Shijian Li, Tian Guo |
Abstract | Deep learning models are increasingly used for end-user applications, supporting both novel features such as facial recognition, and traditional features, e.g. web search. To accommodate high inference throughput, it is common to host a single pre-trained Convolutional Neural Network (CNN) in dedicated cloud-based servers with hardware accelerators such as Graphics Processing Units (GPUs). However, GPUs can be orders of magnitude more expensive than traditional Central Processing Unit (CPU) servers. These resources could also be under-utilized facing dynamic workloads, which may result in inflated serving costs. One potential way to alleviate this problem is by allowing hosted models to share the underlying resources, which we refer to as multi-tenant inference serving. One of the key challenges is maximizing the resource efficiency for multi-tenant serving given hardware with diverse characteristics, models with unique response time Service Level Agreement (SLA), and dynamic inference workloads. In this paper, we present Perseus, a measurement framework that provides the basis for understanding the performance and cost trade-offs of multi-tenant model serving. We implemented Perseus in Python atop a popular cloud inference server called Nvidia TensorRT Inference Server. Leveraging Perseus, we evaluated the inference throughput and cost for serving various models and demonstrated that multi-tenant model serving led to up to 12% cost reduction. |
Tasks | |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02322v2 |
https://arxiv.org/pdf/1912.02322v2.pdf | |
PWC | https://paperswithcode.com/paper/perseus-characterizing-performance-and-cost |
Repo | |
Framework | |