January 26, 2020

3311 words 16 mins read

Paper Group ANR 1452

SVAG: Unified Convergence Results for SAG-SAGA Interpolation with Stochastic Variance Adjusted Gradient Descent. Sparse multiresolution representations with adaptive kernels. Simultaneous regression and feature learning for facial landmarking. FoxNet: A Multi-face Alignment Method. Joint Face Detection and Facial Motion Retargeting for Multiple Fac …

SVAG: Unified Convergence Results for SAG-SAGA Interpolation with Stochastic Variance Adjusted Gradient Descent


Title	SVAG: Unified Convergence Results for SAG-SAGA Interpolation with Stochastic Variance Adjusted Gradient Descent
Authors	Martin Morin, Pontus Giselsson
Abstract	We analyze SVAG, a variance reduced stochastic gradient method with SAG and SAGA as special cases. Our convergence result for SVAG is the first to simultaneously capture both the biased low-variance method SAG and the unbiased high-variance method SAGA. In the case of SAGA, it matches previous upper bounds on the allowed step-size. The SVAG algorithm has a parameter that decides the bias-variance trade-off in the stochastic gradient estimate. We provide numerical examples demonstrating the intuition behind this bias-variance trade-off.
Tasks
Published	2019-03-21
URL	http://arxiv.org/abs/1903.09009v1
PDF	http://arxiv.org/pdf/1903.09009v1.pdf
PWC	https://paperswithcode.com/paper/svag-unified-convergence-results-for-sag-saga
Repo
Framework

Sparse multiresolution representations with adaptive kernels


Title	Sparse multiresolution representations with adaptive kernels
Authors	Maria Peifer, Luiz. F. O. Chamon, Santiago Paternain, Alejandro Ribeiro
Abstract	Reproducing kernel Hilbert spaces (RKHSs) are key elements of many non-parametric tools successfully used in signal processing, statistics, and machine learning. In this work, we aim to address three issues of the classical RKHS based techniques. First, they require the RKHS to be known a priori, which is unrealistic in many applications. Furthermore, the choice of RKHS affects the shape and smoothness of the solution, thus impacting its performance. Second, RKHSs are ill-equipped to deal with heterogeneous degrees of smoothness, i.e., with functions that are smooth in some parts of their domain but vary rapidly in others. Finally, the computational complexity of evaluating the solution of these methods grows with the number of data points, rendering these techniques infeasible for many applications. Though kernel learning, local kernel adaptation, and sparsity have been used to address these issues, many of these approaches are computationally intensive or forgo optimality guarantees. We tackle these problems by leveraging a novel integral representation of functions in RKHSs that allows for arbitrary centers and different kernels at each center. To address the complexity issues, we then write the function estimation problem as a sparse functional program that explicitly minimizes the support of the representation leading to low complexity solutions. Despite their non-convexity and infinite dimensionality, we show these problems can be solved exactly and efficiently by leveraging duality, and we illustrate this new approach in simulated and real data.
Tasks
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02797v1
PDF	https://arxiv.org/pdf/1905.02797v1.pdf
PWC	https://paperswithcode.com/paper/sparse-multiresolution-representations-with
Repo
Framework

Simultaneous regression and feature learning for facial landmarking


Title	Simultaneous regression and feature learning for facial landmarking
Authors	Janez Križaj, Peter Peer, Vitomir Štruc, Simon Dobrišek
Abstract	Face alignment (or facial landmarking) is an important task in many face-related applications, ranging from registration, tracking and animation to higher-level classification problems such as face, expression or attribute recognition. While several solutions have been presented in the literature for this task so far, reliably locating salient facial features across a wide range of posses still remains challenging. To address this issue, we propose in this paper a novel method for automatic facial landmark localization in 3D face data designed specifically to address appearance variability caused by significant pose variations. Our method builds on recent cascaded-regression-based methods to facial landmarking and uses a gating mechanism to incorporate multiple linear cascaded regression models each trained for a limited range of poses into a single powerful landmarking model capable of processing arbitrary posed input data. We develop two distinct approaches around the proposed gating mechanism: i) the first uses a gated multiple ridge descent (GRID) mechanism in conjunction with established (hand-crafted) HOG features for face alignment and achieves state-of-the-art landmarking performance across a wide range of facial poses, ii) the second simultaneously learns multiple-descent directions as well as binary features (SMUF) that are optimal for the alignment tasks and in addition to competitive landmarking results also ensures extremely rapid processing. We evaluate both approaches in rigorous experiments on several popular datasets of 3D face images, i.e., the FRGCv2 and Bosphorus 3D Face datasets and image collections F and G from the University of Notre Dame. The results of our evaluation show that both approaches are competitive in comparison to the state-of-the-art, while exhibiting considerable robustness to pose variations.
Tasks	Face Alignment
Published	2019-04-24
URL	http://arxiv.org/abs/1904.10787v1
PDF	http://arxiv.org/pdf/1904.10787v1.pdf
PWC	https://paperswithcode.com/paper/simultaneous-regression-and-feature-learning
Repo
Framework

FoxNet: A Multi-face Alignment Method


Title	FoxNet: A Multi-face Alignment Method
Authors	Yuxiang Wu, Zehua Cheng, Bin Huang, Yiming Chen, Xinghui Zhu, Weiyang Wang
Abstract	Multi-face alignment aims to identify geometry structures of multiple faces in an image, and its performance is essential for the many practical tasks, such as face recognition, face tracking, and face animation. In this work, we present a fast bottom-up multi-face alignment approach, which can simultaneously localize multi-person facial landmarks with high precision.In more detail, our bottom-up architecture maps the landmarks to the high-dimensional space with which landmarks of all faces are represented. By clustering the features belonging to the same face, our approach can align the multi-person facial landmarks synchronously.Extensive experiments show that our method can achieve high performance in the multi-face landmark alignment task while our model is extremely fast. Moreover, we propose a new multi-face dataset to compare the speed and precision of bottom-up face alignment method with top-down methods. Our dataset is publicly available at https://github.com/AISAResearch/FoxNet
Tasks	Face Alignment, Face Recognition
Published	2019-04-22
URL	https://arxiv.org/abs/1904.09758v2
PDF	https://arxiv.org/pdf/1904.09758v2.pdf
PWC	https://paperswithcode.com/paper/foxnet-a-multi-face-alignment-method
Repo
Framework

Joint Face Detection and Facial Motion Retargeting for Multiple Faces


Title	Joint Face Detection and Facial Motion Retargeting for Multiple Faces
Authors	Bindita Chaudhuri, Noranart Vesdapunt, Baoyuan Wang
Abstract	Facial motion retargeting is an important problem in both computer graphics and vision, which involves capturing the performance of a human face and transferring it to another 3D character. Learning 3D morphable model (3DMM) parameters from 2D face images using convolutional neural networks is common in 2D face alignment, 3D face reconstruction etc. However, existing methods either require an additional face detection step before retargeting or use a cascade of separate networks to perform detection followed by retargeting in a sequence. In this paper, we present a single end-to-end network to jointly predict the bounding box locations and 3DMM parameters for multiple faces. First, we design a novel multitask learning framework that learns a disentangled representation of 3DMM parameters for a single face. Then, we leverage the trained single face model to generate ground truth 3DMM parameters for multiple faces to train another network that performs joint face detection and motion retargeting for images with multiple faces. Experimental results show that our joint detection and retargeting network has high face detection accuracy and is robust to extreme expressions and poses while being faster than state-of-the-art methods.
Tasks	3D Face Reconstruction, Face Alignment, Face Detection, Face Reconstruction
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10744v1
PDF	http://arxiv.org/pdf/1902.10744v1.pdf
PWC	https://paperswithcode.com/paper/joint-face-detection-and-facial-motion
Repo
Framework

HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints


Title	HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints
Authors	Michael Lutter, Boris Belousov, Kim Listmann, Debora Clever, Jan Peters
Abstract	Learning optimal feedback control laws capable of executing optimal trajectories is essential for many robotic applications. Such policies can be learned using reinforcement learning or planned using optimal control. While reinforcement learning is sample inefficient, optimal control only plans an optimal trajectory from a specific starting configuration. In this paper we propose deep optimal feedback control to learn an optimal feedback policy rather than a single trajectory. By exploiting the inherent structure of the robot dynamics and strictly convex action cost, we can derive principled cost functions such that the optimal policy naturally obeys the action limits, is globally optimal and stable on the training domain given the optimal value function. The corresponding optimal value function is learned end-to-end by embedding a deep differential network in the Hamilton-Jacobi-Bellmann differential equation and minimizing the error of this equality while simultaneously decreasing the discounting from short- to far-sighted to enable the learning. Our proposed approach enables us to learn an optimal feedback control law in continuous time, that in contrast to existing approaches generates an optimal trajectory from any point in state-space without the need of replanning. The resulting approach is evaluated on non-linear systems and achieves optimal feedback control, where standard optimal control methods require frequent replanning.
Tasks
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06153v2
PDF	https://arxiv.org/pdf/1909.06153v2.pdf
PWC	https://paperswithcode.com/paper/hjb-optimal-feedback-control-with-deep
Repo
Framework

Iterative Batch Back-Translation for Neural Machine Translation: A Conceptual Model


Title	Iterative Batch Back-Translation for Neural Machine Translation: A Conceptual Model
Authors	Idris Abdulmumin, Bashir Shehu Galadanci, Abubakar Isa
Abstract	An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of back-translations of the target-side monolingual data. Recently, iterative back-translation has been shown to outperform standard back-translation albeit on some language pairs. This work proposes the iterative batch back-translation that is aimed at enhancing the standard iterative back-translation and enabling the efficient utilization of more monolingual data. After each iteration, improved back-translations of new sentences are added to the parallel data that will be used to train the final forward model. The work presents a conceptual model of the proposed approach.
Tasks	Machine Translation
Published	2019-11-26
URL	https://arxiv.org/abs/2001.11327v1
PDF	https://arxiv.org/pdf/2001.11327v1.pdf
PWC	https://paperswithcode.com/paper/iterative-batch-back-translation-for-neural
Repo
Framework

Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform


Title	Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform
Authors	Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
Abstract	Recently, we proposed short-time Fourier transform (STFT)-based loss functions for training a neural speech waveform model. In this paper, we generalize the above framework and propose a training scheme for such models based on spectral amplitude and phase losses obtained by either STFT or continuous wavelet transform (CWT), or both of them. Since CWT is capable of having time and frequency resolutions different from those of STFT and is cable of considering those closer to human auditory scales, the proposed loss functions could provide complementary information on speech signals. Experimental results showed that it is possible to train a high-quality model by using the proposed CWT spectral loss and is as good as one using STFT-based loss.
Tasks
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12392v2
PDF	http://arxiv.org/pdf/1903.12392v2.pdf
PWC	https://paperswithcode.com/paper/training-a-neural-speech-waveform-model-using
Repo
Framework

Autonomous Penetration Testing using Reinforcement Learning


Title	Autonomous Penetration Testing using Reinforcement Learning
Authors	Jonathon Schwartz, Hanna Kurniawati
Abstract	Penetration testing (pentesting) involves performing a controlled attack on a computer system in order to assess it’s security. Although an effective method for testing security, pentesting requires highly skilled practitioners and currently there is a growing shortage of skilled cyber security professionals. One avenue for alleviating this problem is automate the pentesting process using artificial intelligence techniques. Current approaches to automated pentesting have relied on model-based planning, however the cyber security landscape is rapidly changing making maintaining up-to-date models of exploits a challenge. This project investigated the application of model-free Reinforcement Learning (RL) to automated pentesting. Model-free RL has the key advantage over model-based planning of not requiring a model of the environment, instead learning the best policy through interaction with the environment. We first designed and built a fast, low compute simulator for training and testing autonomous pentesting agents. We did this by framing pentesting as a Markov Decision Process with the known configuration of the network as states, the available scans and exploits as actions, the reward determined by the value of machines on the network. We then used this simulator to investigate the application of model-free RL to pentesting. We tested the standard Q-learning algorithm using both tabular and neural network based implementations. We found that within the simulated environment both tabular and neural network implementations were able to find optimal attack paths for a range of different network topologies and sizes without having a model of action behaviour. However, the implemented algorithms were only practical for smaller networks and numbers of actions. Further work is needed in developing scalable RL algorithms and testing these algorithms in larger and higher fidelity environments.
Tasks	Q-Learning
Published	2019-05-15
URL	https://arxiv.org/abs/1905.05965v1
PDF	https://arxiv.org/pdf/1905.05965v1.pdf
PWC	https://paperswithcode.com/paper/autonomous-penetration-testing-using
Repo
Framework

Design of Artificial Intelligence Agents for Games using Deep Reinforcement Learning


Title	Design of Artificial Intelligence Agents for Games using Deep Reinforcement Learning
Authors	Andrei Claudiu Roibu
Abstract	In order perform a large variety of tasks and to achieve human-level performance in complex real-world environments, Artificial Intelligence (AI) Agents must be able to learn from their past experiences and gain both knowledge and an accurate representation of their environment from raw sensory inputs. Traditionally, AI agents have suffered from difficulties in using only sensory inputs to obtain a good representation of their environment and then mapping this representation to an efficient control policy. Deep reinforcement learning algorithms have provided a solution to this issue. In this study, the performance of different conventional and novel deep reinforcement learning algorithms was analysed. The proposed method utilises two types of algorithms, one trained with a variant of Q-learning (DQN) and another trained with SARSA learning (DSN) to assess the feasibility of using direct feedback alignment, a novel biologically plausible method for back-propagating the error. These novel agents, alongside two similar agents trained with the conventional backpropagation algorithm, were tested by using the OpenAI Gym toolkit on several classic control theory problems and Atari 2600 video games. The results of this investigation open the way into new, biologically-inspired deep reinforcement learning algorithms, and their implementation on neuromorphic hardware.
Tasks	Q-Learning
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04127v1
PDF	https://arxiv.org/pdf/1905.04127v1.pdf
PWC	https://paperswithcode.com/paper/design-of-artificial-intelligence-agents-for
Repo
Framework

ptype: Probabilistic Type Inference


Title	ptype: Probabilistic Type Inference
Authors	Taha Ceritli, Christopher K. I. Williams, James Geddes
Abstract	Type inference refers to the task of inferring the data type of a given column of data. Current approaches often fail when data contains missing data and anomalies, which are found commonly in real-world data sets. In this paper, we propose ptype, a probabilistic robust type inference method that allows us to detect such entries, and infer data types. We further show that the proposed method outperforms the existing methods.
Tasks
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10081v2
PDF	https://arxiv.org/pdf/1911.10081v2.pdf
PWC	https://paperswithcode.com/paper/ptype-probabilistic-type-inference
Repo
Framework

The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion


Title	The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion
Authors	Weicheng Cai, Haiwei Wu, Danwei Cai, Ming Li
Abstract	This paper describes our DKU replay detection system for the ASVspoof 2019 challenge. The goal is to develop spoofing countermeasure for automatic speaker recognition in physical access scenario. We leverage the countermeasure system pipeline from four aspects, including the data augmentation, feature representation, classification, and fusion. First, we introduce an utterance-level deep learning framework for anti-spoofing. It receives the variable-length feature sequence and outputs the utterance-level scores directly. Based on the framework, we try out various kinds of input feature representations extracted from either the magnitude spectrum or phase spectrum. Besides, we also perform the data augmentation strategy by applying the speed perturbation on the raw waveform. Our best single system employs a residual neural network trained by the speed-perturbed group delay gram. It achieves EER of 1.04% on the development set, as well as EER of 1.08% on the evaluation set. Finally, using the simple average score from several single systems can further improve the performance. EER of 0.24% on the development set and 0.66% on the evaluation set is obtained for our primary system.
Tasks	Data Augmentation, Speaker Recognition
Published	2019-07-05
URL	https://arxiv.org/abs/1907.02663v1
PDF	https://arxiv.org/pdf/1907.02663v1.pdf
PWC	https://paperswithcode.com/paper/the-dku-replay-detection-system-for-the
Repo
Framework

Classifying logistic vehicles in cities using Deep learning


Title	Classifying logistic vehicles in cities using Deep learning
Authors	Salma Benslimane, Simon Tamayo, Arnaud de La Fortelle
Abstract	Rapid growth in delivery and freight transportation is increasing in urban areas; as a result the use of delivery trucks and light commercial vehicles is evolving. Major cities can use traffic counting as a tool to monitor the presence of delivery vehicles in order to implement intelligent city planning measures. Classical methods for counting vehicles use mechanical, electromagnetic or pneumatic sensors, but these devices are costly, difficult to implement and only detect the presence of vehicles without giving information about their category, model or trajectory. This paper proposes a Deep Learning tool for classifying vehicles in a given image while considering different categories of logistic vehicles, namely: light-duty, medium-duty and heavy-duty vehicles. The proposed approach yields two main contributions: first we developed an architecture to create an annotated and balanced database of logistic vehicles, reducing manual annotation efforts. Second, we built a classifier that accurately classifies the logistic vehicles passing through a given road. The results of this work are: first, a database of 72 000 images for 4 vehicles classes; and second two retrained convolutional neural networks (InceptionV3 and MobileNetV2) capable of classifying vehicles with accuracies over 90%.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.11895v1
PDF	https://arxiv.org/pdf/1906.11895v1.pdf
PWC	https://paperswithcode.com/paper/classifying-logistic-vehicles-in-cities-using
Repo
Framework

Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition


Title	Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition
Authors	Shaoshi Ling, Julian Salazar, Katrin Kirchhoff
Abstract	Pretrained contextual word representations in NLP have greatly improved performance on various downstream tasks. For speech, we propose contextual frame representations that capture phonetic information at the acoustic frame level and can be used for utterance-level language, speaker, and speech recognition. These representations come from the frame-wise intermediate representations of an end-to-end, self-attentive ASR model (SAN-CTC) on spoken utterances. We first train the model on the Fisher English corpus with context-independent phoneme labels, then use its representations at inference time as features for task-specific models on the NIST LRE07 closed-set language recognition task and a Fisher speaker recognition task, giving significant improvements over the state-of-the-art on both (e.g., language EER of 4.68% on 3sec utterances, 23% relative reduction in speaker EER). Results remain competitive when using a novel dilated convolutional model for language recognition, or when ASR pretraining is done with character labels only.
Tasks	Speaker Recognition, Speech Recognition
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00457v1
PDF	https://arxiv.org/pdf/1907.00457v1.pdf
PWC	https://paperswithcode.com/paper/contextual-phonetic-pretraining-for-end-to
Repo
Framework

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences


Title	I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
Authors	Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Chenglin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans
Abstract	The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE’18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE’18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE’08 to SRE’18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.
Tasks	Domain Adaptation, Speaker Recognition
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07386v1
PDF	http://arxiv.org/pdf/1904.07386v1.pdf
PWC	https://paperswithcode.com/paper/i4u-submission-to-nist-sre-2018-leveraging
Repo
Framework