Paper Group ANR 1452
SVAG: Unified Convergence Results for SAG-SAGA Interpolation with Stochastic Variance Adjusted Gradient Descent. Sparse multiresolution representations with adaptive kernels. Simultaneous regression and feature learning for facial landmarking. FoxNet: A Multi-face Alignment Method. Joint Face Detection and Facial Motion Retargeting for Multiple Fac …
SVAG: Unified Convergence Results for SAG-SAGA Interpolation with Stochastic Variance Adjusted Gradient Descent
Title | SVAG: Unified Convergence Results for SAG-SAGA Interpolation with Stochastic Variance Adjusted Gradient Descent |
Authors | Martin Morin, Pontus Giselsson |
Abstract | We analyze SVAG, a variance reduced stochastic gradient method with SAG and SAGA as special cases. Our convergence result for SVAG is the first to simultaneously capture both the biased low-variance method SAG and the unbiased high-variance method SAGA. In the case of SAGA, it matches previous upper bounds on the allowed step-size. The SVAG algorithm has a parameter that decides the bias-variance trade-off in the stochastic gradient estimate. We provide numerical examples demonstrating the intuition behind this bias-variance trade-off. |
Tasks | |
Published | 2019-03-21 |
URL | http://arxiv.org/abs/1903.09009v1 |
http://arxiv.org/pdf/1903.09009v1.pdf | |
PWC | https://paperswithcode.com/paper/svag-unified-convergence-results-for-sag-saga |
Repo | |
Framework | |
Sparse multiresolution representations with adaptive kernels
Title | Sparse multiresolution representations with adaptive kernels |
Authors | Maria Peifer, Luiz. F. O. Chamon, Santiago Paternain, Alejandro Ribeiro |
Abstract | Reproducing kernel Hilbert spaces (RKHSs) are key elements of many non-parametric tools successfully used in signal processing, statistics, and machine learning. In this work, we aim to address three issues of the classical RKHS based techniques. First, they require the RKHS to be known a priori, which is unrealistic in many applications. Furthermore, the choice of RKHS affects the shape and smoothness of the solution, thus impacting its performance. Second, RKHSs are ill-equipped to deal with heterogeneous degrees of smoothness, i.e., with functions that are smooth in some parts of their domain but vary rapidly in others. Finally, the computational complexity of evaluating the solution of these methods grows with the number of data points, rendering these techniques infeasible for many applications. Though kernel learning, local kernel adaptation, and sparsity have been used to address these issues, many of these approaches are computationally intensive or forgo optimality guarantees. We tackle these problems by leveraging a novel integral representation of functions in RKHSs that allows for arbitrary centers and different kernels at each center. To address the complexity issues, we then write the function estimation problem as a sparse functional program that explicitly minimizes the support of the representation leading to low complexity solutions. Despite their non-convexity and infinite dimensionality, we show these problems can be solved exactly and efficiently by leveraging duality, and we illustrate this new approach in simulated and real data. |
Tasks | |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02797v1 |
https://arxiv.org/pdf/1905.02797v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-multiresolution-representations-with |
Repo | |
Framework | |
Simultaneous regression and feature learning for facial landmarking
Title | Simultaneous regression and feature learning for facial landmarking |
Authors | Janez Križaj, Peter Peer, Vitomir Štruc, Simon Dobrišek |
Abstract | Face alignment (or facial landmarking) is an important task in many face-related applications, ranging from registration, tracking and animation to higher-level classification problems such as face, expression or attribute recognition. While several solutions have been presented in the literature for this task so far, reliably locating salient facial features across a wide range of posses still remains challenging. To address this issue, we propose in this paper a novel method for automatic facial landmark localization in 3D face data designed specifically to address appearance variability caused by significant pose variations. Our method builds on recent cascaded-regression-based methods to facial landmarking and uses a gating mechanism to incorporate multiple linear cascaded regression models each trained for a limited range of poses into a single powerful landmarking model capable of processing arbitrary posed input data. We develop two distinct approaches around the proposed gating mechanism: i) the first uses a gated multiple ridge descent (GRID) mechanism in conjunction with established (hand-crafted) HOG features for face alignment and achieves state-of-the-art landmarking performance across a wide range of facial poses, ii) the second simultaneously learns multiple-descent directions as well as binary features (SMUF) that are optimal for the alignment tasks and in addition to competitive landmarking results also ensures extremely rapid processing. We evaluate both approaches in rigorous experiments on several popular datasets of 3D face images, i.e., the FRGCv2 and Bosphorus 3D Face datasets and image collections F and G from the University of Notre Dame. The results of our evaluation show that both approaches are competitive in comparison to the state-of-the-art, while exhibiting considerable robustness to pose variations. |
Tasks | Face Alignment |
Published | 2019-04-24 |
URL | http://arxiv.org/abs/1904.10787v1 |
http://arxiv.org/pdf/1904.10787v1.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-regression-and-feature-learning |
Repo | |
Framework | |
FoxNet: A Multi-face Alignment Method
Title | FoxNet: A Multi-face Alignment Method |
Authors | Yuxiang Wu, Zehua Cheng, Bin Huang, Yiming Chen, Xinghui Zhu, Weiyang Wang |
Abstract | Multi-face alignment aims to identify geometry structures of multiple faces in an image, and its performance is essential for the many practical tasks, such as face recognition, face tracking, and face animation. In this work, we present a fast bottom-up multi-face alignment approach, which can simultaneously localize multi-person facial landmarks with high precision.In more detail, our bottom-up architecture maps the landmarks to the high-dimensional space with which landmarks of all faces are represented. By clustering the features belonging to the same face, our approach can align the multi-person facial landmarks synchronously.Extensive experiments show that our method can achieve high performance in the multi-face landmark alignment task while our model is extremely fast. Moreover, we propose a new multi-face dataset to compare the speed and precision of bottom-up face alignment method with top-down methods. Our dataset is publicly available at https://github.com/AISAResearch/FoxNet |
Tasks | Face Alignment, Face Recognition |
Published | 2019-04-22 |
URL | https://arxiv.org/abs/1904.09758v2 |
https://arxiv.org/pdf/1904.09758v2.pdf | |
PWC | https://paperswithcode.com/paper/foxnet-a-multi-face-alignment-method |
Repo | |
Framework | |
Joint Face Detection and Facial Motion Retargeting for Multiple Faces
Title | Joint Face Detection and Facial Motion Retargeting for Multiple Faces |
Authors | Bindita Chaudhuri, Noranart Vesdapunt, Baoyuan Wang |
Abstract | Facial motion retargeting is an important problem in both computer graphics and vision, which involves capturing the performance of a human face and transferring it to another 3D character. Learning 3D morphable model (3DMM) parameters from 2D face images using convolutional neural networks is common in 2D face alignment, 3D face reconstruction etc. However, existing methods either require an additional face detection step before retargeting or use a cascade of separate networks to perform detection followed by retargeting in a sequence. In this paper, we present a single end-to-end network to jointly predict the bounding box locations and 3DMM parameters for multiple faces. First, we design a novel multitask learning framework that learns a disentangled representation of 3DMM parameters for a single face. Then, we leverage the trained single face model to generate ground truth 3DMM parameters for multiple faces to train another network that performs joint face detection and motion retargeting for images with multiple faces. Experimental results show that our joint detection and retargeting network has high face detection accuracy and is robust to extreme expressions and poses while being faster than state-of-the-art methods. |
Tasks | 3D Face Reconstruction, Face Alignment, Face Detection, Face Reconstruction |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.10744v1 |
http://arxiv.org/pdf/1902.10744v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-face-detection-and-facial-motion |
Repo | |
Framework | |
HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints
Title | HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints |
Authors | Michael Lutter, Boris Belousov, Kim Listmann, Debora Clever, Jan Peters |
Abstract | Learning optimal feedback control laws capable of executing optimal trajectories is essential for many robotic applications. Such policies can be learned using reinforcement learning or planned using optimal control. While reinforcement learning is sample inefficient, optimal control only plans an optimal trajectory from a specific starting configuration. In this paper we propose deep optimal feedback control to learn an optimal feedback policy rather than a single trajectory. By exploiting the inherent structure of the robot dynamics and strictly convex action cost, we can derive principled cost functions such that the optimal policy naturally obeys the action limits, is globally optimal and stable on the training domain given the optimal value function. The corresponding optimal value function is learned end-to-end by embedding a deep differential network in the Hamilton-Jacobi-Bellmann differential equation and minimizing the error of this equality while simultaneously decreasing the discounting from short- to far-sighted to enable the learning. Our proposed approach enables us to learn an optimal feedback control law in continuous time, that in contrast to existing approaches generates an optimal trajectory from any point in state-space without the need of replanning. The resulting approach is evaluated on non-linear systems and achieves optimal feedback control, where standard optimal control methods require frequent replanning. |
Tasks | |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06153v2 |
https://arxiv.org/pdf/1909.06153v2.pdf | |
PWC | https://paperswithcode.com/paper/hjb-optimal-feedback-control-with-deep |
Repo | |
Framework | |
Iterative Batch Back-Translation for Neural Machine Translation: A Conceptual Model
Title | Iterative Batch Back-Translation for Neural Machine Translation: A Conceptual Model |
Authors | Idris Abdulmumin, Bashir Shehu Galadanci, Abubakar Isa |
Abstract | An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of back-translations of the target-side monolingual data. Recently, iterative back-translation has been shown to outperform standard back-translation albeit on some language pairs. This work proposes the iterative batch back-translation that is aimed at enhancing the standard iterative back-translation and enabling the efficient utilization of more monolingual data. After each iteration, improved back-translations of new sentences are added to the parallel data that will be used to train the final forward model. The work presents a conceptual model of the proposed approach. |
Tasks | Machine Translation |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/2001.11327v1 |
https://arxiv.org/pdf/2001.11327v1.pdf | |
PWC | https://paperswithcode.com/paper/iterative-batch-back-translation-for-neural |
Repo | |
Framework | |
Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform
Title | Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform |
Authors | Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi |
Abstract | Recently, we proposed short-time Fourier transform (STFT)-based loss functions for training a neural speech waveform model. In this paper, we generalize the above framework and propose a training scheme for such models based on spectral amplitude and phase losses obtained by either STFT or continuous wavelet transform (CWT), or both of them. Since CWT is capable of having time and frequency resolutions different from those of STFT and is cable of considering those closer to human auditory scales, the proposed loss functions could provide complementary information on speech signals. Experimental results showed that it is possible to train a high-quality model by using the proposed CWT spectral loss and is as good as one using STFT-based loss. |
Tasks | |
Published | 2019-03-29 |
URL | http://arxiv.org/abs/1903.12392v2 |
http://arxiv.org/pdf/1903.12392v2.pdf | |
PWC | https://paperswithcode.com/paper/training-a-neural-speech-waveform-model-using |
Repo | |
Framework | |
Autonomous Penetration Testing using Reinforcement Learning
Title | Autonomous Penetration Testing using Reinforcement Learning |
Authors | Jonathon Schwartz, Hanna Kurniawati |
Abstract | Penetration testing (pentesting) involves performing a controlled attack on a computer system in order to assess it’s security. Although an effective method for testing security, pentesting requires highly skilled practitioners and currently there is a growing shortage of skilled cyber security professionals. One avenue for alleviating this problem is automate the pentesting process using artificial intelligence techniques. Current approaches to automated pentesting have relied on model-based planning, however the cyber security landscape is rapidly changing making maintaining up-to-date models of exploits a challenge. This project investigated the application of model-free Reinforcement Learning (RL) to automated pentesting. Model-free RL has the key advantage over model-based planning of not requiring a model of the environment, instead learning the best policy through interaction with the environment. We first designed and built a fast, low compute simulator for training and testing autonomous pentesting agents. We did this by framing pentesting as a Markov Decision Process with the known configuration of the network as states, the available scans and exploits as actions, the reward determined by the value of machines on the network. We then used this simulator to investigate the application of model-free RL to pentesting. We tested the standard Q-learning algorithm using both tabular and neural network based implementations. We found that within the simulated environment both tabular and neural network implementations were able to find optimal attack paths for a range of different network topologies and sizes without having a model of action behaviour. However, the implemented algorithms were only practical for smaller networks and numbers of actions. Further work is needed in developing scalable RL algorithms and testing these algorithms in larger and higher fidelity environments. |
Tasks | Q-Learning |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.05965v1 |
https://arxiv.org/pdf/1905.05965v1.pdf | |
PWC | https://paperswithcode.com/paper/autonomous-penetration-testing-using |
Repo | |
Framework | |
Design of Artificial Intelligence Agents for Games using Deep Reinforcement Learning
Title | Design of Artificial Intelligence Agents for Games using Deep Reinforcement Learning |
Authors | Andrei Claudiu Roibu |
Abstract | In order perform a large variety of tasks and to achieve human-level performance in complex real-world environments, Artificial Intelligence (AI) Agents must be able to learn from their past experiences and gain both knowledge and an accurate representation of their environment from raw sensory inputs. Traditionally, AI agents have suffered from difficulties in using only sensory inputs to obtain a good representation of their environment and then mapping this representation to an efficient control policy. Deep reinforcement learning algorithms have provided a solution to this issue. In this study, the performance of different conventional and novel deep reinforcement learning algorithms was analysed. The proposed method utilises two types of algorithms, one trained with a variant of Q-learning (DQN) and another trained with SARSA learning (DSN) to assess the feasibility of using direct feedback alignment, a novel biologically plausible method for back-propagating the error. These novel agents, alongside two similar agents trained with the conventional backpropagation algorithm, were tested by using the OpenAI Gym toolkit on several classic control theory problems and Atari 2600 video games. The results of this investigation open the way into new, biologically-inspired deep reinforcement learning algorithms, and their implementation on neuromorphic hardware. |
Tasks | Q-Learning |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.04127v1 |
https://arxiv.org/pdf/1905.04127v1.pdf | |
PWC | https://paperswithcode.com/paper/design-of-artificial-intelligence-agents-for |
Repo | |
Framework | |
ptype: Probabilistic Type Inference
Title | ptype: Probabilistic Type Inference |
Authors | Taha Ceritli, Christopher K. I. Williams, James Geddes |
Abstract | Type inference refers to the task of inferring the data type of a given column of data. Current approaches often fail when data contains missing data and anomalies, which are found commonly in real-world data sets. In this paper, we propose ptype, a probabilistic robust type inference method that allows us to detect such entries, and infer data types. We further show that the proposed method outperforms the existing methods. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10081v2 |
https://arxiv.org/pdf/1911.10081v2.pdf | |
PWC | https://paperswithcode.com/paper/ptype-probabilistic-type-inference |
Repo | |
Framework | |
The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion
Title | The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion |
Authors | Weicheng Cai, Haiwei Wu, Danwei Cai, Ming Li |
Abstract | This paper describes our DKU replay detection system for the ASVspoof 2019 challenge. The goal is to develop spoofing countermeasure for automatic speaker recognition in physical access scenario. We leverage the countermeasure system pipeline from four aspects, including the data augmentation, feature representation, classification, and fusion. First, we introduce an utterance-level deep learning framework for anti-spoofing. It receives the variable-length feature sequence and outputs the utterance-level scores directly. Based on the framework, we try out various kinds of input feature representations extracted from either the magnitude spectrum or phase spectrum. Besides, we also perform the data augmentation strategy by applying the speed perturbation on the raw waveform. Our best single system employs a residual neural network trained by the speed-perturbed group delay gram. It achieves EER of 1.04% on the development set, as well as EER of 1.08% on the evaluation set. Finally, using the simple average score from several single systems can further improve the performance. EER of 0.24% on the development set and 0.66% on the evaluation set is obtained for our primary system. |
Tasks | Data Augmentation, Speaker Recognition |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.02663v1 |
https://arxiv.org/pdf/1907.02663v1.pdf | |
PWC | https://paperswithcode.com/paper/the-dku-replay-detection-system-for-the |
Repo | |
Framework | |
Classifying logistic vehicles in cities using Deep learning
Title | Classifying logistic vehicles in cities using Deep learning |
Authors | Salma Benslimane, Simon Tamayo, Arnaud de La Fortelle |
Abstract | Rapid growth in delivery and freight transportation is increasing in urban areas; as a result the use of delivery trucks and light commercial vehicles is evolving. Major cities can use traffic counting as a tool to monitor the presence of delivery vehicles in order to implement intelligent city planning measures. Classical methods for counting vehicles use mechanical, electromagnetic or pneumatic sensors, but these devices are costly, difficult to implement and only detect the presence of vehicles without giving information about their category, model or trajectory. This paper proposes a Deep Learning tool for classifying vehicles in a given image while considering different categories of logistic vehicles, namely: light-duty, medium-duty and heavy-duty vehicles. The proposed approach yields two main contributions: first we developed an architecture to create an annotated and balanced database of logistic vehicles, reducing manual annotation efforts. Second, we built a classifier that accurately classifies the logistic vehicles passing through a given road. The results of this work are: first, a database of 72 000 images for 4 vehicles classes; and second two retrained convolutional neural networks (InceptionV3 and MobileNetV2) capable of classifying vehicles with accuracies over 90%. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.11895v1 |
https://arxiv.org/pdf/1906.11895v1.pdf | |
PWC | https://paperswithcode.com/paper/classifying-logistic-vehicles-in-cities-using |
Repo | |
Framework | |
Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition
Title | Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition |
Authors | Shaoshi Ling, Julian Salazar, Katrin Kirchhoff |
Abstract | Pretrained contextual word representations in NLP have greatly improved performance on various downstream tasks. For speech, we propose contextual frame representations that capture phonetic information at the acoustic frame level and can be used for utterance-level language, speaker, and speech recognition. These representations come from the frame-wise intermediate representations of an end-to-end, self-attentive ASR model (SAN-CTC) on spoken utterances. We first train the model on the Fisher English corpus with context-independent phoneme labels, then use its representations at inference time as features for task-specific models on the NIST LRE07 closed-set language recognition task and a Fisher speaker recognition task, giving significant improvements over the state-of-the-art on both (e.g., language EER of 4.68% on 3sec utterances, 23% relative reduction in speaker EER). Results remain competitive when using a novel dilated convolutional model for language recognition, or when ASR pretraining is done with character labels only. |
Tasks | Speaker Recognition, Speech Recognition |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00457v1 |
https://arxiv.org/pdf/1907.00457v1.pdf | |
PWC | https://paperswithcode.com/paper/contextual-phonetic-pretraining-for-end-to |
Repo | |
Framework | |
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
Title | I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences |
Authors | Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Chenglin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans |
Abstract | The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE’18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE’18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE’08 to SRE’18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation. |
Tasks | Domain Adaptation, Speaker Recognition |
Published | 2019-04-16 |
URL | http://arxiv.org/abs/1904.07386v1 |
http://arxiv.org/pdf/1904.07386v1.pdf | |
PWC | https://paperswithcode.com/paper/i4u-submission-to-nist-sre-2018-leveraging |
Repo | |
Framework | |