October 21, 2019

2839 words 14 mins read

Paper Group AWR 92

Paper Group AWR 92

The Matrix Calculus You Need For Deep Learning. Global-Locally Self-Attentive Dialogue State Tracker. SAFE: A Neural Survival Analysis Model for Fraud Early Detection. MesoNet: a Compact Facial Video Forgery Detection Network. Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning. Learning Compressed Transforms wi …

The Matrix Calculus You Need For Deep Learning

Title The Matrix Calculus You Need For Deep Learning
Authors Terence Parr, Jeremy Howard
Abstract This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don’t worry if you get stuck at some point along the way—just go back and reread the previous section, and try writing down and working through some examples. And if you’re still stuck, we’re happy to answer your questions in the Theory category at forums.fast.ai. Note: There is a reference section at the end of the paper summarizing all the key matrix calculus rules and terminology discussed here. See related articles at http://explained.ai
Tasks
Published 2018-02-05
URL http://arxiv.org/abs/1802.01528v3
PDF http://arxiv.org/pdf/1802.01528v3.pdf
PWC https://paperswithcode.com/paper/the-matrix-calculus-you-need-for-deep
Repo https://github.com/leandromineti/ml-knowledge-graph
Framework none

Global-Locally Self-Attentive Dialogue State Tracker

Title Global-Locally Self-Attentive Dialogue State Tracker
Authors Victor Zhong, Caiming Xiong, Richard Socher
Abstract Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems. In this paper, we propose the Global-Locally Self-Attentive Dialogue State Tracker (GLAD), which learns representations of the user utterance and previous system actions with global-local modules. Our model uses global modules to share parameters between estimators for different types (called slots) of dialogue states, and uses local modules to learn slot-specific features. We show that this significantly improves tracking of rare states and achieves state-of-the-art performance on the WoZ and DSTC2 state tracking tasks. GLAD obtains 88.1% joint goal accuracy and 97.1% request accuracy on WoZ, outperforming prior work by 3.7% and 5.5%. On DSTC2, our model obtains 74.5% joint goal accuracy and 97.5% request accuracy, outperforming prior work by 1.1% and 1.0%.
Tasks Dialogue State Tracking, Task-Oriented Dialogue Systems
Published 2018-05-19
URL http://arxiv.org/abs/1805.09655v3
PDF http://arxiv.org/pdf/1805.09655v3.pdf
PWC https://paperswithcode.com/paper/global-locally-self-attentive-dialogue-state
Repo https://github.com/salesforce/glad
Framework none

SAFE: A Neural Survival Analysis Model for Fraud Early Detection

Title SAFE: A Neural Survival Analysis Model for Fraud Early Detection
Authors Panpan Zheng, Shuhan Yuan, Xintao Wu
Abstract Many online platforms have deployed anti-fraud systems to detect and prevent fraudulent activities. However, there is usually a gap between the time that a user commits a fraudulent action and the time that the user is suspended by the platform. How to detect fraudsters in time is a challenging problem. Most of the existing approaches adopt classifiers to predict fraudsters given their activity sequences along time. The main drawback of classification models is that the prediction results between consecutive timestamps are often inconsistent. In this paper, we propose a survival analysis based fraud early detection model, SAFE, which maps dynamic user activities to survival probabilities that are guaranteed to be monotonically decreasing along time. SAFE adopts recurrent neural network (RNN) to handle user activity sequences and directly outputs hazard values at each timestamp, and then, survival probability derived from hazard values is deployed to achieve consistent predictions. Because we only observe the user suspended time instead of the fraudulent activity time in the training data, we revise the loss function of the regular survival model to achieve fraud early detection. Experimental results on two real world datasets demonstrate that SAFE outperforms both the survival analysis model and recurrent neural network model alone as well as state-of-the-art fraud early detection approaches.
Tasks Survival Analysis
Published 2018-09-12
URL http://arxiv.org/abs/1809.04683v2
PDF http://arxiv.org/pdf/1809.04683v2.pdf
PWC https://paperswithcode.com/paper/safe-a-neural-survival-analysis-model-for
Repo https://github.com/PanpanZheng/SAFE
Framework tf

MesoNet: a Compact Facial Video Forgery Detection Network

Title MesoNet: a Compact Facial Video Forgery Detection Network
Authors Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen
Abstract This paper presents a method to automatically and efficiently detect face tampering in videos, and particularly focuses on two recent techniques used to generate hyper-realistic forged videos: Deepfake and Face2Face. Traditional image forensics techniques are usually not well suited to videos due to the compression that strongly degrades the data. Thus, this paper follows a deep learning approach and presents two networks, both with a low number of layers to focus on the mesoscopic properties of images. We evaluate those fast networks on both an existing dataset and a dataset we have constituted from online videos. The tests demonstrate a very successful detection rate with more than 98% for Deepfake and 95% for Face2Face.
Tasks DeepFake Detection, Face Swapping, Fake Image Detection
Published 2018-09-04
URL http://arxiv.org/abs/1809.00888v1
PDF http://arxiv.org/pdf/1809.00888v1.pdf
PWC https://paperswithcode.com/paper/mesonet-a-compact-facial-video-forgery
Repo https://github.com/HongguLiu/MesoNet.Pytorch
Framework pytorch

Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning

Title Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning
Authors Arnab Kumar Mondal, Jose Dolz, Christian Desrosiers
Abstract We address the problem of segmenting 3D multi-modal medical images in scenarios where very few labeled examples are available for training. Leveraging the recent success of adversarial learning for semi-supervised segmentation, we propose a novel method based on Generative Adversarial Networks (GANs) to train a segmentation model with both labeled and unlabeled images. The proposed method prevents over-fitting by learning to discriminate between true and fake patches obtained by a generator network. Our work extends current adversarial learning approaches, which focus on 2D single-modality images, to the more challenging context of 3D volumes of multiple modalities. The proposed method is evaluated on the problem of segmenting brain MRI from the iSEG-2017 and MRBrainS 2013 datasets. Significant performance improvement is reported, compared to state-of-art segmentation networks trained in a fully-supervised manner. In addition, our work presents a comprehensive analysis of different GAN architectures for semi-supervised segmentation, showing recent techniques like feature matching to yield a higher performance than conventional adversarial training approaches. Our code is publicly available at https://github.com/arnab39/FewShot_GAN-Unet3D
Tasks 3D Medical Imaging Segmentation, Brain Image Segmentation, Brain Segmentation, Few-Shot Semantic Segmentation, Medical Image Segmentation, Semantic Segmentation, Semi-Supervised Semantic Segmentation
Published 2018-10-29
URL http://arxiv.org/abs/1810.12241v1
PDF http://arxiv.org/pdf/1810.12241v1.pdf
PWC https://paperswithcode.com/paper/few-shot-3d-multi-modal-medical-image
Repo https://github.com/arnab39/FewShot_GAN-Unet3D
Framework tf

Learning Compressed Transforms with Low Displacement Rank

Title Learning Compressed Transforms with Low Displacement Rank
Authors Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré
Abstract The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual. Existing use of LDR matrices in deep learning has applied fixed displacement operators encoding forms of shift invariance akin to convolutions. We introduce a class of LDR matrices with more general displacement operators, and explicitly learn over both the operators and the low-rank component. This class generalizes several previous constructions while preserving compression and efficient computation. We prove bounds on the VC dimension of multi-layer neural networks with structured weight matrices and show empirically that our compact parameterization can reduce the sample complexity of learning. When replacing weight layers in fully-connected, convolutional, and recurrent neural networks for image classification and language modeling tasks, our new classes exceed the accuracy of existing compression approaches, and on some tasks also outperform general unstructured layers while using more than 20x fewer parameters.
Tasks Image Classification, Language Modelling
Published 2018-10-04
URL http://arxiv.org/abs/1810.02309v3
PDF http://arxiv.org/pdf/1810.02309v3.pdf
PWC https://paperswithcode.com/paper/learning-compressed-transforms-with-low
Repo https://github.com/HazyResearch/structured-nets
Framework pytorch

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text

Title E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
Authors Michal Bušta, Yash Patel, Jiri Matas
Abstract An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks. E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem.
Tasks Optical Character Recognition
Published 2018-01-30
URL http://arxiv.org/abs/1801.09919v2
PDF http://arxiv.org/pdf/1801.09919v2.pdf
PWC https://paperswithcode.com/paper/e2e-mlt-an-unconstrained-end-to-end-method
Repo https://github.com/MichalBusta/E2E-MLT
Framework pytorch

Unsupervised Natural Language Generation with Denoising Autoencoders

Title Unsupervised Natural Language Generation with Denoising Autoencoders
Authors Markus Freitag, Scott Roy
Abstract Generating text from structured data is important for various tasks such as question answering and dialog systems. We show that in at least one domain, without any supervision and only based on unlabeled text, we are able to build a Natural Language Generation (NLG) system with higher performance than supervised approaches. In our approach, we interpret the structured data as a corrupt representation of the desired output and use a denoising auto-encoder to reconstruct the sentence. We show how to introduce noise into training examples that do not contain structured data, and that the resulting denoising auto-encoder generalizes to generate correct sentences when given structured data.
Tasks Denoising, Question Answering, Text Generation
Published 2018-04-21
URL http://arxiv.org/abs/1804.07899v2
PDF http://arxiv.org/pdf/1804.07899v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-natural-language-generation-with
Repo https://github.com/mcleonard/NLG_Autoencoder
Framework pytorch

Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture

Title Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture
Authors Denis Tome, Matteo Toso, Lourdes Agapito, Chris Russell
Abstract We propose a CNN-based approach for multi-camera markerless motion capture of the human body. Unlike existing methods that first perform pose estimation on individual cameras and generate 3D models as post-processing, our approach makes use of 3D reasoning throughout a multi-stage approach. This novelty allows us to use provisional 3D models of human pose to rethink where the joints should be located in the image and to recover from past mistakes. Our principled refinement of 3D human poses lets us make use of image cues, even from images where we previously misdetected joints, to refine our estimates as part of an end-to-end approach. Finally, we demonstrate how the high-quality output of our multi-camera setup can be used as an additional training source to improve the accuracy of existing single camera models.
Tasks 3D Human Pose Estimation, Markerless Motion Capture, Motion Capture, Pose Estimation
Published 2018-08-04
URL http://arxiv.org/abs/1808.01525v1
PDF http://arxiv.org/pdf/1808.01525v1.pdf
PWC https://paperswithcode.com/paper/rethinking-pose-in-3d-multi-stage-refinement
Repo https://github.com/MatteoT90/WibergianLearning
Framework tf

Interactive Grounded Language Acquisition and Generalization in a 2D World

Title Interactive Grounded Language Acquisition and Generalization in a 2D World
Authors Haonan Yu, Haichao Zhang, Wei Xu
Abstract We build a virtual agent for learning language in a 2D maze-like world. The agent sees images of the surrounding environment, listens to a virtual teacher, and takes actions to receive rewards. It interactively learns the teacher’s language from scratch based on two language use cases: sentence-directed navigation and question answering. It learns simultaneously the visual representations of the world, the language, and the action control. By disentangling language grounding from other computational routines and sharing a concept detection function between language grounding and prediction, the agent reliably interpolates and extrapolates to interpret sentences that contain new word combinations or new words missing from training sentences. The new words are transferred from the answers of language prediction. Such a language ability is trained and evaluated on a population of over 1.6 million distinct sentences consisting of 119 object words, 8 color words, 9 spatial-relation words, and 50 grammatical words. The proposed model significantly outperforms five comparison methods for interpreting zero-shot sentences. In addition, we demonstrate human-interpretable intermediate outputs of the model in the appendix.
Tasks Language Acquisition, Question Answering
Published 2018-01-31
URL http://arxiv.org/abs/1802.01433v4
PDF http://arxiv.org/pdf/1802.01433v4.pdf
PWC https://paperswithcode.com/paper/interactive-grounded-language-acquisition-and
Repo https://github.com/PaddlePaddle/XWorld
Framework none

Bottom-Up Abstractive Summarization

Title Bottom-Up Abstractive Summarization
Authors Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush
Abstract Neural network-based methods for abstractive summarization produce outputs that are more fluent than other techniques, but which can be poor at content selection. This work proposes a simple technique for addressing this issue: use a data-efficient content selector to over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step to constrain the model to likely phrases. We show that this approach improves the ability to compress text, while still generating fluent summaries. This two-step process is both simpler and higher performing than other end-to-end content selection models, leading to significant improvements on ROUGE for both the CNN-DM and NYT corpus. Furthermore, the content selector can be trained with as little as 1,000 sentences, making it easy to transfer a trained summarizer to a new domain.
Tasks Abstractive Text Summarization, Document Summarization
Published 2018-08-31
URL http://arxiv.org/abs/1808.10792v2
PDF http://arxiv.org/pdf/1808.10792v2.pdf
PWC https://paperswithcode.com/paper/bottom-up-abstractive-summarization
Repo https://github.com/sebastianGehrmann/bottom-up-summary
Framework none

Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Title Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo
Authors Marton Havasi, José Miguel Hernández-Lobato, Juan José Murillo-Fuentes
Abstract Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models. One of the biggest challenges with these models is that exact inference is intractable. The current state-of-the-art inference method, Variational Inference (VI), employs a Gaussian approximation to the posterior distribution. This can be a potentially poor unimodal approximation of the generally multimodal posterior. In this work, we provide evidence for the non-Gaussian nature of the posterior and we apply the Stochastic Gradient Hamiltonian Monte Carlo method to generate samples. To efficiently optimize the hyperparameters, we introduce the Moving Window MCEM algorithm. This results in significantly better predictions at a lower computational cost than its VI counterpart. Thus our method establishes a new state-of-the-art for inference in DGPs.
Tasks Gaussian Processes
Published 2018-06-14
URL http://arxiv.org/abs/1806.05490v3
PDF http://arxiv.org/pdf/1806.05490v3.pdf
PWC https://paperswithcode.com/paper/inference-in-deep-gaussian-processes-using
Repo https://github.com/hughsalimbeni/bayesian_benchmarks
Framework none

Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

Title Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment
Authors Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma
Abstract Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. Most existing AU detection works often treat face alignment as a preprocessing and handle the two tasks independently. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared features are learned firstly, and high-level features of face alignment are fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment features and global features for AU detection. Experiments on BP4D and DISFA benchmarks demonstrate that our framework significantly outperforms the state-of-the-art methods for AU detection.
Tasks Action Unit Detection, Face Alignment, Facial Action Unit Detection
Published 2018-03-15
URL http://arxiv.org/abs/1803.05588v2
PDF http://arxiv.org/pdf/1803.05588v2.pdf
PWC https://paperswithcode.com/paper/deep-adaptive-attention-for-joint-facial
Repo https://github.com/ZhiwenShao/JAANet
Framework pytorch

Calibrating Multivariate Lévy Processes with Neural Networks

Title Calibrating Multivariate Lévy Processes with Neural Networks
Authors Kailai Xu, Eric Darve
Abstract Calibrating a L'evy process usually requires characterizing its jump distribution. Traditionally this problem can be solved with nonparametric estimation using the empirical characteristic functions (ECF), assuming certain regularity, and results to date are mostly in 1D. For multivariate L'evy processes and less smooth L'evy densities, the problem becomes challenging as ECFs decay slowly and have large uncertainty because of limited observations. We solve this problem by approximating the L'evy density with a parametrized functional form; the characteristic function is then estimated using numerical integration. In our benchmarks, we used deep neural networks and found that they are robust and can capture sharp transitions in the L'evy density. They perform favorably compared to piecewise linear functions and radial basis functions. The methods and techniques developed here apply to many other problems that involve nonparametric estimation of functions embedded in a system model.
Tasks
Published 2018-12-20
URL https://arxiv.org/abs/1812.08883v3
PDF https://arxiv.org/pdf/1812.08883v3.pdf
PWC https://paperswithcode.com/paper/calibrating-levy-process-from-observations
Repo https://github.com/UnofficialJuliaMirror/ADCME.jl-07b341a0-ce75-57c6-b2de-414ffdc00be5
Framework tf

Solving the Rubik’s Cube Without Human Knowledge

Title Solving the Rubik’s Cube Without Human Knowledge
Authors Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi
Abstract A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, Chess, and Shogi without human data or domain knowledge. In these environments, a reward is always received at the end of the game, however, for many combinatorial optimization environments, rewards are sparse and episodes are not guaranteed to terminate. We introduce Autodidactic Iteration: a novel reinforcement learning algorithm that is able to teach itself how to solve the Rubik’s Cube with no human assistance. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves – less than or equal to solvers that employ human domain knowledge.
Tasks Combinatorial Optimization
Published 2018-05-18
URL http://arxiv.org/abs/1805.07470v1
PDF http://arxiv.org/pdf/1805.07470v1.pdf
PWC https://paperswithcode.com/paper/solving-the-rubiks-cube-without-human
Repo https://github.com/Dalkio/RL_rubiks
Framework none
comments powered by Disqus