October 21, 2019

2839 words 14 mins read

Paper Group AWR 92

The Matrix Calculus You Need For Deep Learning. Global-Locally Self-Attentive Dialogue State Tracker. SAFE: A Neural Survival Analysis Model for Fraud Early Detection. MesoNet: a Compact Facial Video Forgery Detection Network. Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning. Learning Compressed Transforms wi …

The Matrix Calculus You Need For Deep Learning


Title	The Matrix Calculus You Need For Deep Learning
Authors	Terence Parr, Jeremy Howard
Abstract	This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don’t worry if you get stuck at some point along the way—just go back and reread the previous section, and try writing down and working through some examples. And if you’re still stuck, we’re happy to answer your questions in the Theory category at forums.fast.ai. Note: There is a reference section at the end of the paper summarizing all the key matrix calculus rules and terminology discussed here. See related articles at http://explained.ai
Tasks
Published	2018-02-05
URL	http://arxiv.org/abs/1802.01528v3
PDF	http://arxiv.org/pdf/1802.01528v3.pdf
PWC	https://paperswithcode.com/paper/the-matrix-calculus-you-need-for-deep
Repo	https://github.com/leandromineti/ml-knowledge-graph
Framework	none

Global-Locally Self-Attentive Dialogue State Tracker


Title	Global-Locally Self-Attentive Dialogue State Tracker
Authors	Victor Zhong, Caiming Xiong, Richard Socher
Abstract	Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems. In this paper, we propose the Global-Locally Self-Attentive Dialogue State Tracker (GLAD), which learns representations of the user utterance and previous system actions with global-local modules. Our model uses global modules to share parameters between estimators for different types (called slots) of dialogue states, and uses local modules to learn slot-specific features. We show that this significantly improves tracking of rare states and achieves state-of-the-art performance on the WoZ and DSTC2 state tracking tasks. GLAD obtains 88.1% joint goal accuracy and 97.1% request accuracy on WoZ, outperforming prior work by 3.7% and 5.5%. On DSTC2, our model obtains 74.5% joint goal accuracy and 97.5% request accuracy, outperforming prior work by 1.1% and 1.0%.
Tasks	Dialogue State Tracking, Task-Oriented Dialogue Systems
Published	2018-05-19
URL	http://arxiv.org/abs/1805.09655v3
PDF	http://arxiv.org/pdf/1805.09655v3.pdf
PWC	https://paperswithcode.com/paper/global-locally-self-attentive-dialogue-state
Repo	https://github.com/salesforce/glad
Framework	none

SAFE: A Neural Survival Analysis Model for Fraud Early Detection


Title	SAFE: A Neural Survival Analysis Model for Fraud Early Detection
Authors	Panpan Zheng, Shuhan Yuan, Xintao Wu
Abstract	Many online platforms have deployed anti-fraud systems to detect and prevent fraudulent activities. However, there is usually a gap between the time that a user commits a fraudulent action and the time that the user is suspended by the platform. How to detect fraudsters in time is a challenging problem. Most of the existing approaches adopt classifiers to predict fraudsters given their activity sequences along time. The main drawback of classification models is that the prediction results between consecutive timestamps are often inconsistent. In this paper, we propose a survival analysis based fraud early detection model, SAFE, which maps dynamic user activities to survival probabilities that are guaranteed to be monotonically decreasing along time. SAFE adopts recurrent neural network (RNN) to handle user activity sequences and directly outputs hazard values at each timestamp, and then, survival probability derived from hazard values is deployed to achieve consistent predictions. Because we only observe the user suspended time instead of the fraudulent activity time in the training data, we revise the loss function of the regular survival model to achieve fraud early detection. Experimental results on two real world datasets demonstrate that SAFE outperforms both the survival analysis model and recurrent neural network model alone as well as state-of-the-art fraud early detection approaches.
Tasks	Survival Analysis
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04683v2
PDF	http://arxiv.org/pdf/1809.04683v2.pdf
PWC	https://paperswithcode.com/paper/safe-a-neural-survival-analysis-model-for
Repo	https://github.com/PanpanZheng/SAFE
Framework	tf

MesoNet: a Compact Facial Video Forgery Detection Network


Title	MesoNet: a Compact Facial Video Forgery Detection Network
Authors	Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen
Abstract	This paper presents a method to automatically and efficiently detect face tampering in videos, and particularly focuses on two recent techniques used to generate hyper-realistic forged videos: Deepfake and Face2Face. Traditional image forensics techniques are usually not well suited to videos due to the compression that strongly degrades the data. Thus, this paper follows a deep learning approach and presents two networks, both with a low number of layers to focus on the mesoscopic properties of images. We evaluate those fast networks on both an existing dataset and a dataset we have constituted from online videos. The tests demonstrate a very successful detection rate with more than 98% for Deepfake and 95% for Face2Face.
Tasks	DeepFake Detection, Face Swapping, Fake Image Detection
Published	2018-09-04
URL	http://arxiv.org/abs/1809.00888v1
PDF	http://arxiv.org/pdf/1809.00888v1.pdf
PWC	https://paperswithcode.com/paper/mesonet-a-compact-facial-video-forgery
Repo	https://github.com/HongguLiu/MesoNet.Pytorch
Framework	pytorch


Title	Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning
Authors	Arnab Kumar Mondal, Jose Dolz, Christian Desrosiers
Abstract	We address the problem of segmenting 3D multi-modal medical images in scenarios where very few labeled examples are available for training. Leveraging the recent success of adversarial learning for semi-supervised segmentation, we propose a novel method based on Generative Adversarial Networks (GANs) to train a segmentation model with both labeled and unlabeled images. The proposed method prevents over-fitting by learning to discriminate between true and fake patches obtained by a generator network. Our work extends current adversarial learning approaches, which focus on 2D single-modality images, to the more challenging context of 3D volumes of multiple modalities. The proposed method is evaluated on the problem of segmenting brain MRI from the iSEG-2017 and MRBrainS 2013 datasets. Significant performance improvement is reported, compared to state-of-art segmentation networks trained in a fully-supervised manner. In addition, our work presents a comprehensive analysis of different GAN architectures for semi-supervised segmentation, showing recent techniques like feature matching to yield a higher performance than conventional adversarial training approaches. Our code is publicly available at https://github.com/arnab39/FewShot_GAN-Unet3D
Tasks	3D Medical Imaging Segmentation, Brain Image Segmentation, Brain Segmentation, Few-Shot Semantic Segmentation, Medical Image Segmentation, Semantic Segmentation, Semi-Supervised Semantic Segmentation
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12241v1
PDF	http://arxiv.org/pdf/1810.12241v1.pdf
PWC	https://paperswithcode.com/paper/few-shot-3d-multi-modal-medical-image
Repo	https://github.com/arnab39/FewShot_GAN-Unet3D
Framework	tf

Learning Compressed Transforms with Low Displacement Rank


Title	Learning Compressed Transforms with Low Displacement Rank
Authors	Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré
Abstract	The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual. Existing use of LDR matrices in deep learning has applied fixed displacement operators encoding forms of shift invariance akin to convolutions. We introduce a class of LDR matrices with more general displacement operators, and explicitly learn over both the operators and the low-rank component. This class generalizes several previous constructions while preserving compression and efficient computation. We prove bounds on the VC dimension of multi-layer neural networks with structured weight matrices and show empirically that our compact parameterization can reduce the sample complexity of learning. When replacing weight layers in fully-connected, convolutional, and recurrent neural networks for image classification and language modeling tasks, our new classes exceed the accuracy of existing compression approaches, and on some tasks also outperform general unstructured layers while using more than 20x fewer parameters.
Tasks	Image Classification, Language Modelling
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02309v3
PDF	http://arxiv.org/pdf/1810.02309v3.pdf
PWC	https://paperswithcode.com/paper/learning-compressed-transforms-with-low
Repo	https://github.com/HazyResearch/structured-nets
Framework	pytorch

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text


Title	E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
Authors	Michal Bušta, Yash Patel, Jiri Matas
Abstract	An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks. E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem.
Tasks	Optical Character Recognition
Published	2018-01-30
URL	http://arxiv.org/abs/1801.09919v2
PDF	http://arxiv.org/pdf/1801.09919v2.pdf
PWC	https://paperswithcode.com/paper/e2e-mlt-an-unconstrained-end-to-end-method
Repo	https://github.com/MichalBusta/E2E-MLT
Framework	pytorch

Unsupervised Natural Language Generation with Denoising Autoencoders


Title	Unsupervised Natural Language Generation with Denoising Autoencoders
Authors	Markus Freitag, Scott Roy
Abstract	Generating text from structured data is important for various tasks such as question answering and dialog systems. We show that in at least one domain, without any supervision and only based on unlabeled text, we are able to build a Natural Language Generation (NLG) system with higher performance than supervised approaches. In our approach, we interpret the structured data as a corrupt representation of the desired output and use a denoising auto-encoder to reconstruct the sentence. We show how to introduce noise into training examples that do not contain structured data, and that the resulting denoising auto-encoder generalizes to generate correct sentences when given structured data.
Tasks	Denoising, Question Answering, Text Generation
Published	2018-04-21
URL	http://arxiv.org/abs/1804.07899v2
PDF	http://arxiv.org/pdf/1804.07899v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-natural-language-generation-with
Repo	https://github.com/mcleonard/NLG_Autoencoder
Framework	pytorch


Title	Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture
Authors	Denis Tome, Matteo Toso, Lourdes Agapito, Chris Russell
Abstract	We propose a CNN-based approach for multi-camera markerless motion capture of the human body. Unlike existing methods that first perform pose estimation on individual cameras and generate 3D models as post-processing, our approach makes use of 3D reasoning throughout a multi-stage approach. This novelty allows us to use provisional 3D models of human pose to rethink where the joints should be located in the image and to recover from past mistakes. Our principled refinement of 3D human poses lets us make use of image cues, even from images where we previously misdetected joints, to refine our estimates as part of an end-to-end approach. Finally, we demonstrate how the high-quality output of our multi-camera setup can be used as an additional training source to improve the accuracy of existing single camera models.
Tasks	3D Human Pose Estimation, Markerless Motion Capture, Motion Capture, Pose Estimation
Published	2018-08-04
URL	http://arxiv.org/abs/1808.01525v1
PDF	http://arxiv.org/pdf/1808.01525v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-pose-in-3d-multi-stage-refinement
Repo	https://github.com/MatteoT90/WibergianLearning
Framework	tf

Interactive Grounded Language Acquisition and Generalization in a 2D World


Title	Interactive Grounded Language Acquisition and Generalization in a 2D World
Authors	Haonan Yu, Haichao Zhang, Wei Xu
Abstract	We build a virtual agent for learning language in a 2D maze-like world. The agent sees images of the surrounding environment, listens to a virtual teacher, and takes actions to receive rewards. It interactively learns the teacher’s language from scratch based on two language use cases: sentence-directed navigation and question answering. It learns simultaneously the visual representations of the world, the language, and the action control. By disentangling language grounding from other computational routines and sharing a concept detection function between language grounding and prediction, the agent reliably interpolates and extrapolates to interpret sentences that contain new word combinations or new words missing from training sentences. The new words are transferred from the answers of language prediction. Such a language ability is trained and evaluated on a population of over 1.6 million distinct sentences consisting of 119 object words, 8 color words, 9 spatial-relation words, and 50 grammatical words. The proposed model significantly outperforms five comparison methods for interpreting zero-shot sentences. In addition, we demonstrate human-interpretable intermediate outputs of the model in the appendix.
Tasks	Language Acquisition, Question Answering
Published	2018-01-31
URL	http://arxiv.org/abs/1802.01433v4
PDF	http://arxiv.org/pdf/1802.01433v4.pdf
PWC	https://paperswithcode.com/paper/interactive-grounded-language-acquisition-and
Repo	https://github.com/PaddlePaddle/XWorld
Framework	none

Bottom-Up Abstractive Summarization


Title	Bottom-Up Abstractive Summarization
Authors	Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush
Abstract	Neural network-based methods for abstractive summarization produce outputs that are more fluent than other techniques, but which can be poor at content selection. This work proposes a simple technique for addressing this issue: use a data-efficient content selector to over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step to constrain the model to likely phrases. We show that this approach improves the ability to compress text, while still generating fluent summaries. This two-step process is both simpler and higher performing than other end-to-end content selection models, leading to significant improvements on ROUGE for both the CNN-DM and NYT corpus. Furthermore, the content selector can be trained with as little as 1,000 sentences, making it easy to transfer a trained summarizer to a new domain.
Tasks	Abstractive Text Summarization, Document Summarization
Published	2018-08-31
URL	http://arxiv.org/abs/1808.10792v2
PDF	http://arxiv.org/pdf/1808.10792v2.pdf
PWC	https://paperswithcode.com/paper/bottom-up-abstractive-summarization
Repo	https://github.com/sebastianGehrmann/bottom-up-summary
Framework	none

Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo


Title	Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo
Authors	Marton Havasi, José Miguel Hernández-Lobato, Juan José Murillo-Fuentes
Abstract	Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models. One of the biggest challenges with these models is that exact inference is intractable. The current state-of-the-art inference method, Variational Inference (VI), employs a Gaussian approximation to the posterior distribution. This can be a potentially poor unimodal approximation of the generally multimodal posterior. In this work, we provide evidence for the non-Gaussian nature of the posterior and we apply the Stochastic Gradient Hamiltonian Monte Carlo method to generate samples. To efficiently optimize the hyperparameters, we introduce the Moving Window MCEM algorithm. This results in significantly better predictions at a lower computational cost than its VI counterpart. Thus our method establishes a new state-of-the-art for inference in DGPs.
Tasks	Gaussian Processes
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05490v3
PDF	http://arxiv.org/pdf/1806.05490v3.pdf
PWC	https://paperswithcode.com/paper/inference-in-deep-gaussian-processes-using
Repo	https://github.com/hughsalimbeni/bayesian_benchmarks
Framework	none

Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment


Title	Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment
Authors	Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma
Abstract	Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. Most existing AU detection works often treat face alignment as a preprocessing and handle the two tasks independently. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared features are learned firstly, and high-level features of face alignment are fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment features and global features for AU detection. Experiments on BP4D and DISFA benchmarks demonstrate that our framework significantly outperforms the state-of-the-art methods for AU detection.
Tasks	Action Unit Detection, Face Alignment, Facial Action Unit Detection
Published	2018-03-15
URL	http://arxiv.org/abs/1803.05588v2
PDF	http://arxiv.org/pdf/1803.05588v2.pdf
PWC	https://paperswithcode.com/paper/deep-adaptive-attention-for-joint-facial
Repo	https://github.com/ZhiwenShao/JAANet
Framework	pytorch

Calibrating Multivariate Lévy Processes with Neural Networks


Title	Calibrating Multivariate Lévy Processes with Neural Networks
Authors	Kailai Xu, Eric Darve
Abstract	Calibrating a L'evy process usually requires characterizing its jump distribution. Traditionally this problem can be solved with nonparametric estimation using the empirical characteristic functions (ECF), assuming certain regularity, and results to date are mostly in 1D. For multivariate L'evy processes and less smooth L'evy densities, the problem becomes challenging as ECFs decay slowly and have large uncertainty because of limited observations. We solve this problem by approximating the L'evy density with a parametrized functional form; the characteristic function is then estimated using numerical integration. In our benchmarks, we used deep neural networks and found that they are robust and can capture sharp transitions in the L'evy density. They perform favorably compared to piecewise linear functions and radial basis functions. The methods and techniques developed here apply to many other problems that involve nonparametric estimation of functions embedded in a system model.
Tasks
Published	2018-12-20
URL	https://arxiv.org/abs/1812.08883v3
PDF	https://arxiv.org/pdf/1812.08883v3.pdf
PWC	https://paperswithcode.com/paper/calibrating-levy-process-from-observations
Repo	https://github.com/UnofficialJuliaMirror/ADCME.jl-07b341a0-ce75-57c6-b2de-414ffdc00be5
Framework	tf

Solving the Rubik’s Cube Without Human Knowledge


Title	Solving the Rubik’s Cube Without Human Knowledge
Authors	Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi
Abstract	A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, Chess, and Shogi without human data or domain knowledge. In these environments, a reward is always received at the end of the game, however, for many combinatorial optimization environments, rewards are sparse and episodes are not guaranteed to terminate. We introduce Autodidactic Iteration: a novel reinforcement learning algorithm that is able to teach itself how to solve the Rubik’s Cube with no human assistance. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves – less than or equal to solvers that employ human domain knowledge.
Tasks	Combinatorial Optimization
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07470v1
PDF	http://arxiv.org/pdf/1805.07470v1.pdf
PWC	https://paperswithcode.com/paper/solving-the-rubiks-cube-without-human
Repo	https://github.com/Dalkio/RL_rubiks
Framework	none