Paper Group AWR 92
The Matrix Calculus You Need For Deep Learning. Global-Locally Self-Attentive Dialogue State Tracker. SAFE: A Neural Survival Analysis Model for Fraud Early Detection. MesoNet: a Compact Facial Video Forgery Detection Network. Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning. Learning Compressed Transforms wi …
The Matrix Calculus You Need For Deep Learning
Title | The Matrix Calculus You Need For Deep Learning |
Authors | Terence Parr, Jeremy Howard |
Abstract | This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don’t worry if you get stuck at some point along the way—just go back and reread the previous section, and try writing down and working through some examples. And if you’re still stuck, we’re happy to answer your questions in the Theory category at forums.fast.ai. Note: There is a reference section at the end of the paper summarizing all the key matrix calculus rules and terminology discussed here. See related articles at http://explained.ai |
Tasks | |
Published | 2018-02-05 |
URL | http://arxiv.org/abs/1802.01528v3 |
http://arxiv.org/pdf/1802.01528v3.pdf | |
PWC | https://paperswithcode.com/paper/the-matrix-calculus-you-need-for-deep |
Repo | https://github.com/leandromineti/ml-knowledge-graph |
Framework | none |
Global-Locally Self-Attentive Dialogue State Tracker
Title | Global-Locally Self-Attentive Dialogue State Tracker |
Authors | Victor Zhong, Caiming Xiong, Richard Socher |
Abstract | Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems. In this paper, we propose the Global-Locally Self-Attentive Dialogue State Tracker (GLAD), which learns representations of the user utterance and previous system actions with global-local modules. Our model uses global modules to share parameters between estimators for different types (called slots) of dialogue states, and uses local modules to learn slot-specific features. We show that this significantly improves tracking of rare states and achieves state-of-the-art performance on the WoZ and DSTC2 state tracking tasks. GLAD obtains 88.1% joint goal accuracy and 97.1% request accuracy on WoZ, outperforming prior work by 3.7% and 5.5%. On DSTC2, our model obtains 74.5% joint goal accuracy and 97.5% request accuracy, outperforming prior work by 1.1% and 1.0%. |
Tasks | Dialogue State Tracking, Task-Oriented Dialogue Systems |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.09655v3 |
http://arxiv.org/pdf/1805.09655v3.pdf | |
PWC | https://paperswithcode.com/paper/global-locally-self-attentive-dialogue-state |
Repo | https://github.com/salesforce/glad |
Framework | none |
SAFE: A Neural Survival Analysis Model for Fraud Early Detection
Title | SAFE: A Neural Survival Analysis Model for Fraud Early Detection |
Authors | Panpan Zheng, Shuhan Yuan, Xintao Wu |
Abstract | Many online platforms have deployed anti-fraud systems to detect and prevent fraudulent activities. However, there is usually a gap between the time that a user commits a fraudulent action and the time that the user is suspended by the platform. How to detect fraudsters in time is a challenging problem. Most of the existing approaches adopt classifiers to predict fraudsters given their activity sequences along time. The main drawback of classification models is that the prediction results between consecutive timestamps are often inconsistent. In this paper, we propose a survival analysis based fraud early detection model, SAFE, which maps dynamic user activities to survival probabilities that are guaranteed to be monotonically decreasing along time. SAFE adopts recurrent neural network (RNN) to handle user activity sequences and directly outputs hazard values at each timestamp, and then, survival probability derived from hazard values is deployed to achieve consistent predictions. Because we only observe the user suspended time instead of the fraudulent activity time in the training data, we revise the loss function of the regular survival model to achieve fraud early detection. Experimental results on two real world datasets demonstrate that SAFE outperforms both the survival analysis model and recurrent neural network model alone as well as state-of-the-art fraud early detection approaches. |
Tasks | Survival Analysis |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04683v2 |
http://arxiv.org/pdf/1809.04683v2.pdf | |
PWC | https://paperswithcode.com/paper/safe-a-neural-survival-analysis-model-for |
Repo | https://github.com/PanpanZheng/SAFE |
Framework | tf |
MesoNet: a Compact Facial Video Forgery Detection Network
Title | MesoNet: a Compact Facial Video Forgery Detection Network |
Authors | Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen |
Abstract | This paper presents a method to automatically and efficiently detect face tampering in videos, and particularly focuses on two recent techniques used to generate hyper-realistic forged videos: Deepfake and Face2Face. Traditional image forensics techniques are usually not well suited to videos due to the compression that strongly degrades the data. Thus, this paper follows a deep learning approach and presents two networks, both with a low number of layers to focus on the mesoscopic properties of images. We evaluate those fast networks on both an existing dataset and a dataset we have constituted from online videos. The tests demonstrate a very successful detection rate with more than 98% for Deepfake and 95% for Face2Face. |
Tasks | DeepFake Detection, Face Swapping, Fake Image Detection |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.00888v1 |
http://arxiv.org/pdf/1809.00888v1.pdf | |
PWC | https://paperswithcode.com/paper/mesonet-a-compact-facial-video-forgery |
Repo | https://github.com/HongguLiu/MesoNet.Pytorch |
Framework | pytorch |
Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning
Title | Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning |
Authors | Arnab Kumar Mondal, Jose Dolz, Christian Desrosiers |
Abstract | We address the problem of segmenting 3D multi-modal medical images in scenarios where very few labeled examples are available for training. Leveraging the recent success of adversarial learning for semi-supervised segmentation, we propose a novel method based on Generative Adversarial Networks (GANs) to train a segmentation model with both labeled and unlabeled images. The proposed method prevents over-fitting by learning to discriminate between true and fake patches obtained by a generator network. Our work extends current adversarial learning approaches, which focus on 2D single-modality images, to the more challenging context of 3D volumes of multiple modalities. The proposed method is evaluated on the problem of segmenting brain MRI from the iSEG-2017 and MRBrainS 2013 datasets. Significant performance improvement is reported, compared to state-of-art segmentation networks trained in a fully-supervised manner. In addition, our work presents a comprehensive analysis of different GAN architectures for semi-supervised segmentation, showing recent techniques like feature matching to yield a higher performance than conventional adversarial training approaches. Our code is publicly available at https://github.com/arnab39/FewShot_GAN-Unet3D |
Tasks | 3D Medical Imaging Segmentation, Brain Image Segmentation, Brain Segmentation, Few-Shot Semantic Segmentation, Medical Image Segmentation, Semantic Segmentation, Semi-Supervised Semantic Segmentation |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12241v1 |
http://arxiv.org/pdf/1810.12241v1.pdf | |
PWC | https://paperswithcode.com/paper/few-shot-3d-multi-modal-medical-image |
Repo | https://github.com/arnab39/FewShot_GAN-Unet3D |
Framework | tf |
Learning Compressed Transforms with Low Displacement Rank
Title | Learning Compressed Transforms with Low Displacement Rank |
Authors | Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré |
Abstract | The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual. Existing use of LDR matrices in deep learning has applied fixed displacement operators encoding forms of shift invariance akin to convolutions. We introduce a class of LDR matrices with more general displacement operators, and explicitly learn over both the operators and the low-rank component. This class generalizes several previous constructions while preserving compression and efficient computation. We prove bounds on the VC dimension of multi-layer neural networks with structured weight matrices and show empirically that our compact parameterization can reduce the sample complexity of learning. When replacing weight layers in fully-connected, convolutional, and recurrent neural networks for image classification and language modeling tasks, our new classes exceed the accuracy of existing compression approaches, and on some tasks also outperform general unstructured layers while using more than 20x fewer parameters. |
Tasks | Image Classification, Language Modelling |
Published | 2018-10-04 |
URL | http://arxiv.org/abs/1810.02309v3 |
http://arxiv.org/pdf/1810.02309v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-compressed-transforms-with-low |
Repo | https://github.com/HazyResearch/structured-nets |
Framework | pytorch |
E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
Title | E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text |
Authors | Michal Bušta, Yash Patel, Jiri Matas |
Abstract | An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks. E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem. |
Tasks | Optical Character Recognition |
Published | 2018-01-30 |
URL | http://arxiv.org/abs/1801.09919v2 |
http://arxiv.org/pdf/1801.09919v2.pdf | |
PWC | https://paperswithcode.com/paper/e2e-mlt-an-unconstrained-end-to-end-method |
Repo | https://github.com/MichalBusta/E2E-MLT |
Framework | pytorch |
Unsupervised Natural Language Generation with Denoising Autoencoders
Title | Unsupervised Natural Language Generation with Denoising Autoencoders |
Authors | Markus Freitag, Scott Roy |
Abstract | Generating text from structured data is important for various tasks such as question answering and dialog systems. We show that in at least one domain, without any supervision and only based on unlabeled text, we are able to build a Natural Language Generation (NLG) system with higher performance than supervised approaches. In our approach, we interpret the structured data as a corrupt representation of the desired output and use a denoising auto-encoder to reconstruct the sentence. We show how to introduce noise into training examples that do not contain structured data, and that the resulting denoising auto-encoder generalizes to generate correct sentences when given structured data. |
Tasks | Denoising, Question Answering, Text Generation |
Published | 2018-04-21 |
URL | http://arxiv.org/abs/1804.07899v2 |
http://arxiv.org/pdf/1804.07899v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-natural-language-generation-with |
Repo | https://github.com/mcleonard/NLG_Autoencoder |
Framework | pytorch |
Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture
Title | Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture |
Authors | Denis Tome, Matteo Toso, Lourdes Agapito, Chris Russell |
Abstract | We propose a CNN-based approach for multi-camera markerless motion capture of the human body. Unlike existing methods that first perform pose estimation on individual cameras and generate 3D models as post-processing, our approach makes use of 3D reasoning throughout a multi-stage approach. This novelty allows us to use provisional 3D models of human pose to rethink where the joints should be located in the image and to recover from past mistakes. Our principled refinement of 3D human poses lets us make use of image cues, even from images where we previously misdetected joints, to refine our estimates as part of an end-to-end approach. Finally, we demonstrate how the high-quality output of our multi-camera setup can be used as an additional training source to improve the accuracy of existing single camera models. |
Tasks | 3D Human Pose Estimation, Markerless Motion Capture, Motion Capture, Pose Estimation |
Published | 2018-08-04 |
URL | http://arxiv.org/abs/1808.01525v1 |
http://arxiv.org/pdf/1808.01525v1.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-pose-in-3d-multi-stage-refinement |
Repo | https://github.com/MatteoT90/WibergianLearning |
Framework | tf |
Interactive Grounded Language Acquisition and Generalization in a 2D World
Title | Interactive Grounded Language Acquisition and Generalization in a 2D World |
Authors | Haonan Yu, Haichao Zhang, Wei Xu |
Abstract | We build a virtual agent for learning language in a 2D maze-like world. The agent sees images of the surrounding environment, listens to a virtual teacher, and takes actions to receive rewards. It interactively learns the teacher’s language from scratch based on two language use cases: sentence-directed navigation and question answering. It learns simultaneously the visual representations of the world, the language, and the action control. By disentangling language grounding from other computational routines and sharing a concept detection function between language grounding and prediction, the agent reliably interpolates and extrapolates to interpret sentences that contain new word combinations or new words missing from training sentences. The new words are transferred from the answers of language prediction. Such a language ability is trained and evaluated on a population of over 1.6 million distinct sentences consisting of 119 object words, 8 color words, 9 spatial-relation words, and 50 grammatical words. The proposed model significantly outperforms five comparison methods for interpreting zero-shot sentences. In addition, we demonstrate human-interpretable intermediate outputs of the model in the appendix. |
Tasks | Language Acquisition, Question Answering |
Published | 2018-01-31 |
URL | http://arxiv.org/abs/1802.01433v4 |
http://arxiv.org/pdf/1802.01433v4.pdf | |
PWC | https://paperswithcode.com/paper/interactive-grounded-language-acquisition-and |
Repo | https://github.com/PaddlePaddle/XWorld |
Framework | none |
Bottom-Up Abstractive Summarization
Title | Bottom-Up Abstractive Summarization |
Authors | Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush |
Abstract | Neural network-based methods for abstractive summarization produce outputs that are more fluent than other techniques, but which can be poor at content selection. This work proposes a simple technique for addressing this issue: use a data-efficient content selector to over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step to constrain the model to likely phrases. We show that this approach improves the ability to compress text, while still generating fluent summaries. This two-step process is both simpler and higher performing than other end-to-end content selection models, leading to significant improvements on ROUGE for both the CNN-DM and NYT corpus. Furthermore, the content selector can be trained with as little as 1,000 sentences, making it easy to transfer a trained summarizer to a new domain. |
Tasks | Abstractive Text Summarization, Document Summarization |
Published | 2018-08-31 |
URL | http://arxiv.org/abs/1808.10792v2 |
http://arxiv.org/pdf/1808.10792v2.pdf | |
PWC | https://paperswithcode.com/paper/bottom-up-abstractive-summarization |
Repo | https://github.com/sebastianGehrmann/bottom-up-summary |
Framework | none |
Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo
Title | Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo |
Authors | Marton Havasi, José Miguel Hernández-Lobato, Juan José Murillo-Fuentes |
Abstract | Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models. One of the biggest challenges with these models is that exact inference is intractable. The current state-of-the-art inference method, Variational Inference (VI), employs a Gaussian approximation to the posterior distribution. This can be a potentially poor unimodal approximation of the generally multimodal posterior. In this work, we provide evidence for the non-Gaussian nature of the posterior and we apply the Stochastic Gradient Hamiltonian Monte Carlo method to generate samples. To efficiently optimize the hyperparameters, we introduce the Moving Window MCEM algorithm. This results in significantly better predictions at a lower computational cost than its VI counterpart. Thus our method establishes a new state-of-the-art for inference in DGPs. |
Tasks | Gaussian Processes |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05490v3 |
http://arxiv.org/pdf/1806.05490v3.pdf | |
PWC | https://paperswithcode.com/paper/inference-in-deep-gaussian-processes-using |
Repo | https://github.com/hughsalimbeni/bayesian_benchmarks |
Framework | none |
Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment
Title | Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment |
Authors | Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma |
Abstract | Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. Most existing AU detection works often treat face alignment as a preprocessing and handle the two tasks independently. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared features are learned firstly, and high-level features of face alignment are fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment features and global features for AU detection. Experiments on BP4D and DISFA benchmarks demonstrate that our framework significantly outperforms the state-of-the-art methods for AU detection. |
Tasks | Action Unit Detection, Face Alignment, Facial Action Unit Detection |
Published | 2018-03-15 |
URL | http://arxiv.org/abs/1803.05588v2 |
http://arxiv.org/pdf/1803.05588v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-adaptive-attention-for-joint-facial |
Repo | https://github.com/ZhiwenShao/JAANet |
Framework | pytorch |
Calibrating Multivariate Lévy Processes with Neural Networks
Title | Calibrating Multivariate Lévy Processes with Neural Networks |
Authors | Kailai Xu, Eric Darve |
Abstract | Calibrating a L'evy process usually requires characterizing its jump distribution. Traditionally this problem can be solved with nonparametric estimation using the empirical characteristic functions (ECF), assuming certain regularity, and results to date are mostly in 1D. For multivariate L'evy processes and less smooth L'evy densities, the problem becomes challenging as ECFs decay slowly and have large uncertainty because of limited observations. We solve this problem by approximating the L'evy density with a parametrized functional form; the characteristic function is then estimated using numerical integration. In our benchmarks, we used deep neural networks and found that they are robust and can capture sharp transitions in the L'evy density. They perform favorably compared to piecewise linear functions and radial basis functions. The methods and techniques developed here apply to many other problems that involve nonparametric estimation of functions embedded in a system model. |
Tasks | |
Published | 2018-12-20 |
URL | https://arxiv.org/abs/1812.08883v3 |
https://arxiv.org/pdf/1812.08883v3.pdf | |
PWC | https://paperswithcode.com/paper/calibrating-levy-process-from-observations |
Repo | https://github.com/UnofficialJuliaMirror/ADCME.jl-07b341a0-ce75-57c6-b2de-414ffdc00be5 |
Framework | tf |
Solving the Rubik’s Cube Without Human Knowledge
Title | Solving the Rubik’s Cube Without Human Knowledge |
Authors | Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi |
Abstract | A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, Chess, and Shogi without human data or domain knowledge. In these environments, a reward is always received at the end of the game, however, for many combinatorial optimization environments, rewards are sparse and episodes are not guaranteed to terminate. We introduce Autodidactic Iteration: a novel reinforcement learning algorithm that is able to teach itself how to solve the Rubik’s Cube with no human assistance. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves – less than or equal to solvers that employ human domain knowledge. |
Tasks | Combinatorial Optimization |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07470v1 |
http://arxiv.org/pdf/1805.07470v1.pdf | |
PWC | https://paperswithcode.com/paper/solving-the-rubiks-cube-without-human |
Repo | https://github.com/Dalkio/RL_rubiks |
Framework | none |