October 20, 2019

3084 words 15 mins read

Paper Group AWR 308

Automatic Recognition of Student Engagement using Deep Learning and Facial Expression. Ensemble learning with 3D convolutional neural networks for connectome-based prediction. A Comparison of Machine Learning Algorithms for the Surveillance of Autism Spectrum Disorder. Generalizing to Unseen Domains via Adversarial Data Augmentation. e-SNLI: Natura …

Automatic Recognition of Student Engagement using Deep Learning and Facial Expression


Title	Automatic Recognition of Student Engagement using Deep Learning and Facial Expression
Authors	Omid Mohamad Nezami, Mark Dras, Len Hamey, Deborah Richards, Stephen Wan, Cecile Paris
Abstract	Engagement is a key indicator of the quality of learning experience, and one that plays a major role in developing intelligent educational interfaces. Any such interface requires the ability to recognise the level of engagement in order to respond appropriately; however, there is very little existing data to learn from, and new data is expensive and difficult to acquire. This paper presents a deep learning model to improve engagement recognition from images that overcomes the data sparsity challenge by pre-training on readily available basic facial expression data, before training on specialised engagement data. In the first of two steps, a facial expression recognition model is trained to provide a rich face representation using deep learning. In the second step, we use the model’s weights to initialize our deep learning based model to recognize engagement; we term this the engagement model. We train the model on our new engagement recognition dataset with 4627 engaged and disengaged samples. We find that the engagement model outperforms effective deep learning architectures that we apply for the first time to engagement recognition, as well as approaches using histogram of oriented gradients and support vector machines.
Tasks	Facial Expression Recognition
Published	2018-08-07
URL	https://arxiv.org/abs/1808.02324v5
PDF	https://arxiv.org/pdf/1808.02324v5.pdf
PWC	https://paperswithcode.com/paper/engagement-recognition-using-deep-learning
Repo	https://github.com/omidnezami/Engagement-Recognition
Framework	tf

Ensemble learning with 3D convolutional neural networks for connectome-based prediction


Title	Ensemble learning with 3D convolutional neural networks for connectome-based prediction
Authors	Meenakshi Khosla, Keith Jamison, Amy Kuceyeski, Mert R. Sabuncu
Abstract	The specificty and sensitivity of resting state functional MRI (rs-fMRI) measurements depend on pre-processing choices, such as the parcellation scheme used to define regions of interest (ROIs). In this study, we critically evaluate the effect of brain parcellations on machine learning models applied to rs-fMRI data. Our experiments reveal a remarkable trend: On average, models with stochastic parcellations consistently perform as well as models with widely used atlases at the same spatial scale. We thus propose an ensemble learning strategy to combine the predictions from models trained on connectivity data extracted using different (e.g., stochastic) parcellations. We further present an implementation of our ensemble learning strategy with a novel 3D Convolutional Neural Network (CNN) approach. The proposed CNN approach takes advantage of the full-resolution 3D spatial structure of rs-fMRI data and fits non-linear predictive models. Our ensemble CNN framework overcomes the limitations of traditional machine learning models for connectomes that often rely on region-based summary statistics and/or linear models. We showcase our approach on a classification (autism patients versus healthy controls) and a regression problem (prediction of subject’s age), and report promising results.
Tasks
Published	2018-09-11
URL	https://arxiv.org/abs/1809.06219v2
PDF	https://arxiv.org/pdf/1809.06219v2.pdf
PWC	https://paperswithcode.com/paper/ensemble-learning-with-3d-convolutional
Repo	https://github.com/mk2299/Ensemble3DCNN_connectomes
Framework	tf

A Comparison of Machine Learning Algorithms for the Surveillance of Autism Spectrum Disorder


Title	A Comparison of Machine Learning Algorithms for the Surveillance of Autism Spectrum Disorder
Authors	Scott H Lee, Matthew J Maenner, Charles M Heilig
Abstract	The Centers for Disease Control and Prevention (CDC) coordinates a labor-intensive process to measure the prevalence of autism spectrum disorder (ASD) among children in the United States. Random forests methods have shown promise in speeding up this process, but they lag behind human classification accuracy by about 5%. We explore whether more recently available document classification algorithms can close this gap. We applied 8 supervised learning algorithms to predict whether children meet the case definition for ASD based solely on the words in their evaluations. We compared the algorithms’ performance across 10 random train-test splits of the data, using classification accuracy, F1 score, and number of positive calls to evaluate their potential use for surveillance. Across the 10 train-test cycles, the random forest and support vector machine with Naive Bayes features (NB-SVM) each achieved slightly more than 87% mean accuracy. The NB-SVM produced significantly more false negatives than false positives (P = 0.027), but the random forest did not, making its prevalence estimates very close to the true prevalence in the data. The best-performing neural network performed similarly to the random forest on both measures. The random forest performed as well as more recently available models like the NB-SVM and the neural network, and it also produced good prevalence estimates. NB-SVM may not be a good candidate for use in a fully-automated surveillance workflow due to increased false negatives. More sophisticated algorithms, like hierarchical convolutional neural networks, may not be feasible to train due to characteristics of the data. Current algorithms might perform better if the data are abstracted and processed differently and if they take into account information about the children in addition to their evaluations.
Tasks	Document Classification
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06223v3
PDF	http://arxiv.org/pdf/1804.06223v3.pdf
PWC	https://paperswithcode.com/paper/a-comparison-of-machine-learning-algorithms
Repo	https://github.com/scotthlee/autism_classification
Framework	none

Generalizing to Unseen Domains via Adversarial Data Augmentation


Title	Generalizing to Unseen Domains via Adversarial Data Augmentation
Authors	Riccardo Volpi, Hongseok Namkoong, Ozan Sener, John Duchi, Vittorio Murino, Silvio Savarese
Abstract	We are concerned with learning models that generalize well to different \emph{unseen} domains. We consider a worst-case formulation over data distributions that are near the source domain in the feature space. Only using training data from a single source distribution, we propose an iterative procedure that augments the dataset with examples from a fictitious target domain that is “hard” under the current model. We show that our iterative scheme is an adaptive data augmentation method where we append adversarial examples at each iteration. For softmax losses, we show that our method is a data-dependent regularization scheme that behaves differently from classical regularizers that regularize towards zero (e.g., ridge or lasso). On digit recognition and semantic segmentation tasks, our method learns models improve performance across a range of a priori unknown target domains.
Tasks	Data Augmentation, Semantic Segmentation
Published	2018-05-30
URL	http://arxiv.org/abs/1805.12018v2
PDF	http://arxiv.org/pdf/1805.12018v2.pdf
PWC	https://paperswithcode.com/paper/generalizing-to-unseen-domains-via
Repo	https://github.com/ricvolpi/generalize-unseen-domains
Framework	tf

e-SNLI: Natural Language Inference with Natural Language Explanations


Title	e-SNLI: Natural Language Inference with Natural Language Explanations
Authors	Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom
Abstract	In order for machine learning to garner widespread public adoption, models must be able to provide interpretable and robust explanations for their decisions, as well as learn from human-provided explanations at train time. In this work, we extend the Stanford Natural Language Inference dataset with an additional layer of human-annotated natural language explanations of the entailment relations. We further implement models that incorporate these explanations into their training process and output them at test time. We show how our corpus of explanations, which we call e-SNLI, can be used for various goals, such as obtaining full sentence justifications of a model’s decisions, improving universal sentence representations and transferring to out-of-domain NLI datasets. Our dataset thus opens up a range of research directions for using natural language explanations, both for improving models and for asserting their trust.
Tasks	Natural Language Inference
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01193v2
PDF	http://arxiv.org/pdf/1812.01193v2.pdf
PWC	https://paperswithcode.com/paper/e-snli-natural-language-inference-with
Repo	https://github.com/OanaMariaCamburu/e-SNLI
Framework	pytorch

Mode Normalization


Title	Mode Normalization
Authors	Lucas Deecke, Iain Murray, Hakan Bilen
Abstract	Normalization methods are a central building block in the deep learning toolbox. They accelerate and stabilize training, while decreasing the dependence on manually tuned learning rate schedules. When learning from multi-modal distributions, the effectiveness of batch normalization (BN), arguably the most prominent normalization method, is reduced. As a remedy, we propose a more flexible approach: by extending the normalization to more than a single mean and variance, we detect modes of data on-the-fly, jointly normalizing samples that share common features. We demonstrate that our method outperforms BN and other widely used normalization techniques in several experiments, including single and multi-task datasets.
Tasks
Published	2018-10-12
URL	http://arxiv.org/abs/1810.05466v1
PDF	http://arxiv.org/pdf/1810.05466v1.pdf
PWC	https://paperswithcode.com/paper/mode-normalization
Repo	https://github.com/philipperemy/mode-normalization
Framework	none

Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder


Title	Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder
Authors	Raphael Schumann
Abstract	In this work we present an unsupervised approach to summarize sentences in abstractive way using Variational Autoencoder (VAE). VAE are known to learn a semantically rich latent variable, representing high dimensional input. VAEs are trained by learning to reconstruct the input from the probabilistic latent variable. Explicitly providing the information about output length during training influences the VAE to not encode this information and thus can be manipulated during inference. Instructing the decoder to produce a shorter output sequence leads to expressing the input sentence with fewer words. We show on different summarization data sets, that these shorter sentences can not beat a simple baseline but yield higher ROUGE scores than trying to reconstruct the whole sentence.
Tasks	Abstractive Sentence Summarization
Published	2018-09-14
URL	http://arxiv.org/abs/1809.05233v2
PDF	http://arxiv.org/pdf/1809.05233v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-abstractive-sentence
Repo	https://github.com/raphael-sch/SumVAE
Framework	tf

Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights


Title	Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights
Authors	Arun Mallya, Dillon Davis, Svetlana Lazebnik
Abstract	This work presents a method for adapting a single, fixed deep neural network to multiple tasks without affecting performance on already learned tasks. By building upon ideas from network quantization and pruning, we learn binary masks that piggyback on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task. These masks are learned in an end-to-end differentiable fashion, and incur a low overhead of 1 bit per network parameter, per task. Even though the underlying network is fixed, the ability to mask individual weights allows for the learning of a large number of filters. We show performance comparable to dedicated fine-tuned networks for a variety of classification tasks, including those with large domain shifts from the initial task (ImageNet), and a variety of network architectures. Unlike prior work, we do not suffer from catastrophic forgetting or competition between tasks, and our performance is agnostic to task ordering. Code available at https://github.com/arunmallya/piggyback.
Tasks	Quantization
Published	2018-01-19
URL	http://arxiv.org/abs/1801.06519v2
PDF	http://arxiv.org/pdf/1801.06519v2.pdf
PWC	https://paperswithcode.com/paper/piggyback-adapting-a-single-network-to
Repo	https://github.com/ivclab/CPG
Framework	pytorch

Shampoo: Preconditioned Stochastic Tensor Optimization


Title	Shampoo: Preconditioned Stochastic Tensor Optimization
Authors	Vineet Gupta, Tomer Koren, Yoram Singer
Abstract	Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Although it involves a more complex update rule, Shampoo’s runtime per step is comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam.
Tasks	Stochastic Optimization
Published	2018-02-26
URL	http://arxiv.org/abs/1802.09568v2
PDF	http://arxiv.org/pdf/1802.09568v2.pdf
PWC	https://paperswithcode.com/paper/shampoo-preconditioned-stochastic-tensor
Repo	https://github.com/Daniil-Selikhanovych/Shampoo_optimizer
Framework	tf

Accelerating Natural Gradient with Higher-Order Invariance


Title	Accelerating Natural Gradient with Higher-Order Invariance
Authors	Yang Song, Jiaming Song, Stefano Ermon
Abstract	An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. However, this invariance property requires infinitesimal steps and is lost in practical implementations with small but finite step sizes. In this paper, we study invariance properties from a combined perspective of Riemannian geometry and numerical differential equation solving. We define the order of invariance of a numerical method to be its convergence order to an invariant solution. We propose to use higher-order integrators and geodesic corrections to obtain more invariant optimization trajectories. We prove the numerical convergence properties of geodesic corrected updates and show that they can be as computationally efficient as plain natural gradient. Experimentally, we demonstrate that invariance leads to faster optimization and our techniques improve on traditional natural gradient in deep neural network training and natural policy gradient for reinforcement learning.
Tasks
Published	2018-03-04
URL	http://arxiv.org/abs/1803.01273v2
PDF	http://arxiv.org/pdf/1803.01273v2.pdf
PWC	https://paperswithcode.com/paper/accelerating-natural-gradient-with-higher
Repo	https://github.com/ferrine/torch_anatgrad
Framework	pytorch

Mesh-TensorFlow: Deep Learning for Supercomputers


Title	Mesh-TensorFlow: Deep Learning for Supercomputers
Authors	Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman
Abstract	Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming. However, batch-splitting suffers from problems including the inability to train very large models (due to memory constraints), high latency, and inefficiency at small batch sizes. All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately, efficient model-parallel algorithms tend to be complicated to discover, describe, and to implement, particularly on large clusters. We introduce Mesh-TensorFlow, a language for specifying a general class of distributed tensor computations. Where data-parallelism can be viewed as splitting tensors and operations along the “batch” dimension, in Mesh-TensorFlow, the user can specify any tensor-dimensions to be split across any dimensions of a multi-dimensional mesh of processors. A Mesh-TensorFlow graph compiles into a SPMD program consisting of parallel operations coupled with collective communication primitives such as Allreduce. We use Mesh-TensorFlow to implement an efficient data-parallel, model-parallel version of the Transformer sequence-to-sequence model. Using TPU meshes of up to 512 cores, we train Transformer models with up to 5 billion parameters, surpassing state of the art results on WMT’14 English-to-French translation task and the one-billion-word language modeling benchmark. Mesh-Tensorflow is available at https://github.com/tensorflow/mesh .
Tasks	Language Modelling
Published	2018-11-05
URL	http://arxiv.org/abs/1811.02084v1
PDF	http://arxiv.org/pdf/1811.02084v1.pdf
PWC	https://paperswithcode.com/paper/mesh-tensorflow-deep-learning-for
Repo	https://github.com/tensorflow/mesh
Framework	tf

Memory Fusion Network for Multi-view Sequential Learning


Title	Memory Fusion Network for Multi-view Sequential Learning
Authors	Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, Louis-Philippe Morency
Abstract	Multi-view sequential learning is a fundamental problem in machine learning dealing with multi-view sequences. In a multi-view sequence, there exists two forms of interactions between different views: view-specific interactions and cross-view interactions. In this paper, we present a new neural architecture for multi-view sequential learning called the Memory Fusion Network (MFN) that explicitly accounts for both interactions in a neural architecture and continuously models them through time. The first component of the MFN is called the System of LSTMs, where view-specific interactions are learned in isolation through assigning an LSTM function to each view. The cross-view interactions are then identified using a special attention mechanism called the Delta-memory Attention Network (DMAN) and summarized through time with a Multi-view Gated Memory. Through extensive experimentation, MFN is compared to various proposed approaches for multi-view sequential learning on multiple publicly available benchmark datasets. MFN outperforms all the existing multi-view approaches. Furthermore, MFN outperforms all current state-of-the-art models, setting new state-of-the-art results for these multi-view datasets.
Tasks
Published	2018-02-03
URL	http://arxiv.org/abs/1802.00927v1
PDF	http://arxiv.org/pdf/1802.00927v1.pdf
PWC	https://paperswithcode.com/paper/memory-fusion-network-for-multi-view
Repo	https://github.com/pliang279/MFN
Framework	pytorch

Automated learning with a probabilistic programming language: Birch


Title	Automated learning with a probabilistic programming language: Birch
Authors	Lawrence M. Murray, Thomas B. Schön
Abstract	This work offers a broad perspective on probabilistic modeling and inference in light of recent advances in probabilistic programming, in which models are formally expressed in Turing-complete programming languages. We consider a typical workflow and how probabilistic programming languages can help to automate this workflow, especially in the matching of models with inference methods. We focus on two properties of a model that are critical in this matching: its structure—the conditional dependencies between random variables—and its form—the precise mathematical definition of those dependencies. While the structure and form of a probabilistic model are often fixed a priori, it is a curiosity of probabilistic programming that they need not be, and may instead vary according to random choices made during program execution. We introduce a formal description of models expressed as programs, and discuss some of the ways in which probabilistic programming languages can reveal the structure and form of these, in order to tailor inference methods. We demonstrate the ideas with a new probabilistic programming language called Birch, with a multiple object tracking example.
Tasks	Multiple Object Tracking, Object Tracking, Probabilistic Programming
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01539v1
PDF	http://arxiv.org/pdf/1810.01539v1.pdf
PWC	https://paperswithcode.com/paper/automated-learning-with-a-probabilistic
Repo	https://github.com/lawmurray/MultiObjectTracking
Framework	none

Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents


Title	Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents
Authors	Joel Z. Leibo, Cyprien de Masson d’Autume, Daniel Zoran, David Amos, Charles Beattie, Keith Anderson, Antonio García Castañeda, Manuel Sanchez, Simon Green, Audrunas Gruslys, Shane Legg, Demis Hassabis, Matthew M. Botvinick
Abstract	Psychlab is a simulated psychology laboratory inside the first-person 3D game world of DeepMind Lab (Beattie et al. 2016). Psychlab enables implementations of classical laboratory psychological experiments so that they work with both human and artificial agents. Psychlab has a simple and flexible API that enables users to easily create their own tasks. As examples, we are releasing Psychlab implementations of several classical experimental paradigms including visual search, change detection, random dot motion discrimination, and multiple object tracking. We also contribute a study of the visual psychophysics of a specific state-of-the-art deep reinforcement learning agent: UNREAL (Jaderberg et al. 2016). This study leads to the surprising conclusion that UNREAL learns more quickly about larger target stimuli than it does about smaller stimuli. In turn, this insight motivates a specific improvement in the form of a simple model of foveal vision that turns out to significantly boost UNREAL’s performance, both on Psychlab tasks, and on standard DeepMind Lab tasks. By open-sourcing Psychlab we hope to facilitate a range of future such studies that simultaneously advance deep reinforcement learning and improve its links with cognitive science.
Tasks	Multiple Object Tracking, Object Tracking
Published	2018-01-24
URL	http://arxiv.org/abs/1801.08116v2
PDF	http://arxiv.org/pdf/1801.08116v2.pdf
PWC	https://paperswithcode.com/paper/psychlab-a-psychology-laboratory-for-deep
Repo	https://github.com/susumuota/gym-oculoenv
Framework	none

Sparse and Constrained Attention for Neural Machine Translation


Title	Sparse and Constrained Attention for Neural Machine Translation
Authors	Chaitanya Malaviya, Pedro Ferreira, André F. T. Martins
Abstract	In NMT, words are sometimes dropped from the source or generated repeatedly in the translation. We explore novel strategies to address the coverage problem that change only the attention transformation. Our approach allocates fertilities to source words, used to bound the attention each word can receive. We experiment with various sparse and constrained attention transformations and propose a new one, constrained sparsemax, shown to be differentiable and sparse. Empirical evaluation is provided in three languages pairs.
Tasks	Machine Translation
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08241v1
PDF	http://arxiv.org/pdf/1805.08241v1.pdf
PWC	https://paperswithcode.com/paper/sparse-and-constrained-attention-for-neural
Repo	https://github.com/Unbabel/sparse_constrained_attention
Framework	pytorch