Paper Group AWR 308
Automatic Recognition of Student Engagement using Deep Learning and Facial Expression. Ensemble learning with 3D convolutional neural networks for connectome-based prediction. A Comparison of Machine Learning Algorithms for the Surveillance of Autism Spectrum Disorder. Generalizing to Unseen Domains via Adversarial Data Augmentation. e-SNLI: Natura …
Automatic Recognition of Student Engagement using Deep Learning and Facial Expression
Title | Automatic Recognition of Student Engagement using Deep Learning and Facial Expression |
Authors | Omid Mohamad Nezami, Mark Dras, Len Hamey, Deborah Richards, Stephen Wan, Cecile Paris |
Abstract | Engagement is a key indicator of the quality of learning experience, and one that plays a major role in developing intelligent educational interfaces. Any such interface requires the ability to recognise the level of engagement in order to respond appropriately; however, there is very little existing data to learn from, and new data is expensive and difficult to acquire. This paper presents a deep learning model to improve engagement recognition from images that overcomes the data sparsity challenge by pre-training on readily available basic facial expression data, before training on specialised engagement data. In the first of two steps, a facial expression recognition model is trained to provide a rich face representation using deep learning. In the second step, we use the model’s weights to initialize our deep learning based model to recognize engagement; we term this the engagement model. We train the model on our new engagement recognition dataset with 4627 engaged and disengaged samples. We find that the engagement model outperforms effective deep learning architectures that we apply for the first time to engagement recognition, as well as approaches using histogram of oriented gradients and support vector machines. |
Tasks | Facial Expression Recognition |
Published | 2018-08-07 |
URL | https://arxiv.org/abs/1808.02324v5 |
https://arxiv.org/pdf/1808.02324v5.pdf | |
PWC | https://paperswithcode.com/paper/engagement-recognition-using-deep-learning |
Repo | https://github.com/omidnezami/Engagement-Recognition |
Framework | tf |
Ensemble learning with 3D convolutional neural networks for connectome-based prediction
Title | Ensemble learning with 3D convolutional neural networks for connectome-based prediction |
Authors | Meenakshi Khosla, Keith Jamison, Amy Kuceyeski, Mert R. Sabuncu |
Abstract | The specificty and sensitivity of resting state functional MRI (rs-fMRI) measurements depend on pre-processing choices, such as the parcellation scheme used to define regions of interest (ROIs). In this study, we critically evaluate the effect of brain parcellations on machine learning models applied to rs-fMRI data. Our experiments reveal a remarkable trend: On average, models with stochastic parcellations consistently perform as well as models with widely used atlases at the same spatial scale. We thus propose an ensemble learning strategy to combine the predictions from models trained on connectivity data extracted using different (e.g., stochastic) parcellations. We further present an implementation of our ensemble learning strategy with a novel 3D Convolutional Neural Network (CNN) approach. The proposed CNN approach takes advantage of the full-resolution 3D spatial structure of rs-fMRI data and fits non-linear predictive models. Our ensemble CNN framework overcomes the limitations of traditional machine learning models for connectomes that often rely on region-based summary statistics and/or linear models. We showcase our approach on a classification (autism patients versus healthy controls) and a regression problem (prediction of subject’s age), and report promising results. |
Tasks | |
Published | 2018-09-11 |
URL | https://arxiv.org/abs/1809.06219v2 |
https://arxiv.org/pdf/1809.06219v2.pdf | |
PWC | https://paperswithcode.com/paper/ensemble-learning-with-3d-convolutional |
Repo | https://github.com/mk2299/Ensemble3DCNN_connectomes |
Framework | tf |
A Comparison of Machine Learning Algorithms for the Surveillance of Autism Spectrum Disorder
Title | A Comparison of Machine Learning Algorithms for the Surveillance of Autism Spectrum Disorder |
Authors | Scott H Lee, Matthew J Maenner, Charles M Heilig |
Abstract | The Centers for Disease Control and Prevention (CDC) coordinates a labor-intensive process to measure the prevalence of autism spectrum disorder (ASD) among children in the United States. Random forests methods have shown promise in speeding up this process, but they lag behind human classification accuracy by about 5%. We explore whether more recently available document classification algorithms can close this gap. We applied 8 supervised learning algorithms to predict whether children meet the case definition for ASD based solely on the words in their evaluations. We compared the algorithms’ performance across 10 random train-test splits of the data, using classification accuracy, F1 score, and number of positive calls to evaluate their potential use for surveillance. Across the 10 train-test cycles, the random forest and support vector machine with Naive Bayes features (NB-SVM) each achieved slightly more than 87% mean accuracy. The NB-SVM produced significantly more false negatives than false positives (P = 0.027), but the random forest did not, making its prevalence estimates very close to the true prevalence in the data. The best-performing neural network performed similarly to the random forest on both measures. The random forest performed as well as more recently available models like the NB-SVM and the neural network, and it also produced good prevalence estimates. NB-SVM may not be a good candidate for use in a fully-automated surveillance workflow due to increased false negatives. More sophisticated algorithms, like hierarchical convolutional neural networks, may not be feasible to train due to characteristics of the data. Current algorithms might perform better if the data are abstracted and processed differently and if they take into account information about the children in addition to their evaluations. |
Tasks | Document Classification |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06223v3 |
http://arxiv.org/pdf/1804.06223v3.pdf | |
PWC | https://paperswithcode.com/paper/a-comparison-of-machine-learning-algorithms |
Repo | https://github.com/scotthlee/autism_classification |
Framework | none |
Generalizing to Unseen Domains via Adversarial Data Augmentation
Title | Generalizing to Unseen Domains via Adversarial Data Augmentation |
Authors | Riccardo Volpi, Hongseok Namkoong, Ozan Sener, John Duchi, Vittorio Murino, Silvio Savarese |
Abstract | We are concerned with learning models that generalize well to different \emph{unseen} domains. We consider a worst-case formulation over data distributions that are near the source domain in the feature space. Only using training data from a single source distribution, we propose an iterative procedure that augments the dataset with examples from a fictitious target domain that is “hard” under the current model. We show that our iterative scheme is an adaptive data augmentation method where we append adversarial examples at each iteration. For softmax losses, we show that our method is a data-dependent regularization scheme that behaves differently from classical regularizers that regularize towards zero (e.g., ridge or lasso). On digit recognition and semantic segmentation tasks, our method learns models improve performance across a range of a priori unknown target domains. |
Tasks | Data Augmentation, Semantic Segmentation |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.12018v2 |
http://arxiv.org/pdf/1805.12018v2.pdf | |
PWC | https://paperswithcode.com/paper/generalizing-to-unseen-domains-via |
Repo | https://github.com/ricvolpi/generalize-unseen-domains |
Framework | tf |
e-SNLI: Natural Language Inference with Natural Language Explanations
Title | e-SNLI: Natural Language Inference with Natural Language Explanations |
Authors | Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom |
Abstract | In order for machine learning to garner widespread public adoption, models must be able to provide interpretable and robust explanations for their decisions, as well as learn from human-provided explanations at train time. In this work, we extend the Stanford Natural Language Inference dataset with an additional layer of human-annotated natural language explanations of the entailment relations. We further implement models that incorporate these explanations into their training process and output them at test time. We show how our corpus of explanations, which we call e-SNLI, can be used for various goals, such as obtaining full sentence justifications of a model’s decisions, improving universal sentence representations and transferring to out-of-domain NLI datasets. Our dataset thus opens up a range of research directions for using natural language explanations, both for improving models and for asserting their trust. |
Tasks | Natural Language Inference |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01193v2 |
http://arxiv.org/pdf/1812.01193v2.pdf | |
PWC | https://paperswithcode.com/paper/e-snli-natural-language-inference-with |
Repo | https://github.com/OanaMariaCamburu/e-SNLI |
Framework | pytorch |
Mode Normalization
Title | Mode Normalization |
Authors | Lucas Deecke, Iain Murray, Hakan Bilen |
Abstract | Normalization methods are a central building block in the deep learning toolbox. They accelerate and stabilize training, while decreasing the dependence on manually tuned learning rate schedules. When learning from multi-modal distributions, the effectiveness of batch normalization (BN), arguably the most prominent normalization method, is reduced. As a remedy, we propose a more flexible approach: by extending the normalization to more than a single mean and variance, we detect modes of data on-the-fly, jointly normalizing samples that share common features. We demonstrate that our method outperforms BN and other widely used normalization techniques in several experiments, including single and multi-task datasets. |
Tasks | |
Published | 2018-10-12 |
URL | http://arxiv.org/abs/1810.05466v1 |
http://arxiv.org/pdf/1810.05466v1.pdf | |
PWC | https://paperswithcode.com/paper/mode-normalization |
Repo | https://github.com/philipperemy/mode-normalization |
Framework | none |
Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder
Title | Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder |
Authors | Raphael Schumann |
Abstract | In this work we present an unsupervised approach to summarize sentences in abstractive way using Variational Autoencoder (VAE). VAE are known to learn a semantically rich latent variable, representing high dimensional input. VAEs are trained by learning to reconstruct the input from the probabilistic latent variable. Explicitly providing the information about output length during training influences the VAE to not encode this information and thus can be manipulated during inference. Instructing the decoder to produce a shorter output sequence leads to expressing the input sentence with fewer words. We show on different summarization data sets, that these shorter sentences can not beat a simple baseline but yield higher ROUGE scores than trying to reconstruct the whole sentence. |
Tasks | Abstractive Sentence Summarization |
Published | 2018-09-14 |
URL | http://arxiv.org/abs/1809.05233v2 |
http://arxiv.org/pdf/1809.05233v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-abstractive-sentence |
Repo | https://github.com/raphael-sch/SumVAE |
Framework | tf |
Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights
Title | Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights |
Authors | Arun Mallya, Dillon Davis, Svetlana Lazebnik |
Abstract | This work presents a method for adapting a single, fixed deep neural network to multiple tasks without affecting performance on already learned tasks. By building upon ideas from network quantization and pruning, we learn binary masks that piggyback on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task. These masks are learned in an end-to-end differentiable fashion, and incur a low overhead of 1 bit per network parameter, per task. Even though the underlying network is fixed, the ability to mask individual weights allows for the learning of a large number of filters. We show performance comparable to dedicated fine-tuned networks for a variety of classification tasks, including those with large domain shifts from the initial task (ImageNet), and a variety of network architectures. Unlike prior work, we do not suffer from catastrophic forgetting or competition between tasks, and our performance is agnostic to task ordering. Code available at https://github.com/arunmallya/piggyback. |
Tasks | Quantization |
Published | 2018-01-19 |
URL | http://arxiv.org/abs/1801.06519v2 |
http://arxiv.org/pdf/1801.06519v2.pdf | |
PWC | https://paperswithcode.com/paper/piggyback-adapting-a-single-network-to |
Repo | https://github.com/ivclab/CPG |
Framework | pytorch |
Shampoo: Preconditioned Stochastic Tensor Optimization
Title | Shampoo: Preconditioned Stochastic Tensor Optimization |
Authors | Vineet Gupta, Tomer Koren, Yoram Singer |
Abstract | Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Although it involves a more complex update rule, Shampoo’s runtime per step is comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam. |
Tasks | Stochastic Optimization |
Published | 2018-02-26 |
URL | http://arxiv.org/abs/1802.09568v2 |
http://arxiv.org/pdf/1802.09568v2.pdf | |
PWC | https://paperswithcode.com/paper/shampoo-preconditioned-stochastic-tensor |
Repo | https://github.com/Daniil-Selikhanovych/Shampoo_optimizer |
Framework | tf |
Accelerating Natural Gradient with Higher-Order Invariance
Title | Accelerating Natural Gradient with Higher-Order Invariance |
Authors | Yang Song, Jiaming Song, Stefano Ermon |
Abstract | An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. However, this invariance property requires infinitesimal steps and is lost in practical implementations with small but finite step sizes. In this paper, we study invariance properties from a combined perspective of Riemannian geometry and numerical differential equation solving. We define the order of invariance of a numerical method to be its convergence order to an invariant solution. We propose to use higher-order integrators and geodesic corrections to obtain more invariant optimization trajectories. We prove the numerical convergence properties of geodesic corrected updates and show that they can be as computationally efficient as plain natural gradient. Experimentally, we demonstrate that invariance leads to faster optimization and our techniques improve on traditional natural gradient in deep neural network training and natural policy gradient for reinforcement learning. |
Tasks | |
Published | 2018-03-04 |
URL | http://arxiv.org/abs/1803.01273v2 |
http://arxiv.org/pdf/1803.01273v2.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-natural-gradient-with-higher |
Repo | https://github.com/ferrine/torch_anatgrad |
Framework | pytorch |
Mesh-TensorFlow: Deep Learning for Supercomputers
Title | Mesh-TensorFlow: Deep Learning for Supercomputers |
Authors | Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman |
Abstract | Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming. However, batch-splitting suffers from problems including the inability to train very large models (due to memory constraints), high latency, and inefficiency at small batch sizes. All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately, efficient model-parallel algorithms tend to be complicated to discover, describe, and to implement, particularly on large clusters. We introduce Mesh-TensorFlow, a language for specifying a general class of distributed tensor computations. Where data-parallelism can be viewed as splitting tensors and operations along the “batch” dimension, in Mesh-TensorFlow, the user can specify any tensor-dimensions to be split across any dimensions of a multi-dimensional mesh of processors. A Mesh-TensorFlow graph compiles into a SPMD program consisting of parallel operations coupled with collective communication primitives such as Allreduce. We use Mesh-TensorFlow to implement an efficient data-parallel, model-parallel version of the Transformer sequence-to-sequence model. Using TPU meshes of up to 512 cores, we train Transformer models with up to 5 billion parameters, surpassing state of the art results on WMT’14 English-to-French translation task and the one-billion-word language modeling benchmark. Mesh-Tensorflow is available at https://github.com/tensorflow/mesh . |
Tasks | Language Modelling |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.02084v1 |
http://arxiv.org/pdf/1811.02084v1.pdf | |
PWC | https://paperswithcode.com/paper/mesh-tensorflow-deep-learning-for |
Repo | https://github.com/tensorflow/mesh |
Framework | tf |
Memory Fusion Network for Multi-view Sequential Learning
Title | Memory Fusion Network for Multi-view Sequential Learning |
Authors | Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, Louis-Philippe Morency |
Abstract | Multi-view sequential learning is a fundamental problem in machine learning dealing with multi-view sequences. In a multi-view sequence, there exists two forms of interactions between different views: view-specific interactions and cross-view interactions. In this paper, we present a new neural architecture for multi-view sequential learning called the Memory Fusion Network (MFN) that explicitly accounts for both interactions in a neural architecture and continuously models them through time. The first component of the MFN is called the System of LSTMs, where view-specific interactions are learned in isolation through assigning an LSTM function to each view. The cross-view interactions are then identified using a special attention mechanism called the Delta-memory Attention Network (DMAN) and summarized through time with a Multi-view Gated Memory. Through extensive experimentation, MFN is compared to various proposed approaches for multi-view sequential learning on multiple publicly available benchmark datasets. MFN outperforms all the existing multi-view approaches. Furthermore, MFN outperforms all current state-of-the-art models, setting new state-of-the-art results for these multi-view datasets. |
Tasks | |
Published | 2018-02-03 |
URL | http://arxiv.org/abs/1802.00927v1 |
http://arxiv.org/pdf/1802.00927v1.pdf | |
PWC | https://paperswithcode.com/paper/memory-fusion-network-for-multi-view |
Repo | https://github.com/pliang279/MFN |
Framework | pytorch |
Automated learning with a probabilistic programming language: Birch
Title | Automated learning with a probabilistic programming language: Birch |
Authors | Lawrence M. Murray, Thomas B. Schön |
Abstract | This work offers a broad perspective on probabilistic modeling and inference in light of recent advances in probabilistic programming, in which models are formally expressed in Turing-complete programming languages. We consider a typical workflow and how probabilistic programming languages can help to automate this workflow, especially in the matching of models with inference methods. We focus on two properties of a model that are critical in this matching: its structure—the conditional dependencies between random variables—and its form—the precise mathematical definition of those dependencies. While the structure and form of a probabilistic model are often fixed a priori, it is a curiosity of probabilistic programming that they need not be, and may instead vary according to random choices made during program execution. We introduce a formal description of models expressed as programs, and discuss some of the ways in which probabilistic programming languages can reveal the structure and form of these, in order to tailor inference methods. We demonstrate the ideas with a new probabilistic programming language called Birch, with a multiple object tracking example. |
Tasks | Multiple Object Tracking, Object Tracking, Probabilistic Programming |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.01539v1 |
http://arxiv.org/pdf/1810.01539v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-learning-with-a-probabilistic |
Repo | https://github.com/lawmurray/MultiObjectTracking |
Framework | none |
Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents
Title | Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents |
Authors | Joel Z. Leibo, Cyprien de Masson d’Autume, Daniel Zoran, David Amos, Charles Beattie, Keith Anderson, Antonio García Castañeda, Manuel Sanchez, Simon Green, Audrunas Gruslys, Shane Legg, Demis Hassabis, Matthew M. Botvinick |
Abstract | Psychlab is a simulated psychology laboratory inside the first-person 3D game world of DeepMind Lab (Beattie et al. 2016). Psychlab enables implementations of classical laboratory psychological experiments so that they work with both human and artificial agents. Psychlab has a simple and flexible API that enables users to easily create their own tasks. As examples, we are releasing Psychlab implementations of several classical experimental paradigms including visual search, change detection, random dot motion discrimination, and multiple object tracking. We also contribute a study of the visual psychophysics of a specific state-of-the-art deep reinforcement learning agent: UNREAL (Jaderberg et al. 2016). This study leads to the surprising conclusion that UNREAL learns more quickly about larger target stimuli than it does about smaller stimuli. In turn, this insight motivates a specific improvement in the form of a simple model of foveal vision that turns out to significantly boost UNREAL’s performance, both on Psychlab tasks, and on standard DeepMind Lab tasks. By open-sourcing Psychlab we hope to facilitate a range of future such studies that simultaneously advance deep reinforcement learning and improve its links with cognitive science. |
Tasks | Multiple Object Tracking, Object Tracking |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.08116v2 |
http://arxiv.org/pdf/1801.08116v2.pdf | |
PWC | https://paperswithcode.com/paper/psychlab-a-psychology-laboratory-for-deep |
Repo | https://github.com/susumuota/gym-oculoenv |
Framework | none |
Sparse and Constrained Attention for Neural Machine Translation
Title | Sparse and Constrained Attention for Neural Machine Translation |
Authors | Chaitanya Malaviya, Pedro Ferreira, André F. T. Martins |
Abstract | In NMT, words are sometimes dropped from the source or generated repeatedly in the translation. We explore novel strategies to address the coverage problem that change only the attention transformation. Our approach allocates fertilities to source words, used to bound the attention each word can receive. We experiment with various sparse and constrained attention transformations and propose a new one, constrained sparsemax, shown to be differentiable and sparse. Empirical evaluation is provided in three languages pairs. |
Tasks | Machine Translation |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08241v1 |
http://arxiv.org/pdf/1805.08241v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-and-constrained-attention-for-neural |
Repo | https://github.com/Unbabel/sparse_constrained_attention |
Framework | pytorch |