January 31, 2020

3234 words 16 mins read

Paper Group AWR 435

Paper Group AWR 435

Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input. Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts. Analysing Mathematical Reasoning Abilities of Neural Models. A Layer-Based Sequential Framework for Scene Generation with GANs. HOList: An Environment for Machine Le …

Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input

Title Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input
Authors Maxence Ernoult, Julie Grollier, Damien Querlioz, Yoshua Bengio, Benjamin Scellier
Abstract Equilibrium Propagation (EP) is a biologically inspired learning algorithm for convergent recurrent neural networks, i.e. RNNs that are fed by a static input x and settle to a steady state. Training convergent RNNs consists in adjusting the weights until the steady state of output neurons coincides with a target y. Convergent RNNs can also be trained with the more conventional Backpropagation Through Time (BPTT) algorithm. In its original formulation EP was described in the case of real-time neuronal dynamics, which is computationally costly. In this work, we introduce a discrete-time version of EP with simplified equations and with reduced simulation time, bringing EP closer to practical machine learning tasks. We first prove theoretically, as well as numerically that the neural and weight updates of EP, computed by forward-time dynamics, are step-by-step equal to the ones obtained by BPTT, with gradients computed backward in time. The equality is strict when the transition function of the dynamics derives from a primitive function and the steady state is maintained long enough. We then show for more standard discrete-time neural network dynamics that the same property is approximately respected and we subsequently demonstrate training with EP with equivalent performance to BPTT. In particular, we define the first convolutional architecture trained with EP achieving ~ 1% test error on MNIST, which is the lowest error reported with EP. These results can guide the development of deep neural networks trained with EP.
Tasks
Published 2019-05-31
URL https://arxiv.org/abs/1905.13633v1
PDF https://arxiv.org/pdf/1905.13633v1.pdf
PWC https://paperswithcode.com/paper/updates-of-equilibrium-prop-match-gradients
Repo https://github.com/ernoult/EP-NEURIPS
Framework pytorch

Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts

Title Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts
Authors Julia Kruk, Jonah Lubin, Karan Sikka, Xiao Lin, Dan Jurafsky, Ajay Divakaran
Abstract Computing author intent from multimodal data like Instagram posts requires modeling a complex relationship between text and image. For example, a caption might evoke an ironic contrast with the image, so neither caption nor image is a mere transcript of the other. Instead they combine – via what has been called meaning multiplication – to create a new meaning that has a more complex relation to the literal meanings of text and image. Here we introduce a multimodal dataset of 1299 Instagram posts labeled for three orthogonal taxonomies: the authorial intent behind the image-caption pair, the contextual relationship between the literal meanings of the image and caption, and the semiotic relationship between the signified meanings of the image and caption. We build a baseline deep multimodal classifier to validate the taxonomy, showing that employing both text and image improves intent detection by 9.6% compared to using only the image modality, demonstrating the commonality of non-intersective meaning multiplication. The gain with multimodality is greatest when the image and caption diverge semiotically. Our dataset offers a new resource for the study of the rich meanings that result from pairing text and image.
Tasks Intent Detection
Published 2019-04-19
URL https://arxiv.org/abs/1904.09073v3
PDF https://arxiv.org/pdf/1904.09073v3.pdf
PWC https://paperswithcode.com/paper/integrating-text-and-image-determining
Repo https://github.com/karansikka1/documentIntent_emnlp19
Framework none

Analysing Mathematical Reasoning Abilities of Neural Models

Title Analysing Mathematical Reasoning Abilities of Neural Models
Authors David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli
Abstract Mathematical reasoning—a core ability within human intelligence—presents some unique challenges as a domain: we do not come to understand and solve mathematical problems primarily on the back of experience and evidence, but on the basis of inferring, learning, and exploiting laws, axioms, and symbol manipulation rules. In this paper, we present a new challenge for the evaluation (and eventually the design) of neural architectures and similar system, developing a task suite of mathematics problems involving sequential questions and answers in a free-form textual input/output format. The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes. Having described the data generation process and its potential future expansions, we conduct a comprehensive analysis of models from two broad classes of the most powerful sequence-to-sequence architectures and find notable differences in their ability to resolve mathematical problems and generalize their knowledge.
Tasks Mathematical Question Answering, Math Word Problem Solving
Published 2019-04-02
URL http://arxiv.org/abs/1904.01557v1
PDF http://arxiv.org/pdf/1904.01557v1.pdf
PWC https://paperswithcode.com/paper/analysing-mathematical-reasoning-abilities-of-1
Repo https://github.com/berniwal/DeepLearningProject
Framework tf

A Layer-Based Sequential Framework for Scene Generation with GANs

Title A Layer-Based Sequential Framework for Scene Generation with GANs
Authors Mehmet Ozgur Turkoglu, William Thong, Luuk Spreeuwers, Berkay Kicanaoglu
Abstract The visual world we sense, interpret and interact everyday is a complex composition of interleaved physical entities. Therefore, it is a very challenging task to generate vivid scenes of similar complexity using computers. In this work, we present a scene generation framework based on Generative Adversarial Networks (GANs) to sequentially compose a scene, breaking down the underlying problem into smaller ones. Different than the existing approaches, our framework offers an explicit control over the elements of a scene through separate background and foreground generators. Starting with an initially generated background, foreground objects then populate the scene one-by-one in a sequential manner. Via quantitative and qualitative experiments on a subset of the MS-COCO dataset, we show that our proposed framework produces not only more diverse images but also copes better with affine transformations and occlusion artifacts of foreground objects than its counterparts.
Tasks Conditional Image Generation, Image Generation, Scene Generation
Published 2019-02-02
URL http://arxiv.org/abs/1902.00671v1
PDF http://arxiv.org/pdf/1902.00671v1.pdf
PWC https://paperswithcode.com/paper/a-layer-based-sequential-framework-for-scene
Repo https://github.com/0zgur0/Seq_Scene_Gen
Framework pytorch

HOList: An Environment for Machine Learning of Higher-Order Theorem Proving

Title HOList: An Environment for Machine Learning of Higher-Order Theorem Proving
Authors Kshitij Bansal, Sarah M. Loos, Markus N. Rabe, Christian Szegedy, Stewart Wilcox
Abstract We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic. Higher-order interactive theorem provers enable the formalization of arbitrary mathematical theories and thereby present an interesting, open-ended challenge for deep learning. We provide an open-source framework based on the HOL Light theorem prover that can be used as a reinforcement learning environment. HOL Light comes with a broad coverage of basic mathematical theorems on calculus and the formal proof of the Kepler conjecture, from which we derive a challenging benchmark for automated reasoning. We also present a deep reinforcement learning driven automated theorem prover, DeepHOL, with strong initial results on this benchmark.
Tasks Automated Theorem Proving
Published 2019-04-05
URL https://arxiv.org/abs/1904.03241v3
PDF https://arxiv.org/pdf/1904.03241v3.pdf
PWC https://paperswithcode.com/paper/holist-an-environment-for-machine-learning-of
Repo https://github.com/Kerram/holist-train
Framework tf

Multi-Agent Pathfinding with Continuous Time

Title Multi-Agent Pathfinding with Continuous Time
Authors Anton Andreychuk, Konstantin Yakovlev, Dor Atzmon, Roni Stern
Abstract Multi-Agent Pathfinding (MAPF) is the problem of finding paths for multiple agents such that every agent reaches its goal and the agents do not collide. Most prior work on MAPF was on grids, assumed agents’ actions have uniform duration, and that time is discretized into timesteps. We propose a MAPF algorithm that does not rely on these assumptions, is complete, and provides provably optimal solutions. This algorithm is based on a novel adaptation of Safe interval path planning (SIPP), a continuous time single-agent planning algorithm, and a modified version of Conflict-based search (CBS), a state of the art multi-agent pathfinding algorithm. We analyze this algorithm, discuss its pros and cons, and evaluate it experimentally on several standard benchmarks.
Tasks
Published 2019-01-16
URL https://arxiv.org/abs/1901.05506v3
PDF https://arxiv.org/pdf/1901.05506v3.pdf
PWC https://paperswithcode.com/paper/multi-agent-pathfinding-mapf-with-continuous
Repo https://github.com/PathPlanning/Continuous-CBS
Framework none

Overcomplete Independent Component Analysis via SDP

Title Overcomplete Independent Component Analysis via SDP
Authors Anastasia Podosinnikova, Amelia Perry, Alexander Wein, Francis Bach, Alexandre d’Aspremont, David Sontag
Abstract We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. Previous algorithms either suffer from high computational complexity or make strong assumptions about the form of the mixing matrix. Our algorithm does not make any sparsity assumption yet enjoys favorable computational and theoretical properties. Our algorithm consists of two main steps: (a) estimation of the Hessians of the cumulant generating function (as opposed to the fourth and higher order cumulants used by most algorithms) and (b) a novel semi-definite programming (SDP) relaxation for recovering a mixing component. We show that this relaxation can be efficiently solved with a projected accelerated gradient descent method, which makes the whole algorithm computationally practical. Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere. Experiments are provided on synthetic data and the CIFAR-10 dataset of real images.
Tasks
Published 2019-01-24
URL http://arxiv.org/abs/1901.08334v1
PDF http://arxiv.org/pdf/1901.08334v1.pdf
PWC https://paperswithcode.com/paper/overcomplete-independent-component-analysis
Repo https://github.com/anastasia-podosinnikova/oica
Framework none

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Title SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Authors Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le
Abstract We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/14.6% on the Switchboard/CallHome portion of the Hub5’00 test set without the use of a language model, and 6.8%/14.1% with shallow fusion, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.
Tasks Data Augmentation, End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published 2019-04-18
URL https://arxiv.org/abs/1904.08779v3
PDF https://arxiv.org/pdf/1904.08779v3.pdf
PWC https://paperswithcode.com/paper/specaugment-a-simple-data-augmentation-method
Repo https://github.com/DemisEom/SpecAugment
Framework pytorch

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention – w/o Data Augmentation

Title RWTH ASR Systems for LibriSpeech: Hybrid vs Attention – w/o Data Augmentation
Authors Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
Abstract We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task. Detailed descriptions of the system development, including model design, pretraining schemes, training schedules, and optimization approaches are provided for both system architectures. Both hybrid DNN/HMM and attention-based systems employ bi-directional LSTMs for acoustic modeling/encoding. For language modeling, we employ both LSTM and Transformer based architectures. All our systems are built using RWTHs open-source toolkits RASR and RETURNN. To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems. Our single hybrid system even outperforms previous results obtained from combining eight single systems. Our comparison shows that on the LibriSpeech 960h task, the hybrid DNN/HMM system outperforms the attention-based system by 15% relative on the clean and 40% relative on the other test sets in terms of word error rate. Moreover, experiments on a reduced 100h-subset of the LibriSpeech training corpus even show a more pronounced margin between the hybrid DNN/HMM and attention-based architectures.
Tasks End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published 2019-05-08
URL https://arxiv.org/abs/1905.03072v3
PDF https://arxiv.org/pdf/1905.03072v3.pdf
PWC https://paperswithcode.com/paper/rwth-asr-systems-for-librispeech-hybrid-vs
Repo https://github.com/rwth-i6/returnn-experiments/tree/master/2019-librispeech-system
Framework none

Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks

Title Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks
Authors Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, George J. Pappas
Abstract Tight estimation of the Lipschitz constant for deep neural networks (DNNs) is useful in many applications ranging from robustness certification of classifiers to stability analysis of closed-loop systems with reinforcement learning controllers. Existing methods in the literature for estimating the Lipschitz constant suffer from either lack of accuracy or poor scalability. In this paper, we present a convex optimization framework to compute guaranteed upper bounds on the Lipschitz constant of DNNs both accurately and efficiently. Our main idea is to interpret activation functions as gradients of convex potential functions. Hence, they satisfy certain properties that can be described by quadratic constraints. This particular description allows us to pose the Lipschitz constant estimation problem as a semidefinite program (SDP). The resulting SDP can be adapted to increase either the estimation accuracy (by capturing the interaction between activation functions of different layers) or scalability (by decomposition and parallel implementation). We illustrate the utility of our approach with a variety of experiments on randomly generated networks and on classifiers trained on the MNIST and Iris datasets. In particular, we experimentally demonstrate that our Lipschitz bounds are the most accurate compared to those in the literature. We also study the impact of adversarial training methods on the Lipschitz bounds of the resulting classifiers and show that our bounds can be used to efficiently provide robustness guarantees.
Tasks
Published 2019-06-12
URL https://arxiv.org/abs/1906.04893v1
PDF https://arxiv.org/pdf/1906.04893v1.pdf
PWC https://paperswithcode.com/paper/efficient-and-accurate-estimation-of
Repo https://github.com/arobey1/LipSDP
Framework pytorch

Dynamic data fusion using multi-input models for malware classification

Title Dynamic data fusion using multi-input models for malware classification
Authors Viktor Zenkov, Jason Laska
Abstract Criminals use malware to disrupt cyber-systems. The number of these malware-vulnerable systems is increasing quickly as common systems, such as vehicles, routers, and lightbulbs, become increasingly interconnected cyber-systems. To address the scale of this problem, analysts divide malware into classes and develop, for each class, a specialized defense. In this project we classified malware with machine learning. In particular, we used a supervised multi-class long short term memory (LSTM) model. We trained the algorithm with thousands of malware files annotated with class labels (the training set), and the algorithm learned patterns indicative of each class. We used disassembled malware files (provided by Microsoft) and separated the constituent data into parsed instructions, which look like human-readable machine code text, and raw bytes, which are hexadecimal values. We are interested in which format, text or hex, is more valuable as input for classification. To solve this, we investigated four cases: a text-only model, a hexadecimal-only model, a multi-input model using both text and hexadecimal inputs, and a model based on combining the individual results. We performed this investigation using the machine learning Python package Keras, which allows easily configurable deep learning architectures and training. We hoped to understand the trade-offs between the different formats. Due to the class imbalance in the data, we used multiple methods to compare the formats, using test accuracies, balanced accuracies (taking into account weights of classes), and an accuracy derived from tables of confusion. We found that the multi-input model, which allows learning on both input types simultaneously, resulted in the best performance. Our finding expedites malware classification research by providing researchers a suitable deep learning architecture to train a tailored version to their malware.
Tasks Malware Classification
Published 2019-09-21
URL https://arxiv.org/abs/1910.02021v1
PDF https://arxiv.org/pdf/1910.02021v1.pdf
PWC https://paperswithcode.com/paper/dynamic-data-fusion-using-multi-input-models
Repo https://github.com/viktorZenkov/MalwareClassification
Framework tf

A Study into Echocardiography View Conversion

Title A Study into Echocardiography View Conversion
Authors Amir H. Abdi, Mohammad H. Jafari, Sidney Fels, Theresa Tsang, Purang Abolmaesumi
Abstract Transthoracic echo is one of the most common means of cardiac studies in the clinical routines. During the echo exam, the sonographer captures a set of standard cross sections (echo views) of the heart. Each 2D echo view cuts through the 3D cardiac geometry via a unique plane. Consequently, different views share some limited information. In this work, we investigate the feasibility of generating a 2D echo view using another view based on adversarial generative models. The objective optimized to train the view-conversion model is based on the ideas introduced by LSGAN, PatchGAN and Conditional GAN (cGAN). The size and length of the left ventricle in the generated target echo view is compared against that of the target ground-truth to assess the validity of the echo view conversion. Results show that there is a correlation of 0.50 between the LV areas and 0.49 between the LV lengths of the generated target frames and the real target frames.
Tasks
Published 2019-12-05
URL https://arxiv.org/abs/1912.03120v1
PDF https://arxiv.org/pdf/1912.03120v1.pdf
PWC https://paperswithcode.com/paper/a-study-into-echocardiography-view-conversion
Repo https://github.com/amir-abdi/echo-view2view
Framework none

Detecting Kissing Scenes in a Database of Hollywood Films

Title Detecting Kissing Scenes in a Database of Hollywood Films
Authors Amir Ziai
Abstract Detecting scene types in a movie can be very useful for application such as video editing, ratings assignment, and personalization. We propose a system for detecting kissing scenes in a movie. This system consists of two components. The first component is a binary classifier that predicts a binary label (i.e. kissing or not) given a features exctracted from both the still frames and audio waves of a one-second segment. The second component aggregates the binary labels for contiguous non-overlapping segments into a set of kissing scenes. We experimented with a variety of 2D and 3D convolutional architectures such as ResNet, DesnseNet, and VGGish and developed a highly accurate kissing detector that achieves a validation F1 score of 0.95 on a diverse database of Hollywood films ranging many genres and spanning multiple decades. The code for this project is available at http://github.com/amirziai/kissing-detector.
Tasks Kiss Detection
Published 2019-06-05
URL https://arxiv.org/abs/1906.01843v1
PDF https://arxiv.org/pdf/1906.01843v1.pdf
PWC https://paperswithcode.com/paper/detecting-kissing-scenes-in-a-database-of
Repo https://github.com/amirziai/kissing-detector
Framework pytorch

Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation

Title Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation
Authors Arnab Ghosh, Richard Zhang, Puneet K. Dokania, Oliver Wang, Alexei A. Efros, Philip H. S. Torr, Eli Shechtman
Abstract We propose an interactive GAN-based sketch-to-image translation method that helps novice users create images of simple objects. As the user starts to draw a sketch of a desired object type, the network interactively recommends plausible completions, and shows a corresponding synthesized image to the user. This enables a feedback loop, where the user can edit their sketch based on the network’s recommendations, visualizing both the completed shape and final rendered image while they draw. In order to use a single trained model across a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network. Video available at our website: https://arnabgho.github.io/iSketchNFill/.
Tasks
Published 2019-09-24
URL https://arxiv.org/abs/1909.11081v2
PDF https://arxiv.org/pdf/1909.11081v2.pdf
PWC https://paperswithcode.com/paper/interactive-sketch-fill-multiclass-sketch-to
Repo https://github.com/arnabgho/iSketchNFill
Framework pytorch

Empirical Likelihood for Contextual Bandits

Title Empirical Likelihood for Contextual Bandits
Authors Nikos Karampatziakis, John Langford, Paul Mineiro
Abstract We apply empirical likelihood techniques to contextual bandit policy value estimation, confidence intervals, and learning. We propose a tighter estimator for off-policy evaluation with improved statistical performance over previous proposals. Coupled with this estimator is a confidence interval which also improves over previous proposals. We then harness these to improve learning from contextual bandit data. Each of these is empirically evaluated to show good performance against strong baselines in finite sample regimes.
Tasks Multi-Armed Bandits
Published 2019-06-07
URL https://arxiv.org/abs/1906.03323v3
PDF https://arxiv.org/pdf/1906.03323v3.pdf
PWC https://paperswithcode.com/paper/empirical-likelihood-for-contextual-bandits
Repo https://github.com/pmineiro/elfcb
Framework none
comments powered by Disqus