January 31, 2020

3234 words 16 mins read

Paper Group AWR 435

Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input. Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts. Analysing Mathematical Reasoning Abilities of Neural Models. A Layer-Based Sequential Framework for Scene Generation with GANs. HOList: An Environment for Machine Le …

Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input


Title	Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input
Authors	Maxence Ernoult, Julie Grollier, Damien Querlioz, Yoshua Bengio, Benjamin Scellier
Abstract	Equilibrium Propagation (EP) is a biologically inspired learning algorithm for convergent recurrent neural networks, i.e. RNNs that are fed by a static input x and settle to a steady state. Training convergent RNNs consists in adjusting the weights until the steady state of output neurons coincides with a target y. Convergent RNNs can also be trained with the more conventional Backpropagation Through Time (BPTT) algorithm. In its original formulation EP was described in the case of real-time neuronal dynamics, which is computationally costly. In this work, we introduce a discrete-time version of EP with simplified equations and with reduced simulation time, bringing EP closer to practical machine learning tasks. We first prove theoretically, as well as numerically that the neural and weight updates of EP, computed by forward-time dynamics, are step-by-step equal to the ones obtained by BPTT, with gradients computed backward in time. The equality is strict when the transition function of the dynamics derives from a primitive function and the steady state is maintained long enough. We then show for more standard discrete-time neural network dynamics that the same property is approximately respected and we subsequently demonstrate training with EP with equivalent performance to BPTT. In particular, we define the first convolutional architecture trained with EP achieving ~ 1% test error on MNIST, which is the lowest error reported with EP. These results can guide the development of deep neural networks trained with EP.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13633v1
PDF	https://arxiv.org/pdf/1905.13633v1.pdf
PWC	https://paperswithcode.com/paper/updates-of-equilibrium-prop-match-gradients
Repo	https://github.com/ernoult/EP-NEURIPS
Framework	pytorch

Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts


Title	Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts
Authors	Julia Kruk, Jonah Lubin, Karan Sikka, Xiao Lin, Dan Jurafsky, Ajay Divakaran
Abstract	Computing author intent from multimodal data like Instagram posts requires modeling a complex relationship between text and image. For example, a caption might evoke an ironic contrast with the image, so neither caption nor image is a mere transcript of the other. Instead they combine – via what has been called meaning multiplication – to create a new meaning that has a more complex relation to the literal meanings of text and image. Here we introduce a multimodal dataset of 1299 Instagram posts labeled for three orthogonal taxonomies: the authorial intent behind the image-caption pair, the contextual relationship between the literal meanings of the image and caption, and the semiotic relationship between the signified meanings of the image and caption. We build a baseline deep multimodal classifier to validate the taxonomy, showing that employing both text and image improves intent detection by 9.6% compared to using only the image modality, demonstrating the commonality of non-intersective meaning multiplication. The gain with multimodality is greatest when the image and caption diverge semiotically. Our dataset offers a new resource for the study of the rich meanings that result from pairing text and image.
Tasks	Intent Detection
Published	2019-04-19
URL	https://arxiv.org/abs/1904.09073v3
PDF	https://arxiv.org/pdf/1904.09073v3.pdf
PWC	https://paperswithcode.com/paper/integrating-text-and-image-determining
Repo	https://github.com/karansikka1/documentIntent_emnlp19
Framework	none

Analysing Mathematical Reasoning Abilities of Neural Models


Title	Analysing Mathematical Reasoning Abilities of Neural Models
Authors	David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli
Abstract	Mathematical reasoning—a core ability within human intelligence—presents some unique challenges as a domain: we do not come to understand and solve mathematical problems primarily on the back of experience and evidence, but on the basis of inferring, learning, and exploiting laws, axioms, and symbol manipulation rules. In this paper, we present a new challenge for the evaluation (and eventually the design) of neural architectures and similar system, developing a task suite of mathematics problems involving sequential questions and answers in a free-form textual input/output format. The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes. Having described the data generation process and its potential future expansions, we conduct a comprehensive analysis of models from two broad classes of the most powerful sequence-to-sequence architectures and find notable differences in their ability to resolve mathematical problems and generalize their knowledge.
Tasks	Mathematical Question Answering, Math Word Problem Solving
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01557v1
PDF	http://arxiv.org/pdf/1904.01557v1.pdf
PWC	https://paperswithcode.com/paper/analysing-mathematical-reasoning-abilities-of-1
Repo	https://github.com/berniwal/DeepLearningProject
Framework	tf

A Layer-Based Sequential Framework for Scene Generation with GANs


Title	A Layer-Based Sequential Framework for Scene Generation with GANs
Authors	Mehmet Ozgur Turkoglu, William Thong, Luuk Spreeuwers, Berkay Kicanaoglu
Abstract	The visual world we sense, interpret and interact everyday is a complex composition of interleaved physical entities. Therefore, it is a very challenging task to generate vivid scenes of similar complexity using computers. In this work, we present a scene generation framework based on Generative Adversarial Networks (GANs) to sequentially compose a scene, breaking down the underlying problem into smaller ones. Different than the existing approaches, our framework offers an explicit control over the elements of a scene through separate background and foreground generators. Starting with an initially generated background, foreground objects then populate the scene one-by-one in a sequential manner. Via quantitative and qualitative experiments on a subset of the MS-COCO dataset, we show that our proposed framework produces not only more diverse images but also copes better with affine transformations and occlusion artifacts of foreground objects than its counterparts.
Tasks	Conditional Image Generation, Image Generation, Scene Generation
Published	2019-02-02
URL	http://arxiv.org/abs/1902.00671v1
PDF	http://arxiv.org/pdf/1902.00671v1.pdf
PWC	https://paperswithcode.com/paper/a-layer-based-sequential-framework-for-scene
Repo	https://github.com/0zgur0/Seq_Scene_Gen
Framework	pytorch

HOList: An Environment for Machine Learning of Higher-Order Theorem Proving


Title	HOList: An Environment for Machine Learning of Higher-Order Theorem Proving
Authors	Kshitij Bansal, Sarah M. Loos, Markus N. Rabe, Christian Szegedy, Stewart Wilcox
Abstract	We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic. Higher-order interactive theorem provers enable the formalization of arbitrary mathematical theories and thereby present an interesting, open-ended challenge for deep learning. We provide an open-source framework based on the HOL Light theorem prover that can be used as a reinforcement learning environment. HOL Light comes with a broad coverage of basic mathematical theorems on calculus and the formal proof of the Kepler conjecture, from which we derive a challenging benchmark for automated reasoning. We also present a deep reinforcement learning driven automated theorem prover, DeepHOL, with strong initial results on this benchmark.
Tasks	Automated Theorem Proving
Published	2019-04-05
URL	https://arxiv.org/abs/1904.03241v3
PDF	https://arxiv.org/pdf/1904.03241v3.pdf
PWC	https://paperswithcode.com/paper/holist-an-environment-for-machine-learning-of
Repo	https://github.com/Kerram/holist-train
Framework	tf

Multi-Agent Pathfinding with Continuous Time


Title	Multi-Agent Pathfinding with Continuous Time
Authors	Anton Andreychuk, Konstantin Yakovlev, Dor Atzmon, Roni Stern
Abstract	Multi-Agent Pathfinding (MAPF) is the problem of finding paths for multiple agents such that every agent reaches its goal and the agents do not collide. Most prior work on MAPF was on grids, assumed agents’ actions have uniform duration, and that time is discretized into timesteps. We propose a MAPF algorithm that does not rely on these assumptions, is complete, and provides provably optimal solutions. This algorithm is based on a novel adaptation of Safe interval path planning (SIPP), a continuous time single-agent planning algorithm, and a modified version of Conflict-based search (CBS), a state of the art multi-agent pathfinding algorithm. We analyze this algorithm, discuss its pros and cons, and evaluate it experimentally on several standard benchmarks.
Tasks
Published	2019-01-16
URL	https://arxiv.org/abs/1901.05506v3
PDF	https://arxiv.org/pdf/1901.05506v3.pdf
PWC	https://paperswithcode.com/paper/multi-agent-pathfinding-mapf-with-continuous
Repo	https://github.com/PathPlanning/Continuous-CBS
Framework	none

Overcomplete Independent Component Analysis via SDP


Title	Overcomplete Independent Component Analysis via SDP
Authors	Anastasia Podosinnikova, Amelia Perry, Alexander Wein, Francis Bach, Alexandre d’Aspremont, David Sontag
Abstract	We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. Previous algorithms either suffer from high computational complexity or make strong assumptions about the form of the mixing matrix. Our algorithm does not make any sparsity assumption yet enjoys favorable computational and theoretical properties. Our algorithm consists of two main steps: (a) estimation of the Hessians of the cumulant generating function (as opposed to the fourth and higher order cumulants used by most algorithms) and (b) a novel semi-definite programming (SDP) relaxation for recovering a mixing component. We show that this relaxation can be efficiently solved with a projected accelerated gradient descent method, which makes the whole algorithm computationally practical. Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere. Experiments are provided on synthetic data and the CIFAR-10 dataset of real images.
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08334v1
PDF	http://arxiv.org/pdf/1901.08334v1.pdf
PWC	https://paperswithcode.com/paper/overcomplete-independent-component-analysis
Repo	https://github.com/anastasia-podosinnikova/oica
Framework	none

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition


Title	SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Authors	Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le
Abstract	We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/14.6% on the Switchboard/CallHome portion of the Hub5’00 test set without the use of a language model, and 6.8%/14.1% with shallow fusion, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.
Tasks	Data Augmentation, End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2019-04-18
URL	https://arxiv.org/abs/1904.08779v3
PDF	https://arxiv.org/pdf/1904.08779v3.pdf
PWC	https://paperswithcode.com/paper/specaugment-a-simple-data-augmentation-method
Repo	https://github.com/DemisEom/SpecAugment
Framework	pytorch

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention – w/o Data Augmentation


Title	RWTH ASR Systems for LibriSpeech: Hybrid vs Attention – w/o Data Augmentation
Authors	Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
Abstract	We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task. Detailed descriptions of the system development, including model design, pretraining schemes, training schedules, and optimization approaches are provided for both system architectures. Both hybrid DNN/HMM and attention-based systems employ bi-directional LSTMs for acoustic modeling/encoding. For language modeling, we employ both LSTM and Transformer based architectures. All our systems are built using RWTHs open-source toolkits RASR and RETURNN. To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems. Our single hybrid system even outperforms previous results obtained from combining eight single systems. Our comparison shows that on the LibriSpeech 960h task, the hybrid DNN/HMM system outperforms the attention-based system by 15% relative on the clean and 40% relative on the other test sets in terms of word error rate. Moreover, experiments on a reduced 100h-subset of the LibriSpeech training corpus even show a more pronounced margin between the hybrid DNN/HMM and attention-based architectures.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2019-05-08
URL	https://arxiv.org/abs/1905.03072v3
PDF	https://arxiv.org/pdf/1905.03072v3.pdf
PWC	https://paperswithcode.com/paper/rwth-asr-systems-for-librispeech-hybrid-vs
Repo	https://github.com/rwth-i6/returnn-experiments/tree/master/2019-librispeech-system
Framework	none

Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks


Title	Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks
Authors	Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, George J. Pappas
Abstract	Tight estimation of the Lipschitz constant for deep neural networks (DNNs) is useful in many applications ranging from robustness certification of classifiers to stability analysis of closed-loop systems with reinforcement learning controllers. Existing methods in the literature for estimating the Lipschitz constant suffer from either lack of accuracy or poor scalability. In this paper, we present a convex optimization framework to compute guaranteed upper bounds on the Lipschitz constant of DNNs both accurately and efficiently. Our main idea is to interpret activation functions as gradients of convex potential functions. Hence, they satisfy certain properties that can be described by quadratic constraints. This particular description allows us to pose the Lipschitz constant estimation problem as a semidefinite program (SDP). The resulting SDP can be adapted to increase either the estimation accuracy (by capturing the interaction between activation functions of different layers) or scalability (by decomposition and parallel implementation). We illustrate the utility of our approach with a variety of experiments on randomly generated networks and on classifiers trained on the MNIST and Iris datasets. In particular, we experimentally demonstrate that our Lipschitz bounds are the most accurate compared to those in the literature. We also study the impact of adversarial training methods on the Lipschitz bounds of the resulting classifiers and show that our bounds can be used to efficiently provide robustness guarantees.
Tasks
Published	2019-06-12
URL	https://arxiv.org/abs/1906.04893v1
PDF	https://arxiv.org/pdf/1906.04893v1.pdf
PWC	https://paperswithcode.com/paper/efficient-and-accurate-estimation-of
Repo	https://github.com/arobey1/LipSDP
Framework	pytorch

Dynamic data fusion using multi-input models for malware classification


Title	Dynamic data fusion using multi-input models for malware classification
Authors	Viktor Zenkov, Jason Laska
Abstract	Criminals use malware to disrupt cyber-systems. The number of these malware-vulnerable systems is increasing quickly as common systems, such as vehicles, routers, and lightbulbs, become increasingly interconnected cyber-systems. To address the scale of this problem, analysts divide malware into classes and develop, for each class, a specialized defense. In this project we classified malware with machine learning. In particular, we used a supervised multi-class long short term memory (LSTM) model. We trained the algorithm with thousands of malware files annotated with class labels (the training set), and the algorithm learned patterns indicative of each class. We used disassembled malware files (provided by Microsoft) and separated the constituent data into parsed instructions, which look like human-readable machine code text, and raw bytes, which are hexadecimal values. We are interested in which format, text or hex, is more valuable as input for classification. To solve this, we investigated four cases: a text-only model, a hexadecimal-only model, a multi-input model using both text and hexadecimal inputs, and a model based on combining the individual results. We performed this investigation using the machine learning Python package Keras, which allows easily configurable deep learning architectures and training. We hoped to understand the trade-offs between the different formats. Due to the class imbalance in the data, we used multiple methods to compare the formats, using test accuracies, balanced accuracies (taking into account weights of classes), and an accuracy derived from tables of confusion. We found that the multi-input model, which allows learning on both input types simultaneously, resulted in the best performance. Our finding expedites malware classification research by providing researchers a suitable deep learning architecture to train a tailored version to their malware.
Tasks	Malware Classification
Published	2019-09-21
URL	https://arxiv.org/abs/1910.02021v1
PDF	https://arxiv.org/pdf/1910.02021v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-data-fusion-using-multi-input-models
Repo	https://github.com/viktorZenkov/MalwareClassification
Framework	tf

A Study into Echocardiography View Conversion


Title	A Study into Echocardiography View Conversion
Authors	Amir H. Abdi, Mohammad H. Jafari, Sidney Fels, Theresa Tsang, Purang Abolmaesumi
Abstract	Transthoracic echo is one of the most common means of cardiac studies in the clinical routines. During the echo exam, the sonographer captures a set of standard cross sections (echo views) of the heart. Each 2D echo view cuts through the 3D cardiac geometry via a unique plane. Consequently, different views share some limited information. In this work, we investigate the feasibility of generating a 2D echo view using another view based on adversarial generative models. The objective optimized to train the view-conversion model is based on the ideas introduced by LSGAN, PatchGAN and Conditional GAN (cGAN). The size and length of the left ventricle in the generated target echo view is compared against that of the target ground-truth to assess the validity of the echo view conversion. Results show that there is a correlation of 0.50 between the LV areas and 0.49 between the LV lengths of the generated target frames and the real target frames.
Tasks
Published	2019-12-05
URL	https://arxiv.org/abs/1912.03120v1
PDF	https://arxiv.org/pdf/1912.03120v1.pdf
PWC	https://paperswithcode.com/paper/a-study-into-echocardiography-view-conversion
Repo	https://github.com/amir-abdi/echo-view2view
Framework	none

Detecting Kissing Scenes in a Database of Hollywood Films


Title	Detecting Kissing Scenes in a Database of Hollywood Films
Authors	Amir Ziai
Abstract	Detecting scene types in a movie can be very useful for application such as video editing, ratings assignment, and personalization. We propose a system for detecting kissing scenes in a movie. This system consists of two components. The first component is a binary classifier that predicts a binary label (i.e. kissing or not) given a features exctracted from both the still frames and audio waves of a one-second segment. The second component aggregates the binary labels for contiguous non-overlapping segments into a set of kissing scenes. We experimented with a variety of 2D and 3D convolutional architectures such as ResNet, DesnseNet, and VGGish and developed a highly accurate kissing detector that achieves a validation F1 score of 0.95 on a diverse database of Hollywood films ranging many genres and spanning multiple decades. The code for this project is available at http://github.com/amirziai/kissing-detector.
Tasks	Kiss Detection
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01843v1
PDF	https://arxiv.org/pdf/1906.01843v1.pdf
PWC	https://paperswithcode.com/paper/detecting-kissing-scenes-in-a-database-of
Repo	https://github.com/amirziai/kissing-detector
Framework	pytorch

Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation


Title	Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation
Authors	Arnab Ghosh, Richard Zhang, Puneet K. Dokania, Oliver Wang, Alexei A. Efros, Philip H. S. Torr, Eli Shechtman
Abstract	We propose an interactive GAN-based sketch-to-image translation method that helps novice users create images of simple objects. As the user starts to draw a sketch of a desired object type, the network interactively recommends plausible completions, and shows a corresponding synthesized image to the user. This enables a feedback loop, where the user can edit their sketch based on the network’s recommendations, visualizing both the completed shape and final rendered image while they draw. In order to use a single trained model across a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network. Video available at our website: https://arnabgho.github.io/iSketchNFill/.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11081v2
PDF	https://arxiv.org/pdf/1909.11081v2.pdf
PWC	https://paperswithcode.com/paper/interactive-sketch-fill-multiclass-sketch-to
Repo	https://github.com/arnabgho/iSketchNFill
Framework	pytorch

Empirical Likelihood for Contextual Bandits


Title	Empirical Likelihood for Contextual Bandits
Authors	Nikos Karampatziakis, John Langford, Paul Mineiro
Abstract	We apply empirical likelihood techniques to contextual bandit policy value estimation, confidence intervals, and learning. We propose a tighter estimator for off-policy evaluation with improved statistical performance over previous proposals. Coupled with this estimator is a confidence interval which also improves over previous proposals. We then harness these to improve learning from contextual bandit data. Each of these is empirically evaluated to show good performance against strong baselines in finite sample regimes.
Tasks	Multi-Armed Bandits
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03323v3
PDF	https://arxiv.org/pdf/1906.03323v3.pdf
PWC	https://paperswithcode.com/paper/empirical-likelihood-for-contextual-bandits
Repo	https://github.com/pmineiro/elfcb
Framework	none