April 2, 2020

2966 words 14 mins read

Paper Group ANR 376

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text. MuST-Cinema: a Speech-to-Subtitles corpus. Momentum Improves Normalized SGD. Accessing Higher-level Representations in Sequential Transformers with Feedback Memory. Temporally Folded Convolutional Neural Networks for Sequence Forecasting. Guider l’attention dans les mode …


Title	Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Authors	Difei Gao, Ke Li, Ruiping Wang, Shiguang Shan, Xilin Chen
Abstract	Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.
Tasks	Question Answering, Visual Question Answering
Published	2020-03-31
URL	https://arxiv.org/abs/2003.13962v1
PDF	https://arxiv.org/pdf/2003.13962v1.pdf
PWC	https://paperswithcode.com/paper/multi-modal-graph-neural-network-for-joint
Repo
Framework

MuST-Cinema: a Speech-to-Subtitles corpus


Title	MuST-Cinema: a Speech-to-Subtitles corpus
Authors	Alina Karakanta, Matteo Negri, Marco Turchi
Abstract	Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling. Neural Machine Translation (NMT) can contribute to the automatisation of subtitling, facilitating the work of human subtitlers and reducing turn-around times and related costs. NMT requires high-quality, large, task-specific training data. The existing subtitling corpora, however, are missing both alignments to the source language audio and important information about subtitle breaks. This poses a significant limitation for developing efficient automatic approaches for subtitling, since the length and form of a subtitle directly depends on the duration of the utterance. In this work, we present MuST-Cinema, a multilingual speech translation corpus built from TED subtitles. The corpus is comprised of (audio, transcription, translation) triplets. Subtitle breaks are preserved by inserting special symbols. We show that the corpus can be used to build models that efficiently segment sentences into subtitles and propose a method for annotating existing subtitling corpora with subtitle breaks, conforming to the constraint of length.
Tasks	Machine Translation
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10829v1
PDF	https://arxiv.org/pdf/2002.10829v1.pdf
PWC	https://paperswithcode.com/paper/must-cinema-a-speech-to-subtitles-corpus
Repo
Framework

Momentum Improves Normalized SGD


Title	Momentum Improves Normalized SGD
Authors	Ashok Cutkosky, Harsh Mehta
Abstract	We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives. Then, we consider the case of objectives with bounded second derivative and show that in this case a small tweak to the momentum formula allows normalized SGD with momentum to find an $\epsilon$-critical point in $O(1/\epsilon^{3.5})$ iterations, matching the best-known rates without accruing any logarithmic factors or dependence on dimension. We also provide an adaptive method that automatically improves convergence rates when the variance in the gradients is small. Finally, we show that our method is effective when employed on popular large scale tasks such as ResNet-50 and BERT pretraining, matching the performance of the disparate methods used to get state-of-the-art results on both tasks.
Tasks
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03305v1
PDF	https://arxiv.org/pdf/2002.03305v1.pdf
PWC	https://paperswithcode.com/paper/momentum-improves-normalized-sgd
Repo
Framework

Accessing Higher-level Representations in Sequential Transformers with Feedback Memory


Title	Accessing Higher-level Representations in Sequential Transformers with Feedback Memory
Authors	Angela Fan, Thibaut Lavril, Edouard Grave, Armand Joulin, Sainbayar Sukhbaatar
Abstract	Transformers are feedforward networks that can process input tokens in parallel. While this parallelization makes them computationally efficient, it restricts the model from fully exploiting the sequential nature of the input - the representation at a given layer can only access representations from lower layers, rather than the higher level representations already built in previous time steps. In this work, we propose the Feedback Transformer architecture that exposes all previous representations to all future representations, meaning the lowest representation of the current timestep is formed from the highest-level abstract representation of the past. We demonstrate on a variety of benchmarks in language modeling, neural machine translation, summarization, and reinforcement learning that the increased representation capacity can improve over Transformer baselines.
Tasks	Language Modelling, Machine Translation
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09402v2
PDF	https://arxiv.org/pdf/2002.09402v2.pdf
PWC	https://paperswithcode.com/paper/accessing-higher-level-representations-in
Repo
Framework

Temporally Folded Convolutional Neural Networks for Sequence Forecasting


Title	Temporally Folded Convolutional Neural Networks for Sequence Forecasting
Authors	Matthias Weissenbacher
Abstract	In this work we propose a novel approach to utilize convolutional neural networks for time series forecasting. The time direction of the sequential data with spatial dimensions $D=1,2$ is considered democratically as the input of a spatiotemporal $(D+1)$-dimensional convolutional neural network. Latter then reduces the data stream from $D +1 \to D$ dimensions followed by an incriminator cell which uses this information to forecast the subsequent time step. We empirically compare this strategy to convolutional LSTM’s and LSTM’s on their performance on the sequential MNIST and the JSB chorals dataset, respectively. We conclude that temporally folded convolutional neural networks (TFC’s) may outperform the conventional recurrent strategies.
Tasks	Time Series, Time Series Forecasting
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03340v1
PDF	https://arxiv.org/pdf/2001.03340v1.pdf
PWC	https://paperswithcode.com/paper/temporally-folded-convolutional-neural
Repo
Framework

Guider l’attention dans les modeles de sequence a sequence pour la prediction des actes de dialogue


Title	Guider l’attention dans les modeles de sequence a sequence pour la prediction des actes de dialogue
Authors	Pierre Colombo, Emile Chapuis, Matteo Manica, Emmanuel Vignon, Giovanna Varni, Chloe Clavel
Abstract	The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.
Tasks	Machine Translation
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09419v2
PDF	https://arxiv.org/pdf/2002.09419v2.pdf
PWC	https://paperswithcode.com/paper/guider-lattention-dans-les-modeles-de
Repo
Framework


Title	Cross-modal Multi-task Learning for Graphic Recognition of Caricature Face
Authors	Zuheng Ming, Jean-Christophe Burie, Muhammad Muzzamil Luqman
Abstract	Face recognition of realistic visual images has been well studied and made a significant progress in the recent decade. Unlike the realistic visual images, the face recognition of the caricatures is far from the performance of the visual images. This is largely due to the extreme non-rigid distortions of the caricatures introduced by exaggerating the facial features to strengthen the characters. The heterogeneous modalities of the caricatures and the visual images result the caricature-visual face recognition is a cross-modal problem. In this paper, we propose a method to conduct caricature-visual face recognition via multi-task learning. Rather than the conventional multi-task learning with fixed weights of tasks, this work proposes an approach to learn the weights of tasks according to the importance of tasks. The proposed multi-task learning with dynamic tasks weights enables to appropriately train the hard task and easy task instead of being stuck in the over-training easy task as conventional methods. The experimental results demonstrate the effectiveness of the proposed dynamic multi-task learning for cross-modal caricature-visual face recognition. The performances on the datasets CaVI and WebCaricature show the superiority over the state-of-art methods.
Tasks	Caricature, Face Recognition, Multi-Task Learning
Published	2020-03-10
URL	https://arxiv.org/abs/2003.05787v1
PDF	https://arxiv.org/pdf/2003.05787v1.pdf
PWC	https://paperswithcode.com/paper/cross-modal-multi-task-learning-for-graphic
Repo
Framework

Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic


Title	Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic
Authors	Wonseok Jeon, Paul Barde, Derek Nowrouzezahrai, Joelle Pineau
Abstract	Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems where we seek to recover both policies for our agents and reward functions that promote expert-like behavior. While MA-AIRL has promising results on cooperative and competitive tasks, it is sample-inefficient and has only been validated empirically for small numbers of agents – its ability to scale to many agents remains an open question. We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works. Specifically, we employ multi-agent actor-attention-critic (MAAC) – an off-policy multi-agent RL (MARL) method – for the RL inner loop of the inverse RL procedure. In doing so, we are able to increase sample efficiency compared to state-of-the-art baselines, across both small- and large-scale tasks. Moreover, the RL agents trained on the rewards recovered by our method better match the experts than those trained on the rewards derived from the baselines. Finally, our method requires far fewer agent-environment interactions, particularly as the number of agents increases.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10525v1
PDF	https://arxiv.org/pdf/2002.10525v1.pdf
PWC	https://paperswithcode.com/paper/scalable-multi-agent-inverse-reinforcement
Repo
Framework

Ensemble learning in CNN augmented with fully connected subnetworks


Title	Ensemble learning in CNN augmented with fully connected subnetworks
Authors	Daiki Hirata, Norikazu Takahashi
Abstract	Convolutional Neural Networks (CNNs) have shown remarkable performance in general object recognition tasks. In this paper, we propose a new model called EnsNet which is composed of one base CNN and multiple Fully Connected SubNetworks (FCSNs). In this model, the set of feature-maps generated by the last convolutional layer in the base CNN is divided along channels into disjoint subsets, and these subsets are assigned to the FCSNs. Each of the FCSNs is trained independent of others so that it can predict the class label from the subset of the feature-maps assigned to it. The output of the overall model is determined by majority vote of the base CNN and the FCSNs. Experimental results using the MNIST, Fashion-MNIST and CIFAR-10 datasets show that the proposed approach further improves the performance of CNNs. In particular, an EnsNet achieves a state-of-the-art error rate of 0.16% on MNIST.
Tasks	Object Recognition
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08562v3
PDF	https://arxiv.org/pdf/2003.08562v3.pdf
PWC	https://paperswithcode.com/paper/ensemble-learning-in-cnn-augmented-with-fully
Repo
Framework

An improved online learning algorithm for general fuzzy min-max neural network


Title	An improved online learning algorithm for general fuzzy min-max neural network
Authors	Thanh Tung Khuat, Fang Chen, Bogdan Gabrys
Abstract	This paper proposes an improved version of the current online learning algorithm for a general fuzzy min-max neural network (GFMM) to tackle existing issues concerning expansion and contraction steps as well as the way of dealing with unseen data located on decision boundaries. These drawbacks lower its classification performance, so an improved algorithm is proposed in this study to address the above limitations. The proposed approach does not use the contraction process for overlapping hyperboxes, which is more likely to increase the error rate as shown in the literature. The empirical results indicated the improvement in the classification accuracy and stability of the proposed method compared to the original version and other fuzzy min-max classifiers. In order to reduce the sensitivity to the training samples presentation order of this new on-line learning algorithm, a simple ensemble method is also proposed.
Tasks
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02391v1
PDF	https://arxiv.org/pdf/2001.02391v1.pdf
PWC	https://paperswithcode.com/paper/an-improved-online-learning-algorithm-for
Repo
Framework

CosmoVAE: Variational Autoencoder for CMB Image Inpainting


Title	CosmoVAE: Variational Autoencoder for CMB Image Inpainting
Authors	Kai Yi, Yi Guo, Yanan Fan, Jan Hamann, Yu Guang Wang
Abstract	Cosmic microwave background radiation (CMB) is critical to the understanding of the early universe and precise estimation of cosmological constants. Due to the contamination of thermal dust noise in the galaxy, the CMB map that is an image on the two-dimensional sphere has missing observations, mainly concentrated on the equatorial region. The noise of the CMB map has a significant impact on the estimation precision for cosmological parameters. Inpainting the CMB map can effectively reduce the uncertainty of parametric estimation. In this paper, we propose a deep learning-based variational autoencoder — CosmoVAE, to restoring the missing observations of the CMB map. The input and output of CosmoVAE are square images. To generate training, validation, and test data sets, we segment the full-sky CMB map into many small images by Cartesian projection. CosmoVAE assigns physical quantities to the parameters of the VAE network by using the angular power spectrum of the Gaussian random field as latent variables. CosmoVAE adopts a new loss function to improve the learning performance of the model, which consists of $\ell_1$ reconstruction loss, Kullback-Leibler divergence between the posterior distribution of encoder network and the prior distribution of latent variables, perceptual loss, and total-variation regularizer. The proposed model achieves state of the art performance for Planck \texttt{Commander} 2018 CMB map inpainting.
Tasks	Image Inpainting
Published	2020-01-31
URL	https://arxiv.org/abs/2001.11651v1
PDF	https://arxiv.org/pdf/2001.11651v1.pdf
PWC	https://paperswithcode.com/paper/cosmovae-variational-autoencoder-for-cmb
Repo
Framework

ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions


Title	ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions
Authors	Brijen Thananjeyan, Ashwin Balakrishna, Ugo Rosolia, Joseph E. Gonzalez, Aaron Ames, Ken Goldberg
Abstract	Sample-based learning model predictive control (LMPC) strategies have recently attracted attention due to their desirable theoretical properties and their good empirical performance on robotic tasks. However, prior analysis of LMPC controllers for stochastic systems has mainly focused on linear systems in the iterative learning control setting. We present a novel LMPC algorithm, Adjustable Boundary Condition LMPC (ABC-LMPC), which enables rapid adaptation to novel start and goal configurations and theoretically show that the resulting controller guarantees iterative improvement in expectation for stochastic nonlinear systems. We present results with a practical instantiation of this algorithm and experimentally demonstrate that the resulting controller adapts to a variety of initial and terminal conditions on 3 stochastic continuous control tasks.
Tasks	Continuous Control
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01410v1
PDF	https://arxiv.org/pdf/2003.01410v1.pdf
PWC	https://paperswithcode.com/paper/abc-lmpc-safe-sample-based-learning-mpc-for
Repo
Framework

t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections


Title	t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections
Authors	Angelos Chatzimparmpas, Rafael Messias Martins, Andreas Kerren
Abstract	t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this work, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool’s effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.
Tasks	Dimensionality Reduction
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06910v1
PDF	https://arxiv.org/pdf/2002.06910v1.pdf
PWC	https://paperswithcode.com/paper/t-visne-interactive-assessment-and
Repo
Framework

Dependently Typed Knowledge Graphs


Title	Dependently Typed Knowledge Graphs
Authors	Zhangsheng Lai, Aik Beng Ng, Liang Ze Wong, Simon See, Shaowei Lin
Abstract	Reasoning over knowledge graphs is traditionally built upon a hierarchy of languages in the Semantic Web Stack. Starting from the Resource Description Framework (RDF) for knowledge graphs, more advanced constructs have been introduced through various syntax extensions to add reasoning capabilities to knowledge graphs. In this paper, we show how standardized semantic web technologies (RDF and its query language SPARQL) can be reproduced in a unified manner with dependent type theory. In addition to providing the basic functionalities of knowledge graphs, dependent types add expressiveness in encoding both entities and queries, explainability in answers to queries through witnesses, and compositionality and automation in the construction of witnesses. Using the Coq proof assistant, we demonstrate how to build and query dependently typed knowledge graphs as a proof of concept for future works in this direction.
Tasks	Knowledge Graphs
Published	2020-03-08
URL	https://arxiv.org/abs/2003.03785v1
PDF	https://arxiv.org/pdf/2003.03785v1.pdf
PWC	https://paperswithcode.com/paper/dependently-typed-knowledge-graphs
Repo
Framework

Analysis of Greenhouse Gases


Title	Analysis of Greenhouse Gases
Authors	Shalin Shah
Abstract	Climate change is a result of a complex system of interactions of greenhouse gases (GHG), the ocean, land, ice, and clouds. Large climate change models use several computers and solve several equations to predict the future climate. The equations may include simple polynomials to partial differential equations. Because of the uptake mechanism of the land and ocean, greenhouse gas emissions can take a while to affect the climate. The IPCC has published reports on how greenhouse gas emissions may affect the average temperature of the troposphere and the predictions show that by the end of the century, we can expect a temperature increase from 0:8 C to 5 C. In this article, I use Linear Regression (LM), Quadratic Regression and Gaussian Process Regression (GPR) on monthly GHG data going back several years and try to predict the temperature anomalies based on counterfactuals. The results are quite similar to the IPCC reports.
Tasks
Published	2020-03-21
URL	https://arxiv.org/abs/2003.11916v1
PDF	https://arxiv.org/pdf/2003.11916v1.pdf
PWC	https://paperswithcode.com/paper/analysis-of-greenhouse-gases
Repo
Framework