Paper Group ANR 376
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text. MuST-Cinema: a Speech-to-Subtitles corpus. Momentum Improves Normalized SGD. Accessing Higher-level Representations in Sequential Transformers with Feedback Memory. Temporally Folded Convolutional Neural Networks for Sequence Forecasting. Guider l’attention dans les mode …
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Title | Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text |
Authors | Difei Gao, Ke Li, Ruiping Wang, Shiguang Shan, Xilin Chen |
Abstract | Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts. |
Tasks | Question Answering, Visual Question Answering |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2003.13962v1 |
https://arxiv.org/pdf/2003.13962v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-graph-neural-network-for-joint |
Repo | |
Framework | |
MuST-Cinema: a Speech-to-Subtitles corpus
Title | MuST-Cinema: a Speech-to-Subtitles corpus |
Authors | Alina Karakanta, Matteo Negri, Marco Turchi |
Abstract | Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling. Neural Machine Translation (NMT) can contribute to the automatisation of subtitling, facilitating the work of human subtitlers and reducing turn-around times and related costs. NMT requires high-quality, large, task-specific training data. The existing subtitling corpora, however, are missing both alignments to the source language audio and important information about subtitle breaks. This poses a significant limitation for developing efficient automatic approaches for subtitling, since the length and form of a subtitle directly depends on the duration of the utterance. In this work, we present MuST-Cinema, a multilingual speech translation corpus built from TED subtitles. The corpus is comprised of (audio, transcription, translation) triplets. Subtitle breaks are preserved by inserting special symbols. We show that the corpus can be used to build models that efficiently segment sentences into subtitles and propose a method for annotating existing subtitling corpora with subtitle breaks, conforming to the constraint of length. |
Tasks | Machine Translation |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10829v1 |
https://arxiv.org/pdf/2002.10829v1.pdf | |
PWC | https://paperswithcode.com/paper/must-cinema-a-speech-to-subtitles-corpus |
Repo | |
Framework | |
Momentum Improves Normalized SGD
Title | Momentum Improves Normalized SGD |
Authors | Ashok Cutkosky, Harsh Mehta |
Abstract | We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives. Then, we consider the case of objectives with bounded second derivative and show that in this case a small tweak to the momentum formula allows normalized SGD with momentum to find an $\epsilon$-critical point in $O(1/\epsilon^{3.5})$ iterations, matching the best-known rates without accruing any logarithmic factors or dependence on dimension. We also provide an adaptive method that automatically improves convergence rates when the variance in the gradients is small. Finally, we show that our method is effective when employed on popular large scale tasks such as ResNet-50 and BERT pretraining, matching the performance of the disparate methods used to get state-of-the-art results on both tasks. |
Tasks | |
Published | 2020-02-09 |
URL | https://arxiv.org/abs/2002.03305v1 |
https://arxiv.org/pdf/2002.03305v1.pdf | |
PWC | https://paperswithcode.com/paper/momentum-improves-normalized-sgd |
Repo | |
Framework | |
Accessing Higher-level Representations in Sequential Transformers with Feedback Memory
Title | Accessing Higher-level Representations in Sequential Transformers with Feedback Memory |
Authors | Angela Fan, Thibaut Lavril, Edouard Grave, Armand Joulin, Sainbayar Sukhbaatar |
Abstract | Transformers are feedforward networks that can process input tokens in parallel. While this parallelization makes them computationally efficient, it restricts the model from fully exploiting the sequential nature of the input - the representation at a given layer can only access representations from lower layers, rather than the higher level representations already built in previous time steps. In this work, we propose the Feedback Transformer architecture that exposes all previous representations to all future representations, meaning the lowest representation of the current timestep is formed from the highest-level abstract representation of the past. We demonstrate on a variety of benchmarks in language modeling, neural machine translation, summarization, and reinforcement learning that the increased representation capacity can improve over Transformer baselines. |
Tasks | Language Modelling, Machine Translation |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09402v2 |
https://arxiv.org/pdf/2002.09402v2.pdf | |
PWC | https://paperswithcode.com/paper/accessing-higher-level-representations-in |
Repo | |
Framework | |
Temporally Folded Convolutional Neural Networks for Sequence Forecasting
Title | Temporally Folded Convolutional Neural Networks for Sequence Forecasting |
Authors | Matthias Weissenbacher |
Abstract | In this work we propose a novel approach to utilize convolutional neural networks for time series forecasting. The time direction of the sequential data with spatial dimensions $D=1,2$ is considered democratically as the input of a spatiotemporal $(D+1)$-dimensional convolutional neural network. Latter then reduces the data stream from $D +1 \to D$ dimensions followed by an incriminator cell which uses this information to forecast the subsequent time step. We empirically compare this strategy to convolutional LSTM’s and LSTM’s on their performance on the sequential MNIST and the JSB chorals dataset, respectively. We conclude that temporally folded convolutional neural networks (TFC’s) may outperform the conventional recurrent strategies. |
Tasks | Time Series, Time Series Forecasting |
Published | 2020-01-10 |
URL | https://arxiv.org/abs/2001.03340v1 |
https://arxiv.org/pdf/2001.03340v1.pdf | |
PWC | https://paperswithcode.com/paper/temporally-folded-convolutional-neural |
Repo | |
Framework | |
Guider l’attention dans les modeles de sequence a sequence pour la prediction des actes de dialogue
Title | Guider l’attention dans les modeles de sequence a sequence pour la prediction des actes de dialogue |
Authors | Pierre Colombo, Emile Chapuis, Matteo Manica, Emmanuel Vignon, Giovanna Varni, Chloe Clavel |
Abstract | The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA. |
Tasks | Machine Translation |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09419v2 |
https://arxiv.org/pdf/2002.09419v2.pdf | |
PWC | https://paperswithcode.com/paper/guider-lattention-dans-les-modeles-de |
Repo | |
Framework | |
Cross-modal Multi-task Learning for Graphic Recognition of Caricature Face
Title | Cross-modal Multi-task Learning for Graphic Recognition of Caricature Face |
Authors | Zuheng Ming, Jean-Christophe Burie, Muhammad Muzzamil Luqman |
Abstract | Face recognition of realistic visual images has been well studied and made a significant progress in the recent decade. Unlike the realistic visual images, the face recognition of the caricatures is far from the performance of the visual images. This is largely due to the extreme non-rigid distortions of the caricatures introduced by exaggerating the facial features to strengthen the characters. The heterogeneous modalities of the caricatures and the visual images result the caricature-visual face recognition is a cross-modal problem. In this paper, we propose a method to conduct caricature-visual face recognition via multi-task learning. Rather than the conventional multi-task learning with fixed weights of tasks, this work proposes an approach to learn the weights of tasks according to the importance of tasks. The proposed multi-task learning with dynamic tasks weights enables to appropriately train the hard task and easy task instead of being stuck in the over-training easy task as conventional methods. The experimental results demonstrate the effectiveness of the proposed dynamic multi-task learning for cross-modal caricature-visual face recognition. The performances on the datasets CaVI and WebCaricature show the superiority over the state-of-art methods. |
Tasks | Caricature, Face Recognition, Multi-Task Learning |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.05787v1 |
https://arxiv.org/pdf/2003.05787v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-modal-multi-task-learning-for-graphic |
Repo | |
Framework | |
Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic
Title | Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic |
Authors | Wonseok Jeon, Paul Barde, Derek Nowrouzezahrai, Joelle Pineau |
Abstract | Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems where we seek to recover both policies for our agents and reward functions that promote expert-like behavior. While MA-AIRL has promising results on cooperative and competitive tasks, it is sample-inefficient and has only been validated empirically for small numbers of agents – its ability to scale to many agents remains an open question. We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works. Specifically, we employ multi-agent actor-attention-critic (MAAC) – an off-policy multi-agent RL (MARL) method – for the RL inner loop of the inverse RL procedure. In doing so, we are able to increase sample efficiency compared to state-of-the-art baselines, across both small- and large-scale tasks. Moreover, the RL agents trained on the rewards recovered by our method better match the experts than those trained on the rewards derived from the baselines. Finally, our method requires far fewer agent-environment interactions, particularly as the number of agents increases. |
Tasks | |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10525v1 |
https://arxiv.org/pdf/2002.10525v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-multi-agent-inverse-reinforcement |
Repo | |
Framework | |
Ensemble learning in CNN augmented with fully connected subnetworks
Title | Ensemble learning in CNN augmented with fully connected subnetworks |
Authors | Daiki Hirata, Norikazu Takahashi |
Abstract | Convolutional Neural Networks (CNNs) have shown remarkable performance in general object recognition tasks. In this paper, we propose a new model called EnsNet which is composed of one base CNN and multiple Fully Connected SubNetworks (FCSNs). In this model, the set of feature-maps generated by the last convolutional layer in the base CNN is divided along channels into disjoint subsets, and these subsets are assigned to the FCSNs. Each of the FCSNs is trained independent of others so that it can predict the class label from the subset of the feature-maps assigned to it. The output of the overall model is determined by majority vote of the base CNN and the FCSNs. Experimental results using the MNIST, Fashion-MNIST and CIFAR-10 datasets show that the proposed approach further improves the performance of CNNs. In particular, an EnsNet achieves a state-of-the-art error rate of 0.16% on MNIST. |
Tasks | Object Recognition |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08562v3 |
https://arxiv.org/pdf/2003.08562v3.pdf | |
PWC | https://paperswithcode.com/paper/ensemble-learning-in-cnn-augmented-with-fully |
Repo | |
Framework | |
An improved online learning algorithm for general fuzzy min-max neural network
Title | An improved online learning algorithm for general fuzzy min-max neural network |
Authors | Thanh Tung Khuat, Fang Chen, Bogdan Gabrys |
Abstract | This paper proposes an improved version of the current online learning algorithm for a general fuzzy min-max neural network (GFMM) to tackle existing issues concerning expansion and contraction steps as well as the way of dealing with unseen data located on decision boundaries. These drawbacks lower its classification performance, so an improved algorithm is proposed in this study to address the above limitations. The proposed approach does not use the contraction process for overlapping hyperboxes, which is more likely to increase the error rate as shown in the literature. The empirical results indicated the improvement in the classification accuracy and stability of the proposed method compared to the original version and other fuzzy min-max classifiers. In order to reduce the sensitivity to the training samples presentation order of this new on-line learning algorithm, a simple ensemble method is also proposed. |
Tasks | |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.02391v1 |
https://arxiv.org/pdf/2001.02391v1.pdf | |
PWC | https://paperswithcode.com/paper/an-improved-online-learning-algorithm-for |
Repo | |
Framework | |
CosmoVAE: Variational Autoencoder for CMB Image Inpainting
Title | CosmoVAE: Variational Autoencoder for CMB Image Inpainting |
Authors | Kai Yi, Yi Guo, Yanan Fan, Jan Hamann, Yu Guang Wang |
Abstract | Cosmic microwave background radiation (CMB) is critical to the understanding of the early universe and precise estimation of cosmological constants. Due to the contamination of thermal dust noise in the galaxy, the CMB map that is an image on the two-dimensional sphere has missing observations, mainly concentrated on the equatorial region. The noise of the CMB map has a significant impact on the estimation precision for cosmological parameters. Inpainting the CMB map can effectively reduce the uncertainty of parametric estimation. In this paper, we propose a deep learning-based variational autoencoder — CosmoVAE, to restoring the missing observations of the CMB map. The input and output of CosmoVAE are square images. To generate training, validation, and test data sets, we segment the full-sky CMB map into many small images by Cartesian projection. CosmoVAE assigns physical quantities to the parameters of the VAE network by using the angular power spectrum of the Gaussian random field as latent variables. CosmoVAE adopts a new loss function to improve the learning performance of the model, which consists of $\ell_1$ reconstruction loss, Kullback-Leibler divergence between the posterior distribution of encoder network and the prior distribution of latent variables, perceptual loss, and total-variation regularizer. The proposed model achieves state of the art performance for Planck \texttt{Commander} 2018 CMB map inpainting. |
Tasks | Image Inpainting |
Published | 2020-01-31 |
URL | https://arxiv.org/abs/2001.11651v1 |
https://arxiv.org/pdf/2001.11651v1.pdf | |
PWC | https://paperswithcode.com/paper/cosmovae-variational-autoencoder-for-cmb |
Repo | |
Framework | |
ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions
Title | ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions |
Authors | Brijen Thananjeyan, Ashwin Balakrishna, Ugo Rosolia, Joseph E. Gonzalez, Aaron Ames, Ken Goldberg |
Abstract | Sample-based learning model predictive control (LMPC) strategies have recently attracted attention due to their desirable theoretical properties and their good empirical performance on robotic tasks. However, prior analysis of LMPC controllers for stochastic systems has mainly focused on linear systems in the iterative learning control setting. We present a novel LMPC algorithm, Adjustable Boundary Condition LMPC (ABC-LMPC), which enables rapid adaptation to novel start and goal configurations and theoretically show that the resulting controller guarantees iterative improvement in expectation for stochastic nonlinear systems. We present results with a practical instantiation of this algorithm and experimentally demonstrate that the resulting controller adapts to a variety of initial and terminal conditions on 3 stochastic continuous control tasks. |
Tasks | Continuous Control |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01410v1 |
https://arxiv.org/pdf/2003.01410v1.pdf | |
PWC | https://paperswithcode.com/paper/abc-lmpc-safe-sample-based-learning-mpc-for |
Repo | |
Framework | |
t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections
Title | t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections |
Authors | Angelos Chatzimparmpas, Rafael Messias Martins, Andreas Kerren |
Abstract | t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this work, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool’s effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable. |
Tasks | Dimensionality Reduction |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.06910v1 |
https://arxiv.org/pdf/2002.06910v1.pdf | |
PWC | https://paperswithcode.com/paper/t-visne-interactive-assessment-and |
Repo | |
Framework | |
Dependently Typed Knowledge Graphs
Title | Dependently Typed Knowledge Graphs |
Authors | Zhangsheng Lai, Aik Beng Ng, Liang Ze Wong, Simon See, Shaowei Lin |
Abstract | Reasoning over knowledge graphs is traditionally built upon a hierarchy of languages in the Semantic Web Stack. Starting from the Resource Description Framework (RDF) for knowledge graphs, more advanced constructs have been introduced through various syntax extensions to add reasoning capabilities to knowledge graphs. In this paper, we show how standardized semantic web technologies (RDF and its query language SPARQL) can be reproduced in a unified manner with dependent type theory. In addition to providing the basic functionalities of knowledge graphs, dependent types add expressiveness in encoding both entities and queries, explainability in answers to queries through witnesses, and compositionality and automation in the construction of witnesses. Using the Coq proof assistant, we demonstrate how to build and query dependently typed knowledge graphs as a proof of concept for future works in this direction. |
Tasks | Knowledge Graphs |
Published | 2020-03-08 |
URL | https://arxiv.org/abs/2003.03785v1 |
https://arxiv.org/pdf/2003.03785v1.pdf | |
PWC | https://paperswithcode.com/paper/dependently-typed-knowledge-graphs |
Repo | |
Framework | |
Analysis of Greenhouse Gases
Title | Analysis of Greenhouse Gases |
Authors | Shalin Shah |
Abstract | Climate change is a result of a complex system of interactions of greenhouse gases (GHG), the ocean, land, ice, and clouds. Large climate change models use several computers and solve several equations to predict the future climate. The equations may include simple polynomials to partial differential equations. Because of the uptake mechanism of the land and ocean, greenhouse gas emissions can take a while to affect the climate. The IPCC has published reports on how greenhouse gas emissions may affect the average temperature of the troposphere and the predictions show that by the end of the century, we can expect a temperature increase from 0:8 C to 5 C. In this article, I use Linear Regression (LM), Quadratic Regression and Gaussian Process Regression (GPR) on monthly GHG data going back several years and try to predict the temperature anomalies based on counterfactuals. The results are quite similar to the IPCC reports. |
Tasks | |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.11916v1 |
https://arxiv.org/pdf/2003.11916v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-greenhouse-gases |
Repo | |
Framework | |