April 2, 2020

# Paper Group ANR 103

Input Dropout for Spatially Aligned Modalities. On- Device Information Extraction from Screenshots in form of tags. Generating Representative Headlines for News Stories. DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion. Dissipative SymODEN: Encoding Hamiltonian Dynamics with Dissipation and Control into Deep Learning. A Compr …

#### Input Dropout for Spatially Aligned Modalities

Title Input Dropout for Spatially Aligned Modalities
Authors Sébastien de Blois, Mathieu Garon, Christian Gagné, Jean-François Lalonde
Abstract Computer vision datasets containing multiple modalities such as color, depth, and thermal properties are now commonly accessible and useful for solving a wide array of challenging tasks. However, deploying multi-sensor heads is not possible in many scenarios. As such many practical solutions tend to be based on simpler sensors, mostly for cost, simplicity and robustness considerations. In this work, we propose a training methodology to take advantage of these additional modalities available in datasets, even if they are not available at test time. By assuming that the modalities have a strong spatial correlation, we propose Input Dropout, a simple technique that consists in stochastic hiding of one or many input modalities at training time, while using only the canonical (e.g. RGB) modalities at test time. We demonstrate that Input Dropout trivially combines with existing deep convolutional architectures, and improves their performance on a wide range of computer vision tasks such as dehazing, 6-DOF object tracking, pedestrian detection and object classification.
Tasks Object Classification, Object Tracking, Pedestrian Detection
Published 2020-02-07
URL https://arxiv.org/abs/2002.02852v1
PDF https://arxiv.org/pdf/2002.02852v1.pdf
PWC https://paperswithcode.com/paper/input-dropout-for-spatially-aligned
Repo
Framework

#### On- Device Information Extraction from Screenshots in form of tags

Title On- Device Information Extraction from Screenshots in form of tags
Authors Sumit Kumar, Gopi Ramena, Manoj Goyal, Debi Mohanty, Ankur Agarwal, Benu Changmai, Sukumar Moharana
Abstract We propose a method to make mobile screenshots easily searchable. In this paper, we present the workflow in which we: 1) preprocessed a collection of screenshots, 2) identified script presentin image, 3) extracted unstructured text from images, 4) identifiedlanguage of the extracted text, 5) extracted keywords from the text, 6) identified tags based on image features, 7) expanded tag set by identifying related keywords, 8) inserted image tags with relevant images after ranking and indexed them to make it searchable on device. We made the pipeline which supports multiple languages and executed it on-device, which addressed privacy concerns. We developed novel architectures for components in the pipeline, optimized performance and memory for on-device computation. We observed from experimentation that the solution developed can reduce overall user effort and improve end user experience while searching, whose results are published.
Published 2020-01-11
URL https://arxiv.org/abs/2001.06094v1
PDF https://arxiv.org/pdf/2001.06094v1.pdf
PWC https://paperswithcode.com/paper/on-device-information-extraction-from
Repo
Framework

#### Generating Representative Headlines for News Stories

Title Generating Representative Headlines for News Stories
Authors Xiaotao Gu, Yuning Mao, Jiawei Han, Jialu Liu, Hongkun Yu, You Wu, Cong Yu, Daniel Finnie, Jiaqi Zhai, Nicholas Zukoski
Abstract Millions of news articles are published online every day, which can be overwhelming for readers to follow. Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption. However, it remains a challenging research problem to efficiently and effectively generate a representative headline for each story. Automatic summarization of a document set has been studied for decades, while few studies have focused on generating representative headlines for a set of articles. Unlike summaries, which aim to capture most information with least redundancy, headlines aim to capture information jointly shared by the story articles in short length, and exclude information that is too specific to each individual article. In this work, we study the problem of generating representative headlines for news stories. We develop a distant supervision approach to train large-scale generation models without any human annotation. This approach centers on two technical components. First, we propose a multi-level pre-training framework that incorporates massive unlabeled corpus with different quality-vs.-quantity balance at different levels. We show that models trained within this framework outperform those trained with pure human curated corpus. Second, we propose a novel self-voting-based article attention layer to extract salient information shared by multiple articles. We show that models that incorporate this layer are robust to potential noises in news stories and outperform existing baselines with or without noises. We can further enhance our model by incorporating human labels, and we show our distant supervision approach significantly reduces the demand on labeled data.
Published 2020-01-26
URL https://arxiv.org/abs/2001.09386v3
PDF https://arxiv.org/pdf/2001.09386v3.pdf
Repo
Framework

#### DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion

Title DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion
Authors Zixiang Zhao, Shuang Xu, Chunxia Zhang, Junmin Liu, Pengfei Li, Jiangshe Zhang
Abstract Infrared and visible image fusion, a hot topic in the field of image processing, aims at obtaining fused images keeping the advantages of source images. This paper proposes a novel auto-encoder (AE) based fusion network. The core idea is that the encoder decomposes an image into background and detail feature maps with low- and high-frequency information, respectively, and that the decoder recovers the original image. To this end, the loss function makes the background/detail feature maps of source images similar/dissimilar. In the test phase, background and detail feature maps are respectively merged via a fusion module, and the fused image is recovered by the decoder. Qualitative and quantitative results illustrate that our method can generate fusion images containing highlighted targets and abundant detail texture information with strong robustness and meanwhile surpass state-of-the-art (SOTA) approaches.
Tasks Infrared And Visible Image Fusion
Published 2020-03-20
URL https://arxiv.org/abs/2003.09210v2
PDF https://arxiv.org/pdf/2003.09210v2.pdf
PWC https://paperswithcode.com/paper/didfuse-deep-image-decomposition-for-infrared
Repo
Framework

#### Dissipative SymODEN: Encoding Hamiltonian Dynamics with Dissipation and Control into Deep Learning

Title Dissipative SymODEN: Encoding Hamiltonian Dynamics with Dissipation and Control into Deep Learning
Authors Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty
Abstract In this work, we introduce Dissipative SymODEN, a deep learning architecture which can infer the dynamics of a physical system with dissipation from observed state trajectories. To improve prediction accuracy while reducing network size, Dissipative SymODEN encodes the port-Hamiltonian dynamics with energy dissipation and external input into the design of its computation graph and learns the dynamics in a structured way. The learned model, by revealing key aspects of the system, such as the inertia, dissipation, and potential energy, paves the way for energy-based controllers.
Published 2020-02-20
URL https://arxiv.org/abs/2002.08860v2
PDF https://arxiv.org/pdf/2002.08860v2.pdf
PWC https://paperswithcode.com/paper/dissipative-symoden-encoding-hamiltonian
Repo
Framework

#### A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks

Title A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks
Authors Zhaodong Chen, Lei Deng, Bangyan Wang, Guoqi Li, Yuan Xie
Abstract In recent years, plenty of metrics have been proposed to identify networks that are free of gradient explosion and vanishing. However, due to the diversity of network components and complex serial-parallel hybrid connections in modern DNNs, the evaluation of existing metrics usually requires strong assumptions, complex statistical analysis, or has limited application fields, which constraints their spread in the community. In this paper, inspired by the Gradient Norm Equality and dynamical isometry, we first propose a novel metric called Block Dynamical Isometry, which measures the change of gradient norm in individual block. Because our Block Dynamical Isometry is norm-based, its evaluation needs weaker assumptions compared with the original dynamical isometry. To mitigate the challenging derivation, we propose a highly modularized statistical framework based on free probability. Our framework includes several key theorems to handle complex serial-parallel hybrid connections and a library to cover the diversity of network components. Besides, several sufficient prerequisites are provided. Powered by our metric and framework, we analyze extensive initialization, normalization, and network structures. We find that Gradient Norm Equality is a universal philosophy behind them. Then, we improve some existing methods based on our analysis, including an activation function selection strategy for initialization techniques, a new configuration for weight normalization, and a depth-aware way to derive coefficients in SeLU. Moreover, we propose a novel normalization technique named second moment normalization, which is theoretically 30% faster than batch normalization without accuracy loss. Last but not least, our conclusions and methods are evidenced by extensive experiments on multiple models over CIFAR10 and ImageNet.
Published 2020-01-01
URL https://arxiv.org/abs/2001.00254v1
PDF https://arxiv.org/pdf/2001.00254v1.pdf
PWC https://paperswithcode.com/paper/a-comprehensive-and-modularized-statistical
Repo
Framework

#### Faster On-Device Training Using New Federated Momentum Algorithm

Title Faster On-Device Training Using New Federated Momentum Algorithm
Authors Zhouyuan Huo, Qian Yang, Bin Gu, Lawrence Carin. Heng Huang
Abstract Mobile crowdsensing has gained significant attention in recent years and has become a critical paradigm for emerging Internet of Things applications. The sensing devices continuously generate a significant quantity of data, which provide tremendous opportunities to develop innovative intelligent applications. To utilize these data to train machine learning models while not compromising user privacy, federated learning has become a promising solution. However, there is little understanding of whether federated learning algorithms are guaranteed to converge. We reconsider model averaging in federated learning and formulate it as a gradient-based method with biased gradients. This novel perspective assists analysis of its convergence rate and provides a new direction for more acceleration. We prove for the first time that the federated averaging algorithm is guaranteed to converge for non-convex problems, without imposing additional assumptions. We further propose a novel accelerated federated learning algorithm and provide a convergence guarantee. Simulated federated learning experiments are conducted to train deep neural networks on benchmark datasets, and experimental results show that our proposed method converges faster than previous approaches.
Published 2020-02-06
URL https://arxiv.org/abs/2002.02090v1
PDF https://arxiv.org/pdf/2002.02090v1.pdf
PWC https://paperswithcode.com/paper/faster-on-device-training-using-new-federated
Repo
Framework

#### Few-shot Action Recognition via Improved Attention with Self-supervision

Title Few-shot Action Recognition via Improved Attention with Self-supervision
Authors Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H. S. Torr, Piotr Koniusz
Abstract Most existing few-shot learning methods in computer vision focus on class recognition given a few of still images as the input. In contrast, this paper tackles a more challenging task of few-shot action-recognition from video clips. We propose a simple framework which is both flexible and easy to implement. Our approach exploits joint spatial and temporal attention mechanisms in conjunction with self-supervised representation learning on videos. This design encourages the model to discover and encode spatial and temporal attention hotspots important during the similarity learning between dynamic video sequences for which locations of discriminative patterns vary in the spatio-temporal sense. Our method compares favorably with several state-of-the-art baselines on HMDB51, miniMIT and UCF101 datasets, demonstrating its superior performance.
Published 2020-01-12
URL https://arxiv.org/abs/2001.03905v1
PDF https://arxiv.org/pdf/2001.03905v1.pdf
PWC https://paperswithcode.com/paper/few-shot-action-recognition-via-improved
Repo
Framework

#### Heterogeneous Graph Neural Networks for Malicious Account Detection

Title Heterogeneous Graph Neural Networks for Malicious Account Detection
Authors Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, Le Song
Abstract We present, GEM, the first heterogeneous graph neural network approach for detecting malicious accounts at Alipay, one of the world’s leading mobile cashless payment platform. Our approach, inspired from a connected subgraph approach, adaptively learns discriminative embeddings from heterogeneous account-device graphs based on two fundamental weaknesses of attackers, i.e. device aggregation and activity aggregation. For the heterogeneous graph consists of various types of nodes, we propose an attention mechanism to learn the importance of different types of nodes, while using the sum operator for modeling the aggregation patterns of nodes in each type. Experiments show that our approaches consistently perform promising results compared with competitive methods over time.
Published 2020-02-27
URL https://arxiv.org/abs/2002.12307v1
PDF https://arxiv.org/pdf/2002.12307v1.pdf
PWC https://paperswithcode.com/paper/heterogeneous-graph-neural-networks-for
Repo
Framework

#### PushNet: Efficient and Adaptive Neural Message Passing

Title PushNet: Efficient and Adaptive Neural Message Passing
Authors Julian Busch, Jiaxing Pi, Thomas Seidl
Abstract Message passing neural networks have recently evolved into a state-of-the-art approach to representation learning on graphs. Existing methods perform synchronous message passing along all edges in multiple subsequent rounds and consequently suffer from various shortcomings: Propagation schemes are inflexible since they are restricted to $k$-hop neighborhoods and insensitive to actual demands of information propagation. Further, long-range dependencies cannot be modeled adequately and learned representations are based on correlations of fixed locality. These issues prevent existing methods from reaching their full potential in terms of prediction performance. Instead, we consider a novel asynchronous message passing approach where information is pushed only along the most relevant edges until convergence. Our proposed algorithm can equivalently be formulated as a single synchronous message passing iteration using a suitable neighborhood function, thus sharing the advantages of existing methods while addressing their central issues. The resulting neural network utilizes a node-adaptive receptive field derived from meaningful sparse node neighborhoods. In addition, by learning and combining node representations over differently sized neighborhoods, our model is able to capture correlations on multiple scales. We further propose variants of our base model with different inductive bias. Empirical results are provided for semi-supervised node classification on five real-world datasets following a rigorous evaluation protocol. We find that our models outperform competitors on all datasets in terms of accuracy with statistical significance. In some cases, our models additionally provide faster runtime.
Published 2020-03-04
URL https://arxiv.org/abs/2003.02228v2
PDF https://arxiv.org/pdf/2003.02228v2.pdf
Repo
Framework

#### Learning to Hash with Graph Neural Networks for Recommender Systems

Title Learning to Hash with Graph Neural Networks for Recommender Systems
Authors Qiaoyu Tan, Ninghao Liu, Xing Zhao, Hongxia Yang, Jingren Zhou, Xia Hu
Abstract Graph representation learning has attracted much attention in supporting high quality candidate search at scale. Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users’ preferences in continuous embedding space are tremendous. In this work, we investigate the problem of hashing with graph neural networks (GNNs) for high quality retrieval, and propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes. Specifically, a deep hashing with GNNs (HashGNN) is presented, which consists of two components, a GNN encoder for learning node representations, and a hash layer for encoding representations to hash codes. The whole architecture is trained end-to-end by jointly optimizing two losses, i.e., reconstruction loss from reconstructing observed links, and ranking loss from preserving the relative ordering of hash codes. A novel discrete optimization strategy based on straight through estimator (STE) with guidance is proposed. The principal idea is to avoid gradient magnification in back-propagation of STE with continuous embedding guidance, in which we begin from learning an easier network that mimic the continuous embedding and let it evolve during the training until it finally goes back to STE. Comprehensive experiments over three publicly available and one real-world Alibaba company datasets demonstrate that our model not only can achieve comparable performance compared with its continuous counterpart but also runs multiple times faster during inference.
Tasks Graph Representation Learning, Recommendation Systems, Representation Learning
Published 2020-03-04
URL https://arxiv.org/abs/2003.01917v1
PDF https://arxiv.org/pdf/2003.01917v1.pdf
PWC https://paperswithcode.com/paper/learning-to-hash-with-graph-neural-networks
Repo
Framework

#### Localizing Multi-scale Semantic Patches for Image Classification

Title Localizing Multi-scale Semantic Patches for Image Classification
Authors Chuanguang Yang, Xiaolong Hu, Zhulin An, Hui Zhu, Yongjun Xu
Abstract Deep convolutional neural networks (CNN) always non-linearly aggregate the information from the whole input image, which results in the difficult to interpret how relevant regions contribute the final prediction. In this paper, we construct a light-weight AnchorNet combined with our proposed algorithms to localize multi-scale semantic patches, where the contribution of each patch can be determined due to the linearly spatial aggregation before the softmax layer. Visual explanation shows that localized patches can indeed retain the semantics of the original images, while helping us to further analyze the feature extraction of localization branches with various receptive fields. For more practical, we use localized patches for downstream classification tasks across widely applied networks. Experimental results demonstrate that replacing the original images can get a clear inference acceleration with only tiny performance degradation.
Published 2020-01-31
URL https://arxiv.org/abs/2002.03737v1
PDF https://arxiv.org/pdf/2002.03737v1.pdf
PWC https://paperswithcode.com/paper/localizing-multi-scale-semantic-patches-for
Repo
Framework

#### q-VAE for Disentangled Representation Learning and Latent Dynamical Systems

Title q-VAE for Disentangled Representation Learning and Latent Dynamical Systems
Authors Taisuke Kobayashi
Abstract This paper proposes a novel variational autoencoder (VAE) derived from Tsallis statistics, named q-VAE. A vanilla VAE is utilized to statistically extract latent space hidden in data sampled. Such latent space is useful to make robots controllable in feasible computational time and cost. To improve usefulness of the latent space, this paper focuses on disentangled representation learning like $\beta$-VAE, which is the baseline for it. Starting from the viewpoint of Tsallis statistics, a new lower bound of the q-VAE is derived to maximize likelihood of the data sampled. This can be regarded as an adaptive $\beta$-VAE with a deformed Kullback-Leibler divergence. To verify benefits from the q-VAE, a benchmark task to extract the latent space from MNIST dataset is performed. It is found that the q-VAE improved the disentangled representation while not deteriorating reconstruction accuracy of the data. As another advantage of the q-VAE, it does not require independency between the data. This advantage is demonstrated in learning latent dynamics of a nonlinear dynamical simulation. By combining the disentangled representation, the q-VAE achieves stable and accurate long-term state prediction from the initial state and the actions at respective times.
Published 2020-03-04
URL https://arxiv.org/abs/2003.01852v1
PDF https://arxiv.org/pdf/2003.01852v1.pdf
PWC https://paperswithcode.com/paper/q-vae-for-disentangled-representation
Repo
Framework

#### Towards Novel Insights in Lattice Field Theory with Explainable Machine Learning

Title Towards Novel Insights in Lattice Field Theory with Explainable Machine Learning
Authors Stefan Bluecher, Lukas Kades, Jan M. Pawlowski, Nils Strodthoff, Julian M. Urban
Abstract Machine learning has the potential to aid our understanding of phase structures in lattice quantum field theories through the statistical analysis of Monte Carlo samples. Available algorithms, in particular those based on deep learning, often demonstrate remarkable performance in the search for previously unidentified features, but tend to lack transparency if applied naively. To address these shortcomings, we propose representation learning in combination with interpretability methods as a framework for the identification of observables. More specifically, we investigate action parameter regression as a pretext task while using layer-wise relevance propagation (LRP) to identify the most important observables depending on the location in the phase diagram. The approach is put to work in the context of a scalar Yukawa model in (2+1)d. First, we investigate a multilayer perceptron to determine an importance hierarchy of several predefined, standard observables. The method is then applied directly to the raw field configurations using a convolutional network, demonstrating the ability to reconstruct all order parameters from the learned filter weights. Based on our results, we argue that due to its broad applicability, attribution methods such as LRP could prove a useful and versatile tool in our search for new physical insights. In the case of the Yukawa model, it facilitates the construction of an observable that characterises the symmetric phase.
Published 2020-03-03
URL https://arxiv.org/abs/2003.01504v1
PDF https://arxiv.org/pdf/2003.01504v1.pdf
PWC https://paperswithcode.com/paper/towards-novel-insights-in-lattice-field
Repo
Framework

#### Oblivious Data for Fairness with Kernels

Title Oblivious Data for Fairness with Kernels