April 1, 2020

3417 words 17 mins read

Paper Group ANR 399

Quantum Machine Learning Algorithm for Knowledge Graphs. Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout. Proficiency Aware Multi-Agent Actor-Critic for Mixed Aerial and Ground Robot Teaming. Training for Speech Recognition on Coprocessors. Towards Deep Unsupervised SAR Despeckling with Blind-Spot Conv …

Quantum Machine Learning Algorithm for Knowledge Graphs


Title	Quantum Machine Learning Algorithm for Knowledge Graphs
Authors	Yunpu Ma, Yuyi Wang, Volker Tresp
Abstract	Semantic knowledge graphs are large-scale triple-oriented databases for knowledge representation and reasoning. Implicit knowledge can be inferred by modeling and reconstructing the tensor representations generated from knowledge graphs. However, as the sizes of knowledge graphs continue to grow, classical modeling becomes increasingly computational resource intensive. This paper investigates how quantum resources can be capitalized to accelerate the modeling of knowledge graphs. In particular, we propose the first quantum machine learning algorithm for making inference on tensorized data, e.g., on knowledge graphs. Since most tensor problems are NP-hard, it is challenging to devise quantum algorithms to support that task. We simplify the problem by making a plausible assumption that the tensor representation of a knowledge graph can be approximated by its low-rank tensor singular value decomposition, which is verified by our experiments. The proposed sampling-based quantum algorithm achieves exponential speedup with a runtime that is polylogarithmic in the dimension of knowledge graph tensor.
Tasks	Knowledge Graphs, Quantum Machine Learning
Published	2020-01-04
URL	https://arxiv.org/abs/2001.01077v1
PDF	https://arxiv.org/pdf/2001.01077v1.pdf
PWC	https://paperswithcode.com/paper/quantum-machine-learning-algorithm-for
Repo
Framework

Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout


Title	Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout
Authors	Filip Graliński, Tomasz Stanisławek, Anna Wróblewska, Dawid Lipiński, Agnieszka Kaliska, Paulina Rosalska, Bartosz Topolski, Przemysław Biecek
Abstract	State-of-the-art solutions for Natural Language Processing (NLP) are able to capture a broad range of contexts, like the sentence-level context or document-level context for short documents. But these solutions are still struggling when it comes to longer, real-world documents with the information encoded in the spatial structure of the document, such as page elements like tables, forms, headers, openings or footers; complex page layout or presence of multiple pages. To encourage progress on deeper and more complex Information Extraction (IE) we introduce a new task (named Kleister) with two new datasets. Utilizing both textual and structural layout features, an NLP system must find the most important information, about various types of entities, in long formal documents. We propose Pipeline method as a text-only baseline with different Named Entity Recognition architectures (Flair, BERT, RoBERTa). Moreover, we checked the most popular PDF processing tools for text extraction (pdf2djvu, Tesseract and Textract) in order to analyze behavior of IE system in presence of errors introduced by these tools.
Tasks	Named Entity Recognition
Published	2020-03-04
URL	https://arxiv.org/abs/2003.02356v2
PDF	https://arxiv.org/pdf/2003.02356v2.pdf
PWC	https://paperswithcode.com/paper/kleister-a-novel-task-for-information
Repo
Framework

Proficiency Aware Multi-Agent Actor-Critic for Mixed Aerial and Ground Robot Teaming


Title	Proficiency Aware Multi-Agent Actor-Critic for Mixed Aerial and Ground Robot Teaming
Authors	Qifei Yu, Zhexin Shen, Yijiang Pang, Rui Liu
Abstract	Mixed Cooperation and competition are the actual scenarios of deploying multi-robot systems, such as the multi-UAV/UGV teaming for tracking criminal vehicles and protecting important individuals. Types and the total number of robot are all important factors that influence mixed cooperation quality. In various real-world environments, such as open space, forest, and urban building clusters, robot deployments have been influenced largely, as different robots have different configurations to support different environments. For example, UGVs are good at moving on the urban roads and reach the forest area while UAVs are good at flying in open space and around the high building clusters. However, it is challenging to design the collective behaviors for robot cooperation according to the dynamic changes in robot capabilities, working status, and environmental constraints. To solve this question, we proposed a novel proficiency-aware mixed environment multi-agent deep reinforcement learning (Mix-DRL). In Mix-DRL, robot capability and environment factors are formalized into the model to update the policy to model the nonlinear relations between heterogeneous team deployment strategies and the real-world environmental conditions. Mix-DRL can largely exploit robot capability while staying aware of the environment limitations. With the validation of a heterogeneous team with 2 UAVs and 2 UGVs in tasks, such as social security for criminal vehicle tracking, the Mix-DRL’s effectiveness has been evaluated with $14.20%$ of cooperation improvement. Given the general setting of Mix-DRL, it can be used to guide the general cooperation of UAVs and UGVs for multi-target tracking.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03910v1
PDF	https://arxiv.org/pdf/2002.03910v1.pdf
PWC	https://paperswithcode.com/paper/proficiency-aware-multi-agent-actor-critic
Repo
Framework

Training for Speech Recognition on Coprocessors


Title	Training for Speech Recognition on Coprocessors
Authors	Sebastian Baunsgaard, Sebastian B. Wrede, Pınar Tozun
Abstract	Automatic Speech Recognition (ASR) has increased in popularity in recent years. The evolution of processor and storage technologies has enabled more advanced ASR mechanisms, fueling the development of virtual assistants such as Amazon Alexa, Apple Siri, Microsoft Cortana, and Google Home. The interest in such assistants, in turn, has amplified the novel developments in ASR research. However, despite this popularity, there has not been a detailed training efficiency analysis of modern ASR systems. This mainly stems from: the proprietary nature of many modern applications that depend on ASR, like the ones listed above; the relatively expensive co-processor hardware that is used to accelerate ASR by big vendors to enable such applications; and the absence of well-established benchmarks. The goal of this paper is to address the latter two of these challenges. The paper first describes an ASR model, based on a deep neural network inspired by recent work in this domain, and our experiences building it. Then we evaluate this model on three CPU-GPU co-processor platforms that represent different budget categories. Our results demonstrate that utilizing hardware acceleration yields good results even without high-end equipment. While the most expensive platform (10X price of the least expensive one) converges to the initial accuracy target 10-30% and 60-70% faster than the other two, the differences among the platforms almost disappear at slightly higher accuracy targets. In addition, our results further highlight both the difficulty of evaluating ASR systems due to the complex, long, and resource intensive nature of the model training in this domain, and the importance of establishing benchmarks for ASR.
Tasks	Speech Recognition
Published	2020-03-22
URL	https://arxiv.org/abs/2003.12366v1
PDF	https://arxiv.org/pdf/2003.12366v1.pdf
PWC	https://paperswithcode.com/paper/training-for-speech-recognition-on
Repo
Framework


Title	Towards Deep Unsupervised SAR Despeckling with Blind-Spot Convolutional Neural Networks
Authors	Andrea Bordone Molini, Diego Valsesia, Giulia Fracastoro, Enrico Magli
Abstract	SAR despeckling is a problem of paramount importance in remote sensing, since it represents the first step of many scene analysis algorithms. Recently, deep learning techniques have outperformed classical model-based despeckling algorithms. However, such methods require clean ground truth images for training, thus resorting to synthetically speckled optical images since clean SAR images cannot be acquired. In this paper, inspired by recent works on blind-spot denoising networks, we propose a self-supervised Bayesian despeckling method. The proposed method is trained employing only noisy images and can therefore learn features of real SAR images rather than synthetic data. We show that the performance of the proposed network is very close to the supervised training approach on synthetic data and competitive on real data.
Tasks	Denoising
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05264v1
PDF	https://arxiv.org/pdf/2001.05264v1.pdf
PWC	https://paperswithcode.com/paper/towards-deep-unsupervised-sar-despeckling
Repo
Framework

Language Technology Programme for Icelandic 2019-2023


Title	Language Technology Programme for Icelandic 2019-2023
Authors	Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson
Abstract	In this paper, we describe a new national language technology programme for Icelandic. The programme, which spans a period of five years, aims at making Icelandic usable in communication and interactions in the digital world, by developing accessible, open-source language resources and software. The research and development work within the programme is carried out by a consortium of universities, institutions, and private companies, with a strong emphasis on cooperation between academia and industries. Five core projects will be the main content of the programme: language resources, speech recognition, speech synthesis, machine translation, and spell and grammar checking. We also describe other national language technology programmes and give an overview over the history of language technology in Iceland.
Tasks	Machine Translation, Speech Recognition, Speech Synthesis
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09244v1
PDF	https://arxiv.org/pdf/2003.09244v1.pdf
PWC	https://paperswithcode.com/paper/language-technology-programme-for-icelandic
Repo
Framework

AVR: Attention based Salient Visual Relationship Detection


Title	AVR: Attention based Salient Visual Relationship Detection
Authors	Jianming Lv, Qinzhe Xiao, Jiajie Zhong
Abstract	Visual relationship detection aims to locate objects in images and recognize the relationships between objects. Traditional methods treat all observed relationships in an image equally, which causes a relatively poor performance in the detection tasks on complex images with abundant visual objects and various relationships. To address this problem, we propose an attention based model, namely AVR, to achieve salient visual relationships based on both local and global context of the relationships. Specifically, AVR recognizes relationships and measures the attention on the relationships in the local context of an input image by fusing the visual features, semantic and spatial information of the relationships. AVR then applies the attention to assign important relationships with larger salient weights for effective information filtering. Furthermore, AVR is integrated with the priori knowledge in the global context of image datasets to improve the precision of relationship prediction, where the context is modeled as a heterogeneous graph to measure the priori probability of relationships based on the random walk algorithm. Comprehensive experiments are conducted to demonstrate the effectiveness of AVR in several real-world image datasets, and the results show that AVR outperforms state-of-the-art visual relationship detection methods significantly by up to $87.5%$ in terms of recall.
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07012v1
PDF	https://arxiv.org/pdf/2003.07012v1.pdf
PWC	https://paperswithcode.com/paper/avr-attention-based-salient-visual
Repo
Framework

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters


Title	Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
Authors	Jinlong Liu, Guoqing Jiang, Yunzhi Bai, Ting Chen, Huayan Wang
Abstract	As deep neural networks (DNNs) achieve tremendous success across many application domains, researchers tried to explore in many aspects on why they generalize well. In this paper, we provide a novel perspective on these issues using the gradient signal to noise ratio (GSNR) of parameters during training process of DNNs. The GSNR of a parameter is defined as the ratio between its gradient’s squared mean and variance, over the data distribution. Based on several approximations, we establish a quantitative relationship between model parameters’ GSNR and the generalization gap. This relationship indicates that larger GSNR during training process leads to better generalization performance. Moreover, we show that, different from that of shallow models (e.g. logistic regression, support vector machines), the gradient descent optimization dynamics of DNNs naturally produces large GSNR during training, which is probably the key to DNNs’ remarkable generalization ability.
Tasks
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07384v2
PDF	https://arxiv.org/pdf/2001.07384v2.pdf
PWC	https://paperswithcode.com/paper/understanding-why-neural-networks-generalize-1
Repo
Framework

Multiple Object Tracking by Flowing and Fusing


Title	Multiple Object Tracking by Flowing and Fusing
Authors	Jimuyang Zhang, Sanping Zhou, Xin Chang, Fangbin Wan, Jinjun Wang, Yang Wu, Dong Huang
Abstract	Most of Multiple Object Tracking (MOT) approaches compute individual target features for two subtasks: estimating target-wise motions and conducting pair-wise Re-Identification (Re-ID). Because of the indefinite number of targets among video frames, both subtasks are very difficult to scale up efficiently in end-to-end Deep Neural Networks (DNNs). In this paper, we design an end-to-end DNN tracking approach, Flow-Fuse-Tracker (FFT), that addresses the above issues with two efficient techniques: target flowing and target fusing. Specifically, in target flowing, a FlowTracker DNN module learns the indefinite number of target-wise motions jointly from pixel-level optical flows. In target fusing, a FuseTracker DNN module refines and fuses targets proposed by FlowTracker and frame-wise object detection, instead of trusting either of the two inaccurate sources of target proposal. Because FlowTracker can explore complex target-wise motion patterns and FuseTracker can refine and fuse targets from FlowTracker and detectors, our approach can achieve the state-of-the-art results on several MOT benchmarks. As an online MOT approach, FFT produced the top MOTA of 46.3 on the 2DMOT15, 56.5 on the MOT16, and 56.5 on the MOT17 tracking benchmarks, surpassing all the online and offline methods in existing publications.
Tasks	Multiple Object Tracking, Object Detection, Object Tracking
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11180v1
PDF	https://arxiv.org/pdf/2001.11180v1.pdf
PWC	https://paperswithcode.com/paper/multiple-object-tracking-by-flowing-and
Repo
Framework

Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation


Title	Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation
Authors	Haiyan Yin, Dingcheng Li, Xu Li, Ping Li
Abstract	Training generative models that can generate high-quality text with sufficient diversity is an important open problem for Natural Language Generation (NLG) community. Recently, generative adversarial models have been applied extensively on text generation tasks, where the adversarially trained generators alleviate the exposure bias experienced by conventional maximum likelihood approaches and result in promising generation quality. However, due to the notorious defect of mode collapse for adversarial training, the adversarially trained generators face a quality-diversity trade-off, i.e., the generator models tend to sacrifice generation diversity severely for increasing generation quality. In this paper, we propose a novel approach which aims to improve the performance of adversarial text generation via efficiently decelerating mode collapse of the adversarial training. To this end, we introduce a cooperative training paradigm, where a language model is cooperatively trained with the generator and we utilize the language model to efficiently shape the data distribution of the generator against mode collapse. Moreover, instead of engaging the cooperative update for the generator in a principled way, we formulate a meta learning mechanism, where the cooperative update to the generator serves as a high level meta task, with an intuition of ensuring the parameters of the generator after the adversarial update would stay resistant against mode collapse. In the experiment, we demonstrate our proposed approach can efficiently slow down the pace of mode collapse for the adversarial text generators. Overall, our proposed method is able to outperform the baseline approaches with significant margins in terms of both generation quality and diversity in the testified domains.
Tasks	Adversarial Text, Language Modelling, Meta-Learning, Text Generation
Published	2020-03-12
URL	https://arxiv.org/abs/2003.11530v1
PDF	https://arxiv.org/pdf/2003.11530v1.pdf
PWC	https://paperswithcode.com/paper/meta-cotgan-a-meta-cooperative-training
Repo
Framework

Hybrid Autoregressive Transducer (hat)


Title	Hybrid Autoregressive Transducer (hat)
Authors	Ehsan Variani, David Rybach, Cyril Allauzen, Michael Riley
Abstract	This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal language model that can be used to decide whether inference with an external language model is beneficial or not. This article also presents a finite context version of the HAT model that addresses the exposure bias problem and significantly simplifies the overall training and inference. We evaluate our proposed model on a large-scale voice search task. Our experiments show significant improvements in WER compared to the state-of-the-art approaches.
Tasks	Language Modelling, Speech Recognition
Published	2020-03-12
URL	https://arxiv.org/abs/2003.07705v1
PDF	https://arxiv.org/pdf/2003.07705v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-autoregressive-transducer-hat
Repo
Framework

Author2Vec: A Framework for Generating User Embedding


Title	Author2Vec: A Framework for Generating User Embedding
Authors	Xiaodong Wu, Weizhe Lin, Zhilin Wang, Elena Rastorgueva
Abstract	Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.
Tasks
Published	2020-03-17
URL	https://arxiv.org/abs/2003.11627v1
PDF	https://arxiv.org/pdf/2003.11627v1.pdf
PWC	https://paperswithcode.com/paper/author2vec-a-framework-for-generating-user
Repo
Framework

TDEFSI: Theory Guided Deep Learning Based Epidemic Forecasting with Synthetic Information


Title	TDEFSI: Theory Guided Deep Learning Based Epidemic Forecasting with Synthetic Information
Authors	Lijing Wang, Jiangzhuo Chen, Madhav Marathe
Abstract	Influenza-like illness (ILI) places a heavy social and economic burden on our society. Traditionally, ILI surveillance data is updated weekly and provided at a spatially coarse resolution. Producing timely and reliable high-resolution spatiotemporal forecasts for ILI is crucial for local preparedness and optimal interventions. We present TDEFSI (Theory Guided Deep Learning Based Epidemic Forecasting with Synthetic Information), an epidemic forecasting framework that integrates the strengths of deep neural networks and high-resolution simulations of epidemic processes over networks. TDEFSI yields accurate high-resolution spatiotemporal forecasts using low-resolution time series data. During the training phase, TDEFSI uses high-resolution simulations of epidemics that explicitly model spatial and social heterogeneity inherent in urban regions as one component of training data. We train a two-branch recurrent neural network model to take both within-season and between-season low-resolution observations as features, and output high-resolution detailed forecasts. The resulting forecasts are not just driven by observed data but also capture the intricate social, demographic and geographic attributes of specific urban regions and mathematical theories of disease propagation over networks. We focus on forecasting the incidence of ILI and evaluate TDEFSI’s performance using synthetic and real-world testing datasets at the state and county levels in the USA. The results show that, at the state level, our method achieves comparable/better performance than several state-of-the-art methods. At the county level, TDEFSI outperforms the other methods. The proposed method can be applied to other infectious diseases as well.
Tasks	Time Series
Published	2020-01-28
URL	https://arxiv.org/abs/2002.04663v1
PDF	https://arxiv.org/pdf/2002.04663v1.pdf
PWC	https://paperswithcode.com/paper/tdefsi-theory-guided-deep-learning-based
Repo
Framework

Tethered Aerial Visual Assistance


Title	Tethered Aerial Visual Assistance
Authors	Xuesu Xiao, Jan Dufek, Robin R. Murphy
Abstract	In this paper, an autonomous tethered Unmanned Aerial Vehicle (UAV) is developed into a visual assistant in a marsupial co-robots team, collaborating with a tele-operated Unmanned Ground Vehicle (UGV) for robot operations in unstructured or confined environments. These environments pose extreme challenges to the remote tele-operator due to the lack of sufficient situational awareness, mostly caused by the unstructuredness and confinement, stationary and limited field-of-view and lack of depth perception from the robot’s onboard cameras. To overcome these problems, a secondary tele-operated robot is used in current practices, who acts as a visual assistant and provides external viewpoints to overcome the perceptual limitations of the primary robot’s onboard sensors. However, a second tele-operated robot requires extra manpower and teamwork demand between primary and secondary operators. The manually chosen viewpoints tend to be subjective and sub-optimal. Considering these intricacies, we develop an autonomous tethered aerial visual assistant in place of the secondary tele-operated robot and operator, to reduce human robot ratio from 2:2 to 1:2. Using a fundamental viewpoint quality theory, a formal risk reasoning framework, and a newly developed tethered motion suite, our visual assistant is able to autonomously navigate to good-quality viewpoints in a risk-aware manner through unstructured or confined spaces with a tether. The developed marsupial co-robots team could improve tele-operation efficiency in nuclear operations, bomb squad, disaster robots, and other domains with novel tasks or highly occluded environments, by reducing manpower and teamwork demand, and achieving better visual assistance quality with trustworthy risk-aware motion.
Tasks
Published	2020-01-15
URL	https://arxiv.org/abs/2001.06347v1
PDF	https://arxiv.org/pdf/2001.06347v1.pdf
PWC	https://paperswithcode.com/paper/tethered-aerial-visual-assistance
Repo
Framework

Detecting impending malnutrition of elderly people in domestic smart home environments


Title	Detecting impending malnutrition of elderly people in domestic smart home environments
Authors	Björn Friedrich, Jürgen Bauer, Andreas Hein
Abstract	Proper nutrition is very important for the well-being and independence of elderly people. A significant loss of body weight or a decrease of the Body Mass Index respectively is an indicator for malnutrition. A continuous monitoring of the BMI enables doctors and nutritionists to intervene on impending malnutrition. However, continuous monitoring of the BMI by professionals is not applicable and self-monitoring not reliable. In this article a method for monitoring the trend of the BMI based on ambient sensors is introduced. The ambient sensors are used to measure the time a person spends for preparing meals at home. When the trend of the average time for 4 weeks changes, so does the trend of the BMI for those 4 weeks. Both values show a very strong correlation. Thus, the average time for preparing a meal is a suitable indicator for doctors and nutritionists to examine the patient further, become aware of an impending malnutrition, and intervene at an early stage of malnutrition. The method has been tested on a real-world dataset collected during a 10-month field study with 20 participants of an age of about 85 years.
Tasks
Published	2020-03-31
URL	https://arxiv.org/abs/2003.14159v1
PDF	https://arxiv.org/pdf/2003.14159v1.pdf
PWC	https://paperswithcode.com/paper/detecting-impending-malnutrition-of-elderly
Repo
Framework