January 31, 2020

3322 words 16 mins read

Paper Group ANR 132

Symbiotic Graph Neural Networks for 3D Skeleton-based Human Action Recognition and Motion Prediction. Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects. Learning Spatiotemporal Features of Ride-sourcing Services with Fusion Convolutional Network. Learned Quality Enhancement via Multi-Frame Priors for HEVC Compliant Low-Delay Appli …

Symbiotic Graph Neural Networks for 3D Skeleton-based Human Action Recognition and Motion Prediction


Title	Symbiotic Graph Neural Networks for 3D Skeleton-based Human Action Recognition and Motion Prediction
Authors	Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, Qi Tian
Abstract	3D skeleton-based action recognition and motion prediction are two essential problems of human activity understanding. In many previous works: 1) they studied two tasks separately, neglecting internal correlations; 2) they did not capture sufficient relations inside the body. To address these issues, we propose a symbiotic model to handle two tasks jointly; and we propose two scales of graphs to explicitly capture relations among body-joints and body-parts. Together, we propose symbiotic graph neural networks, which contain a backbone, an action-recognition head, and a motion-prediction head. Two heads are trained jointly and enhance each other. For the backbone, we propose multi-branch multi-scale graph convolution networks to extract spatial and temporal features. The multi-scale graph convolution networks are based on joint-scale and part-scale graphs. The joint-scale graphs contain actional graphs, capturing action-based relations, and structural graphs, capturing physical constraints. The part-scale graphs integrate body-joints to form specific parts, representing high-level relations. Moreover, dual bone-based graphs and networks are proposed to learn complementary features. We conduct extensive experiments for skeleton-based action recognition and motion prediction with four datasets, NTU-RGB+D, Kinetics, Human3.6M, and CMU Mocap. Experiments show that our symbiotic graph neural networks achieve better performances on both tasks compared to the state-of-the-art methods.
Tasks	motion prediction, Skeleton Based Action Recognition, Temporal Action Localization
Published	2019-10-05
URL	https://arxiv.org/abs/1910.02212v1
PDF	https://arxiv.org/pdf/1910.02212v1.pdf
PWC	https://paperswithcode.com/paper/symbiotic-graph-neural-networks-for-3d
Repo
Framework

Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects


Title	Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects
Authors	Ali Cheraghian, Shafin Rahman, Dylan Campbell, Lars Petersson
Abstract	The development of advanced 3D sensors has enabled many objects to be captured in the wild at a large scale, and a 3D object recognition system may therefore encounter many objects for which the system has received no training. Zero-Shot Learning (ZSL) approaches can assist such systems in recognizing previously unseen objects. Applying ZSL to 3D point cloud objects is an emerging topic in the area of 3D vision, however, a significant problem that ZSL often suffers from is the so-called hubness problem, which is when a model is biased to predict only a few particular labels for most of the test instances. We observe that this hubness problem is even more severe for 3D recognition than for 2D recognition. One reason for this is that in 2D one can use pre-trained networks trained on large datasets like ImageNet, which produces high-quality features. However, in the 3D case there are no such large-scale, labelled datasets available for pre-training which means that the extracted 3D features are of poorer quality which, in turn, exacerbates the hubness problem. In this paper, we therefore propose a loss to specifically address the hubness problem. Our proposed method is effective for both Zero-Shot and Generalized Zero-Shot Learning, and we perform extensive evaluations on the challenging datasets ModelNet40, ModelNet10, McGill and SHREC2015. A new state-of-the-art result for both zero-shot tasks in the 3D case is established.
Tasks	3D Object Recognition, Object Recognition, Zero-Shot Learning
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06371v1
PDF	https://arxiv.org/pdf/1907.06371v1.pdf
PWC	https://paperswithcode.com/paper/mitigating-the-hubness-problem-for-zero-shot
Repo
Framework

Learning Spatiotemporal Features of Ride-sourcing Services with Fusion Convolutional Network


Title	Learning Spatiotemporal Features of Ride-sourcing Services with Fusion Convolutional Network
Authors	Dapeng Zhang, Feng Xiao, Lu Li, Gang Kou
Abstract	In order to collectively forecast the demand of ride-sourcing services in all regions of a city, convolutional neural networks (CNNs) have been applied with commendable results. However, local statistical differences throughout the geographical layout of the city make the spatial stationarity assumption of the convolution invalid, which limits the performance of CNNs on demand forecasting task. Hence, we propose a novel deep learning framework called LC-ST-FCN (locally-connected spatiotemporal fully-convolutional neural network) that consists of a stack of 3D convolutional layers, 2D (standard) convolutional layers, and locally connected convolutional layers. This fully convolutional architecture maintains the spatial coordinates of the input and no spatial information is lost between layers. Features are fused across layers to define a tunable nonlinear local-to-global-to-local representation, where both global and local statistics can be learned to improve predictive performance. Furthermore, as the local statistics vary from region to region, the arithmetic-mean-based metrics frequently used in spatial stationarity situations cannot effectively evaluate the models. We propose a weighted-arithmetic approach to deal with this situation. In the experiments, a real dataset from a ride-sourcing service platform (DiDiChuxing) is used, which demonstrates the effectiveness and superiority of our proposed model and evaluation method.
Tasks
Published	2019-04-15
URL	http://arxiv.org/abs/1904.06823v1
PDF	http://arxiv.org/pdf/1904.06823v1.pdf
PWC	https://paperswithcode.com/paper/learning-spatiotemporal-features-of-ride
Repo
Framework

Learned Quality Enhancement via Multi-Frame Priors for HEVC Compliant Low-Delay Applications


Title	Learned Quality Enhancement via Multi-Frame Priors for HEVC Compliant Low-Delay Applications
Authors	Ming Lu, Ming Cheng, Yiling Xu, Shiliang Pu, Qiu Shen, Zhan Ma
Abstract	Networked video applications, e.g., video conferencing, often suffer from poor visual quality due to unexpected network fluctuation and limited bandwidth. In this paper, we have developed a Quality Enhancement Network (QENet) to reduce the video compression artifacts, leveraging the spatial and temporal priors generated by respective multi-scale convolutions spatially and warped temporal predictions in a recurrent fashion temporally. We have integrated this QENet as a standard-alone post-processing subsystem to the High Efficiency Video Coding (HEVC) compliant decoder. Experimental results show that our QENet demonstrates the state-of-the-art performance against default in-loop filters in HEVC and other deep learning based methods with noticeable objective gains in Peak-Signal-to-Noise Ratio (PSNR) and subjective gains visually.
Tasks	Video Compression
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01025v1
PDF	https://arxiv.org/pdf/1905.01025v1.pdf
PWC	https://paperswithcode.com/paper/learned-quality-enhancement-via-multi-frame
Repo
Framework

Are State-of-the-art Visual Place Recognition Techniques any Good for Aerial Robotics?


Title	Are State-of-the-art Visual Place Recognition Techniques any Good for Aerial Robotics?
Authors	Mubariz Zaffar, Ahmad Khaliq, Shoaib Ehsan, Michael Milford, Kostas Alexis, Klaus McDonald-Maier
Abstract	Visual Place Recognition (VPR) has seen significant advances at the frontiers of matching performance and computational superiority over the past few years. However, these evaluations are performed for ground-based mobile platforms and cannot be generalized to aerial platforms. The degree of viewpoint variation experienced by aerial robots is complex, with their processing power and on-board memory limited by payload size and battery ratings. Therefore, in this paper, we collect $8$ state-of-the-art VPR techniques that have been previously evaluated for ground-based platforms and compare them on $2$ recently proposed aerial place recognition datasets with three prime focuses: a) Matching performance b) Processing power consumption c) Projected memory requirements. This gives a birds-eye view of the applicability of contemporary VPR research to aerial robotics and lays down the the nature of challenges for aerial-VPR.
Tasks	Visual Place Recognition
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07967v2
PDF	https://arxiv.org/pdf/1904.07967v2.pdf
PWC	https://paperswithcode.com/paper/are-state-of-the-art-visual-place-recognition
Repo
Framework

Image and Video Compression with Neural Networks: A Review


Title	Image and Video Compression with Neural Networks: A Review
Authors	Siwei Ma, Xinfeng Zhang, Chuanmin Jia, Zhenghui Zhao, Shiqi Wang, Shanshe Wang
Abstract	In recent years, the image and video coding technologies have advanced by leaps and bounds. However, due to the popularization of image and video acquisition devices, the growth rate of image and video data is far beyond the improvement of the compression ratio. In particular, it has been widely recognized that there are increasing challenges of pursuing further coding performance improvement within the traditional hybrid coding framework. Deep convolution neural network (CNN) which makes the neural network resurge in recent years and has achieved great success in both artificial intelligent and signal processing fields, also provides a novel and promising solution for image and video compression. In this paper, we provide a systematic, comprehensive and up-to-date review of neural network based image and video compression techniques. The evolution and development of neural network based compression methodologies are introduced for images and video respectively. More specifically, the cutting-edge video coding techniques by leveraging deep learning and HEVC framework are presented and discussed, which promote the state-of-the-art video coding performance substantially. Moreover, the end-to-end image and video coding frameworks based on neural networks are also reviewed, revealing interesting explorations on next generation image and video coding frameworks/standards. The most significant research works on the image and video coding related topics using neural networks are highlighted, and future trends are also envisioned. In particular, the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision, which are the two dominant signal receptor in the age of artificial intelligence.
Tasks	Video Compression
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03567v2
PDF	http://arxiv.org/pdf/1904.03567v2.pdf
PWC	https://paperswithcode.com/paper/image-and-video-compression-with-neural
Repo
Framework

Localizing Discriminative Visual Landmarks for Place Recognition


Title	Localizing Discriminative Visual Landmarks for Place Recognition
Authors	Zhe Xin, Yinghao Cai, Tao Lu, Xiaoxia Xing, Shaojun Cai, Jixiang Zhang, Yiping Yang, Yanqing Wang
Abstract	We address the problem of visual place recognition with perceptual changes. The fundamental problem of visual place recognition is generating robust image representations which are not only insensitive to environmental changes but also distinguishable to different places. Taking advantage of the feature extraction ability of Convolutional Neural Networks (CNNs), we further investigate how to localize discriminative visual landmarks that positively contribute to the similarity measurement, such as buildings and vegetations. In particular, a Landmark Localization Network (LLN) is designed to indicate which regions of an image are used for discrimination. Detailed experiments are conducted on open source datasets with varied appearance and viewpoint changes. The proposed approach achieves superior performance against state-of-the-art methods.
Tasks	Visual Place Recognition
Published	2019-04-14
URL	http://arxiv.org/abs/1904.06635v1
PDF	http://arxiv.org/pdf/1904.06635v1.pdf
PWC	https://paperswithcode.com/paper/localizing-discriminative-visual-landmarks
Repo
Framework

Aerial Images Processing for Car Detection using Convolutional Neural Networks: Comparison between Faster R-CNN and YoloV3


Title	Aerial Images Processing for Car Detection using Convolutional Neural Networks: Comparison between Faster R-CNN and YoloV3
Authors	Adel Ammar, Anis Koubaa, Mohanned Ahmed, Abdulrahman Saad
Abstract	In this paper, we address the problem of car detection from aerial images using Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerial images are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN, which is the most popular region-based algorithm, and YOLOv3, which is known to be the fastest detection algorithm. We analyze two datasets with different characteristics to check the impact of various factors, such as UAV’s altitude, camera resolution, and object size. The objective of this work is to conduct a robust comparison between these two cutting-edge algorithms. By using a variety of metrics, we show that none of the two algorithms outperforms the other in all cases.
Tasks	Object Detection
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07234v1
PDF	https://arxiv.org/pdf/1910.07234v1.pdf
PWC	https://paperswithcode.com/paper/aerial-images-processing-for-car-detection
Repo
Framework

Deep Predictive Video Compression with Bi-directional Prediction


Title	Deep Predictive Video Compression with Bi-directional Prediction
Authors	Woonsung Park, Munchurl Kim
Abstract	Recently, deep image compression has shown a big progress in terms of coding efficiency and image quality improvement. However, relatively less attention has been put on video compression using deep learning networks. In the paper, we first propose a deep learning based bi-predictive coding network, called BP-DVC Net, for video compression. Learned from the lesson of the conventional video coding, a B-frame coding structure is incorporated in our BP-DVC Net. While the bi-predictive coding in the conventional video codecs requires to transmit to decoder sides the motion vectors for block motion and the residues from prediction, our BP-DVC Net incorporates optical flow estimation networks in both encoder and decoder sides so as not to transmit the motion information to the decoder sides for coding efficiency improvement. Also, a bi-prediction network in the BP-DVC Net is proposed and used to precisely predict the current frame and to yield the resulting residues as small as possible. Furthermore, our BP-DVC Net allows for the compressive feature maps to be entropy-coded using the temporal context among the feature maps of adjacent frames. The BP-DVC Net has an end-to-end video compression architecture with newly designed flow and prediction losses. Experimental results show that the compression performance of our proposed method is comparable to those of H.264, HEVC in terms of PSNR and MS-SSIM.
Tasks	Image Compression, Optical Flow Estimation, Video Compression
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02909v1
PDF	http://arxiv.org/pdf/1904.02909v1.pdf
PWC	https://paperswithcode.com/paper/deep-predictive-video-compression-with-bi
Repo
Framework

Neural Grammatical Error Correction with Finite State Transducers


Title	Neural Grammatical Error Correction with Finite State Transducers
Authors	Felix Stahlberg, Christopher Bryant, Bill Byrne
Abstract	Grammatical error correction (GEC) is one of the areas in natural language processing in which purely neural models have not yet superseded more traditional symbolic models. Hybrid systems combining phrase-based statistical machine translation (SMT) and neural sequence models are currently among the most effective approaches to GEC. However, both SMT and neural sequence-to-sequence models require large amounts of annotated data. Language model based GEC (LM-GEC) is a promising alternative which does not rely on annotated training data. We show how to improve LM-GEC by applying modelling techniques based on finite state transducers. We report further gains by rescoring with neural language models. We show that our methods developed for LM-GEC can also be used with SMT systems if annotated training data is available. Our best system outperforms the best published result on the CoNLL-2014 test set, and achieves far better relative improvements over the SMT baselines than previous hybrid systems.
Tasks	Grammatical Error Correction, Language Modelling, Machine Translation
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10625v2
PDF	http://arxiv.org/pdf/1903.10625v2.pdf
PWC	https://paperswithcode.com/paper/neural-grammatical-error-correction-with
Repo
Framework

Learning to Memorize in Neural Task-Oriented Dialogue Systems


Title	Learning to Memorize in Neural Task-Oriented Dialogue Systems
Authors	Chien-Sheng Wu
Abstract	In this thesis, we leverage the neural copy mechanism and memory-augmented neural networks (MANNs) to address existing challenge of neural task-oriented dialogue learning. We show the effectiveness of our strategy by achieving good performance in multi-domain dialogue state tracking, retrieval-based dialogue systems, and generation-based dialogue systems. We first propose a transferable dialogue state generator (TRADE) that leverages its copy mechanism to get rid of dialogue ontology and share knowledge between domains. We also evaluate unseen domain dialogue state tracking and show that TRADE enables zero-shot dialogue state tracking and can adapt to new few-shot domains without forgetting the previous domains. Second, we utilize MANNs to improve retrieval-based dialogue learning. They are able to capture dialogue sequential dependencies and memorize long-term information. We also propose a recorded delexicalization copy strategy to replace real entity values with ordered entity types. Our models are shown to surpass other retrieval baselines, especially when the conversation has a large number of turns. Lastly, we tackle generation-based dialogue learning with two proposed models, the memory-to-sequence (Mem2Seq) and global-to-local memory pointer network (GLMP). Mem2Seq is the first model to combine multi-hop memory attention with the idea of the copy mechanism. GLMP further introduces the concept of response sketching and double pointers copying. We show that GLMP achieves the state-of-the-art performance on human evaluation.
Tasks	Dialogue State Tracking, Task-Oriented Dialogue Systems
Published	2019-05-19
URL	https://arxiv.org/abs/1905.07687v1
PDF	https://arxiv.org/pdf/1905.07687v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-memorize-in-neural-task-oriented
Repo
Framework

Improving Dialogue State Tracking by Discerning the Relevant Context


Title	Improving Dialogue State Tracking by Discerning the Relevant Context
Authors	Sanuj Sharma, Prafulla Kumar Choubey, Ruihong Huang
Abstract	A typical conversation comprises of multiple turns between participants where they go back-and-forth between different topics. At each user turn, dialogue state tracking (DST) aims to estimate user’s goal by processing the current utterance. However, in many turns, users implicitly refer to the previous goal, necessitating the use of relevant dialogue history. Nonetheless, distinguishing relevant history is challenging and a popular method of using dialogue recency for that is inefficient. We, therefore, propose a novel framework for DST that identifies relevant historical context by referring to the past utterances where a particular slot-value changes and uses that together with weighted system utterance to identify the relevant context. Specifically, we use the current user utterance and the most recent system utterance to determine the relevance of a system utterance. Empirical analyses show that our method improves joint goal accuracy by 2.75% and 2.36% on WoZ 2.0 and MultiWoZ 2.0 restaurant domain datasets respectively over the previous state-of-the-art GLAD model.
Tasks	Dialogue State Tracking
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02800v1
PDF	http://arxiv.org/pdf/1904.02800v1.pdf
PWC	https://paperswithcode.com/paper/improving-dialogue-state-tracking-by
Repo
Framework

Scaling Multi-Domain Dialogue State Tracking via Query Reformulation


Title	Scaling Multi-Domain Dialogue State Tracking via Query Reformulation
Authors	Pushpendre Rastogi, Arpit Gupta, Tongfei Chen, Lambert Mathias
Abstract	We present a novel approach to dialogue state tracking and referring expression resolution tasks. Successful contextual understanding of multi-turn spoken dialogues requires resolving referring expressions across turns and tracking the entities relevant to the conversation across turns. Tracking conversational state is particularly challenging in a multi-domain scenario when there exist multiple spoken language understanding (SLU) sub-systems, and each SLU sub-system operates on its domain-specific meaning representation. While previous approaches have addressed the disparate schema issue by learning candidate transformations of the meaning representation, in this paper, we instead model the reference resolution as a dialogue context-aware user query reformulation task – the dialog state is serialized to a sequence of natural language tokens representing the conversation. We develop our model for query reformulation using a pointer-generator network and a novel multi-task learning setup. In our experiments, we show a significant improvement in absolute F1 on an internal as well as a, soon to be released, public benchmark respectively.
Tasks	Dialogue State Tracking, Multi-Task Learning, Spoken Language Understanding
Published	2019-03-12
URL	http://arxiv.org/abs/1903.05164v3
PDF	http://arxiv.org/pdf/1903.05164v3.pdf
PWC	https://paperswithcode.com/paper/scaling-multi-domain-dialogue-state-tracking
Repo
Framework

AutoBlock: A Hands-off Blocking Framework for Entity Matching


Title	AutoBlock: A Hands-off Blocking Framework for Entity Matching
Authors	Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, David Page
Abstract	Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human effort in cleaning data and designing blocking keys. In this paper, we propose AutoBlock, a novel hands-off blocking framework for entity matching, based on similarity-preserving representation learning and nearest neighbor search. Our contributions include: (a) Automation: AutoBlock frees users from laborious data cleaning and blocking key tuning. (b) Scalability: AutoBlock has a sub-quadratic total time complexity and can be easily deployed for millions of records. (c) Effectiveness: AutoBlock outperforms a wide range of competitive baselines on multiple large-scale, real-world datasets, especially when datasets are dirty and/or unstructured.
Tasks	Representation Learning
Published	2019-12-07
URL	https://arxiv.org/abs/1912.03417v1
PDF	https://arxiv.org/pdf/1912.03417v1.pdf
PWC	https://paperswithcode.com/paper/autoblock-a-hands-off-blocking-framework-for
Repo
Framework

Simulator-based training of generative models for the inverse design of metasurfaces


Title	Simulator-based training of generative models for the inverse design of metasurfaces
Authors	Jiaqi Jiang, Jonathan A. Fan
Abstract	Metasurfaces are subwavelength-structured artificial media that can shape and localize electromagnetic waves in unique ways. The inverse design of these devices is a non-convex optimization problem in a high dimensional space, making global optimization a major challenge. We present a new type of population-based global optimization algorithm for metasurfaces that is enabled by the training of a generative neural network. The loss function used for backpropagation depends on the generated pattern layouts, their efficiencies, and efficiency gradients, which are calculated by the adjoint variables method using forward and adjoint electromagnetic simulations. We observe that the distribution of devices generated by the network continuously shifts towards high performance design space regions over the course of optimization. Upon training completion, the best generated devices have efficiencies comparable to or exceeding the best devices designed using standard topology optimization. Our proposed global optimization algorithm can generally apply to other gradient-based optimization problems in optics, mechanics and electronics.
Tasks
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07843v4
PDF	https://arxiv.org/pdf/1906.07843v4.pdf
PWC	https://paperswithcode.com/paper/dataless-training-of-generative-models-for
Repo
Framework