April 2, 2020

2954 words 14 mins read

Paper Group ANR 212

Paper Group ANR 212

Fully reversible neural networks for large-scale surface and sub-surface characterization via remote sensing. Boosting Deep Face Recognition via Disentangling Appearance and Geometry. Robust Facial Landmark Detection via Aggregation on Geometrically Manipulated Faces. SalsaNext: Fast Semantic Segmentation of LiDAR Point Clouds for Autonomous Drivin …

Fully reversible neural networks for large-scale surface and sub-surface characterization via remote sensing

Title Fully reversible neural networks for large-scale surface and sub-surface characterization via remote sensing
Authors Bas Peters, Eldad Haber, Keegan Lensink
Abstract The large spatial/frequency scale of hyperspectral and airborne magnetic and gravitational data causes memory issues when using convolutional neural networks for (sub-) surface characterization. Recently developed fully reversible networks can mostly avoid memory limitations by virtue of having a low and fixed memory requirement for storing network states, as opposed to the typical linear memory growth with depth. Fully reversible networks enable the training of deep neural networks that take in entire data volumes, and create semantic segmentations in one go. This approach avoids the need to work in small patches or map a data patch to the class of just the central pixel. The cross-entropy loss function requires small modifications to work in conjunction with a fully reversible network and learn from sparsely sampled labels without ever seeing fully labeled ground truth. We show examples from land-use change detection from hyperspectral time-lapse data, and regional aquifer mapping from airborne geophysical and geological data.
Published 2020-03-16
URL https://arxiv.org/abs/2003.07474v1
PDF https://arxiv.org/pdf/2003.07474v1.pdf
PWC https://paperswithcode.com/paper/fully-reversible-neural-networks-for-large

Boosting Deep Face Recognition via Disentangling Appearance and Geometry

Title Boosting Deep Face Recognition via Disentangling Appearance and Geometry
Authors Ali Dabouei, Fariborz Taherkhani, Sobhan Soleymani, Jeremy Dawson, Nasser M. Nasrabadi
Abstract In this paper, we propose a framework for disentangling the appearance and geometry representations in the face recognition task. To provide supervision for this aim, we generate geometrically identical faces by incorporating spatial transformations. We demonstrate that the proposed approach enhances the performance of deep face recognition models by assisting the training process in two ways. First, it enforces the early and intermediate convolutional layers to learn more representative features that satisfy the properties of disentangled embeddings. Second, it augments the training set by altering faces geometrically. Through extensive experiments, we demonstrate that integrating the proposed approach into state-of-the-art face recognition methods effectively improves their performance on challenging datasets, such as LFW, YTF, and MegaFace. Both theoretical and practical aspects of the method are analyzed rigorously by concerning ablation studies and knowledge transfer tasks. Furthermore, we show that the knowledge leaned by the proposed method can favor other face-related tasks, such as attribute prediction.
Tasks Face Recognition, Transfer Learning
Published 2020-01-13
URL https://arxiv.org/abs/2001.04559v1
PDF https://arxiv.org/pdf/2001.04559v1.pdf
PWC https://paperswithcode.com/paper/boosting-deep-face-recognition-via

Robust Facial Landmark Detection via Aggregation on Geometrically Manipulated Faces

Title Robust Facial Landmark Detection via Aggregation on Geometrically Manipulated Faces
Authors Seyed Mehdi Iranmanesh, Ali Dabouei, Sobhan Soleymani, Hadi Kazemi, Nasser M. Nasrabadi
Abstract In this work, we present a practical approach to the problem of facial landmark detection. The proposed method can deal with large shape and appearance variations under the rich shape deformation. To handle the shape variations we equip our method with the aggregation of manipulated face images. The proposed framework generates different manipulated faces using only one given face image. The approach utilizes the fact that small but carefully crafted geometric manipulation in the input domain can fool deep face recognition models. We propose three different approaches to generate manipulated faces in which two of them perform the manipulations via adversarial attacks and the other one uses known transformations. Aggregating the manipulated faces provides a more robust landmark detection approach which is able to capture more important deformations and variations of the face shapes. Our approach is demonstrated its superiority compared to the state-of-the-art method on benchmark datasets AFLW, 300-W, and COFW.
Tasks Face Recognition, Facial Landmark Detection
Published 2020-01-07
URL https://arxiv.org/abs/2001.03113v1
PDF https://arxiv.org/pdf/2001.03113v1.pdf
PWC https://paperswithcode.com/paper/robust-facial-landmark-detection-via-1

SalsaNext: Fast Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving

Title SalsaNext: Fast Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving
Authors Tiago Cortinhal, George Tzelepis, Eren Erdal Aksoy
Abstract In this paper, we introduce SalsaNext for the semantic segmentation of a full 3D LiDAR point cloud in real-time. SalsaNext is the next version of SalsaNet [1] which has an encoder-decoder architecture where the encoder unit has a set of ResNet blocks and the decoder part combines upsampled features from the residual blocks. In contrast to SalsaNet, we have an additional layer in the encoder and decoder, introduce the context module, switch from stride convolution to average pooling and also apply central dropout treatment. To directly optimize the Jaccard index, we further combine the weighted cross-entropy loss with Lovasz-Softmax loss [2]. We provide a thorough quantitative evaluation on the Semantic-KITTI dataset [3], which demonstrates that the proposed SalsaNext outperforms other state-of-the-art semantic segmentation networks in terms of accuracy and computation time. We also release our source code https://github.com/TiagoCortinhal/SalsaNext.
Tasks Autonomous Driving, Semantic Segmentation
Published 2020-03-07
URL https://arxiv.org/abs/2003.03653v1
PDF https://arxiv.org/pdf/2003.03653v1.pdf
PWC https://paperswithcode.com/paper/salsanext-fast-semantic-segmentation-of-lidar

DP-Net: Dynamic Programming Guided Deep Neural Network Compression

Title DP-Net: Dynamic Programming Guided Deep Neural Network Compression
Authors Dingcheng Yang, Wenjian Yu, Ao Zhou, Haoyuan Mu, Gary Yao, Xiaoyi Wang
Abstract In this work, we propose an effective scheme (called DP-Net) for compressing the deep neural networks (DNNs). It includes a novel dynamic programming (DP) based algorithm to obtain the optimal solution of weight quantization and an optimization process to train a clustering-friendly DNN. Experiments showed that the DP-Net allows larger compression than the state-of-the-art counterparts while preserving accuracy. The largest 77X compression ratio on Wide ResNet is achieved by combining DP-Net with other compression techniques. Furthermore, the DP-Net is extended for compressing a robust DNN model with negligible accuracy loss. At last, a custom accelerator is designed on FPGA to speed up the inference computation with DP-Net.
Tasks Neural Network Compression, Quantization
Published 2020-03-21
URL https://arxiv.org/abs/2003.09615v1
PDF https://arxiv.org/pdf/2003.09615v1.pdf
PWC https://paperswithcode.com/paper/dp-net-dynamic-programming-guided-deep-neural

Think Locally, Act Globally: Federated Learning with Local and Global Representations

Title Think Locally, Act Globally: Federated Learning with Local and Global Representations
Authors Paul Pu Liang, Terrance Liu, Liu Ziyin, Ruslan Salakhutdinov, Louis-Philippe Morency
Abstract Federated learning is an emerging research paradigm to train models on private data distributed over multiple devices. A key challenge involves keeping device data private and training a global model only by communicating parameters and updates. Given the recent trend towards building larger models, deploying models in federated settings on real-world tasks is becoming increasingly difficult. To this end, we propose to augment federated learning with local representation learning on each device to learn useful and compact representations from raw data. As a result, the global model can be smaller since it only operates on local representations, reducing the number of communicated parameters. In addition, we show that local models provide flexibility in dealing with heterogeneous data and can be modified to learn fair representations that obfuscate protected attributes such as race, age, and gender. Finally, we support our empirical results with a theoretical analysis which shows that a combination of local and global models reduces both variance in the data as well as variance in data distributions across the devices.
Tasks Representation Learning, Visual Question Answering
Published 2020-01-06
URL https://arxiv.org/abs/2001.01523v2
PDF https://arxiv.org/pdf/2001.01523v2.pdf
PWC https://paperswithcode.com/paper/think-locally-act-globally-federated-learning

Automatic Location Type Classification From Social-Media Posts

Title Automatic Location Type Classification From Social-Media Posts
Authors Elad Kravi, Benny Kimelfeld, Yaron Kanza, Roi Reichart
Abstract We introduce the problem of Automatic Location Type Classification from social media posts. Our goal is to correctly associate a set of messages posted in a small radius around a given location with their corresponding location type, e.g., school, church, restaurant or museum. We provide a dataset of locations associated with tweets posted in close geographical proximity. We explore two approaches to the problem: (a) a pipeline approach where each message is first classified, and then the location associated with the message set is inferred from the individual message labels; and (b) a joint approach where the individual messages are simultaneously processed to yield the desired location type. Our results demonstrate the superiority of the joint approach. Moreover, we show that due to the unique structure of the problem, where weakly-related messages are jointly processed to yield a single final label, simpler linear classifiers outperform deep neural network alternatives that have shown superior in previous text classification tasks.
Tasks Text Classification
Published 2020-02-05
URL https://arxiv.org/abs/2002.01846v1
PDF https://arxiv.org/pdf/2002.01846v1.pdf
PWC https://paperswithcode.com/paper/automatic-location-type-classification-from

Empirical Studies on the Properties of Linear Regions in Deep Neural Networks

Title Empirical Studies on the Properties of Linear Regions in Deep Neural Networks
Authors Xiao Zhang, Dongrui Wu
Abstract A deep neural network (DNN) with piecewise linear activations can partition the input space into numerous small linear regions, where different linear functions are fitted. It is believed that the number of these regions represents the expressivity of the DNN. This paper provides a novel and meticulous perspective to look into DNNs: Instead of just counting the number of the linear regions, we study their local properties, such as the inspheres, the directions of the corresponding hyperplanes, the decision boundaries, and the relevance of the surrounding regions. We empirically observed that different optimization techniques lead to completely different linear regions, even though they result in similar classification accuracies. We hope our study can inspire the design of novel optimization techniques, and help discover and analyze the behaviors of DNNs.
Published 2020-01-04
URL https://arxiv.org/abs/2001.01072v2
PDF https://arxiv.org/pdf/2001.01072v2.pdf
PWC https://paperswithcode.com/paper/empirical-studies-on-the-properties-of-linear-1

MonoLayout: Amodal scene layout from a single image

Title MonoLayout: Amodal scene layout from a single image
Authors Kaustubh Mani, Swapnil Daga, Shubhika Garg, N. Sai Shankar, Krishna Murthy Jatavallabhula, K. Madhava Krishna
Abstract In this paper, we address the novel, highly challenging problem of estimating the layout of a complex urban driving scenario. Given a single color image captured from a driving platform, we aim to predict the bird’s-eye view layout of the road and other traffic participants. The estimated layout should reason beyond what is visible in the image, and compensate for the loss of 3D information due to projection. We dub this problem amodal scene layout estimation, which involves “hallucinating” scene layout for even parts of the world that are occluded in the image. To this end, we present MonoLayout, a deep neural network for real-time amodal scene layout estimation from a single image. We represent scene layout as a multi-channel semantic occupancy grid, and leverage adversarial feature learning to hallucinate plausible completions for occluded image parts. Due to the lack of fair baseline methods, we extend several state-of-the-art approaches for road-layout estimation and vehicle occupancy estimation in bird’s-eye view to the amodal setup for rigorous evaluation. By leveraging temporal sensor fusion to generate training labels, we significantly outperform current art over a number of datasets. On the KITTI and Argoverse datasets, we outperform all baselines by a significant margin. We also make all our annotations, and code publicly available. A video abstract of this paper is available https://www.youtube.com/watch?v=HcroGyo6yRQ .
Tasks Sensor Fusion
Published 2020-02-19
URL https://arxiv.org/abs/2002.08394v1
PDF https://arxiv.org/pdf/2002.08394v1.pdf
PWC https://paperswithcode.com/paper/monolayout-amodal-scene-layout-from-a-single

Scene Completeness-Aware Lidar Depth Completion for Driving Scenario

Title Scene Completeness-Aware Lidar Depth Completion for Driving Scenario
Authors Cho-Ying Wu, Ulrich Neumann
Abstract In this paper we propose Scene Completeness-Aware Depth Completion (SADC) to complete raw lidar scans into dense depth maps with fine whole scene structures. Recent sparse depth completion for lidar only focuses on the lower scenes and produce irregular estimations on the upper because existing datasets such as KITTI do not provide groundtruth for upper areas. These areas are considered less important because they are usually sky or trees and of less scene understanding interest. However, we argue that in several driving scenarios such as large trucks or cars with loads, objects could extend to upper parts of scenes, and thus depth maps with structured upper scene estimation are important for RGBD algorithms. SADC leverages stereo cameras, which have better scene completeness, and lidars, which are more precise, to perform sparse depth completion. To our knowledge, we are the first to focus on scene completeness of sparse depth completion. We validate our SADC on both depth estimate precision and scene-completeness on KITTI. Moreover, SADC only adds small extra computational cost upon base methods of stereo matching and lidar completion in terms of runtime and model size.
Tasks Depth Completion, Scene Understanding, Stereo Matching
Published 2020-03-15
URL https://arxiv.org/abs/2003.06945v2
PDF https://arxiv.org/pdf/2003.06945v2.pdf
PWC https://paperswithcode.com/paper/scene-completenesss-aware-lidar-depth

DeepCrashTest: Turning Dashcam Videos into Virtual Crash Tests for Automated Driving Systems

Title DeepCrashTest: Turning Dashcam Videos into Virtual Crash Tests for Automated Driving Systems
Authors Sai Krishna Bashetty, Heni Ben Amor, Georgios Fainekos
Abstract The goal of this paper is to generate simulations with real-world collision scenarios for training and testing autonomous vehicles. We use numerous dashcam crash videos uploaded on the internet to extract valuable collision data and recreate the crash scenarios in a simulator. We tackle the problem of extracting 3D vehicle trajectories from videos recorded by an unknown and uncalibrated monocular camera source using a modular approach. A working architecture and demonstration videos along with the open-source implementation are provided with the paper.
Tasks Autonomous Vehicles
Published 2020-03-26
URL https://arxiv.org/abs/2003.11766v1
PDF https://arxiv.org/pdf/2003.11766v1.pdf
PWC https://paperswithcode.com/paper/deepcrashtest-turning-dashcam-videos-into

Incorporating Joint Embeddings into Goal-Oriented Dialogues with Multi-Task Learning

Title Incorporating Joint Embeddings into Goal-Oriented Dialogues with Multi-Task Learning
Authors Firas Kassawat, Debanjan Chaudhuri, Jens Lehmann
Abstract Attention-based encoder-decoder neural network models have recently shown promising results in goal-oriented dialogue systems. However, these models struggle to reason over and incorporate state-full knowledge while preserving their end-to-end text generation functionality. Since such models can greatly benefit from user intent and knowledge graph integration, in this paper we propose an RNN-based end-to-end encoder-decoder architecture which is trained with joint embeddings of the knowledge graph and the corpus as input. The model provides an additional integration of user intent along with text generation, trained with a multi-task learning paradigm along with an additional regularization technique to penalize generating the wrong entity as output. The model further incorporates a Knowledge Graph entity lookup during inference to guarantee the generated output is state-full based on the local knowledge graph provided. We finally evaluated the model using the BLEU score, empirical evaluation depicts that our proposed architecture can aid in the betterment of task-oriented dialogue system`s performance. |
Tasks Goal-Oriented Dialogue Systems, Multi-Task Learning, Text Generation
Published 2020-01-28
URL https://arxiv.org/abs/2001.10468v1
PDF https://arxiv.org/pdf/2001.10468v1.pdf
PWC https://paperswithcode.com/paper/incorporating-joint-embeddings-into-goal

Attention over Parameters for Dialogue Systems

Title Attention over Parameters for Dialogue Systems
Authors Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, Jamin Shin, Pascale Fung
Abstract Dialogue systems require a great deal of different but complementary expertise to assist, inform, and entertain humans. For example, different domains (e.g., restaurant reservation, train ticket booking) of goal-oriented dialogue systems can be viewed as different skills, and so does ordinary chatting abilities of chit-chat dialogue systems. In this paper, we propose to learn a dialogue system that independently parameterizes different dialogue skills, and learns to select and combine each of them through Attention over Parameters (AoP). The experimental results show that this approach achieves competitive performance on a combined dataset of MultiWOZ, In-Car Assistant, and Persona-Chat. Finally, we demonstrate that each dialogue skill is effectively learned and can be combined with other skills to produce selective responses.
Tasks Goal-Oriented Dialogue Systems
Published 2020-01-07
URL https://arxiv.org/abs/2001.01871v2
PDF https://arxiv.org/pdf/2001.01871v2.pdf
PWC https://paperswithcode.com/paper/attention-over-parameters-for-dialogue

PANDA: A Gigapixel-level Human-centric Video Dataset

Title PANDA: A Gigapixel-level Human-centric Video Dataset
Authors Xueyang Wang, Xiya Zhang, Yinheng Zhu, Yuchen Guo, Xiaoyun Yuan, Liuyu Xiang, Zerun Wang, Guiguang Ding, David J Brady, Qionghai Dai, Lu Fang
Abstract We present PANDA, the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world scenes with both wide field-of-view (~1 square kilometer area) and high-resolution details (~gigapixel-level/frame). The scenes may contain 4k head counts with over 100x scale variation. PANDA provides enriched and hierarchical ground-truth annotations, including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions. We benchmark the human detection and tracking tasks. Due to the vast variance of pedestrian pose, scale, occlusion and trajectory, existing approaches are challenged by both accuracy and efficiency. Given the uniqueness of PANDA with both wide FoV and high resolution, a new task of interaction-aware group detection is introduced. We design a ‘global-to-local zoom-in’ framework, where global trajectories and local interactions are simultaneously encoded, yielding promising results. We believe PANDA will contribute to the community of artificial intelligence and praxeology by understanding human behaviors and interactions in large-scale real-world scenes. PANDA Website: http://www.panda-dataset.com.
Tasks Human Detection
Published 2020-03-10
URL https://arxiv.org/abs/2003.04852v1
PDF https://arxiv.org/pdf/2003.04852v1.pdf
PWC https://paperswithcode.com/paper/panda-a-gigapixel-level-human-centric-video

Human Apprenticeship Learning via Kernel-based Inverse Reinforcement Learning

Title Human Apprenticeship Learning via Kernel-based Inverse Reinforcement Learning
Authors Mark A. Rucker, Layne T. Watson, Laura E. Barnes, Matthew S. Gerber
Abstract This paper considers if a reward function learned via inverse reinforcement from a human expert can be used as a feedback intervention to alter future human performance as desired (i.e., human to human apprenticeship learning). To learn reward functions two new algorithms are developed: a kernel-based inverse reinforcement learning algorithm and a Monte Carlo reinforcement learning algorithm. The algorithms are benchmarked against well-known alternatives within their respective corpus and are shown to outperform in terms of efficiency and optimality. To test the feedback intervention two randomized experiments are performed with 3,256 human participants. The experimental results demonstrate with significance that the rewards learned from “expert” individuals are effective as feedback interventions. In addition to the algorithmic contributions and successful experiments, the paper also describes three reward function modifications to improve reward function feedback interventions for humans.
Published 2020-02-25
URL https://arxiv.org/abs/2002.10904v1
PDF https://arxiv.org/pdf/2002.10904v1.pdf
PWC https://paperswithcode.com/paper/human-apprenticeship-learning-via-kernel
comments powered by Disqus