April 3, 2020

# Paper Group AWR 80

Learning Human-Object Interaction Detection using Interaction Points. Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks. Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception. Unpaired Multi-modal Segmentation via Knowledge Distillation. Zero-Reference Deep Curve Estimation for Low- …

#### Learning Human-Object Interaction Detection using Interaction Points

Title Learning Human-Object Interaction Detection using Interaction Points
Authors Tiancai Wang, Tong Yang, Martin Danelljan, Fahad Shahbaz Khan, Xiangyu Zhang, Jian Sun
Abstract Understanding interactions between humans and objects is one of the fundamental problems in visual classification and an essential step towards detailed scene understanding. Human-object interaction (HOI) detection strives to localize both the human and an object as well as the identification of complex interactions between them. Most existing HOI detection approaches are instance-centric where interactions between all possible human-object pairs are predicted based on appearance features and coarse spatial information. We argue that appearance features alone are insufficient to capture complex human-object interactions. In this paper, we therefore propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs. Our network predicts interaction points, which directly localize and classify the inter-action. Paired with the densely predicted interaction vectors, the interactions are associated with human and object detections to obtain final predictions. To the best of our knowledge, we are the first to propose an approach where HOI detection is posed as a keypoint detection and grouping problem. Experiments are performed on two popular benchmarks: V-COCO and HICO-DET. Our approach sets a new state-of-the-art on both datasets. Code is available at https://github.com/vaesl/IP-Net.
Tasks Human-Object Interaction Detection, Keypoint Detection, Scene Understanding
Published 2020-03-31
URL https://arxiv.org/abs/2003.14023v1
PDF https://arxiv.org/pdf/2003.14023v1.pdf
PWC https://paperswithcode.com/paper/learning-human-object-interaction-detection
Repo https://github.com/vaesl/IP-Net
Framework none

#### Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

Title Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
Authors Wenguan Wang, Xiankai Lu, Jianbing Shen, David Crandall, Ling Shao
Abstract This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this task as a process of iterative information fusion over video graphs. Specifically, AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. The underlying pair-wise relations are described by a differentiable attention mechanism. Through parametric message passing, AGNN is able to efficiently capture and mine much richer and higher-order relations between video frames, thus enabling a more complete understanding of video content and more accurate foreground estimation. Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case. To further demonstrate the generalizability of our framework, we extend AGNN to an additional task: image object co-segmentation (IOCS). We perform experiments on two famous IOCS datasets and observe again the superiority of our AGNN model. The extensive experiments verify that AGNN is able to learn the underlying semantic/appearance relationships among video frames or related images, and discover the common objects.
Tasks Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2020-01-19
URL https://arxiv.org/abs/2001.06807v1
PDF https://arxiv.org/pdf/2001.06807v1.pdf
PWC https://paperswithcode.com/paper/zero-shot-video-object-segmentation-via-1
Repo https://github.com/carrierlxk/AGNN
Framework pytorch

#### Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception

Title Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception
Authors Md Jahidul Islam, Peigen Luo, Junaed Sattar
Abstract In this paper, we introduce and tackle the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision and provide an efficient solution for near real-time applications. We present Deep SESR, a residual-in-residual network-based generative model that can learn to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution. We supervise its training by formulating a multi-modal objective function that addresses the chrominance-specific underwater color degradation, lack of image sharpness, and loss in high-level feature representation. It is also supervised to learn salient foreground regions in the image, which in turn guides the network to learn global contrast enhancement. We design an end-to-end training pipeline to jointly learn the saliency prediction and SESR on a shared hierarchical feature space for fast inference. Moreover, we present UFO-120, the first dataset to facilitate large-scale SESR learning; it contains over 1500 training samples and a benchmark test set of 120 samples. By thorough experimental evaluation on the UFO-120 and other standard datasets, we demonstrate that Deep SESR outperforms the existing solutions for underwater image enhancement and super-resolution. We also validate its generalization performance on several test cases that include underwater images with diverse spectral and spatial degradation levels, and also terrestrial images with unseen natural objects. Lastly, we analyze its computational feasibility for single-board deployments and demonstrate its operational benefits for visually-guided underwater robots. The model and dataset information will be available at: https://github.com/xahidbuffon/Deep-SESR.
Tasks Image Enhancement, Saliency Prediction, Super-Resolution
Published 2020-02-04
URL https://arxiv.org/abs/2002.01155v1
PDF https://arxiv.org/pdf/2002.01155v1.pdf
PWC https://paperswithcode.com/paper/simultaneous-enhancement-and-super-resolution
Repo https://github.com/xahidbuffon/Deep-SESR
Framework none

#### Unpaired Multi-modal Segmentation via Knowledge Distillation

Title Unpaired Multi-modal Segmentation via Knowledge Distillation
Authors Qi Dou, Quande Liu, Pheng Ann Heng, Ben Glocker
Abstract Multi-modal learning is typically performed with network architectures containing modality-specific layers and shared layers, utilizing co-registered images of different modalities. We propose a novel learning scheme for unpaired cross-modality image segmentation, with a highly compact architecture achieving superior segmentation accuracy. In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI, and only employ modality-specific internal normalization layers which compute respective statistics. To effectively train such a highly compact model, we introduce a novel loss term inspired by knowledge distillation, by explicitly constraining the KL-divergence of our derived prediction distributions between modalities. We have extensively validated our approach on two multi-class segmentation problems: i) cardiac structure segmentation, and ii) abdominal organ segmentation. Different network settings, i.e., 2D dilated network and 3D U-net, are utilized to investigate our method’s general efficacy. Experimental results on both tasks demonstrate that our novel multi-modal learning scheme consistently outperforms single-modal training and previous multi-modal approaches.
Published 2020-01-06
URL https://arxiv.org/abs/2001.03111v1
PDF https://arxiv.org/pdf/2001.03111v1.pdf
PWC https://paperswithcode.com/paper/unpaired-multi-modal-segmentation-via
Repo https://github.com/JunMa11/MedJournal-OpenSourcePapers
Framework tf

#### Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement

Title Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement
Authors Chunle Guo, Chongyi Li, Jichang Guo, Chen Change Loy, Junhui Hou, Sam Kwong, Runmin Cong
Abstract The paper presents a novel method, Zero-Reference Deep Curve Estimation (Zero-DCE), which formulates light enhancement as a task of image-specific curve estimation with a deep network. Our method trains a lightweight deep network, DCE-Net, to estimate pixel-wise and high-order curves for dynamic range adjustment of a given image. The curve estimation is specially designed, considering pixel value range, monotonicity, and differentiability. Zero-DCE is appealing in its relaxed assumption on reference images, i.e., it does not require any paired or unpaired data during training. This is achieved through a set of carefully formulated non-reference loss functions, which implicitly measure the enhancement quality and drive the learning of the network. Our method is efficient as image enhancement can be achieved by an intuitive and simple nonlinear curve mapping. Despite its simplicity, we show that it generalizes well to diverse lighting conditions. Extensive experiments on various benchmarks demonstrate the advantages of our method over state-of-the-art methods qualitatively and quantitatively. Furthermore, the potential benefits of our Zero-DCE to face detection in the dark are discussed. Code and model will be available at https://github.com/Li-Chongyi/Zero-DCE.
Tasks Face Detection, Image Enhancement, Low-Light Image Enhancement
Published 2020-01-19
URL https://arxiv.org/abs/2001.06826v2
PDF https://arxiv.org/pdf/2001.06826v2.pdf
PWC https://paperswithcode.com/paper/zero-reference-deep-curve-estimation-for-low
Repo https://github.com/Li-Chongyi/Zero-DCE
Framework none

#### Uneven Coverage of Natural Disasters in Wikipedia: the Case of Flood

Title Uneven Coverage of Natural Disasters in Wikipedia: the Case of Flood
Authors Valerio Lorini, Javier Rando, Diego Saez-Trumper, Carlos Castillo
Abstract The usage of non-authoritative data for disaster management presents the opportunity of accessing timely information that might not be available through other means, as well as the challenge of dealing with several layers of biases. Wikipedia, a collaboratively-produced encyclopedia, includes in-depth information about many natural and human-made disasters, and its editors are particularly good at adding information in real-time as a crisis unfolds. In this study, we focus on the English version of Wikipedia, that is by far the most comprehensive version of this encyclopedia. Wikipedia tends to have good coverage of disasters, particularly those having a large number of fatalities. However, we also show that a tendency to cover events in wealthy countries and not cover events in poorer ones permeates Wikipedia as a source for disaster-related information. By performing careful automatic content analysis at a large scale, we show how the coverage of floods in Wikipedia is skewed towards rich, English-speaking countries, in particular the US and Canada. We also note how coverage of floods in countries with the lowest income, as well as countries in South America, is substantially lower than the coverage of floods in middle-income countries. These results have implications for systems using Wikipedia or similar collaborative media platforms as an information source for detecting emergencies or for gathering valuable information for disaster response.
Published 2020-01-23
URL https://arxiv.org/abs/2001.08810v1
PDF https://arxiv.org/pdf/2001.08810v1.pdf
PWC https://paperswithcode.com/paper/uneven-coverage-of-natural-disasters-in
Repo https://github.com/javirandor/disasters-wikipedia-floods
Framework none

#### GAMI-Net: An Explainable Neural Network based on Generalized Additive Models with Structured Interactions

Title GAMI-Net: An Explainable Neural Network based on Generalized Additive Models with Structured Interactions
Authors Zebin Yang, Aijun Zhang, Agus Sudjianto
Abstract The lack of interpretability is an inevitable problem when using neural network models in real applications. In this paper, a new explainable neural network called GAMI-Net, based on generalized additive models with structured interactions, is proposed to pursue a good balance between prediction accuracy and model interpretability. The GAMI-Net is a disentangled feedforward network with multiple additive subnetworks, where each subnetwork is designed for capturing either one main effect or one pairwise interaction effect. It takes into account three kinds of interpretability constraints, including a) sparsity constraint for selecting the most significant effects for parsimonious representations; b) heredity constraint such that a pairwise interaction could only be included when at least one of its parent effects exists; and c) marginal clarity constraint, in order to make the main and pairwise interaction effects mutually distinguishable. For model estimation, we develop an adaptive training algorithm that firstly fits the main effects to the responses, then fits the structured pairwise interactions to the residuals. Numerical experiments on both synthetic functions and real-world datasets show that the proposed explainable GAMI-Net enjoys superior interpretability while maintaining competitive prediction accuracy in comparison to the explainable boosting machine and other benchmark machine learning models.
Published 2020-03-16
URL https://arxiv.org/abs/2003.07132v1
PDF https://arxiv.org/pdf/2003.07132v1.pdf
PWC https://paperswithcode.com/paper/gami-net-an-explainable-neural-network-based
Repo https://github.com/ZebinYang/gaminet
Framework tf

#### Context-Transformer: Tackling Object Confusion for Few-Shot Detection

Title Context-Transformer: Tackling Object Confusion for Few-Shot Detection
Authors Ze Yang, Yali Wang, Xianyu Chen, Jianzhuang Liu, Yu Qiao
Abstract Few-shot object detection is a challenging but realistic scenario, where only a few annotated training images are available for training detectors. A popular approach to handle this problem is transfer learning, i.e., fine-tuning a detector pretrained on a source-domain benchmark. However, such transferred detector often fails to recognize new objects in the target domain, due to low data diversity of training samples. To tackle this problem, we propose a novel Context-Transformer within a concise deep transfer framework. Specifically, Context-Transformer can effectively leverage source-domain object knowledge as guidance, and automatically exploit contexts from only a few training images in the target domain. Subsequently, it can adaptively integrate these relational clues to enhance the discriminative power of detector, in order to reduce object confusion in few-shot scenarios. Moreover, Context-Transformer is flexibly embedded in the popular SSD-style detectors, which makes it a plug-and-play module for end-to-end few-shot learning. Finally, we evaluate Context-Transformer on the challenging settings of few-shot detection and incremental few-shot detection. The experimental results show that, our framework outperforms the recent state-of-the-art approaches.
Tasks Few-Shot Learning, Few-Shot Object Detection, Object Detection, Transfer Learning
Published 2020-03-16
URL https://arxiv.org/abs/2003.07304v1
PDF https://arxiv.org/pdf/2003.07304v1.pdf
PWC https://paperswithcode.com/paper/context-transformer-tackling-object-confusion
Repo https://github.com/Ze-Yang/Context-Transformer
Framework pytorch

#### Exploration in Action Space

Title Exploration in Action Space
Authors Anirudh Vemula, Wen Sun, J. Andrew Bagnell
Abstract Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains. In this paper, we examine reasons why these methods work better and the situations in which they are worse than traditional action space exploration methods. Through a simple theoretical analysis, we show that when the parametric complexity required to solve the reinforcement learning problem is greater than the product of action space dimensionality and horizon length, exploration in action space is preferred. This is also shown empirically by comparing simple exploration methods on several toy problems.
Published 2020-03-31
URL https://arxiv.org/abs/2004.00500v1
PDF https://arxiv.org/pdf/2004.00500v1.pdf
PWC https://paperswithcode.com/paper/exploration-in-action-space
Repo https://github.com/LAIRLAB/ARS-experiments
Framework pytorch

#### CATA++: A Collaborative Dual Attentive Autoencoder Method for Recommending Scientific Articles

Title CATA++: A Collaborative Dual Attentive Autoencoder Method for Recommending Scientific Articles
Authors Meshal Alfarhood, Jianlin Cheng
Abstract Recommender systems today have become an essential component of any commercial website. Collaborative filtering approaches, and Matrix Factorization (MF) techniques in particular, are widely used in recommender systems. However, the natural data sparsity problem limits their performance where users generally interact with very few items in the system. Consequently, multiple hybrid models were proposed recently to optimize MF performance by incorporating additional contextual information in its learning process. Although these models improve the recommendation quality, there are two primary aspects for further improvements: (1) multiple models focus only on some portion of the available contextual information and neglect other portions; (2) learning the feature space of the side contextual information needs to be further enhanced. In this paper, we propose a Collaborative Dual Attentive Autoencoder (CATA++) for recommending scientific articles. CATA++ utilizes an article’s content and learns its latent space via two parallel autoencoders. We use attention mechanism to capture the most pertinent part of information in making more relevant recommendations. Comprehensive experiments on three real-world datasets have shown that our dual-way learning strategy has significantly improved the MF performance in comparison with other state-of-the-art MF-based models according to various experimental evaluations. The source code of our methods is available at: https://github.com/jianlin-cheng/CATA.
Published 2020-02-27
URL https://arxiv.org/abs/2002.12277v1
PDF https://arxiv.org/pdf/2002.12277v1.pdf
PWC https://paperswithcode.com/paper/cata-a-collaborative-dual-attentive
Repo https://github.com/jianlin-cheng/CATA
Framework tf

#### SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback

Title SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback
Authors Chao Wang, Hengshu Zhu, Chen Zhu, Chuan Qin, Hui Xiong
Abstract The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the well-known pairwise and listwise approaches have still been limited by various challenges. Specifically, for the pairwise approaches, the assumption of independent pairwise preference is not always held in practice. Also, the listwise approaches cannot efficiently accommodate “ties” due to the precondition of the entire list permutation. To this end, in this paper, we propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to inherently accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons and can be implemented with matrix factorization and neural networks. Meanwhile, we also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $\sqrt{M/N}$, where $M$ and $N$ are the numbers of items and users, respectively. Finally, extensive experiments on four real-world datasets clearly validate the superiority of SetRank compared with various state-of-the-art baselines.
Published 2020-02-23
URL https://arxiv.org/abs/2002.09841v1
PDF https://arxiv.org/pdf/2002.09841v1.pdf
PWC https://paperswithcode.com/paper/setrank-a-setwise-bayesian-approach-for
Framework none

#### Supervised Learning for Non-Sequential Data with the Canonical Polyadic Decomposition

Title Supervised Learning for Non-Sequential Data with the Canonical Polyadic Decomposition
Authors Alexandros Haliassos, Kriton Konstantinidis, Danilo P. Mandic
Abstract There has recently been increasing interest, both theoretical and practical, in utilizing tensor networks for the analysis and design of machine learning systems. In particular, a framework has been proposed that can handle both dense data (e.g., standard regression or classification tasks) and sparse data (e.g., recommender systems), unlike support vector machines and traditional deep learning techniques. Namely, it can be interpreted as applying local feature mappings to the data and, through the outer product operator, modelling all interactions of functions of the features; the corresponding weights are represented as a tensor network for computational tractability. In this paper, we derive efficient prediction and learning algorithms for supervised learning with the Canonical Polyadic (CP) decomposition, including suitable regularization and initialization schemes. We empirically demonstrate that the CP-based model performs at least on par with the existing models based on the Tensor Train (TT) decomposition on standard non-sequential tasks, and better on MovieLens 100K. Furthermore, in contrast to previous works which applied two-dimensional local feature maps to the data, we generalize the framework to handle arbitrarily high-dimensional maps, in order to gain a powerful lever on the expressiveness of the model. In order to enhance its stability and generalization capabilities, we propose a normalized version of the feature maps. Our experiments show that this version leads to dramatic improvements over the unnormalized and/or two-dimensional maps, as well as to performance on non-sequential supervised learning tasks that compares favourably with popular models, including neural networks.
Published 2020-01-27
URL https://arxiv.org/abs/2001.10109v1
PDF https://arxiv.org/pdf/2001.10109v1.pdf
PWC https://paperswithcode.com/paper/supervised-learning-for-non-sequential-data
Repo https://github.com/KritonKonstantinidis/CPD_Supervised_Learning
Framework tf

#### Unsupervised Any-to-Many Audiovisual Synthesis via Exemplar Autoencoders

Title Unsupervised Any-to-Many Audiovisual Synthesis via Exemplar Autoencoders
Authors Kangle Deng, Aayush Bansal, Deva Ramanan
Abstract We present an unsupervised approach that enables us to convert the speech input of any one individual to an output set of potentially-infinitely many speakers. One can stand in front of a mic and be able to make their favorite celebrity say the same words. Our approach builds on simple autoencoders that project out-of-sample data to the distribution of the training set (motivated by PCA/linear autoencoders). We use an exemplar autoencoder to learn the voice and specific style (emotions and ambiance) of a target speaker. In contrast to existing methods, the proposed approach can be easily extended to an arbitrarily large number of speakers in a very little time using only two-three minutes of audio data from a speaker. We also exhibit the usefulness of our approach for generating video from audio signals and vice-versa. We suggest the reader to check out our project webpage for various synthesized examples: https://dunbar12138.github.io/projectpage/Audiovisual/
Published 2020-01-13
URL https://arxiv.org/abs/2001.04463v1
PDF https://arxiv.org/pdf/2001.04463v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-any-to-many-audiovisual
Repo https://github.com/dunbar12138/Audiovisual-Synthesis
Framework pytorch

#### RatLesNetv2: A Fully Convolutional Network for Rodent Brain Lesion Segmentation

Title RatLesNetv2: A Fully Convolutional Network for Rodent Brain Lesion Segmentation
Authors Juan Miguel Valverde, Artem Shatillo, Riccardo de Feo, Olli Gröhn, Alejandra Sierra, Jussi Tohka
Abstract Segmentation of rodent brain lesions on magnetic resonance images (MRIs) is a time-consuming task with high inter- and intra-operator variability due to its subjective nature. We present a three-dimensional fully convolutional neural network (ConvNet) named RatLesNetv2 for segmenting rodent brain lesions. We compare its performance with other ConvNets on an unusually large and heterogeneous data set composed by 916 T2-weighted rat brain scans at nine different lesion stages. RatLesNetv2 obtained similar to higher Dice coefficients than the other ConvNets and it produced much more realistic and compact segmentations with notably less holes and lower Hausdorff distance. RatLesNetv2-derived segmentations also exceeded inter-rater agreement Dice coefficients. Additionally, we show that training on disparate ground truths leads to significantly different segmentations, and we study RatLesNetv2 generalization capability when optimizing for training sets of different sizes. RatLesNetv2 is publicly available at https://github.com/jmlipman/RatLesNetv2.
Published 2020-01-24
URL https://arxiv.org/abs/2001.09138v2
PDF https://arxiv.org/pdf/2001.09138v2.pdf
PWC https://paperswithcode.com/paper/ratlesnetv2-a-fully-convolutional-network-for
Repo https://github.com/jmlipman/RatLesNetv2
Framework pytorch

#### Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

Title Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap
Authors Tae Jin Park, Kyu J. Han, Manoj Kumar, Shrikanth Narayanan
Abstract In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization. The proposed framework uses normalized maximum eigengap (NME) values to estimate the number of clusters and the parameters for the threshold of the elements of each row in an affinity matrix during spectral clustering, without the use of parameter tuning on the development set. Even through this hands-off approach, we achieve a comparable or better performance across various evaluation sets than the results found using traditional clustering methods that apply careful parameter tuning and development data. A relative improvement of 17% in the speaker error rate on the well-known CALLHOME evaluation set shows the effectiveness of our proposed spectral clustering with auto-tuning.