January 31, 2020

2961 words 14 mins read

Paper Group ANR 146

Paper Group ANR 146

Segmenting Objects in Day and Night:Edge-Conditioned CNN for Thermal Image Semantic Segmentation. Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach. Cooperative Learning of Disjoint Syntax and Semantics. Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Loc …

Segmenting Objects in Day and Night:Edge-Conditioned CNN for Thermal Image Semantic Segmentation

Title Segmenting Objects in Day and Night:Edge-Conditioned CNN for Thermal Image Semantic Segmentation
Authors Chenglong Li, Wei Xia, Yan Yan, Bin Luo, Jin Tang
Abstract Despite much research progress in image semantic segmentation, it remains challenging under adverse environmental conditions caused by imaging limitations of visible spectrum. While thermal infrared cameras have several advantages over cameras for the visible spectrum, such as operating in total darkness, insensitive to illumination variations, robust to shadow effects and strong ability to penetrate haze and smog. These advantages of thermal infrared cameras make the segmentation of semantic objects in day and night. In this paper, we propose a novel network architecture, called edge-conditioned convolutional neural network (EC-CNN), for thermal image semantic segmentation. Particularly, we elaborately design a gated feature-wise transform layer in EC-CNN to adaptively incorporate edge prior knowledge. The whole EC-CNN is end-to-end trained, and can generate high-quality segmentation results with the edge guidance. Meanwhile, we also introduce a new benchmark dataset named “Segment Objects in Day And night”(SODA) for comprehensive evaluations in thermal image semantic segmentation. SODA contains over 7,168 manually annotated and synthetically generated thermal images with 20 semantic region labels and from a broad range of viewpoints and scene complexities. Extensive experiments on SODA demonstrate the effectiveness of the proposed EC-CNN against the state-of-the-art methods.
Tasks Semantic Segmentation
Published 2019-07-24
URL https://arxiv.org/abs/1907.10303v1
PDF https://arxiv.org/pdf/1907.10303v1.pdf
PWC https://paperswithcode.com/paper/segmenting-objects-in-day-and-nightedge
Repo
Framework

Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach

Title Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach
Authors Hyungsul Kim, Ahmed El-Kishky, Xiang Ren, Jiawei Han
Abstract We present ProxiModel, a novel event mining framework for extracting high-quality structured event knowledge from large, redundant, and noisy news data sources. The proposed model differentiates itself from other approaches by modeling both the event correlation within each individual document as well as across the corpus. To facilitate this, we introduce the concept of a proximity-network, a novel space-efficient data structure to facilitate scalable event mining. This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections. We probabilistically model the proximity network as a generative process with sparsity-inducing regularization. This allows us to efficiently and effectively extract high-quality and interpretable news events. Experiments on three different news corpora demonstrate that the proposed method is effective and robust at generating high-quality event descriptors and attributes. We briefly detail many interesting applications from our proposed framework such as news summarization, event tracking and multi-dimensional analysis on news. Finally, we explore a case study on visualizing the events for a Japan Tsunami news corpus and demonstrate ProxiModel’s ability to automatically summarize emerging news events.
Tasks
Published 2019-11-14
URL https://arxiv.org/abs/1911.06407v1
PDF https://arxiv.org/pdf/1911.06407v1.pdf
PWC https://paperswithcode.com/paper/mining-news-events-from-comparable-news
Repo
Framework

Cooperative Learning of Disjoint Syntax and Semantics

Title Cooperative Learning of Disjoint Syntax and Semantics
Authors Serhii Havrylov, Germán Kruszewski, Armand Joulin
Abstract There has been considerable attention devoted to models that learn to jointly infer an expression’s syntactic structure and its semantics. Yet, \citet{NangiaB18} has recently shown that the current best systems fail to learn the correct parsing strategy on mathematical expressions generated from a simple context-free grammar. In this work, we present a recursive model inspired by \newcite{ChoiYL18} that reaches near perfect accuracy on this task. Our model is composed of two separated modules for syntax and semantics. They are cooperatively trained with standard continuous and discrete optimization schemes. Our model does not require any linguistic structure for supervision and its recursive nature allows for out-of-domain generalization with little loss in performance. Additionally, our approach performs competitively on several natural language tasks, such as Natural Language Inference or Sentiment Analysis.
Tasks Domain Generalization, Natural Language Inference, Sentiment Analysis
Published 2019-02-25
URL https://arxiv.org/abs/1902.09393v2
PDF https://arxiv.org/pdf/1902.09393v2.pdf
PWC https://paperswithcode.com/paper/cooperative-learning-of-disjoint-syntax-and
Repo
Framework

Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019

Title Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019
Authors Zhaofan Qiu, Dong Li, Yehao Li, Qi Cai, Yingwei Pan, Ting Yao
Abstract This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.
Tasks Action Localization, Spatio-Temporal Action Localization, Temporal Action Localization
Published 2019-06-14
URL https://arxiv.org/abs/1906.07016v1
PDF https://arxiv.org/pdf/1906.07016v1.pdf
PWC https://paperswithcode.com/paper/trimmed-action-recognition-dense-captioning
Repo
Framework

Multi-turn Inference Matching Network for Natural Language Inference

Title Multi-turn Inference Matching Network for Natural Language Inference
Authors Chunhua Liu, Shan Jiang, Hainan Yu, Dong Yu
Abstract Natural Language Inference (NLI) is a fundamental and challenging task in Natural Language Processing (NLP). Most existing methods only apply one-pass inference process on a mixed matching feature, which is a concatenation of different matching features between a premise and a hypothesis. In this paper, we propose a new model called Multi-turn Inference Matching Network (MIMN) to perform multi-turn inference on different matching features. In each turn, the model focuses on one particular matching feature instead of the mixed matching feature. To enhance the interaction between different matching features, a memory component is employed to store the history inference information. The inference of each turn is performed on the current matching feature and the memory. We conduct experiments on three different NLI datasets. The experimental results show that our model outperforms or achieves the state-of-the-art performance on all the three datasets.
Tasks Natural Language Inference
Published 2019-01-08
URL http://arxiv.org/abs/1901.02222v1
PDF http://arxiv.org/pdf/1901.02222v1.pdf
PWC https://paperswithcode.com/paper/multi-turn-inference-matching-network-for
Repo
Framework

Audio Source Separation via Multi-Scale Learning with Dilated Dense U-Nets

Title Audio Source Separation via Multi-Scale Learning with Dilated Dense U-Nets
Authors Vivek Sivaraman Narayanaswamy, Sameeksha Katoch, Jayaraman J. Thiagarajan, Huan Song, Andreas Spanias
Abstract Modern audio source separation techniques rely on optimizing sequence model architectures such as, 1D-CNNs, on mixture recordings to generalize well to unseen mixtures. Specifically, recent focus is on time-domain based architectures such as Wave-U-Net which exploit temporal context by extracting multi-scale features. However, the optimality of the feature extraction process in these architectures has not been well investigated. In this paper, we examine and recommend critical architectural changes that forge an optimal multi-scale feature extraction process. To this end, we replace regular $1-$D convolutions with adaptive dilated convolutions that have innate capability of capturing increased context by using large temporal receptive fields. We also investigate the impact of dense connections on the extraction process that encourage feature reuse and better gradient flow. The dense connections between the downsampling and upsampling paths of a U-Net architecture capture multi-resolution information leading to improved temporal modelling. We evaluate the proposed approaches on the MUSDB test dataset. In addition to providing an improved performance over the state-of-the-art, we also provide insights on the impact of different architectural choices on complex data-driven solutions for source separation.
Tasks
Published 2019-04-08
URL http://arxiv.org/abs/1904.04161v1
PDF http://arxiv.org/pdf/1904.04161v1.pdf
PWC https://paperswithcode.com/paper/audio-source-separation-via-multi-scale
Repo
Framework

Situating Sentence Embedders with Nearest Neighbor Overlap

Title Situating Sentence Embedders with Nearest Neighbor Overlap
Authors Lucy H. Lin, Noah A. Smith
Abstract As distributed approaches to natural language semantics have developed and diversified, embedders for linguistic units larger than words have come to play an increasingly important role. To date, such embedders have been evaluated using benchmark tasks (e.g., GLUE) and linguistic probes. We propose a comparative approach, nearest neighbor overlap (N2O), that quantifies similarity between embedders in a task-agnostic manner. N2O requires only a collection of examples and is simple to understand: two embedders are more similar if, for the same set of inputs, there is greater overlap between the inputs’ nearest neighbors. Though applicable to embedders of texts of any size, we focus on sentence embedders and use N2O to show the effects of different design choices and architectures.
Tasks
Published 2019-09-24
URL https://arxiv.org/abs/1909.10724v1
PDF https://arxiv.org/pdf/1909.10724v1.pdf
PWC https://paperswithcode.com/paper/situating-sentence-embedders-with-nearest
Repo
Framework

Bilinear Graph Networks for Visual Question Answering

Title Bilinear Graph Networks for Visual Question Answering
Authors Dalu Guo, Chang Xu, Dacheng Tao
Abstract This paper revisits the bilinear attention networks in the visual question answering task from a graph perspective. The classical bilinear attention networks build a bilinear attention map to extract the joint representation of words in the question and objects in the image but lack fully exploring the relationship between words for complex reasoning. In contrast, we develop bilinear graph networks to model the context of the joint embeddings of words and objects. Two kinds of graphs are investigated, namely image-graph and question-graph. The image-graph transfers features of the detected objects to their related query words, enabling the output nodes to have both semantic and factual information. The question-graph exchanges information between these output nodes from image-graph to amplify the implicit yet important relationship between objects. These two kinds of graphs cooperate with each other, and thus our resulting model can model the relationship and dependency between objects, which leads to the realization of multi-step reasoning. Experimental results on the VQA v2.0 validation dataset demonstrate the ability of our method to handle the complex questions. On the test-std set, our best single model achieves state-of-the-art performance, boosting the overall accuracy to 72.41%.
Tasks Question Answering, Visual Question Answering
Published 2019-07-23
URL https://arxiv.org/abs/1907.09815v2
PDF https://arxiv.org/pdf/1907.09815v2.pdf
PWC https://paperswithcode.com/paper/graph-reasoning-networks-for-visual-question
Repo
Framework

Robust Tensor Recovery using Low-Rank Tensor Ring

Title Robust Tensor Recovery using Low-Rank Tensor Ring
Authors Huyan Huang, Yipeng Liu, Ce Zhu
Abstract Robust tensor completion recoveries the low-rank and sparse parts from its partially observed entries. In this paper, we propose the robust tensor ring completion (RTRC) model and rigorously analyze its exact recovery guarantee via TR-unfolding scheme, and the result is consistent with that of matrix case. We propose the algorithms for tensor ring robust principle component analysis (TRRPCA) and RTCR using the alternating direction method of multipliers (ADMM). The numerical experiment demonstrates that the proposed method outperforms the state-of-the-art ones in terms of recovery accuracy.
Tasks
Published 2019-03-31
URL http://arxiv.org/abs/1904.00435v2
PDF http://arxiv.org/pdf/1904.00435v2.pdf
PWC https://paperswithcode.com/paper/robust-tensor-recovery-using-low-rank-tensor
Repo
Framework

Progressive Fusion for Unsupervised Binocular Depth Estimation using Cycled Networks

Title Progressive Fusion for Unsupervised Binocular Depth Estimation using Cycled Networks
Authors Andrea Pilzer, Stéphane Lathuilière, Dan Xu, Mihai Marian Puscas, Elisa Ricci, Nicu Sebe
Abstract Recent deep monocular depth estimation approaches based on supervised regression have achieved remarkable performance. However, they require costly ground truth annotations during training. To cope with this issue, in this paper we present a novel unsupervised deep learning approach for predicting depth maps. We introduce a new network architecture, named Progressive Fusion Network (PFN), that is specifically designed for binocular stereo depth estimation. This network is based on a multi-scale refinement strategy that combines the information provided by both stereo views. In addition, we propose to stack twice this network in order to form a cycle. This cycle approach can be interpreted as a form of data-augmentation since, at training time, the network learns both from the training set images (in the forward half-cycle) but also from the synthesized images (in the backward half-cycle). The architecture is jointly trained with adversarial learning. Extensive experiments on the publicly available datasets KITTI, Cityscapes and ApolloScape demonstrate the effectiveness of the proposed model which is competitive with other unsupervised deep learning methods for depth prediction.
Tasks Data Augmentation, Depth Estimation, Monocular Depth Estimation, Stereo Depth Estimation
Published 2019-09-17
URL https://arxiv.org/abs/1909.07667v1
PDF https://arxiv.org/pdf/1909.07667v1.pdf
PWC https://paperswithcode.com/paper/progressive-fusion-for-unsupervised-binocular
Repo
Framework

Challenges in Partially-Automated Roadway Feature Mapping Using Mobile Laser Scanning and Vehicle Trajectory Data

Title Challenges in Partially-Automated Roadway Feature Mapping Using Mobile Laser Scanning and Vehicle Trajectory Data
Authors Mohammad Billah, Farzana Rahman, Arash Maskooki, Michael Todd, Matthew Barth, Jay A. Farrell
Abstract Connected vehicle and driver’s assistance applications are greatly facilitated by Enhanced Digital Maps (EDMs) that represent roadway features (e.g., lane edges or centerlines, stop bars). Due to the large number of signalized intersections and miles of roadway, manual development of EDMs on a global basis is not feasible. Mobile Terrestrial Laser Scanning (MTLS) is the preferred data acquisition method to provide data for automated EDM development. Such systems provide an MTLS trajectory and a point cloud for the roadway environment. The challenge is to automatically convert these data into an EDM. This article presents a new processing and feature extraction method, experimental demonstration providing SAE-J2735 map messages for eleven example intersections, and a discussion of the results that points out remaining challenges and suggests directions for future research.
Tasks
Published 2019-02-09
URL http://arxiv.org/abs/1902.03346v1
PDF http://arxiv.org/pdf/1902.03346v1.pdf
PWC https://paperswithcode.com/paper/challenges-in-partially-automated-roadway
Repo
Framework

FaceLiveNet+: A Holistic Networks For Face Authentication Based On Dynamic Multi-task Convolutional Neural Networks

Title FaceLiveNet+: A Holistic Networks For Face Authentication Based On Dynamic Multi-task Convolutional Neural Networks
Authors Zuheng Ming, Junshi Xia, Muhammad Muzzamil Luqman, Jean-Christophe Burie, Kaixing Zhao
Abstract This paper proposes a holistic multi-task Convolutional Neural Networks (CNNs) with the dynamic weights of the tasks,namely FaceLiveNet+, for face authentication. FaceLiveNet+ can employ face verification and facial expression recognition as a solution of liveness control simultaneously. Comparing to the single-task learning, the proposed multi-task learning can better capture the feature representation for all of the tasks. The experimental results show the superiority of the multi-task learning to the single-task learning for both the face verification task and facial expression recognition task. Rather using a conventional multi-task learning with fixed weights for the tasks, this work proposes a so called dynamic-weight-unit to automatically learn the weights of the tasks. The experiments have shown the effectiveness of the dynamic weights for training the networks. Finally, the holistic evaluation for face authentication based on the proposed protocol has shown the feasibility to apply the FaceLiveNet+ for face authentication.
Tasks Face Verification, Facial Expression Recognition, Multi-Task Learning
Published 2019-02-28
URL http://arxiv.org/abs/1902.11179v1
PDF http://arxiv.org/pdf/1902.11179v1.pdf
PWC https://paperswithcode.com/paper/facelivenet-a-holistic-networks-for-face
Repo
Framework

Unsupervised Domain Adaptation for Depth Prediction from Images

Title Unsupervised Domain Adaptation for Depth Prediction from Images
Authors Alessio Tonioni, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano
Abstract State-of-the-art approaches to infer dense depth measurements from images rely on CNNs trained end-to-end on a vast amount of data. However, these approaches suffer a drastic drop in accuracy when dealing with environments much different in appearance and/or context from those observed at training time. This domain shift issue is usually addressed by fine-tuning on smaller sets of images from the target domain annotated with depth labels. Unfortunately, relying on such supervised labeling is seldom feasible in most practical settings. Therefore, we propose an unsupervised domain adaptation technique which does not require groundtruth labels. Our method relies only on image pairs and leverages on classical stereo algorithms to produce disparity measurements alongside with confidence estimators to assess upon their reliability. We propose to fine-tune both depth-from-stereo as well as depth-from-mono architectures by a novel confidence-guided loss function that handles the measured disparities as noisy labels weighted according to the estimated confidence. Extensive experimental results based on standard datasets and evaluation protocols prove that our technique can address effectively the domain shift issue with both stereo and monocular depth prediction architectures and outperforms other state-of-the-art unsupervised loss functions that may be alternatively deployed to pursue domain adaptation.
Tasks Depth Estimation, Domain Adaptation, Unsupervised Domain Adaptation
Published 2019-09-09
URL https://arxiv.org/abs/1909.03943v1
PDF https://arxiv.org/pdf/1909.03943v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-domain-adaptation-for-depth
Repo
Framework

Polarimetric Thermal to Visible Face Verification via Attribute Preserved Synthesis

Title Polarimetric Thermal to Visible Face Verification via Attribute Preserved Synthesis
Authors Xing Di, He Zhang, Vishal M. Patel
Abstract Thermal to visible face verification is a challenging problem due to the large domain discrepancy between the modalities. Existing approaches either attempt to synthesize visible faces from thermal faces or extract robust features from these modalities for cross-modal matching. In this paper, we take a different approach in which we make use of the attributes extracted from the visible image to synthesize the attribute-preserved visible image from the input thermal image for cross-modal matching. A pre-trained VGG-Face network is used to extract the attributes from the visible image. Then, a novel Attribute Preserved Generative Adversarial Network (AP-GAN) is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. Finally, a deep network is used to extract features from the synthesized image and the input visible image for verification. Extensive experiments on the ARL Polarimetric face dataset show that the proposed method achieves significant improvements over the state-of-the-art methods.
Tasks Face Verification
Published 2019-01-03
URL http://arxiv.org/abs/1901.00889v1
PDF http://arxiv.org/pdf/1901.00889v1.pdf
PWC https://paperswithcode.com/paper/polarimetric-thermal-to-visible-face
Repo
Framework

Hierarchical Deep Double Q-Routing

Title Hierarchical Deep Double Q-Routing
Authors Ramy E. Ali, Bilgehan Erman, Ejder Baştuğ, Bruce Cilli
Abstract This paper explores a deep reinforcement learning approach applied to the packet routing problem with high-dimensional constraints instigated by dynamic and autonomous communication networks. Our approach is motivated by the fact that centralized path calculation approaches are often not scalable, whereas the distributed approaches with locally acting nodes are not fully aware of the end-to-end performance. We instead hierarchically distribute the path calculation over designated nodes in the network while taking into account the end-to-end performance. Specifically, we develop a hierarchical cluster-oriented adaptive per-flow path calculation mechanism by leveraging the Deep Double Q-network (DDQN) algorithm, where the end-to-end paths are calculated by the source nodes with the assistance of cluster (group) leaders at different hierarchical levels. In our approach, a deferred composite reward is designed to capture the end-to-end performance through a feedback signal from the source nodes to the group leaders and captures the local network performance through the local resource assessments by the group leaders. This approach scales in large networks, adapts to the dynamic demand, utilizes the network resources efficiently and can be applied to segment routing.
Tasks
Published 2019-10-09
URL https://arxiv.org/abs/1910.04041v3
PDF https://arxiv.org/pdf/1910.04041v3.pdf
PWC https://paperswithcode.com/paper/hierarchical-deep-double-q-routing
Repo
Framework
comments powered by Disqus