February 1, 2020

3429 words 17 mins read

Paper Group AWR 129

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events. A Remote Sensing Image Dataset for Cloud Removal. Learning Transferable Cooperative Behavior in Multi-Agent Teams. Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach. In Defense of the Triplet Loss Again: Learning R …

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events


Title	Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events
Authors	Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du
Abstract	In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events. This differs from existing strategies, which focus on characterizing global acoustical distributions of audio or the temporal evolution of short-term audio features, without analysis down to the level of sound events. To identify distinct sound events for each scene, we formulate ASC in a multi-instance learning (MIL) framework, where each audio recording is mapped into a bag-of-instances representation. Here, instances can be seen as high-level representations for sound events inside a scene. We also propose a MIL neural networks model, which implicitly identifies distinct instances (i.e., sound events). Furthermore, we propose two specially designed modules that model the multi-temporal scale and multi-modal natures of the sound events respectively. The experiments were conducted on the official development set of the DCASE2018 Task1 Subtask B, and our best-performing model improves over the official baseline by 9.4% (68.3% vs 58.9%) in terms of classification accuracy. This study indicates that recognizing acoustic scenes by identifying distinct sound events is effective and paves the way for future studies that combine this strategy with previous ones.
Tasks	Acoustic Scene Classification, Scene Classification
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05204v2
PDF	http://arxiv.org/pdf/1904.05204v2.pdf
PWC	https://paperswithcode.com/paper/acoustic-scene-classification-by-implicitly
Repo	https://github.com/hackerekcah/distinct-events-asc
Framework	pytorch

A Remote Sensing Image Dataset for Cloud Removal


Title	A Remote Sensing Image Dataset for Cloud Removal
Authors	Daoyu Lin, Guangluan Xu, Xiaoke Wang, Yang Wang, Xian Sun, Kun Fu
Abstract	Cloud-based overlays are often present in optical remote sensing images, thus limiting the application of acquired data. Removing clouds is an indispensable pre-processing step in remote sensing image analysis. Deep learning has achieved great success in the field of remote sensing in recent years, including scene classification and change detection. However, deep learning is rarely applied in remote sensing image removal clouds. The reason is the lack of data sets for training neural networks. In order to solve this problem, this paper first proposed the Remote sensing Image Cloud rEmoving dataset (RICE). The proposed dataset consists of two parts: RICE1 contains 500 pairs of images, each pair has images with cloud and cloudless size of 512512; RICE2 contains 450 sets of images, each set contains three 512512 size images. , respectively, the reference picture without clouds, the picture of the cloud and the mask of its cloud. The dataset is freely available at \url{https://github.com/BUPTLdy/RICE_DATASET}.
Tasks	Scene Classification
Published	2019-01-03
URL	http://arxiv.org/abs/1901.00600v1
PDF	http://arxiv.org/pdf/1901.00600v1.pdf
PWC	https://paperswithcode.com/paper/a-remote-sensing-image-dataset-for-cloud
Repo	https://github.com/BUPTLdy/RICE_DATASET
Framework	none

Learning Transferable Cooperative Behavior in Multi-Agent Teams


Title	Learning Transferable Cooperative Behavior in Multi-Agent Teams
Authors	Akshat Agarwal, Sumit Kumar, Katia Sycara
Abstract	While multi-agent interactions can be naturally modeled as a graph, the environment has traditionally been considered as a black box. We propose to create a shared agent-entity graph, where agents and environmental entities form vertices, and edges exist between the vertices which can communicate with each other. Agents learn to cooperate by exchanging messages along the edges of this graph. Our proposed multi-agent reinforcement learning framework is invariant to the number of agents or entities present in the system as well as permutation invariance, both of which are desirable properties for any multi-agent system representation. We present state-of-the-art results on coverage, formation and line control tasks for multi-agent teams in a fully decentralized framework and further show that the learned policies quickly transfer to scenarios with different team sizes along with strong zero-shot generalization performance. This is an important step towards developing multi-agent teams which can be realistically deployed in the real world without assuming complete prior knowledge or instantaneous communication at unbounded distances.
Tasks	Multi-agent Reinforcement Learning
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01202v1
PDF	https://arxiv.org/pdf/1906.01202v1.pdf
PWC	https://paperswithcode.com/paper/learning-transferable-cooperative-behavior-in
Repo	https://github.com/sumitsk/marl_transfer
Framework	pytorch

Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach


Title	Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach
Authors	Changjun Fan, Li Zeng, Yuhui Ding, Muhao Chen, Yizhou Sun, Zhong Liu
Abstract	Betweenness centrality (BC) is one of the most used centrality measures for network analysis, which seeks to describe the importance of nodes in a network in terms of the fraction of shortest paths that pass through them. It is key to many valuable applications, including community detection and network dismantling. Computing BC scores on large networks is computationally challenging due to high time complexity. Many approximation algorithms have been proposed to speed up the estimation of BC, which are mainly sampling-based. However, these methods are still prone to considerable execution time on large-scale networks, and their results are often exacerbated when small changes happen to the network structures. In this paper, we focus on identifying nodes with high BC in a graph, since many application scenarios are built upon retrieving nodes with top-k BC. Different from previous heuristic methods, we turn this task into a learning problem and design an encoder-decoder based framework to resolve the problem. More specifcally, the encoder leverages the network structure to encode each node into an embedding vector, which captures the important structural information of the node. The decoder transforms the embedding vector for each node into a scalar, which captures the relative rank of this node in terms of BC. We use the pairwise ranking loss to train the model to identify the orders of nodes regarding their BC. By training on small-scale networks, the learned model is capable of assigning relative BC scores to nodes for any unseen networks, and thus identifying the highly-ranked nodes. Comprehensive experiments on both synthetic and real-world networks demonstrate that, compared to representative baselines, our model drastically speeds up the prediction without noticeable sacrifce in accuracy, and outperforms the state-of-the-art by accuracy on several large real-world networks.
Tasks	Community Detection
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10418v4
PDF	https://arxiv.org/pdf/1905.10418v4.pdf
PWC	https://paperswithcode.com/paper/learning-to-identify-high-betweenness
Repo	https://github.com/FFrankyy/DrBC
Framework	tf

In Defense of the Triplet Loss Again: Learning Robust Person Re-Identification with Fast Approximated Triplet Loss and Label Distillation


Title	In Defense of the Triplet Loss Again: Learning Robust Person Re-Identification with Fast Approximated Triplet Loss and Label Distillation
Authors	Ye Yuan, Wuyang Chen, Yang Yang, Zhangyang Wang
Abstract	The comparative losses (typically, triplet loss) are appealing choices for learning person re-identification (ReID) features. However, the triplet loss is computationally much more expensive than the (practically more popular) classification loss, limiting their wider usage in massive datasets. Moreover, the abundance of label noise and outliers in ReID datasets may also put the margin-based loss in jeopardy. This work addresses the above two shortcomings of triplet loss, extending its effectiveness to large-scale ReID datasets with potentially noisy labels. We propose a fast-approximated triplet (FAT) loss, which provably converts the point-wise triplet loss into its upper bound form, consisting of a point-to-set loss term plus cluster compactness regularization. It preserves the effectiveness of triplet loss, while leading to linear complexity to the training set size. A label distillation strategy is further designed to learn refined soft-labels in place of the potentially noisy labels, from only an identified subset of confident examples, through teacher-student networks. We conduct extensive experiments on three most popular ReID benchmarks (Market-1501, DukeMTMC-reID, and MSMT17), and demonstrate that FAT loss with distilled labels lead to ReID features with remarkable accuracy, efficiency, robustness, and direct transferability to unseen datasets.
Tasks	Person Re-Identification
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07863v2
PDF	https://arxiv.org/pdf/1912.07863v2.pdf
PWC	https://paperswithcode.com/paper/in-defense-of-the-triplet-loss-again-learning
Repo	https://github.com/TAMU-VITA/FAT
Framework	pytorch

Partially Shuffling the Training Data to Improve Language Models


Title	Partially Shuffling the Training Data to Improve Language Models
Authors	Ofir Press
Abstract	Although SGD requires shuffling the training data between epochs, currently none of the word-level language modeling systems do this. Naively shuffling all sentences in the training data would not permit the model to learn inter-sentence dependencies. Here we present a method that partially shuffles the training data between epochs. This method makes each batch random, while keeping most sentence ordering intact. It achieves new state of the art results on word-level language modeling on both the Penn Treebank and WikiText-2 datasets.
Tasks	Language Modelling, Sentence Ordering
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04167v2
PDF	http://arxiv.org/pdf/1903.04167v2.pdf
PWC	https://paperswithcode.com/paper/partially-shuffling-the-training-data-to-1
Repo	https://github.com/ofirpress/PartialShuffle
Framework	pytorch

Second-Order Semantic Dependency Parsing with End-to-End Neural Networks


Title	Second-Order Semantic Dependency Parsing with End-to-End Neural Networks
Authors	Xinyu Wang, Jingxian Huang, Kewei Tu
Abstract	Semantic dependency parsing aims to identify semantic relationships between words in a sentence that form a graph. In this paper, we propose a second-order semantic dependency parser, which takes into consideration not only individual dependency edges but also interactions between pairs of edges. We show that second-order parsing can be approximated using mean field (MF) variational inference or loopy belief propagation (LBP). We can unfold both algorithms as recurrent layers of a neural network and therefore can train the parser in an end-to-end manner. Our experiments show that our approach achieves state-of-the-art performance.
Tasks	Dependency Parsing, Semantic Dependency Parsing
Published	2019-06-19
URL	https://arxiv.org/abs/1906.07880v2
PDF	https://arxiv.org/pdf/1906.07880v2.pdf
PWC	https://paperswithcode.com/paper/second-order-semantic-dependency-parsing-with
Repo	https://github.com/wangxinyu0922/Second_Order_SDP
Framework	tf

ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data


Title	ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data
Authors	Foivos I. Diakogiannis, François Waldner, Peter Caccetta, Chen Wu
Abstract	Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications. Due to the large within-class and small between-class variance in pixel values of objects of interest, this remains a challenging task. In recent years, deep convolutional neural networks have started being used in remote sensing applications and demonstrate state of the art performance for pixel level classification of objects. \textcolor{black}{Here we propose a reliable framework for performant results for the task of semantic segmentation of monotemporal very high resolution aerial images. Our framework consists of a novel deep learning architecture, ResUNet-a, and a novel loss function based on the Dice loss. ResUNet-a uses a UNet encoder/decoder backbone, in combination with residual connections, atrous convolutions, pyramid scene parsing pooling and multi-tasking inference. ResUNet-a infers sequentially the boundary of the objects, the distance transform of the segmentation mask, the segmentation mask and a colored reconstruction of the input. Each of the tasks is conditioned on the inference of the previous ones, thus establishing a conditioned relationship between the various tasks, as this is described through the architecture’s computation graph. We analyse the performance of several flavours of the Generalized Dice loss for semantic segmentation, and we introduce a novel variant loss function for semantic segmentation of objects that has excellent convergence properties and behaves well even under the presence of highly imbalanced classes.} The performance of our modeling framework is evaluated on the ISPRS 2D Potsdam dataset. Results show state-of-the-art performance with an average F1 score of 92.9% over all classes for our best model.
Tasks	Scene Parsing, Scene Understanding, Semantic Segmentation
Published	2019-04-01
URL	https://arxiv.org/abs/1904.00592v3
PDF	https://arxiv.org/pdf/1904.00592v3.pdf
PWC	https://paperswithcode.com/paper/resunet-a-a-deep-learning-framework-for
Repo	https://github.com/mohuazheliu/ResUnet-a
Framework	tf

PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval


Title	PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval
Authors	Wenxiao Zhang, Chunxia Xiao
Abstract	Point cloud based retrieval for place recognition is an emerging problem in vision field. The main challenge is how to find an efficient way to encode the local features into a discriminative global descriptor. In this paper, we propose a Point Contextual Attention Network (PCAN), which can predict the significance of each local point feature based on point context. Our network makes it possible to pay more attention to the task-relevent features when aggregating local features. Experiments on various benchmark datasets show that the proposed network can provide outperformance than current state-of-the-art approaches.
Tasks
Published	2019-04-22
URL	http://arxiv.org/abs/1904.09793v1
PDF	http://arxiv.org/pdf/1904.09793v1.pdf
PWC	https://paperswithcode.com/paper/pcan-3d-attention-map-learning-using
Repo	https://github.com/XLechter/PCAN
Framework	tf

AMI-Net+: A Novel Multi-Instance Neural Network for Medical Diagnosis from Incomplete and Imbalanced Data


Title	AMI-Net+: A Novel Multi-Instance Neural Network for Medical Diagnosis from Incomplete and Imbalanced Data
Authors	Zeyuan Wang, Josiah Poon, Simon Poon
Abstract	In medical real-world study (RWS), how to fully utilize the fragmentary and scarce information in model training to generate the solid diagnosis results is a challenging task. In this work, we introduce a novel multi-instance neural network, AMI-Net+, to train and predict from the incomplete and extremely imbalanced data. It is more effective than the state-of-art method, AMI-Net. First, we also implement embedding, multi-head attention and gated attention-based multi-instance pooling to capture the relations of symptoms themselves and with the given disease. Besides, we propose var-ious improvements to AMI-Net, that the cross-entropy loss is replaced by focal loss and we propose a novel self-adaptive multi-instance pooling method on instance-level to obtain the bag representation. We validate the performance of AMI-Net+ on two real-world datasets, from two different medical domains. Results show that our approach outperforms other base-line models by a considerable margin.
Tasks	Medical Diagnosis
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01734v1
PDF	https://arxiv.org/pdf/1907.01734v1.pdf
PWC	https://paperswithcode.com/paper/ami-net-a-novel-multi-instance-neural-network
Repo	https://github.com/Zeyuan-Wang/AMI-Netv2
Framework	tf

ÚFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task


Title	ÚFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task
Authors	Milan Straka, Jana Straková
Abstract	We present a system description of our contribution to the CoNLL 2019 shared task, Cross-Framework Meaning Representation Parsing (MRP 2019). The proposed architecture is our first attempt towards a semantic parsing extension of the UDPipe 2.0, a lemmatization, POS tagging and dependency parsing pipeline. For the MRP 2019, which features five formally and linguistically different approaches to meaning representation (DM, PSD, EDS, UCCA and AMR), we propose a uniform, language and framework agnostic graph-to-graph neural network architecture. Without any knowledge about the graph structure, and specifically without any linguistically or framework motivated features, our system implicitly models the meaning representation graphs. After fixing a human error (we used earlier incorrect version of provided test set analyses), our submission would score third in the competition evaluation. The source code of our system is available at https://github.com/ufal/mrpipe-conll2019.
Tasks	Dependency Parsing, Lemmatization, Semantic Parsing
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11295v1
PDF	https://arxiv.org/pdf/1910.11295v1.pdf
PWC	https://paperswithcode.com/paper/ufal-mrpipe-at-mrp-2019-udpipe-goes-semantic
Repo	https://github.com/ufal/mrpipe-conll2019
Framework	none

[Extended version] Rethinking Deep Neural Network Ownership Verification: Embedding Passports to Defeat Ambiguity Attacks


Title	[Extended version] Rethinking Deep Neural Network Ownership Verification: Embedding Passports to Defeat Ambiguity Attacks
Authors	Lixin Fan, Kam Woh Ng, Chee Seng Chan
Abstract	With substantial amount of time, resources and human (team) efforts invested to explore and develop successful deep neural networks (DNN), there emerges an urgent need to protect these inventions from being illegally copied, redistributed, or abused without respecting the intellectual properties of legitimate owners. Following recent progresses along this line, we investigate a number of watermark-based DNN ownership verification methods in the face of ambiguity attacks, which aim to cast doubts on the ownership verification by forging counterfeit watermarks. It is shown that ambiguity attacks pose serious threats to existing DNN watermarking methods. As remedies to the above-mentioned loophole, this paper proposes novel passport-based DNN ownership verification schemes which are both robust to network modifications and resilient to ambiguity attacks. The gist of embedding digital passports is to design and train DNN models in a way such that, the DNN inference performance of an original task will be significantly deteriorated due to forged passports. In other words, genuine passports are not only verified by looking for the predefined signatures, but also reasserted by the unyielding DNN model inference performances. Extensive experimental results justify the effectiveness of the proposed passport-based DNN ownership verification schemes. Code and models are available at https://github.com/kamwoh/DeepIPR
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07830v3
PDF	https://arxiv.org/pdf/1909.07830v3.pdf
PWC	https://paperswithcode.com/paper/rethinking-deep-neural-network-ownership
Repo	https://github.com/kamwoh/DeepIPR
Framework	pytorch

HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs


Title	HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Authors	Fangyu Liu, Rongtian Ye, Xun Wang, Shuaipeng Li
Abstract	The hubness problem widely exists in high-dimensional embedding space and is a fundamental source of error for cross-modal matching tasks. In this work, we study the emergence of hubs in Visual Semantic Embeddings (VSE) with application to text-image matching. We analyze the pros and cons of two widely adopted optimization objectives for training VSE and propose a novel hubness-aware loss function (HAL) that addresses previous methods’ defects. Unlike (Faghri et al.2018) which simply takes the hardest sample within a mini-batch, HAL takes all samples into account, using both local and global statistics to scale up the weights of “hubs”. We experiment our method with various configurations of model architectures and datasets. The method exhibits exceptionally good robustness and brings consistent improvement on the task of text-image matching across all settings. Specifically, under the same model architectures as (Faghri et al. 2018) and (Lee at al. 2018), by switching only the learning objective, we report a maximum R@1improvement of 7.4% on MS-COCO and 8.3% on Flickr30k.
Tasks
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10097v1
PDF	https://arxiv.org/pdf/1911.10097v1.pdf
PWC	https://paperswithcode.com/paper/hal-improved-text-image-matching-by
Repo	https://github.com/hardyqr/HAL
Framework	tf

Top-view Trajectories: A Pedestrian Dataset of Vehicle-Crowd Interaction from Controlled Experiments and Crowded Campus


Title	Top-view Trajectories: A Pedestrian Dataset of Vehicle-Crowd Interaction from Controlled Experiments and Crowded Campus
Authors	Dongfang Yang, Linhui Li, Keith Redmill, Ümit Özgüner
Abstract	Predicting the collective motion of a group of pedestrians (a crowd) under the vehicle influence is essential for the development of autonomous vehicles to deal with mixed urban scenarios where interpersonal interaction and vehicle-crowd interaction (VCI) are significant. This usually requires a model that can describe individual pedestrian motion under the influence of nearby pedestrians and the vehicle. This study proposed two pedestrian trajectory datasets, CITR dataset and DUT dataset, so that the pedestrian motion models can be further calibrated and verified, especially when vehicle influence on pedestrians plays an important role. CITR dataset consists of experimentally designed fundamental VCI scenarios (front, back, and lateral VCIs) and provides unique ID for each pedestrian, which is suitable for exploring a specific aspect of VCI. DUT dataset gives two ordinary and natural VCI scenarios in crowded university campus, which can be used for more general purpose VCI exploration. The trajectories of pedestrians, as well as vehicles, were extracted by processing video frames that come from a down-facing camera mounted on a hovering drone as the recording equipment. The final trajectories of pedestrians and vehicles were refined by Kalman filters with linear point-mass model and nonlinear bicycle model, respectively, in which xy-velocity of pedestrians and longitudinal speed and orientation of vehicles were estimated. The statistics of the velocity magnitude distribution demonstrated the validity of the proposed dataset. In total, there are approximate 340 pedestrian trajectories in CITR dataset and 1793 pedestrian trajectories in DUT dataset. The dataset is available at GitHub.
Tasks	Autonomous Vehicles
Published	2019-02-01
URL	http://arxiv.org/abs/1902.00487v2
PDF	http://arxiv.org/pdf/1902.00487v2.pdf
PWC	https://paperswithcode.com/paper/top-view-trajectories-a-pedestrian-dataset-of
Repo	https://github.com/dongfang-steven-yang/vci-dataset-dut
Framework	none

DynWalks: Global Topology and Recent Changes Awareness Dynamic Network Embedding


Title	DynWalks: Global Topology and Recent Changes Awareness Dynamic Network Embedding
Authors	Chengbin Hou, Han Zhang, Ke Tang, Shan He
Abstract	Learning topological representation of a network in dynamic environments has recently attracted considerable attention due to the time-evolving nature of many real-world networks i.e. nodes/links might be added/removed as time goes on. Dynamic network embedding aims to learn low dimensional embeddings for unseen and seen nodes by using any currently available snapshots of a dynamic network. For seen nodes, the existing methods either treat them equally important or focus on the $k$ most affected nodes at each time step. However, the former solution is time-consuming, and the later solution that relies on incoming changes may lose the global topology—an important feature for downstream tasks. To address these challenges, we propose a dynamic network embedding method called DynWalks, which includes two key components: 1) An online network embedding framework that can dynamically and efficiently learn embeddings based on the selected nodes; 2) A novel online node selecting scheme that offers the flexible choices to balance global topology and recent changes, as well as to fulfill the real-time constraint if needed. The empirical studies on six real-world dynamic networks under three different slicing ways show that DynWalks significantly outperforms the state-of-the-art methods in graph reconstruction tasks, and obtains comparable results in link prediction tasks. Furthermore, the wall-clock time and complexity analysis demonstrate its excellent time and space efficiency. The source code of DynWalks is available at https://github.com/houchengbin/DynWalks
Tasks	Link Prediction, Network Embedding
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11968v1
PDF	https://arxiv.org/pdf/1907.11968v1.pdf
PWC	https://paperswithcode.com/paper/dynwalks-global-topology-and-recent-changes-1
Repo	https://github.com/houchengbin/DynWalks
Framework	none