April 2, 2020

3365 words 16 mins read

Paper Group ANR 251

Paper Group ANR 251

Density-Aware Graph for Deep Semi-Supervised Visual Recognition. Shortest path distance approximation using deep learning techniques. COVID-19 Screening on Chest X-ray Images Using Deep Learning based Anomaly Detection. Grounded Situation Recognition. Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment. Tenso …

Density-Aware Graph for Deep Semi-Supervised Visual Recognition

Title Density-Aware Graph for Deep Semi-Supervised Visual Recognition
Authors Suichan Li, Bin Liu, Dongdong Chen, Qi Chu, Lu Yuan, Nenghai Yu
Abstract Semi-supervised learning (SSL) has been extensively studied to improve the generalization ability of deep neural networks for visual recognition. To involve the unlabelled data, most existing SSL methods are based on common density-based cluster assumption: samples lying in the same high-density region are likely to belong to the same class, including the methods performing consistency regularization or generating pseudo-labels for the unlabelled images. Despite their impressive performance, we argue three limitations exist: 1) Though the density information is demonstrated to be an important clue, they all use it in an implicit way and have not exploited it in depth. 2) For feature learning, they often learn the feature embedding based on the single data sample and ignore the neighborhood information. 3) For label-propagation based pseudo-label generation, it is often done offline and difficult to be end-to-end trained with feature learning. Motivated by these limitations, this paper proposes to solve the SSL problem by building a novel density-aware graph, based on which the neighborhood information can be easily leveraged and the feature learning and label propagation can also be trained in an end-to-end way. Specifically, we first propose a new Density-aware Neighborhood Aggregation(DNA) module to learn more discriminative features by incorporating the neighborhood information in a density-aware manner. Then a novel Density-ascending Path based Label Propagation(DPLP) module is proposed to generate the pseudo-labels for unlabeled samples more efficiently according to the feature distribution characterized by density. Finally, the DNA module and DPLP module evolve and improve each other end-to-end.
Published 2020-03-30
URL https://arxiv.org/abs/2003.13194v1
PDF https://arxiv.org/pdf/2003.13194v1.pdf
PWC https://paperswithcode.com/paper/density-aware-graph-for-deep-semi-supervised

Shortest path distance approximation using deep learning techniques

Title Shortest path distance approximation using deep learning techniques
Authors Fatemeh Salehi Rizi, Joerg Schloetterer, Michael Granitzer
Abstract Computing shortest path distances between nodes lies at the heart of many graph algorithms and applications. Traditional exact methods such as breadth-first-search (BFS) do not scale up to contemporary, rapidly evolving today’s massive networks. Therefore, it is required to find approximation methods to enable scalable graph processing with a significant speedup. In this paper, we utilize vector embeddings learnt by deep learning techniques to approximate the shortest paths distances in large graphs. We show that a feedforward neural network fed with embeddings can approximate distances with relatively low distortion error. The suggested method is evaluated on the Facebook, BlogCatalog, Youtube and Flickr social networks.
Published 2020-02-12
URL https://arxiv.org/abs/2002.05257v1
PDF https://arxiv.org/pdf/2002.05257v1.pdf
PWC https://paperswithcode.com/paper/shortest-path-distance-approximation-using

COVID-19 Screening on Chest X-ray Images Using Deep Learning based Anomaly Detection

Title COVID-19 Screening on Chest X-ray Images Using Deep Learning based Anomaly Detection
Authors Jianpeng Zhang, Yutong Xie, Yi Li, Chunhua Shen, Yong Xia
Abstract Coronaviruses are important human and animal pathogens. To date the novel COVID-19 coronavirus is rapidly spreading worldwide and subsequently threatening health of billions of humans. Clinical studies have shown that most COVID-19 patients suffer from the lung infection. Although chest CT has been shown to be an effective imaging technique for lung-related disease diagnosis, chest Xray is more widely available due to its faster imaging time and considerably lower cost than CT. Deep learning, one of the most successful AI techniques, is an effective means to assist radiologists to analyze the vast amount of chest X-ray images, which can be critical for efficient and reliable COVID-19 screening. In this work, we aim to develop a new deep anomaly detection model for fast, reliable screening. To evaluate the model performance, we have collected 100 chest X-ray images of 70 patients confirmed with COVID-19 from the Github repository. To facilitate deep learning, more data are needed. Thus, we have also collected 1431 additional chest X-ray images confirmed as other pneumonia of 1008 patients from the public ChestX-ray14 dataset. Our initial experimental results show that the model developed here can reliably detect 96.00% COVID-19 cases (sensitivity being 96.00%) and 70.65% non-COVID-19 cases (specificity being 70.65%) when evaluated on 1531 Xray images with two splits of the dataset.
Tasks Anomaly Detection
Published 2020-03-27
URL https://arxiv.org/abs/2003.12338v1
PDF https://arxiv.org/pdf/2003.12338v1.pdf
PWC https://paperswithcode.com/paper/covid-19-screening-on-chest-x-ray-images

Grounded Situation Recognition

Title Grounded Situation Recognition
Authors Sarah Pratt, Mark Yatskar, Luca Weihs, Ali Farhadi, Aniruddha Kembhavi
Abstract We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles (e.g. agent, tool), and bounding-box groundings of entities. GSR presents important technical challenges: identifying semantic saliency, categorizing and localizing a large and diverse set of entities, overcoming semantic sparsity, and disambiguating roles. Moreover, unlike in captioning, GSR is straightforward to evaluate. To study this new task we create the Situations With Groundings (SWiG) dataset which adds 278,336 bounding-box groundings to the 11,538 entity classes in the imsitu dataset. We propose a Joint Situation Localizer and find that jointly predicting situations and groundings with end-to-end training handily outperforms independent training on the entire grounding metric suite with relative gains between 8% and 32%. Finally, we show initial findings on three exciting future directions enabled by our models: conditional querying, visual chaining, and grounded semantic aware image retrieval. Code and data available at https://prior.allenai.org/projects/gsr.
Tasks Image Retrieval
Published 2020-03-26
URL https://arxiv.org/abs/2003.12058v1
PDF https://arxiv.org/pdf/2003.12058v1.pdf
PWC https://paperswithcode.com/paper/grounded-situation-recognition

Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment

Title Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment
Authors Tomas Wilkinson, Carl Nettelblad
Abstract Recent work in word spotting in handwritten documents has yielded impressive results. This progress has largely been made by supervised learning systems, which are dependent on manually annotated data, making deployment to new collections a significant effort. In this paper, we propose an approach that utilises transcripts without bounding box annotations to train segmentation-free query-by-string word spotting models, given a partially trained model. This is done through a training-free alignment procedure based on hidden Markov models. This procedure creates a tentative mapping between word region proposals and the transcriptions to automatically create additional weakly annotated training data, without choosing any single alignment possibility as the correct one. When only using between 1% and 7% of the fully annotated training sets for partial convergence, we automatically annotate the remaining training data and successfully train using it. On all our datasets, our final trained model then comes within a few mAP% of the performance from a model trained with the full training set used as ground truth. We believe that this will be a significant advance towards a more general use of word spotting, since digital transcription data will already exist for parts of many collections of interest.
Tasks Word Spotting In Handwritten Documents
Published 2020-03-24
URL https://arxiv.org/abs/2003.11087v1
PDF https://arxiv.org/pdf/2003.11087v1.pdf
PWC https://paperswithcode.com/paper/bootstrapping-weakly-supervised-segmentation

Tensor Graph Convolutional Networks for Multi-relational and Robust Learning

Title Tensor Graph Convolutional Networks for Multi-relational and Robust Learning
Authors Vassilis N. Ioannidis, Antonio G. Marques, Georgios B. Giannakis
Abstract The era of “data deluge” has sparked renewed interest in graph-based learning methods and their widespread applications ranging from sociology and biology to transportation and communications. In this context of graph-aware methods, the present paper introduces a tensor-graph convolutional network (TGCN) for scalable semi-supervised learning (SSL) from data associated with a collection of graphs, that are represented by a tensor. Key aspects of the novel TGCN architecture are the dynamic adaptation to different relations in the tensor graph via learnable weights, and the consideration of graph-based regularizers to promote smoothness and alleviate over-parameterization. The ultimate goal is to design a powerful learning architecture able to: discover complex and highly nonlinear data associations, combine (and select) multiple types of relations, scale gracefully with the graph size, and remain robust to perturbations on the graph edges. The proposed architecture is relevant not only in applications where the nodes are naturally involved in different relations (e.g., a multi-relational graph capturing family, friendship and work relations in a social network), but also in robust learning setups where the graph entails a certain level of uncertainty, and the different tensor slabs correspond to different versions (realizations) of the nominal graph. Numerical tests showcase that the proposed architecture achieves markedly improved performance relative to standard GCNs, copes with state-of-the-art adversarial attacks, and leads to remarkable SSL performance over protein-to-protein interaction networks.
Published 2020-03-15
URL https://arxiv.org/abs/2003.07729v1
PDF https://arxiv.org/pdf/2003.07729v1.pdf
PWC https://paperswithcode.com/paper/tensor-graph-convolutional-networks-for-multi

Deep Learning on Knowledge Graph for Recommender System: A Survey

Title Deep Learning on Knowledge Graph for Recommender System: A Survey
Authors Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, Latifur Khan
Abstract Recent advances in research have demonstrated the effectiveness of knowledge graphs (KG) in providing valuable external knowledge to improve recommendation systems (RS). A knowledge graph is capable of encoding high-order relations that connect two objects with one or multiple related attributes. With the help of the emerging Graph Neural Networks (GNN), it is possible to extract both object characteristics and relations from KG, which is an essential factor for successful recommendations. In this paper, we provide a comprehensive survey of the GNN-based knowledge-aware deep recommender systems. Specifically, we discuss the state-of-the-art frameworks with a focus on their core component, i.e., the graph embedding module, and how they address practical recommendation issues such as scalability, cold-start and so on. We further summarize the commonly-used benchmark datasets, evaluation metrics as well as open-source codes. Finally, we conclude the survey and propose potential research directions in this rapidly growing field.
Tasks Graph Embedding, Knowledge Graphs, Recommendation Systems
Published 2020-03-25
URL https://arxiv.org/abs/2004.00387v1
PDF https://arxiv.org/pdf/2004.00387v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-on-knowledge-graph-for

Transferring Dense Pose to Proximal Animal Classes

Title Transferring Dense Pose to Proximal Animal Classes
Authors Artsiom Sanakoyeu, Vasil Khalidov, Maureen S. McCarthy, Andrea Vedaldi, Natalia Neverova
Abstract Recent contributions have demonstrated that it is possible to recognize the pose of humans densely and accurately given a large dataset of poses annotated in detail. In principle, the same approach could be extended to any animal class, but the effort required for collecting new annotations for each case makes this strategy impractical, despite important applications in natural conservation, science and business. We show that, at least for proximal animal classes such as chimpanzees, it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes. We do this by (1) establishing a DensePose model for the new animal which is also geometrically aligned to humans (2) introducing a multi-head R-CNN architecture that facilitates transfer of multiple recognition tasks between classes, (3) finding which combination of known classes can be transferred most effectively to the new animal and (4) using self-calibrated uncertainty heads to generate pseudo-labels graded by quality for training a model for this class. We also introduce two benchmark datasets labelled in the manner of DensePose for the class chimpanzee and use them to evaluate our approach, showing excellent transfer learning performance.
Tasks Transfer Learning
Published 2020-02-28
URL https://arxiv.org/abs/2003.00080v1
PDF https://arxiv.org/pdf/2003.00080v1.pdf
PWC https://paperswithcode.com/paper/transferring-dense-pose-to-proximal-animal

Efficient Backbone Search for Scene Text Recognition

Title Efficient Backbone Search for Scene Text Recognition
Authors Hui Zhang, Quanming Yao, Mingkun Yang, Yongchao Xu, Xiang Bai
Abstract Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes. The community has paid increasing attention to boost the performance by improving the pre-processing image module, like rectification and deblurring, or the sequence translator. However, another critical module, i.e., the feature sequence extractor, has not been extensively explored. In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance. First, we design a domain-specific search space for STR, which contains both choices on operations and constraints on the downsampling path. Then, we propose a two-step search algorithm, which decouples operations and downsampling path, for an efficient search in the given space. Experiments demonstrate that, by searching data-dependent backbones, AutoSTR can outperform the state-of-the-art approaches on standard benchmarks with much fewer FLOPS and model parameters.
Tasks Deblurring, Neural Architecture Search, Scene Text Recognition
Published 2020-03-14
URL https://arxiv.org/abs/2003.06567v1
PDF https://arxiv.org/pdf/2003.06567v1.pdf
PWC https://paperswithcode.com/paper/efficient-backbone-search-for-scene-text

Non-stationary neural network for stock return prediction

Title Non-stationary neural network for stock return prediction
Authors Steven Y. K. Wong, Jennifer Chan, Lamiae Azizi, Richard Y. D. Xu
Abstract We consider the problem of neural network training in a time-varying context. Machine learning algorithms have excelled in problems that do not change over time. However, problems encountered in financial markets are often non-stationary. We propose the online early stopping algorithm and show that a neural network trained using this algorithm can track a function changing with unknown dynamics. We applied the proposed algorithm to the stock return prediction problem studied in Gu et al. (2019) and achieved mean rank correlation of 4.69%, almost twice as high as the expanding window approach. We also show that prominent factors, such as the size effect and momentum, exhibit time varying stock return predictiveness.
Published 2020-03-05
URL https://arxiv.org/abs/2003.02515v1
PDF https://arxiv.org/pdf/2003.02515v1.pdf
PWC https://paperswithcode.com/paper/non-stationary-neural-network-for-stock

Locally Private Distributed Reinforcement Learning

Title Locally Private Distributed Reinforcement Learning
Authors Hajime Ono, Tsubasa Takahashi
Abstract We study locally differentially private algorithms for reinforcement learning to obtain a robust policy that performs well across distributed private environments. Our algorithm protects the information of local agents’ models from being exploited by adversarial reverse engineering. Since a local policy is strongly being affected by the individual environment, the output of the agent may release the private information unconsciously. In our proposed algorithm, local agents update the model in their environments and report noisy gradients designed to satisfy local differential privacy (LDP) that gives a rigorous local privacy guarantee. By utilizing a set of reported noisy gradients, a central aggregator updates its model and delivers it to different local agents. In our empirical evaluation, we demonstrate how our method performs well under LDP. To the best of our knowledge, this is the first work that actualizes distributed reinforcement learning under LDP. This work enables us to obtain a robust agent that performs well across distributed private environments.
Published 2020-01-31
URL https://arxiv.org/abs/2001.11718v1
PDF https://arxiv.org/pdf/2001.11718v1.pdf
PWC https://paperswithcode.com/paper/locally-private-distributed-reinforcement

A Novel Incremental Clustering Technique with Concept Drift Detection

Title A Novel Incremental Clustering Technique with Concept Drift Detection
Authors Mitchell D. Woodbright, Md Anisur Rahman, Md Zahidul Islam
Abstract Data are being collected from various aspects of life. These data can often arrive in chunks/batches. Traditional static clustering algorithms are not suitable for dynamic datasets, i.e., when data arrive in streams of chunks/batches. If we apply a conventional clustering technique over the combined dataset, then every time a new batch of data comes, the process can be slow and wasteful. Moreover, it can be challenging to store the combined dataset in memory due to its ever-increasing size. As a result, various incremental clustering techniques have been proposed. These techniques need to efficiently update the current clustering result whenever a new batch arrives, to adapt the current clustering result/solution with the latest data. These techniques also need the ability to detect concept drifts when the clustering pattern of a new batch is significantly different from older batches. Sometimes, clustering patterns may drift temporarily in a single batch while the next batches do not exhibit the drift. Therefore, incremental clustering techniques need the ability to detect a temporary drift and sustained drift. In this paper, we propose an efficient incremental clustering algorithm called UIClust. It is designed to cluster streams of data chunks, even when there are temporary or sustained concept drifts. We evaluate the performance of UIClust by comparing it with a recently published, high-quality incremental clustering algorithm. We use real and synthetic datasets. We compare the results by using well-known clustering evaluation criteria: entropy, sum of squared errors (SSE), and execution time. Our results show that UIClust outperforms the existing technique in all our experiments.
Published 2020-03-30
URL https://arxiv.org/abs/2003.13225v1
PDF https://arxiv.org/pdf/2003.13225v1.pdf
PWC https://paperswithcode.com/paper/a-novel-incremental-clustering-technique-with

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

Title Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Authors Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu
Abstract Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-to-end and hybrid of AVSR systems are investigated. Second, purposefully designed modality fusion gates are used to robustly integrate the audio and visual features. Third, in contrast to a traditional pipelined architecture containing explicit speech separation and recognition components, a streamlined and integrated AVSR system optimized consistently using the lattice-free MMI (LF-MMI) discriminative criterion is also proposed. The proposed LF-MMI time-delay neural network (TDNN) system establishes the state-of-the-art for the LRS2 dataset. Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29.98% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system. Consistent performance improvements of 4.89% absolute in WER reduction over the baseline AVSR system using feature fusion are also obtained.
Tasks Audio-Visual Speech Recognition, Speech Recognition, Speech Separation, Visual Speech Recognition
Published 2020-01-06
URL https://arxiv.org/abs/2001.01656v1
PDF https://arxiv.org/pdf/2001.01656v1.pdf
PWC https://paperswithcode.com/paper/audio-visual-recognition-of-overlapped-speech

On Initializing Airline Crew Pairing Optimization for Large-scale Complex Flight Networks

Title On Initializing Airline Crew Pairing Optimization for Large-scale Complex Flight Networks
Authors Divyam Aggarwal, Dhish Kumar Saxena, Thomas Bäck, Michael Emmerich
Abstract Crew pairing optimization (CPO) is critically important for any airline, since its crew operating costs are second-largest, next to the fuel-cost. CPO aims at generating a set of flight sequences (crew pairings) covering a flight-schedule, at minimum-cost, while satisfying several legality constraints. For large-scale complex flight networks, billion-plus legal pairings (variables) are possible, rendering their offline enumeration intractable and an exhaustive search for their minimum-cost full flight-coverage subset impractical. Even generating an initial feasible solution (IFS: a manageable set of legal pairings covering all flights), which could be subsequently optimized is a difficult (NP-complete) problem. Though, as part of a larger project the authors have developed a crew pairing optimizer (AirCROP), this paper dedicatedly focuses on IFS-generation through a novel heuristic based on divide-and-cover strategy and Integer Programming. For real-world large and complex flight network datasets (including over 3200 flights and 15 crew bases) provided by GE Aviation, the proposed heuristic shows upto a ten-fold speed improvement over another state-of-the-art approach. Unprecedentedly, this paper presents an empirical investigation of the impact of IFS-cost on the final (optimized) solution-cost, revealing that too low an IFS-cost does not necessarily imply faster convergence for AirCROP or even lower cost for the optimized solution.
Published 2020-03-15
URL https://arxiv.org/abs/2003.06423v1
PDF https://arxiv.org/pdf/2003.06423v1.pdf
PWC https://paperswithcode.com/paper/on-initializing-airline-crew-pairing

A Study of the Tasks and Models in Machine Reading Comprehension

Title A Study of the Tasks and Models in Machine Reading Comprehension
Authors Chao Wang
Abstract To provide a survey on the existing tasks and models in Machine Reading Comprehension (MRC), this report reviews: 1) the dataset collection and performance evaluation of some representative simple-reasoning and complex-reasoning MRC tasks; 2) the architecture designs, attention mechanisms, and performance-boosting approaches for developing neural-network-based MRC models; 3) some recently proposed transfer learning approaches to incorporating text-style knowledge contained in external corpora into the neural networks of MRC models; 4) some recently proposed knowledge base encoding approaches to incorporating graph-style knowledge contained in external knowledge bases into the neural networks of MRC models. Besides, according to what has been achieved and what are still deficient, this report also proposes some open problems for the future research.
Tasks Machine Reading Comprehension, Reading Comprehension, Transfer Learning
Published 2020-01-23
URL https://arxiv.org/abs/2001.08635v1
PDF https://arxiv.org/pdf/2001.08635v1.pdf
PWC https://paperswithcode.com/paper/a-study-of-the-tasks-and-models-in-machine
comments powered by Disqus