April 3, 2020

3558 words 17 mins read

Paper Group AWR 7

ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images. Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations. Machine-Learning-Based Diagnostics of EEG Pathology. MAST: A Memory-Augmented Self-supervised Tracker. Selecting time-series hyperparameters with the ar …

ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images


Title	ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images
Authors	Parag Mali, Puneeth Kukkadapu, Mahshad Mahdavi, Richard Zanibbi
Abstract	We introduce the Scanning Single Shot Detector (ScanSSD) for locating math formulas offset from text and embedded in textlines. ScanSSD uses only visual features for detection: no formatting or typesetting information such as layout, font, or character labels are employed. Given a 600 dpi document page image, a Single Shot Detector (SSD) locates formulas at multiple scales using sliding windows, after which candidate detections are pooled to obtain page-level results. For our experiments we use the TFD-ICDAR2019v2 dataset, a modification of the GTDB scanned math article collection. ScanSSD detects characters in formulas with high accuracy, obtaining a 0.926 f-score, and detects formulas with high recall overall. Detection errors are largely minor, such as splitting formulas at large whitespace gaps (e.g., for variable constraints) and merging formulas on adjacent textlines. Formula detection f-scores of 0.796 (IOU $\geq0.5$) and 0.733 (IOU $\ge 0.75$) are obtained. Our data, evaluation tools, and code are publicly available.
Tasks
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08005v1
PDF	https://arxiv.org/pdf/2003.08005v1.pdf
PWC	https://paperswithcode.com/paper/scanssd-scanning-single-shot-detector-for
Repo	https://github.com/MaliParag/ScanSSD
Framework	pytorch

Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations


Title	Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations
Authors	Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian
Abstract	The learning of the deep networks largely relies on the data with human-annotated labels. In some label insufficient situations, the performance degrades on the decision boundary with high data density. A common solution is to directly minimize the Shannon Entropy, but the side effect caused by entropy minimization, i.e., reduction of the prediction diversity, is mostly ignored. To address this issue, we reinvestigate the structure of classification output matrix of a randomly selected data batch. We find by theoretical analysis that the prediction discriminability and diversity could be separately measured by the Frobenius-norm and rank of the batch output matrix. Besides, the nuclear-norm is an upperbound of the Frobenius-norm, and a convex approximation of the matrix rank. Accordingly, to improve both discriminability and diversity, we propose Batch Nuclear-norm Maximization (BNM) on the output matrix. BNM could boost the learning under typical label insufficient learning scenarios, such as semi-supervised learning, domain adaptation and open domain recognition. On these tasks, extensive experimental results show that BNM outperforms competitors and works well with existing well-known methods. The code is available at https://github.com/cuishuhao/BNM.
Tasks	Domain Adaptation
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12237v1
PDF	https://arxiv.org/pdf/2003.12237v1.pdf
PWC	https://paperswithcode.com/paper/towards-discriminability-and-diversity-batch
Repo	https://github.com/cuishuhao/BNM
Framework	pytorch

Machine-Learning-Based Diagnostics of EEG Pathology


Title	Machine-Learning-Based Diagnostics of EEG Pathology
Authors	Lukas Alexander Wilhelm Gemein, Robin Tibor Schirrmeister, Patryk Chrabąszcz, Daniel Wilson, Joschka Boedecker, Andreas Schulze-Bonhage, Frank Hutter, Tonio Ball
Abstract	Machine learning (ML) methods have the potential to automate clinical EEG analysis. They can be categorized into feature-based (with handcrafted features), and end-to-end approaches (with learned features). Previous studies on EEG pathology decoding have typically analyzed a limited number of features, decoders, or both. For a I) more elaborate feature-based EEG analysis, and II) in-depth comparisons of both approaches, here we first develop a comprehensive feature-based framework, and then compare this framework to state-of-the-art end-to-end methods. To this aim, we apply the proposed feature-based framework and deep neural networks including an EEG-optimized temporal convolutional network (TCN) to the task of pathological versus non-pathological EEG classification. For a robust comparison, we chose the Temple University Hospital (TUH) Abnormal EEG Corpus (v2.0.0), which contains approximately 3000 EEG recordings. The results demonstrate that the proposed feature-based decoding framework can achieve accuracies on the same level as state-of-the-art deep neural networks. We find accuracies across both approaches in an astonishingly narrow range from 81–86%. Moreover, visualizations and analyses indicated that both approaches used similar aspects of the data, e.g., delta and theta band power at temporal electrode locations. We argue that the accuracies of current binary EEG pathology decoders could saturate near 90% due to the imperfect inter-rater agreement of the clinical labels, and that such decoders are already clinically useful, such as in areas where clinical EEG experts are rare. We make the proposed feature-based framework available open source and thus offer a new tool for EEG machine learning research.
Tasks	EEG
Published	2020-02-11
URL	https://arxiv.org/abs/2002.05115v1
PDF	https://arxiv.org/pdf/2002.05115v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-based-diagnostics-of-eeg
Repo	https://github.com/gemeinl/auto-eeg-diagnosis-comparison
Framework	none

MAST: A Memory-Augmented Self-supervised Tracker


Title	MAST: A Memory-Augmented Self-supervised Tracker
Authors	Zihang Lai, Erika Lu, Weidi Xie
Abstract	Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. In this paper, we first reassess the traditional choices used for self-supervised training and reconstruction loss by conducting thorough experiments that finally elucidate the optimal choices. Second, we further improve on existing methods by augmenting our architecture with a crucial memory component. Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. Our first two contributions yield a self-supervised network that for the first time is competitive with supervised methods on standard evaluation metrics of dense tracking. When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods. We believe this new generalizability metric can better capture the real-world use-cases for dense tracking, and will spur new interest in this research direction.
Tasks	Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07793v2
PDF	https://arxiv.org/pdf/2002.07793v2.pdf
PWC	https://paperswithcode.com/paper/mast-a-memory-augmented-self-supervised
Repo	https://github.com/zlai0/MAST
Framework	pytorch

Selecting time-series hyperparameters with the artificial jackknife


Title	Selecting time-series hyperparameters with the artificial jackknife
Authors	Filippo Pellegrino
Abstract	This article proposes a generalisation of the delete-$d$ jackknife to solve hyperparameter selection problems for time series. This novel technique is compatible with dependent data since it substitutes the jackknife removal step with a fictitious deletion, wherein observed datapoints are replaced with artificial missing values. In order to emphasise this point, I called this methodology artificial delete-$d$ jackknife. As an illustration, it is used to regulate vector autoregressions with an elastic-net penalty on the coefficients. A software implementation, ElasticNetVAR.jl, is available on GitHub.
Tasks	Time Series
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04697v1
PDF	https://arxiv.org/pdf/2002.04697v1.pdf
PWC	https://paperswithcode.com/paper/selecting-time-series-hyperparameters-with
Repo	https://github.com/fipelle/ElasticNetVAR.jl
Framework	none


Title	Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation
Authors	Aman Shenoy, Ashish Sardana
Abstract	Sentiment Analysis and Emotion Detection in conversation is key in a number of real-world applications, with different applications leveraging different kinds of data to be able to achieve reasonably accurate predictions. Multimodal Emotion Detection and Sentiment Analysis can be particularly useful as applications will be able to use specific subsets of the available modalities, as per their available data, to be able to produce relevant predictions. Current systems dealing with Multimodal functionality fail to leverage and capture the context of the conversation through all modalities, the current speaker and listener(s) in the conversation, and the relevance and relationship between the available modalities through an adequate fusion mechanism. In this paper, we propose a recurrent neural network architecture that attempts to take into account all the mentioned drawbacks, and keeps track of the context of the conversation, interlocutor states, and the emotions conveyed by the speakers in the conversation. Our proposed model out performs the state of the art on two benchmark datasets on a variety of accuracy and regression metrics.
Tasks	Emotion Recognition, Emotion Recognition in Context, Emotion Recognition in Conversation, Multimodal Emotion Recognition, Multimodal Sentiment Analysis, Sentiment Analysis
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08267v2
PDF	https://arxiv.org/pdf/2002.08267v2.pdf
PWC	https://paperswithcode.com/paper/multilogue-net-a-context-aware-rnn-for-multi
Repo	https://github.com/amanshenoy/multilogue-net
Framework	pytorch

CycleISP: Real Image Restoration via Improved Data Synthesis


Title	CycleISP: Real Image Restoration via Improved Data Synthesis
Authors	Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
Abstract	The availability of large-scale datasets has helped unleash the true potential of deep convolutional neural networks (CNNs). However, for the single-image denoising problem, capturing a real dataset is an unacceptably expensive and cumbersome procedure. Consequently, image denoising algorithms are mostly developed and evaluated on synthetic data that is usually generated with a widespread assumption of additive white Gaussian noise (AWGN). While the CNNs achieve impressive results on these synthetic datasets, they do not perform well when applied on real camera images, as reported in recent benchmark datasets. This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline. In this paper, we present a framework that models camera imaging pipeline in forward and reverse directions. It allows us to produce any number of realistic image pairs for denoising both in RAW and sRGB spaces. By training a new image denoising network on realistic synthetic data, we achieve the state-of-the-art performance on real camera benchmark datasets. The parameters in our model are ~5 times lesser than the previous best method for RAW denoising. Furthermore, we demonstrate that the proposed framework generalizes beyond image denoising problem e.g., for color matching in stereoscopic cinema. The source code and pre-trained models are available at https://github.com/swz30/CycleISP.
Tasks	Denoising, Image Denoising, Image Restoration
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07761v1
PDF	https://arxiv.org/pdf/2003.07761v1.pdf
PWC	https://paperswithcode.com/paper/cycleisp-real-image-restoration-via-improved
Repo	https://github.com/swz30/CycleISP
Framework	pytorch

DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection


Title	DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection
Authors	Liming Jiang, Wayne Wu, Ren Li, Chen Qian, Chen Change Loy
Abstract	In this paper, we present our on-going effort of constructing a large-scale benchmark, DeeperForensics-1.0, for face forgery detection. Our benchmark represents the largest face forgery detection dataset by far, with 60, 000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. Extensive real-world perturbations are applied to obtain a more challenging benchmark of larger scale and higher diversity. All source videos in DeeperForensics-1.0 are carefully collected, and fake videos are generated by a newly proposed end-to-end face swapping framework. The quality of generated videos outperforms those in existing datasets, validated by user studies. The benchmark features a hidden test set, which contains manipulated videos achieving high deceptive scores in human evaluations. We further contribute a comprehensive study that evaluates five representative detection baselines and make a thorough analysis of different settings. We believe this dataset will contribute to real-world face forgery detection research.
Tasks	Face Swapping
Published	2020-01-09
URL	https://arxiv.org/abs/2001.03024v1
PDF	https://arxiv.org/pdf/2001.03024v1.pdf
PWC	https://paperswithcode.com/paper/deeperforensics-10-a-large-scale-dataset-for
Repo	https://github.com/EndlessSora/DeeperForensics-1.0
Framework	none

Differentially Private Set Union


Title	Differentially Private Set Union
Authors	Sivakanth Gopi, Pankaj Gulhane, Janardhan Kulkarni, Judy Hanwen Shen, Milad Shokouhi, Sergey Yekhanin
Abstract	We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.
Tasks
Published	2020-02-22
URL	https://arxiv.org/abs/2002.09745v1
PDF	https://arxiv.org/pdf/2002.09745v1.pdf
PWC	https://paperswithcode.com/paper/differentially-private-set-union
Repo	https://github.com/heyyjudes/differentially-private-set-union
Framework	none

On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location


Title	On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location
Authors	Osman Semih Kayhan, Jan C. van Gemert
Abstract	In this paper we challenge the common assumption that convolutional layers in modern CNNs are translation invariant. We show that CNNs can and will exploit the absolute spatial location by learning filters that respond exclusively to particular absolute locations by exploiting image boundary effects. Because modern CNNs filters have a huge receptive field, these boundary effects operate even far from the image boundary, allowing the network to exploit absolute spatial location all over the image. We give a simple solution to remove spatial location encoding which improves translation invariance and thus gives a stronger visual inductive bias which particularly benefits small data sets. We broadly demonstrate these benefits on several architectures and various applications such as image classification, patch matching, and two video classification datasets.
Tasks	Image Classification, Video Classification
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07064v1
PDF	https://arxiv.org/pdf/2003.07064v1.pdf
PWC	https://paperswithcode.com/paper/on-translation-invariance-in-cnns
Repo	https://github.com/oskyhn/CNNs-Without-Borders
Framework	none

AMIL: Adversarial Multi Instance Learning for Human Pose Estimation


Title	AMIL: Adversarial Multi Instance Learning for Human Pose Estimation
Authors	Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Jie Yang
Abstract	Human pose estimation has an important impact on a wide range of applications from human-computer interface to surveillance and content-based video retrieval. For human pose estimation, joint obstructions and overlapping upon human bodies result in departed pose estimation. To address these problems, by integrating priors of the structure of human bodies, we present a novel structure-aware network to discreetly consider such priors during the training of the network. Typically, learning such constraints is a challenging task. Instead, we propose generative adversarial networks as our learning model in which we design two residual multiple instance learning (MIL) models with the identical architecture, one is used as the generator and the other one is used as the discriminator. The discriminator task is to distinguish the actual poses from the fake ones. If the pose generator generates the results that the discriminator is not able to distinguish from the real ones, the model has successfully learnt the priors. In the proposed model, the discriminator differentiates the ground-truth heatmaps from the generated ones, and later the adversarial loss back-propagates to the generator. Such procedure assists the generator to learn reasonable body configurations and is proved to be advantageous to improve the pose estimation accuracy. Meanwhile, we propose a novel function for MIL. It is an adjustable structure for both instance selection and modeling to appropriately pass the information between instances in a single bag. In the proposed residual MIL neural network, the pooling action adequately updates the instance contribution to its bag. The proposed adversarial residual multi-instance neural network that is based on pooling has been validated on two datasets for the human pose estimation task and successfully outperforms the other state-of-arts models.
Tasks	Multiple Instance Learning, Pose Estimation, Video Retrieval
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08002v1
PDF	https://arxiv.org/pdf/2003.08002v1.pdf
PWC	https://paperswithcode.com/paper/amil-adversarial-multi-instance-learning-for
Repo	https://github.com/pshams55/AMIL
Framework	pytorch

Adapting Object Detectors with Conditional Domain Normalization


Title	Adapting Object Detectors with Conditional Domain Normalization
Authors	Peng Su, Kun Wang, Xingyu Zeng, Shixiang Tang, Dapeng Chen, Di Qiu, Xiaogang Wang
Abstract	Real-world object detectors are often challenged by the domain gaps between different datasets. In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain gap. CDN is designed to encode different domain inputs into a shared latent space, where the features from different domains carry the same domain attribute. To achieve this, we first disentangle the domain-specific attribute out of the semantic features from one domain via a domain embedding module, which learns a domain-vector to characterize the corresponding domain attribute information. Then this domain-vector is used to encode the features from another domain through a conditional normalization, resulting in different domains’ features carrying the same domain attribute. We incorporate CDN into various convolution stages of an object detector to adaptively address the domain shifts of different level’s representation. In contrast to existing adaptation works that conduct domain confusion learning on semantic features to remove domain-specific factors, CDN aligns different domain distributions by modulating the semantic features of one domain conditioned on the learned domain-vector of another domain. Extensive experiments show that CDN outperforms existing methods remarkably on both real-to-real and synthetic-to-real adaptation benchmarks, including 2D image detection and 3D point cloud detection.
Tasks	3D Object Detection, Object Detection, Unsupervised Domain Adaptation
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07071v1
PDF	https://arxiv.org/pdf/2003.07071v1.pdf
PWC	https://paperswithcode.com/paper/adapting-object-detectors-with-conditional
Repo	https://github.com/psu1/CDN
Framework	none

Key Points Estimation and Point Instance Segmentation Approach for Lane Detection


Title	Key Points Estimation and Point Instance Segmentation Approach for Lane Detection
Authors	Yeongmin Ko, Jiwon Jun, Donghwuy Ko, Moongu Jeon
Abstract	State-of-the-art lane detection methods achieve successful performance. Despite their advantages, these methods have critical deficiencies such as the limited number of detectable lanes and high false positive. In especial, high false positive can cause wrong and dangerous control. In this paper, we propose a novel lane detection method for the arbitrary number of lanes using the deep learning method, which has the lower number of false positives than other recent lane detection methods. The architecture of the proposed method has the shared feature extraction layers and several branches for detection and embedding to cluster lanes. The proposed method can generate exact points on the lanes, and we cast a clustering problem for the generated points as a point cloud instance segmentation problem. The proposed method is more compact because it generates fewer points than the original image pixel size. Our proposed post processing method eliminates outliers successfully and increases the performance notably. Whole proposed framework achieves competitive results on the tuSimple dataset.
Tasks	Instance Segmentation, Lane Detection
Published	2020-02-16
URL	https://arxiv.org/abs/2002.06604v2
PDF	https://arxiv.org/pdf/2002.06604v2.pdf
PWC	https://paperswithcode.com/paper/key-points-estimation-and-point-instance
Repo	https://github.com/koyeongmin/PINet
Framework	pytorch

MUXConv: Information Multiplexing in Convolutional Neural Networks


Title	MUXConv: Information Multiplexing in Convolutional Neural Networks
Authors	Zhichao Lu, Kalyanmoy Deb, Vishnu Naresh Boddeti
Abstract	Convolutional neural networks have witnessed remarkable improvements in computational efficiency in recent years. A key driving force has been the idea of trading-off model expressivity and efficiency through a combination of $1\times 1$ and depth-wise separable convolutions in lieu of a standard convolutional layer. The price of the efficiency, however, is the sub-optimal flow of information across space and channels in the network. To overcome this limitation, we present MUXConv, a layer that is designed to increase the flow of information by progressively multiplexing channel and spatial information in the network, while mitigating computational complexity. Furthermore, to demonstrate the effectiveness of MUXConv, we integrate it within an efficient multi-objective evolutionary algorithm to search for the optimal model hyper-parameters while simultaneously optimizing accuracy, compactness, and computational efficiency. On ImageNet, the resulting models, dubbed MUXNets, match the performance (75.3% top-1 accuracy) and multiply-add operations (218M) of MobileNetV3 while being 1.6$\times$ more compact, and outperform other mobile models in all the three criteria. MUXNet also performs well under transfer learning and when adapted to object detection. On the ChestX-Ray 14 benchmark, its accuracy is comparable to the state-of-the-art while being $3.3\times$ more compact and $14\times$ more efficient. Similarly, detection on PASCAL VOC 2007 is 1.2% more accurate, 28% faster and 6% more compact compared to MobileNetV2. Code is available from https://github.com/human-analysis/MUXConv
Tasks	Object Detection, Transfer Learning
Published	2020-03-31
URL	https://arxiv.org/abs/2003.13880v1
PDF	https://arxiv.org/pdf/2003.13880v1.pdf
PWC	https://paperswithcode.com/paper/muxconv-information-multiplexing-in
Repo	https://github.com/human-analysis/MUXConv
Framework	pytorch

CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model


Title	CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model
Authors	Liang Xu, Xuanwei Zhang, Qianqian Dong
Abstract	In this paper, we introduce the Chinese corpus from CLUE organization, CLUECorpus2020, a large-scale corpus that can be used directly for self-supervised learning such as pre-training of a language model, or language generation. It has 100G raw corpus with 35 billion Chinese characters, which is retrieved from Common Crawl. To better understand this corpus, we conduct language understanding experiments on both small and large scale, and results show that the models trained on this corpus can achieve excellent performance on Chinese. We release a new Chinese vocabulary with a size of 8K, which is only one-third of the vocabulary size used in Chinese Bert released by Google. It saves computational cost and memory while works as good as original vocabulary. We also release both large and tiny versions of the pre-trained model on this corpus. The former achieves the state-of-the-art result, and the latter retains most precision while accelerating training and prediction speed for eight times compared to Bert-base. To facilitate future work on self-supervised learning on Chinese, we release our dataset, new vocabulary, codes, and pre-trained models on Github.
Tasks	Language Modelling, Text Generation
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01355v2
PDF	https://arxiv.org/pdf/2003.01355v2.pdf
PWC	https://paperswithcode.com/paper/cluecorpus2020-a-large-scale-chinese-corpus
Repo	https://github.com/CLUEbenchmark/CLUECorpus2020
Framework	tf