April 3, 2020

# Paper Group AWR 7

ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images. Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations. Machine-Learning-Based Diagnostics of EEG Pathology. MAST: A Memory-Augmented Self-supervised Tracker. Selecting time-series hyperparameters with the ar …

#### ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images

Title ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images
Abstract We introduce the Scanning Single Shot Detector (ScanSSD) for locating math formulas offset from text and embedded in textlines. ScanSSD uses only visual features for detection: no formatting or typesetting information such as layout, font, or character labels are employed. Given a 600 dpi document page image, a Single Shot Detector (SSD) locates formulas at multiple scales using sliding windows, after which candidate detections are pooled to obtain page-level results. For our experiments we use the TFD-ICDAR2019v2 dataset, a modification of the GTDB scanned math article collection. ScanSSD detects characters in formulas with high accuracy, obtaining a 0.926 f-score, and detects formulas with high recall overall. Detection errors are largely minor, such as splitting formulas at large whitespace gaps (e.g., for variable constraints) and merging formulas on adjacent textlines. Formula detection f-scores of 0.796 (IOU $\geq0.5$) and 0.733 (IOU $\ge 0.75$) are obtained. Our data, evaluation tools, and code are publicly available.
Published 2020-03-18
URL https://arxiv.org/abs/2003.08005v1
PDF https://arxiv.org/pdf/2003.08005v1.pdf
PWC https://paperswithcode.com/paper/scanssd-scanning-single-shot-detector-for
Repo https://github.com/MaliParag/ScanSSD
Framework pytorch

#### Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations

Title Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations
Authors Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian
Abstract The learning of the deep networks largely relies on the data with human-annotated labels. In some label insufficient situations, the performance degrades on the decision boundary with high data density. A common solution is to directly minimize the Shannon Entropy, but the side effect caused by entropy minimization, i.e., reduction of the prediction diversity, is mostly ignored. To address this issue, we reinvestigate the structure of classification output matrix of a randomly selected data batch. We find by theoretical analysis that the prediction discriminability and diversity could be separately measured by the Frobenius-norm and rank of the batch output matrix. Besides, the nuclear-norm is an upperbound of the Frobenius-norm, and a convex approximation of the matrix rank. Accordingly, to improve both discriminability and diversity, we propose Batch Nuclear-norm Maximization (BNM) on the output matrix. BNM could boost the learning under typical label insufficient learning scenarios, such as semi-supervised learning, domain adaptation and open domain recognition. On these tasks, extensive experimental results show that BNM outperforms competitors and works well with existing well-known methods. The code is available at https://github.com/cuishuhao/BNM.
Published 2020-03-27
URL https://arxiv.org/abs/2003.12237v1
PDF https://arxiv.org/pdf/2003.12237v1.pdf
PWC https://paperswithcode.com/paper/towards-discriminability-and-diversity-batch
Repo https://github.com/cuishuhao/BNM
Framework pytorch

#### Machine-Learning-Based Diagnostics of EEG Pathology

Title Machine-Learning-Based Diagnostics of EEG Pathology
Authors Lukas Alexander Wilhelm Gemein, Robin Tibor Schirrmeister, Patryk Chrabąszcz, Daniel Wilson, Joschka Boedecker, Andreas Schulze-Bonhage, Frank Hutter, Tonio Ball
Abstract Machine learning (ML) methods have the potential to automate clinical EEG analysis. They can be categorized into feature-based (with handcrafted features), and end-to-end approaches (with learned features). Previous studies on EEG pathology decoding have typically analyzed a limited number of features, decoders, or both. For a I) more elaborate feature-based EEG analysis, and II) in-depth comparisons of both approaches, here we first develop a comprehensive feature-based framework, and then compare this framework to state-of-the-art end-to-end methods. To this aim, we apply the proposed feature-based framework and deep neural networks including an EEG-optimized temporal convolutional network (TCN) to the task of pathological versus non-pathological EEG classification. For a robust comparison, we chose the Temple University Hospital (TUH) Abnormal EEG Corpus (v2.0.0), which contains approximately 3000 EEG recordings. The results demonstrate that the proposed feature-based decoding framework can achieve accuracies on the same level as state-of-the-art deep neural networks. We find accuracies across both approaches in an astonishingly narrow range from 81–86%. Moreover, visualizations and analyses indicated that both approaches used similar aspects of the data, e.g., delta and theta band power at temporal electrode locations. We argue that the accuracies of current binary EEG pathology decoders could saturate near 90% due to the imperfect inter-rater agreement of the clinical labels, and that such decoders are already clinically useful, such as in areas where clinical EEG experts are rare. We make the proposed feature-based framework available open source and thus offer a new tool for EEG machine learning research.
Published 2020-02-11
URL https://arxiv.org/abs/2002.05115v1
PDF https://arxiv.org/pdf/2002.05115v1.pdf
PWC https://paperswithcode.com/paper/machine-learning-based-diagnostics-of-eeg
Repo https://github.com/gemeinl/auto-eeg-diagnosis-comparison
Framework none

#### MAST: A Memory-Augmented Self-supervised Tracker

Title MAST: A Memory-Augmented Self-supervised Tracker
Authors Zihang Lai, Erika Lu, Weidi Xie
Abstract Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. In this paper, we first reassess the traditional choices used for self-supervised training and reconstruction loss by conducting thorough experiments that finally elucidate the optimal choices. Second, we further improve on existing methods by augmenting our architecture with a crucial memory component. Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. Our first two contributions yield a self-supervised network that for the first time is competitive with supervised methods on standard evaluation metrics of dense tracking. When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods. We believe this new generalizability metric can better capture the real-world use-cases for dense tracking, and will spur new interest in this research direction.
Tasks Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2020-02-18
URL https://arxiv.org/abs/2002.07793v2
PDF https://arxiv.org/pdf/2002.07793v2.pdf
PWC https://paperswithcode.com/paper/mast-a-memory-augmented-self-supervised
Repo https://github.com/zlai0/MAST
Framework pytorch

#### Selecting time-series hyperparameters with the artificial jackknife

Title Selecting time-series hyperparameters with the artificial jackknife
Authors Filippo Pellegrino
Abstract This article proposes a generalisation of the delete-$d$ jackknife to solve hyperparameter selection problems for time series. This novel technique is compatible with dependent data since it substitutes the jackknife removal step with a fictitious deletion, wherein observed datapoints are replaced with artificial missing values. In order to emphasise this point, I called this methodology artificial delete-$d$ jackknife. As an illustration, it is used to regulate vector autoregressions with an elastic-net penalty on the coefficients. A software implementation, ElasticNetVAR.jl, is available on GitHub.
Published 2020-02-11
URL https://arxiv.org/abs/2002.04697v1
PDF https://arxiv.org/pdf/2002.04697v1.pdf
PWC https://paperswithcode.com/paper/selecting-time-series-hyperparameters-with
Repo https://github.com/fipelle/ElasticNetVAR.jl
Framework none

#### Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation

Title Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation
Authors Aman Shenoy, Ashish Sardana
Abstract Sentiment Analysis and Emotion Detection in conversation is key in a number of real-world applications, with different applications leveraging different kinds of data to be able to achieve reasonably accurate predictions. Multimodal Emotion Detection and Sentiment Analysis can be particularly useful as applications will be able to use specific subsets of the available modalities, as per their available data, to be able to produce relevant predictions. Current systems dealing with Multimodal functionality fail to leverage and capture the context of the conversation through all modalities, the current speaker and listener(s) in the conversation, and the relevance and relationship between the available modalities through an adequate fusion mechanism. In this paper, we propose a recurrent neural network architecture that attempts to take into account all the mentioned drawbacks, and keeps track of the context of the conversation, interlocutor states, and the emotions conveyed by the speakers in the conversation. Our proposed model out performs the state of the art on two benchmark datasets on a variety of accuracy and regression metrics.
Tasks Emotion Recognition, Emotion Recognition in Context, Emotion Recognition in Conversation, Multimodal Emotion Recognition, Multimodal Sentiment Analysis, Sentiment Analysis
Published 2020-02-19
URL https://arxiv.org/abs/2002.08267v2
PDF https://arxiv.org/pdf/2002.08267v2.pdf
PWC https://paperswithcode.com/paper/multilogue-net-a-context-aware-rnn-for-multi
Repo https://github.com/amanshenoy/multilogue-net
Framework pytorch

#### CycleISP: Real Image Restoration via Improved Data Synthesis

Title CycleISP: Real Image Restoration via Improved Data Synthesis
Authors Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
Abstract The availability of large-scale datasets has helped unleash the true potential of deep convolutional neural networks (CNNs). However, for the single-image denoising problem, capturing a real dataset is an unacceptably expensive and cumbersome procedure. Consequently, image denoising algorithms are mostly developed and evaluated on synthetic data that is usually generated with a widespread assumption of additive white Gaussian noise (AWGN). While the CNNs achieve impressive results on these synthetic datasets, they do not perform well when applied on real camera images, as reported in recent benchmark datasets. This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline. In this paper, we present a framework that models camera imaging pipeline in forward and reverse directions. It allows us to produce any number of realistic image pairs for denoising both in RAW and sRGB spaces. By training a new image denoising network on realistic synthetic data, we achieve the state-of-the-art performance on real camera benchmark datasets. The parameters in our model are ~5 times lesser than the previous best method for RAW denoising. Furthermore, we demonstrate that the proposed framework generalizes beyond image denoising problem e.g., for color matching in stereoscopic cinema. The source code and pre-trained models are available at https://github.com/swz30/CycleISP.
Tasks Denoising, Image Denoising, Image Restoration
Published 2020-03-17
URL https://arxiv.org/abs/2003.07761v1
PDF https://arxiv.org/pdf/2003.07761v1.pdf
PWC https://paperswithcode.com/paper/cycleisp-real-image-restoration-via-improved
Repo https://github.com/swz30/CycleISP
Framework pytorch

#### DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection

Title DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection
Authors Liming Jiang, Wayne Wu, Ren Li, Chen Qian, Chen Change Loy
Abstract In this paper, we present our on-going effort of constructing a large-scale benchmark, DeeperForensics-1.0, for face forgery detection. Our benchmark represents the largest face forgery detection dataset by far, with 60, 000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. Extensive real-world perturbations are applied to obtain a more challenging benchmark of larger scale and higher diversity. All source videos in DeeperForensics-1.0 are carefully collected, and fake videos are generated by a newly proposed end-to-end face swapping framework. The quality of generated videos outperforms those in existing datasets, validated by user studies. The benchmark features a hidden test set, which contains manipulated videos achieving high deceptive scores in human evaluations. We further contribute a comprehensive study that evaluates five representative detection baselines and make a thorough analysis of different settings. We believe this dataset will contribute to real-world face forgery detection research.
Published 2020-01-09
URL https://arxiv.org/abs/2001.03024v1
PDF https://arxiv.org/pdf/2001.03024v1.pdf
PWC https://paperswithcode.com/paper/deeperforensics-10-a-large-scale-dataset-for
Repo https://github.com/EndlessSora/DeeperForensics-1.0
Framework none

#### Differentially Private Set Union

Title Differentially Private Set Union
Authors Sivakanth Gopi, Pankaj Gulhane, Janardhan Kulkarni, Judy Hanwen Shen, Milad Shokouhi, Sergey Yekhanin
Abstract We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.
Published 2020-02-22
URL https://arxiv.org/abs/2002.09745v1
PDF https://arxiv.org/pdf/2002.09745v1.pdf
PWC https://paperswithcode.com/paper/differentially-private-set-union
Repo https://github.com/heyyjudes/differentially-private-set-union
Framework none

#### On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location

Title On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location
Authors Osman Semih Kayhan, Jan C. van Gemert
Abstract In this paper we challenge the common assumption that convolutional layers in modern CNNs are translation invariant. We show that CNNs can and will exploit the absolute spatial location by learning filters that respond exclusively to particular absolute locations by exploiting image boundary effects. Because modern CNNs filters have a huge receptive field, these boundary effects operate even far from the image boundary, allowing the network to exploit absolute spatial location all over the image. We give a simple solution to remove spatial location encoding which improves translation invariance and thus gives a stronger visual inductive bias which particularly benefits small data sets. We broadly demonstrate these benefits on several architectures and various applications such as image classification, patch matching, and two video classification datasets.
Published 2020-03-16
URL https://arxiv.org/abs/2003.07064v1
PDF https://arxiv.org/pdf/2003.07064v1.pdf
PWC https://paperswithcode.com/paper/on-translation-invariance-in-cnns
Repo https://github.com/oskyhn/CNNs-Without-Borders
Framework none

#### AMIL: Adversarial Multi Instance Learning for Human Pose Estimation

Title AMIL: Adversarial Multi Instance Learning for Human Pose Estimation
Authors Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Jie Yang
Tasks Multiple Instance Learning, Pose Estimation, Video Retrieval
Published 2020-03-18
URL https://arxiv.org/abs/2003.08002v1
PDF https://arxiv.org/pdf/2003.08002v1.pdf
Repo https://github.com/pshams55/AMIL
Framework pytorch

#### Adapting Object Detectors with Conditional Domain Normalization

Title Adapting Object Detectors with Conditional Domain Normalization
Authors Peng Su, Kun Wang, Xingyu Zeng, Shixiang Tang, Dapeng Chen, Di Qiu, Xiaogang Wang
Abstract Real-world object detectors are often challenged by the domain gaps between different datasets. In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain gap. CDN is designed to encode different domain inputs into a shared latent space, where the features from different domains carry the same domain attribute. To achieve this, we first disentangle the domain-specific attribute out of the semantic features from one domain via a domain embedding module, which learns a domain-vector to characterize the corresponding domain attribute information. Then this domain-vector is used to encode the features from another domain through a conditional normalization, resulting in different domains’ features carrying the same domain attribute. We incorporate CDN into various convolution stages of an object detector to adaptively address the domain shifts of different level’s representation. In contrast to existing adaptation works that conduct domain confusion learning on semantic features to remove domain-specific factors, CDN aligns different domain distributions by modulating the semantic features of one domain conditioned on the learned domain-vector of another domain. Extensive experiments show that CDN outperforms existing methods remarkably on both real-to-real and synthetic-to-real adaptation benchmarks, including 2D image detection and 3D point cloud detection.
Published 2020-03-16
URL https://arxiv.org/abs/2003.07071v1
PDF https://arxiv.org/pdf/2003.07071v1.pdf
Repo https://github.com/psu1/CDN
Framework none

#### Key Points Estimation and Point Instance Segmentation Approach for Lane Detection

Title Key Points Estimation and Point Instance Segmentation Approach for Lane Detection
Authors Yeongmin Ko, Jiwon Jun, Donghwuy Ko, Moongu Jeon
Abstract State-of-the-art lane detection methods achieve successful performance. Despite their advantages, these methods have critical deficiencies such as the limited number of detectable lanes and high false positive. In especial, high false positive can cause wrong and dangerous control. In this paper, we propose a novel lane detection method for the arbitrary number of lanes using the deep learning method, which has the lower number of false positives than other recent lane detection methods. The architecture of the proposed method has the shared feature extraction layers and several branches for detection and embedding to cluster lanes. The proposed method can generate exact points on the lanes, and we cast a clustering problem for the generated points as a point cloud instance segmentation problem. The proposed method is more compact because it generates fewer points than the original image pixel size. Our proposed post processing method eliminates outliers successfully and increases the performance notably. Whole proposed framework achieves competitive results on the tuSimple dataset.
Published 2020-02-16
URL https://arxiv.org/abs/2002.06604v2
PDF https://arxiv.org/pdf/2002.06604v2.pdf
PWC https://paperswithcode.com/paper/key-points-estimation-and-point-instance
Repo https://github.com/koyeongmin/PINet
Framework pytorch

#### MUXConv: Information Multiplexing in Convolutional Neural Networks

Title MUXConv: Information Multiplexing in Convolutional Neural Networks
Authors Zhichao Lu, Kalyanmoy Deb, Vishnu Naresh Boddeti
Abstract Convolutional neural networks have witnessed remarkable improvements in computational efficiency in recent years. A key driving force has been the idea of trading-off model expressivity and efficiency through a combination of $1\times 1$ and depth-wise separable convolutions in lieu of a standard convolutional layer. The price of the efficiency, however, is the sub-optimal flow of information across space and channels in the network. To overcome this limitation, we present MUXConv, a layer that is designed to increase the flow of information by progressively multiplexing channel and spatial information in the network, while mitigating computational complexity. Furthermore, to demonstrate the effectiveness of MUXConv, we integrate it within an efficient multi-objective evolutionary algorithm to search for the optimal model hyper-parameters while simultaneously optimizing accuracy, compactness, and computational efficiency. On ImageNet, the resulting models, dubbed MUXNets, match the performance (75.3% top-1 accuracy) and multiply-add operations (218M) of MobileNetV3 while being 1.6$\times$ more compact, and outperform other mobile models in all the three criteria. MUXNet also performs well under transfer learning and when adapted to object detection. On the ChestX-Ray 14 benchmark, its accuracy is comparable to the state-of-the-art while being $3.3\times$ more compact and $14\times$ more efficient. Similarly, detection on PASCAL VOC 2007 is 1.2% more accurate, 28% faster and 6% more compact compared to MobileNetV2. Code is available from https://github.com/human-analysis/MUXConv
Published 2020-03-31
URL https://arxiv.org/abs/2003.13880v1
PDF https://arxiv.org/pdf/2003.13880v1.pdf
PWC https://paperswithcode.com/paper/muxconv-information-multiplexing-in
Repo https://github.com/human-analysis/MUXConv
Framework pytorch

#### CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model

Title CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model
Authors Liang Xu, Xuanwei Zhang, Qianqian Dong
Abstract In this paper, we introduce the Chinese corpus from CLUE organization, CLUECorpus2020, a large-scale corpus that can be used directly for self-supervised learning such as pre-training of a language model, or language generation. It has 100G raw corpus with 35 billion Chinese characters, which is retrieved from Common Crawl. To better understand this corpus, we conduct language understanding experiments on both small and large scale, and results show that the models trained on this corpus can achieve excellent performance on Chinese. We release a new Chinese vocabulary with a size of 8K, which is only one-third of the vocabulary size used in Chinese Bert released by Google. It saves computational cost and memory while works as good as original vocabulary. We also release both large and tiny versions of the pre-trained model on this corpus. The former achieves the state-of-the-art result, and the latter retains most precision while accelerating training and prediction speed for eight times compared to Bert-base. To facilitate future work on self-supervised learning on Chinese, we release our dataset, new vocabulary, codes, and pre-trained models on Github.