February 2, 2020

3265 words 16 mins read

Paper Group AWR 62

Accurate reconstruction of EBSD datasets by a multimodal data approach using an evolutionary algorithm. Locality Preserving Joint Transfer for Domain Adaptation. Online Normalization for Training Neural Networks. Deep Object Co-segmentation via Spatial-Semantic Network Modulation. Resolution-invariant Person Re-Identification. Compression with Flow …

Accurate reconstruction of EBSD datasets by a multimodal data approach using an evolutionary algorithm


Title	Accurate reconstruction of EBSD datasets by a multimodal data approach using an evolutionary algorithm
Authors	Marie-Agathe Charpagne, Florian Strub, Tresa M. Pollock
Abstract	A new method has been developed for the correction of the distortions and/or enhanced phase differentiation in Electron Backscatter Diffraction (EBSD) data. Using a multi-modal data approach, the method uses segmented images of the phase of interest (laths, precipitates, voids, inclusions) on images gathered by backscattered or secondary electrons of the same area as the EBSD map. The proposed approach then search for the best transformation to correct their relative distortions and recombines the data in a new EBSD file. Speckles of the features of interest are first segmented in both the EBSD and image data modes. The speckle extracted from the EBSD data is then meshed, and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is implemented to distort the mesh until the speckles superimpose. The quality of the matching is quantified via a score that is linked to the number of overlapping pixels in the speckles. The locations of the points of the distorted mesh are compared to those of the initial positions to create pairs of matching points that are used to calculate the polynomial function that describes the distortion the best. This function is then applied to un-distort the EBSD data, and the phase information is inferred using the data of the segmented speckle. Fast and versatile, this method does not require any human annotation and can be applied to large datasets and wide areas. Besides, this method requires very few assumptions concerning the shape of the distortion function. It can be used for the single compensation of the distortions or combined with the phase differentiation. The accuracy of this method is of the order of the pixel size. Some application examples in multiphase materials with feature sizes down to 1 $\mu$m are presented, including Ti-6Al-4V Titanium alloy, Rene 65 and additive manufactured Inconel 718 Nickel-base superalloys.
Tasks
Published	2019-03-07
URL	http://arxiv.org/abs/1903.02988v2
PDF	http://arxiv.org/pdf/1903.02988v2.pdf
PWC	https://paperswithcode.com/paper/accurate-reconstruction-of-ebsd-datasets-by-a
Repo	https://github.com/MLmicroscopy/distortions
Framework	none

Locality Preserving Joint Transfer for Domain Adaptation


Title	Locality Preserving Joint Transfer for Domain Adaptation
Authors	Li Jingjing, Jing Mengmeng, Lu Ke, Zhu Lei, Shen Heng Tao
Abstract	Domain adaptation aims to leverage knowledge from a well-labeled source domain to a poorly-labeled target domain. A majority of existing works transfer the knowledge at either feature level or sample level. Recent researches reveal that both of the paradigms are essentially important, and optimizing one of them can reinforce the other. Inspired by this, we propose a novel approach to jointly exploit feature adaptation with distribution matching and sample adaptation with landmark selection. During the knowledge transfer, we also take the local consistency between samples into consideration, so that the manifold structures of samples can be preserved. At last, we deploy label propagation to predict the categories of new instances. Notably, our approach is suitable for both homogeneous and heterogeneous domain adaptation by learning domain-specific projections. Extensive experiments on five open benchmarks, which consist of both standard and large-scale datasets, verify that our approach can significantly outperform not only conventional approaches but also end-to-end deep models. The experiments also demonstrate that we can leverage handcrafted features to promote the accuracy on deep features by heterogeneous adaptation.
Tasks	Domain Adaptation, Transfer Learning
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07441v1
PDF	https://arxiv.org/pdf/1906.07441v1.pdf
PWC	https://paperswithcode.com/paper/locality-preserving-joint-transfer-for-domain
Repo	https://github.com/lijin118/LPJT
Framework	none

Online Normalization for Training Neural Networks


Title	Online Normalization for Training Neural Networks
Authors	Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Koster, Ryan Reece, Sofia Samaniego de la Fuente, Vishal Subbiah, Michael James
Abstract	Online Normalization is a new technique for normalizing the hidden activations of a neural network. Like Batch Normalization, it normalizes the sample dimension. While Online Normalization does not use batches, it is as accurate as Batch Normalization. We resolve a theoretical limitation of Batch Normalization by introducing an unbiased technique for computing the gradient of normalized activations. Online Normalization works with automatic differentiation by adding statistical normalization as a primitive. This technique can be used in cases not covered by some other normalizers, such as recurrent networks, fully connected networks, and networks with activation memory requirements prohibitive for batching. We show its applications to image classification, image segmentation, and language modeling. We present formal proofs and experimental results on ImageNet, CIFAR, and PTB datasets.
Tasks	Image Classification, Language Modelling, Semantic Segmentation
Published	2019-05-15
URL	https://arxiv.org/abs/1905.05894v3
PDF	https://arxiv.org/pdf/1905.05894v3.pdf
PWC	https://paperswithcode.com/paper/online-normalization-for-training-neural
Repo	https://github.com/Cerebras/online-normalization
Framework	pytorch

Deep Object Co-segmentation via Spatial-Semantic Network Modulation


Title	Deep Object Co-segmentation via Spatial-Semantic Network Modulation
Authors	Kaihua Zhang, Jin Chen, Bo Liu, Qingshan Liu
Abstract	Object co-segmentation is to segment the shared objects in multiple relevant images, which has numerous applications in computer vision. This paper presents a spatial and semantic modulated deep network framework for object co-segmentation. A backbone network is adopted to extract multi-resolution image features. With the multi-resolution features of the relevant images as input, we design a spatial modulator to learn a mask for each image. The spatial modulator captures the correlations of image feature descriptors via unsupervised learning. The learned mask can roughly localize the shared foreground object while suppressing the background. For the semantic modulator, we model it as a supervised image classification task. We propose a hierarchical second-order pooling module to transform the image features for classification use. The outputs of the two modulators manipulate the multi-resolution features by a shift-and-scale operation so that the features focus on segmenting co-object regions. The proposed model is trained end-to-end without any intricate post-processing. Extensive experiments on four image co-segmentation benchmark datasets demonstrate the superior accuracy of the proposed method compared to state-of-the-art methods.
Tasks	Image Classification
Published	2019-11-29
URL	https://arxiv.org/abs/1911.12950v1
PDF	https://arxiv.org/pdf/1911.12950v1.pdf
PWC	https://paperswithcode.com/paper/deep-object-co-segmentation-via-spatial
Repo	https://github.com/cj4L/SSNM-Coseg
Framework	none

Resolution-invariant Person Re-Identification


Title	Resolution-invariant Person Re-Identification
Authors	Shunan Mao, Shiliang Zhang, Ming Yang
Abstract	Exploiting resolution invariant representation is critical for person Re-Identification (ReID) in real applications, where the resolutions of captured person images may vary dramatically. This paper learns person representations robust to resolution variance through jointly training a Foreground-Focus Super-Resolution (FFSR) module and a Resolution-Invariant Feature Extractor (RIFE) by end-to-end CNN learning. FFSR upscales the person foreground using a fully convolutional auto-encoder with skip connections learned with a foreground focus training loss. RIFE adopts two feature extraction streams weighted by a dual-attention block to learn features for low and high resolution images, respectively. These two complementary modules are jointly trained, leading to a strong resolution invariant representation. We evaluate our methods on five datasets containing person images at a large range of resolutions, where our methods show substantial superiority to existing solutions. For instance, we achieve Rank-1 accuracy of 36.4% and 73.3% on CAVIAR and MLR-CUHK03, outperforming the state-of-the art by 2.9% and 2.6%, respectively.
Tasks	Person Re-Identification, Super-Resolution
Published	2019-06-24
URL	https://arxiv.org/abs/1906.09748v2
PDF	https://arxiv.org/pdf/1906.09748v2.pdf
PWC	https://paperswithcode.com/paper/resolution-invariant-person-re-identification
Repo	https://github.com/maosnhehe/RIPR
Framework	pytorch

Compression with Flows via Local Bits-Back Coding


Title	Compression with Flows via Local Bits-Back Coding
Authors	Jonathan Ho, Evan Lohn, Pieter Abbeel
Abstract	Likelihood-based generative models are the backbones of lossless compression due to the guaranteed existence of codes with lengths close to negative log likelihood. However, there is no guaranteed existence of computationally efficient codes that achieve these lengths, and coding algorithms must be hand-tailored to specific types of generative models to ensure computational efficiency. Such coding algorithms are known for autoregressive models and variational autoencoders, but not for general types of flow models. To fill in this gap, we introduce local bits-back coding, a new compression technique for flow models. We present efficient algorithms that instantiate our technique for many popular types of flows, and we demonstrate that our algorithms closely achieve theoretical codelengths for state-of-the-art flow models on high-dimensional data.
Tasks
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08500v3
PDF	https://arxiv.org/pdf/1905.08500v3.pdf
PWC	https://paperswithcode.com/paper/compression-with-flows-via-local-bits-back
Repo	https://github.com/hojonathanho/localbitsback
Framework	pytorch

CURL: Neural Curve Layers for Global Image Enhancement


Title	CURL: Neural Curve Layers for Global Image Enhancement
Authors	Sean Moran, Ales Leonardis, Steven McDonagh, Gregory Slabaugh
Abstract	We present a novel approach to adjust global image properties such as colour, saturation, and luminance using human-interpretable image enhancement curves, inspired by the Photoshop curves tool. Our method, dubbed neural CURve Layers (CURL), is designed as a multi-colour space neural retouching block trained jointly in three different colour spaces (HSV, CIELab, RGB) guided by a novel multi-colour space loss. The curves are fully differentiable and are trained end-to-end for different computer vision problems including photo enhancement (RGB-to-RGB) and as part of the image signal processing pipeline for image formation (RAW-to-RGB). To demonstrate the effectiveness of CURL we combine this global image transformation block with a pixel-level (local) image multi-scale encoder-decoder backbone network. In an extensive experimental evaluation we show that CURL produces state-of-the-art image quality versus recently proposed deep learning approaches in both objective and perceptual metrics, setting new state-of-the-art performance on multiple public datasets.
Tasks	Demosaicking, Denoising, Image Enhancement
Published	2019-11-29
URL	https://arxiv.org/abs/1911.13175v2
PDF	https://arxiv.org/pdf/1911.13175v2.pdf
PWC	https://paperswithcode.com/paper/difar-deep-image-formation-and-retouching
Repo	https://github.com/sjmoran/neural_curve_layers
Framework	none

Mask Scoring R-CNN


Title	Mask Scoring R-CNN
Authors	Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, Xinggang Wang
Abstract	Letting a deep network be aware of the quality of its own predictions is an interesting yet important problem. In the task of instance segmentation, the confidence of instance classification is used as mask quality score in most instance segmentation frameworks. However, the mask quality, quantified as the IoU between the instance mask and its ground truth, is usually not well correlated with classification score. In this paper, we study this problem and propose Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks. The proposed network block takes the instance feature and the corresponding predicted mask together to regress the mask IoU. The mask scoring strategy calibrates the misalignment between mask quality and mask score, and improves instance segmentation performance by prioritizing more accurate mask predictions during COCO AP evaluation. By extensive evaluations on the COCO dataset, Mask Scoring R-CNN brings consistent and noticeable gain with different models, and outperforms the state-of-the-art Mask R-CNN. We hope our simple and effective approach will provide a new direction for improving instance segmentation. The source code of our method is available at \url{https://github.com/zjhuang22/maskscoring_rcnn}.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-03-01
URL	http://arxiv.org/abs/1903.00241v1
PDF	http://arxiv.org/pdf/1903.00241v1.pdf
PWC	https://paperswithcode.com/paper/mask-scoring-r-cnn
Repo	https://github.com/zjhuang22/maskscoring_rcnn
Framework	pytorch

Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks


Title	Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks
Authors	Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma
Abstract	It remains a puzzle that why deep neural networks (DNNs), with more parameters than samples, often generalize well. An attempt of understanding this puzzle is to discover implicit biases underlying the training process of DNNs, such as the Frequency Principle (F-Principle), i.e., DNNs often fit target functions from low to high frequencies. Inspired by the F-Principle, we propose an effective model of linear F-Principle (LFP) dynamics which accurately predicts the learning results of two-layer ReLU neural networks (NNs) of large widths. This LFP dynamics is rationalized by a linearized mean field residual dynamics of NNs. Importantly, the long-time limit solution of this LFP dynamics is equivalent to the solution of a constrained optimization problem explicitly minimizing an FP-norm, in which higher frequencies of feasible solutions are more heavily penalized. Using this optimization formulation, an a priori estimate of the generalization error bound is provided, revealing that a higher FP-norm of the target function increases the generalization error. Overall, by explicitizing the implicit bias of the F-Principle as an explicit penalty for two-layer NNs, our work makes a step towards a quantitative understanding of the learning and generalization of general DNNs.
Tasks
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10264v1
PDF	https://arxiv.org/pdf/1905.10264v1.pdf
PWC	https://paperswithcode.com/paper/explicitizing-an-implicit-bias-of-the
Repo	https://github.com/xuzhiqin1990/F-Principle
Framework	tf

Leveraging Self-supervised Denoising for Image Segmentation


Title	Leveraging Self-supervised Denoising for Image Segmentation
Authors	Mangal Prakash, Tim-Oliver Buchholz, Manan Lalit, Pavel Tomancak, Florian Jug, Alexander Krull
Abstract	Deep learning (DL) has arguably emerged as the method of choice for the detection and segmentation of biological structures in microscopy images. However, DL typically needs copious amounts of annotated training data that is for biomedical projects typically not available and excessively expensive to generate. Additionally, tasks become harder in the presence of noise, requiring even more high-quality training data. Hence, we propose to use denoising networks to improve the performance of other DL-based image segmentation methods. More specifically, we present ideas on how state-of-the-art self-supervised CARE networks can improve cell/nuclei segmentation in microscopy data. Using two state-of-the-art baseline methods, U-Net and StarDist, we show that our ideas consistently improve the quality of resulting segmentations, especially when only limited training data for noisy micrographs are available.
Tasks	Denoising, Semantic Segmentation
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12239v3
PDF	https://arxiv.org/pdf/1911.12239v3.pdf
PWC	https://paperswithcode.com/paper/leveraging-self-supervised-denoising-for
Repo	https://github.com/juglab/VoidSeg
Framework	tf

Integrating and querying similar tables from PDF documents using deep learning


Title	Integrating and querying similar tables from PDF documents using deep learning
Authors	Rahul Anand, Hye-Young Paik, Cheng Wang
Abstract	Large amount of public data produced by enterprises are in semi-structured PDF form. Tabular data extraction from reports and other published data in PDF format is of interest for various data consolidation purposes such as analysing and aggregating financial reports of a company. Queries into the structured tabular data in PDF format are normally processed in an unstructured manner through means like text-match. This is mainly due to that the binary format of PDF documents is optimized for layout and rendering and do not have great support for automated parsing of data. Moreover, even the same table type in PDF files varies in schema, row or column headers, which makes it difficult for a query plan to cover all relevant tables. This paper proposes a deep learning based method to enable SQL-like query and analysis of financial tables from annual reports in PDF format. This is achieved through table type classification and nearest row search. We demonstrate that using word embedding trained on Google news for header match clearly outperforms the text-match based approach in traditional database. We also introduce a practical system that uses this technology to query and analyse finance tables in PDF documents from various sources.
Tasks
Published	2019-01-15
URL	https://arxiv.org/abs/1901.04672v1
PDF	https://arxiv.org/pdf/1901.04672v1.pdf
PWC	https://paperswithcode.com/paper/integrating-and-querying-similar-tables-from
Repo	https://github.com/dhavalpotdar/Bounding-box-Classifier
Framework	none

Efficient Reinforcement Learning with a Thought-Game for StarCraft


Title	Efficient Reinforcement Learning with a Thought-Game for StarCraft
Authors	Ruo-Ze Liu, Haifeng Guo, Xiaozhong Ji, Yang Yu, Zhen-Jia Pang, Zitai Xiao, Yuzhou Wu, Tong Lu
Abstract	StarCraft provides an extremely challenging platform for reinforcement learning due to its huge state-space and game length. The previous fastest method requires days to train a full-length game policy in a single commercial machine. Introduction of background knowledge can accelerate the training of reinforcement learning. But how to effectively introduce background knowledge is still an open question. In this paper, we incorporate the background knowledge to reinforcement learning in the form of a thought-game. With the thought-game, the policy is firstly trained in the thought-game fastly and is then transferred to the real game using mapping functions for the second phase training. In our experiments, the trained agent can achieve a 100% win-rate on the map \textit{Simple64} against the most difficult non-cheating built-in bot (level-7), and the training is 100 times faster than the previous ones under the same computational resource. To test the generalization performance of the agent, a Golden level of StarCraft~II Ladder human player has competed with the agent. With restricted strategy, the agent wins the human player by 4 out of 5 games. We also apply thought-game idea to another game which is “StarCraft: Brood War”, the predecessor of StarCraft II. The thought-game approach might shed some light for further studies of efficient reinforcement learning.
Tasks	Starcraft, Starcraft II
Published	2019-03-02
URL	https://arxiv.org/abs/1903.00715v2
PDF	https://arxiv.org/pdf/1903.00715v2.pdf
PWC	https://paperswithcode.com/paper/efficient-reinforcement-learning-with-a-mind
Repo	https://github.com/mindgameSC2/mind-SC2
Framework	tf

UER: An Open-Source Toolkit for Pre-training Models


Title	UER: An Open-Source Toolkit for Pre-training Models
Authors	Zhe Zhao, Hui Chen, Jinbin Zhang, Xin Zhao, Tao Liu, Wei Lu, Xi Chen, Haotang Deng, Qi Ju, Xiaoyong Du
Abstract	Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks. While there does not exist a single pre-training model that works best in all cases, it is of necessity to develop a framework that is able to deploy various pre-training models efficiently. For this purpose, we propose an assemble-on-demand pre-training toolkit, namely Universal Encoder Representations (UER). UER is loosely coupled, and encapsulated with rich modules. By assembling modules on demand, users can either reproduce a state-of-the-art pre-training model or develop a pre-training model that remains unexplored. With UER, we have built a model zoo, which contains pre-trained models based on different corpora, encoders, and targets (objectives). With proper pre-trained models, we could achieve new state-of-the-art results on a range of downstream datasets.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05658v1
PDF	https://arxiv.org/pdf/1909.05658v1.pdf
PWC	https://paperswithcode.com/paper/uer-an-open-source-toolkit-for-pre-training
Repo	https://github.com/dbiir/UER-py
Framework	pytorch

Spoken Language Intent Detection using Confusion2Vec


Title	Spoken Language Intent Detection using Confusion2Vec
Authors	Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis Georgiou
Abstract	Decoding speaker’s intent is a crucial part of spoken language understanding (SLU). The presence of noise or errors in the text transcriptions, in real life scenarios make the task more challenging. In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems. We propose to employ confusion2vec word feature representation to compensate for the errors made by ASR and to increase the robustness of the SLU system. The confusion2vec, motivated from human speech production and perception, models acoustic relationships between words in addition to the semantic and syntactic relations of words in human language. We hypothesize that ASR often makes errors relating to acoustically similar words, and the confusion2vec with inherent model of acoustic relationships between words is able to compensate for the errors. We demonstrate through experiments on the ATIS benchmark dataset, the robustness of the proposed model to achieve state-of-the-art results under noisy ASR conditions. Our system reduces classification error rate (CER) by 20.84% and improves robustness by 37.48% (lower CER degradation) relative to the previous state-of-the-art going from clean to noisy transcripts. Improvements are also demonstrated when training the intent detection models on noisy transcripts.
Tasks	Intent Detection, Speech Recognition, Spoken Language Understanding
Published	2019-04-07
URL	https://arxiv.org/abs/1904.03576v3
PDF	https://arxiv.org/pdf/1904.03576v3.pdf
PWC	https://paperswithcode.com/paper/spoken-language-intent-detection-using
Repo	https://github.com/pgurunath/slu_confusion2vec
Framework	none

Trajectory-Based Off-Policy Deep Reinforcement Learning


Title	Trajectory-Based Off-Policy Deep Reinforcement Learning
Authors	Andreas Doerr, Michael Volpp, Marc Toussaint, Sebastian Trimpe, Christian Daniel
Abstract	Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.
Tasks	Continuous Control, Policy Gradient Methods, Stochastic Optimization
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05710v1
PDF	https://arxiv.org/pdf/1905.05710v1.pdf
PWC	https://paperswithcode.com/paper/trajectory-based-off-policy-deep
Repo	https://github.com/boschresearch/DD_OPG
Framework	tf