October 20, 2019

3072 words 15 mins read

Paper Group AWR 168

Autoencoders, Kernels, and Multilayer Perceptrons for Electron Micrograph Restoration and Compression. Learning 3D Shape Completion under Weak Supervision. Denoising of 3-D Magnetic Resonance Images Using a Residual Encoder-Decoder Wasserstein Generative Adversarial Network. A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing. Co …

Autoencoders, Kernels, and Multilayer Perceptrons for Electron Micrograph Restoration and Compression


Title	Autoencoders, Kernels, and Multilayer Perceptrons for Electron Micrograph Restoration and Compression
Authors	Jeffrey M. Ede
Abstract	We present 14 autoencoders, 15 kernels and 14 multilayer perceptrons for electron micrograph restoration and compression. These have been trained for transmission electron microscopy (TEM), scanning transmission electron microscopy (STEM) and for both (TEM+STEM). TEM autoencoders have been trained for 1$\times$, 4$\times$, 16$\times$ and 64$\times$ compression, STEM autoencoders for 1$\times$, 4$\times$ and 16$\times$ compression and TEM+STEM autoencoders for 1$\times$, 2$\times$, 4$\times$, 8$\times$, 16$\times$, 32$\times$ and 64$\times$ compression. Kernels and multilayer perceptrons have been trained to approximate the denoising effect of the 4$\times$ compression autoencoders. Kernels for input sizes of 3, 5, 7, 11 and 15 have been fitted for TEM, STEM and TEM+STEM. TEM multilayer perceptrons have been trained with 1 hidden layer for input sizes of 3, 5 and 7 and with 2 hidden layers for input sizes of 5 and 7. STEM multilayer perceptrons have been trained with 1 hidden layer for input sizes of 3, 5 and 7. TEM+STEM multilayer perceptrons have been trained with 1 hidden layer for input sizes of 3, 5, 7 and 11 and with 2 hidden layers for input sizes of 3 and 7. Our code, example usage and pre-trained models are available at https://github.com/Jeffrey-Ede/Denoising-Kernels-MLPs-Autoencoders
Tasks	Denoising
Published	2018-08-29
URL	http://arxiv.org/abs/1808.09916v1
PDF	http://arxiv.org/pdf/1808.09916v1.pdf
PWC	https://paperswithcode.com/paper/autoencoders-kernels-and-multilayer
Repo	https://github.com/Jeffrey-Ede/Denoising-Kernels-MLPs-Autoencoders
Framework	tf

Learning 3D Shape Completion under Weak Supervision


Title	Learning 3D Shape Completion under Weak Supervision
Authors	David Stutz, Andreas Geiger
Abstract	We address the problem of 3D shape completion from sparse and noisy point clouds, a fundamental problem in computer vision and robotics. Recent approaches are either data-driven or learning-based: Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations; Learning-based approaches, in contrast, avoid the expensive optimization step by learning to directly predict complete shapes from incomplete observations in a fully-supervised setting. However, full supervision is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, i.e., learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. On synthetic benchmarks based on ShapeNet and ModelNet as well as on real robotics data from KITTI and Kinect, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with recent fully supervised baselines and outperforms data-driven approaches, while requiring less supervision and being significantly faster.
Tasks
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07290v2
PDF	http://arxiv.org/pdf/1805.07290v2.pdf
PWC	https://paperswithcode.com/paper/learning-3d-shape-completion-under-weak
Repo	https://github.com/davidstutz/aml-improved-shape-completion
Framework	pytorch

Denoising of 3-D Magnetic Resonance Images Using a Residual Encoder-Decoder Wasserstein Generative Adversarial Network


Title	Denoising of 3-D Magnetic Resonance Images Using a Residual Encoder-Decoder Wasserstein Generative Adversarial Network
Authors	Maosong Ran, Jinrong Hu, Yang Chen, Hu Chen, Huaiqiang Sun, Jiliu Zhou, Yi Zhang
Abstract	Structure-preserved denoising of 3D magnetic resonance imaging (MRI) images is a critical step in medical image analysis. Over the past few years, many algorithms with impressive performances have been proposed. In this paper, inspired by the idea of deep learning, we introduce an MRI denoising method based on the residual encoder-decoder Wasserstein generative adversarial network (RED-WGAN). Specifically, to explore the structure similarity between neighboring slices, a 3D configuration is utilized as the basic processing unit. Residual autoencoders combined with deconvolution operations are introduced into the generator network. Furthermore, to alleviate the oversmoothing shortcoming of the traditional mean squared error (MSE) loss function, the perceptual similarity, which is implemented by calculating the distances in the feature space extracted by a pretrained VGG-19 network, is incorporated with the MSE and adversarial losses to form the new loss function. Extensive experiments are implemented to assess the performance of the proposed method. The experimental results show that the proposed RED-WGAN achieves performance superior to several state-of-the-art methods in both simulated and real clinical data. In particular, our method demonstrates powerful abilities in both noise suppression and structure preservation.
Tasks	Denoising
Published	2018-08-12
URL	https://arxiv.org/abs/1808.03941v2
PDF	https://arxiv.org/pdf/1808.03941v2.pdf
PWC	https://paperswithcode.com/paper/denoising-of-3-d-magnetic-resonance-images
Repo	https://github.com/Deep-Imaging-Group/RED-WGAN
Framework	pytorch


Title	A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing
Authors	Shifeng Zhang, Xiaobo Wang, Ajian Liu, Chenxu Zhao, Jun Wan, Sergio Escalera, Hailin Shi, Zezheng Wang, Stan Z. Li
Abstract	Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects ($\le\negmedspace170$) and modalities ($\leq\negmedspace2$), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest publicly available dataset for face anti-spoofing in terms of both subjects and visual modalities. Specifically, it consists of $1,000$ subjects with $21,000$ videos and each sample has $3$ modalities (i.e., RGB, Depth and IR). We also provide a measurement set, evaluation protocol and training/validation/testing subsets, developing a new benchmark for face anti-spoofing. Moreover, we present a new multi-modal fusion method as baseline, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modal. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at https://sites.google.com/qq.com/chalearnfacespoofingattackdete
Tasks	Face Anti-Spoofing, Face Recognition
Published	2018-12-02
URL	http://arxiv.org/abs/1812.00408v3
PDF	http://arxiv.org/pdf/1812.00408v3.pdf
PWC	https://paperswithcode.com/paper/casia-surf-a-dataset-and-benchmark-for-large
Repo	https://github.com/SoftwareGift/FeatherNets_Face-Anti-spoofing-Attack-Detection-Challenge-CVPR2019
Framework	pytorch

Compact and Efficient Encodings for Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models


Title	Compact and Efficient Encodings for Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models
Authors	Buser Say, Scott Sanner
Abstract	In this paper, we leverage the efficiency of Binarized Neural Networks (BNNs) to learn complex state transition models of planning domains with discretized factored state and action spaces. In order to directly exploit this transition structure for planning, we present two novel compilations of the learned factored planning problem with BNNs based on reductions to Weighted Partial Maximum Boolean Satisfiability (FD-SAT-Plan+) as well as Binary Linear Programming (FD-BLP-Plan+). Theoretically, we show that our SAT-based Bi-Directional Neuron Activation Encoding is asymptotically the most compact encoding in the literature and maintains the generalized arc-consistency property through unit propagation – an important property that facilitates efficiency in SAT solvers. Experimentally, we validate the computational efficiency of our Bi-Directional Neuron Activation Encoding in comparison to an existing neuron activation encoding and demonstrate the effectiveness of learning complex transition models with BNNs. We test the runtime efficiency of both FD-SAT-Plan+ and FD-BLP-Plan+ on the learned factored planning problem showing that FD-SAT-Plan+ scales better with increasing BNN size and complexity. Finally, we present a finite-time incremental constraint generation algorithm based on generalized landmark constraints to improve the planning accuracy of our encodings through simulated or real-world interaction.
Tasks
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10433v9
PDF	http://arxiv.org/pdf/1811.10433v9.pdf
PWC	https://paperswithcode.com/paper/compact-and-efficient-encodings-for-planning
Repo	https://github.com/saybuser/FD-SAT-Plan
Framework	none

Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds


Title	Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds
Authors	Xiaohan Chen, Jialin Liu, Zhangyang Wang, Wotao Yin
Abstract	In recent years, unfolding iterative algorithms as neural networks has become an empirical success in solving sparse recovery problems. However, its theoretical understanding is still immature, which prevents us from fully utilizing the power of neural networks. In this work, we study unfolded ISTA (Iterative Shrinkage Thresholding Algorithm) for sparse signal recovery. We introduce a weight structure that is necessary for asymptotic convergence to the true sparse signal. With this structure, unfolded ISTA can attain a linear convergence, which is better than the sublinear convergence of ISTA/FISTA in general cases. Furthermore, we propose to incorporate thresholding in the network to perform support selection, which is easy to implement and able to boost the convergence rate both theoretically and empirically. Extensive simulations, including sparse vector recovery and a compressive sensing experiment on real image data, corroborate our theoretical results and demonstrate their practical usefulness. We have made our codes publicly available: https://github.com/xchen-tamu/linear-lista-cpss.
Tasks	Compressive Sensing
Published	2018-08-29
URL	http://arxiv.org/abs/1808.10038v2
PDF	http://arxiv.org/pdf/1808.10038v2.pdf
PWC	https://paperswithcode.com/paper/theoretical-linear-convergence-of-unfolded
Repo	https://github.com/TAMU-VITA/LISTA-CPSS
Framework	tf

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation


Title	Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation
Authors	Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu
Abstract	Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons’ temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature learning framework. The co-occurrence features are learned with a hierarchical methodology, in which different levels of contextual information are aggregated gradually. Firstly point-level information of each joint is encoded independently. Then they are assembled into semantic representation in both spatial and temporal domains. Specifically, we introduce a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation. Besides, raw skeleton coordinates as well as their temporal difference are integrated with a two-stream paradigm. Experiments show that our approach consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.
Tasks	RF-based Pose Estimation, Skeleton Based Action Recognition, Temporal Action Localization
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06055v1
PDF	http://arxiv.org/pdf/1804.06055v1.pdf
PWC	https://paperswithcode.com/paper/co-occurrence-feature-learning-from-skeleton
Repo	https://github.com/huguyuehuhu/HCN-pytorch
Framework	pytorch

ECG arrhythmia classification using a 2-D convolutional neural network


Title	ECG arrhythmia classification using a 2-D convolutional neural network
Authors	Tae Joon Jun, Hoang Minh Nguyen, Daeyoun Kang, Dohyeun Kim, Daeyoung Kim, Young-Hak Kim
Abstract	In this paper, we propose an effective electrocardiogram (ECG) arrhythmia classification method using a deep two-dimensional convolutional neural network (CNN) which recently shows outstanding performance in the field of pattern recognition. Every ECG beat was transformed into a two-dimensional grayscale image as an input data for the CNN classifier. Optimization of the proposed CNN classifier includes various deep learning techniques such as batch normalization, data augmentation, Xavier initialization, and dropout. In addition, we compared our proposed classifier with two well-known CNN models; AlexNet and VGGNet. ECG recordings from the MIT-BIH arrhythmia database were used for the evaluation of the classifier. As a result, our classifier achieved 99.05% average accuracy with 97.85% average sensitivity. To precisely validate our CNN classifier, 10-fold cross-validation was performed at the evaluation which involves every ECG recording as a test data. Our experimental results have successfully validated that the proposed CNN classifier with the transformed ECG images can achieve excellent classification accuracy without any manual pre-processing of the ECG signals such as noise filtering, feature extraction, and feature reduction.
Tasks	Arrhythmia Detection, Data Augmentation, Electrocardiography (ECG)
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06812v1
PDF	http://arxiv.org/pdf/1804.06812v1.pdf
PWC	https://paperswithcode.com/paper/ecg-arrhythmia-classification-using-a-2-d
Repo	https://github.com/lorenzobrusco/ECGNeuralNetwork
Framework	tf

Discourse Embellishment Using a Deep Encoder-Decoder Network


Title	Discourse Embellishment Using a Deep Encoder-Decoder Network
Authors	Leonid Berov, Kai Standvoss
Abstract	We suggest a new NLG task in the context of the discourse generation pipeline of computational storytelling systems. This task, textual embellishment, is defined by taking a text as input and generating a semantically equivalent output with increased lexical and syntactic complexity. Ideally, this would allow the authors of computational storytellers to implement just lightweight NLG systems and use a domain-independent embellishment module to translate its output into more literary text. We present promising first results on this task using LSTM Encoder-Decoder networks trained on the WikiLarge dataset. Furthermore, we introduce “Compiled Computer Tales”, a corpus of computationally generated stories, that can be used to test the capabilities of embellishment algorithms.
Tasks
Published	2018-10-18
URL	http://arxiv.org/abs/1810.08076v1
PDF	http://arxiv.org/pdf/1810.08076v1.pdf
PWC	https://paperswithcode.com/paper/discourse-embellishment-using-a-deep-encoder
Repo	https://github.com/cartisan/CompiledComputerTales
Framework	tf

Non-local Meets Global: An Integrated Paradigm for Hyperspectral Denoising


Title	Non-local Meets Global: An Integrated Paradigm for Hyperspectral Denoising
Authors	Wei He, Quanming Yao, Chao Li, Naoto Yokoya, Qibin Zhao
Abstract	Non-local low-rank tensor approximation has been developed as a state-of-the-art method for hyperspectral image (HSI) denoising. Unfortunately, with more spectral bands for HSI, while the running time of these methods significantly increases, their denoising performance benefits little. In this paper, we claim that the HSI underlines a global spectral low-rank subspace, and the spectral subspaces of each full band patch groups should underlie this global low-rank subspace. This motivates us to propose a unified spatial-spectral paradigm for HSI denoising. As the new model is hard to optimize, we further propose an efficient algorithm for optimization, which is motivated by alternating minimization. This is done by first learning a low-dimensional projection and the related reduced image from the noisy HSI. Then, the non-local low-rank denoising and iterative regularization are developed to refine the reduced image and projection, respectively. Finally, experiments on synthetic and both real datasets demonstrate the superiority against the other state-of-the-arts HSI denoising methods.
Tasks	Denoising
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04243v2
PDF	http://arxiv.org/pdf/1812.04243v2.pdf
PWC	https://paperswithcode.com/paper/non-local-meets-global-an-integrated-paradigm
Repo	https://github.com/quanmingyao/NGMeet
Framework	none

Fine-grained Entity Typing through Increased Discourse Context and Adaptive Classification Thresholds


Title	Fine-grained Entity Typing through Increased Discourse Context and Adaptive Classification Thresholds
Authors	Sheng Zhang, Kevin Duh, Benjamin Van Durme
Abstract	Fine-grained entity typing is the task of assigning fine-grained semantic types to entity mentions. We propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context – both document and sentence level information – than prior work. We find that additional context improves performance, with further improvements gained by utilizing adaptive classification thresholds. Experiments show that our approach without reliance on hand-crafted features achieves the state-of-the-art results on three benchmark datasets.
Tasks	Entity Typing
Published	2018-04-21
URL	http://arxiv.org/abs/1804.08000v1
PDF	http://arxiv.org/pdf/1804.08000v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-entity-typing-through-increased
Repo	https://github.com/sheng-z/figet
Framework	pytorch

Graph-Based Global Reasoning Networks


Title	Graph-Based Global Reasoning Networks
Authors	Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Shuicheng Yan, Jiashi Feng, Yannis Kalantidis
Abstract	Globally modeling and reasoning over relations between regions can be beneficial for many computer vision tasks on both images and videos. Convolutional Neural Networks (CNNs) excel at modeling local relations by convolution operations, but they are typically inefficient at capturing global relations between distant regions and require stacking multiple convolution layers. In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed. After reasoning, relation-aware features are distributed back to the original coordinate space for down-stream tasks. We further present a highly efficient instantiation of the proposed approach and introduce the Global Reasoning unit (GloRe unit) that implements the coordinate-interaction space mapping by weighted global pooling and weighted broadcasting, and the relation reasoning via graph convolution on a small graph in interaction space. The proposed GloRe unit is lightweight, end-to-end trainable and can be easily plugged into existing CNNs for a wide range of tasks. Extensive experiments show our GloRe unit can consistently boost the performance of state-of-the-art backbone architectures, including ResNet, ResNeXt, SE-Net and DPN, for both 2D and 3D CNNs, on image classification, semantic segmentation and video action recognition task.
Tasks	Image Classification, Relational Reasoning, Semantic Segmentation, Temporal Action Localization
Published	2018-11-30
URL	http://arxiv.org/abs/1811.12814v1
PDF	http://arxiv.org/pdf/1811.12814v1.pdf
PWC	https://paperswithcode.com/paper/graph-based-global-reasoning-networks
Repo	https://github.com/facebookresearch/GloRe
Framework	pytorch

Probing hidden spin order with interpretable machine learning


Title	Probing hidden spin order with interpretable machine learning
Authors	Jonas Greitemann, Ke Liu, Lode Pollet
Abstract	The search of unconventional magnetic and nonmagnetic states is a major topic in the study of frustrated magnetism. Canonical examples of those states include various spin liquids and spin nematics. However, discerning their existence and the correct characterization is usually challenging. Here we introduce a machine-learning protocol that can identify general nematic order and their order parameter from seemingly featureless spin configurations, thus providing comprehensive insight on the presence or absence of hidden orders. We demonstrate the capabilities of our method by extracting the analytical form of nematic order parameter tensors up to rank 6. This may prove useful in the search for novel spin states and for ruling out spurious spin liquid candidates.
Tasks	Interpretable Machine Learning
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08557v5
PDF	http://arxiv.org/pdf/1804.08557v5.pdf
PWC	https://paperswithcode.com/paper/probing-hidden-spin-order-with-interpretable
Repo	https://github.com/jgreitemann/svm-order-params
Framework	none

STAIR Actions: A Video Dataset of Everyday Home Actions


Title	STAIR Actions: A Video Dataset of Everyday Home Actions
Authors	Yuya Yoshikawa, Jiaqing Lin, Akikazu Takeuchi
Abstract	A new large-scale video dataset for human action recognition, called STAIR Actions is introduced. STAIR Actions contains 100 categories of action labels representing fine-grained everyday home actions so that it can be applied to research in various home tasks such as nursing, caring, and security. In STAIR Actions, each video has a single action label. Moreover, for each action category, there are around 1,000 videos that were obtained from YouTube or produced by crowdsource workers. The duration of each video is mostly five to six seconds. The total number of videos is 102,462. We explain how we constructed STAIR Actions and show the characteristics of STAIR Actions compared to existing datasets for human action recognition. Experiments with three major models for action recognition show that STAIR Actions can train large models and achieve good performance. STAIR Actions can be downloaded from http://actions.stair.center
Tasks	Temporal Action Localization
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04326v3
PDF	http://arxiv.org/pdf/1804.04326v3.pdf
PWC	https://paperswithcode.com/paper/stair-actions-a-video-dataset-of-everyday
Repo	https://github.com/STAIR-Lab-CIT/STAIR-actions
Framework	pytorch

Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach


Title	Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach
Authors	Jingjing Xu, Xu Sun, Qi Zeng, Xuancheng Ren, Xiaodong Zhang, Houfeng Wang, Wenjie Li
Abstract	The goal of sentiment-to-sentiment “translation” is to change the underlying sentiment of a sentence while keeping its content. The main challenge is the lack of parallel data. To solve this problem, we propose a cycled reinforcement learning method that enables training on unpaired data by collaboration between a neutralization module and an emotionalization module. We evaluate our approach on two review datasets, Yelp and Amazon. Experimental results show that our approach significantly outperforms the state-of-the-art systems. Especially, the proposed method substantially improves the content preservation performance. The BLEU score is improved from 1.64 to 22.46 and from 0.56 to 14.06 on the two datasets, respectively.
Tasks	Text Style Transfer
Published	2018-05-14
URL	http://arxiv.org/abs/1805.05181v2
PDF	http://arxiv.org/pdf/1805.05181v2.pdf
PWC	https://paperswithcode.com/paper/unpaired-sentiment-to-sentiment-translation-a
Repo	https://github.com/lancopku/unpaired-sentiment-translation
Framework	tf