January 31, 2020

3285 words 16 mins read

Paper Group AWR 400

Reinventing 2D Convolutions for 3D Images. Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering. A Neural-Symbolic Architecture for Inverse Graphics Improved by Lifelong Meta-Learning. Two-stage Image Classification Supervised by a Single Teacher Single Student Model. Adaptively Connected Neural Networks. Dependency or Span, E …

Reinventing 2D Convolutions for 3D Images


Title	Reinventing 2D Convolutions for 3D Images
Authors	Jiancheng Yang, Xiaoyang Huang, Bingbing Ni, Jingwei Xu, Canqian Yang, Guozheng Xu
Abstract	There have been considerable debates over 2D and 3D representation learning on 3D medical images. 2D approaches could benefit from large-scale 2D pretraining, whereas they are generally weak in capturing large 3D contexts. 3D approaches are natively strong in 3D contexts, however few publicly available 3D medical dataset is large and diverse enough for universal 3D pretraining. Even for hybrid (2D + 3D) approaches, the intrinsic disadvantages within the 2D / 3D parts still exist. In this study, we bridge the gap between 2D and 3D convolutions by reinventing the 2D convolutions. We propose ACS (axial-coronal-sagittal) convolutions to perform natively 3D representation learning, while utilizing the pretrained weights on 2D datasets. In ACS convolutions, 2D convolution kernels are split by channel into three parts, and convoluted separately on the three views (axial, coronal and sagittal) of 3D representations. Theoretically, ANY 2D CNN (ResNet, DenseNet, or DeepLab) is able to be converted into a 3D ACS CNN, with pretrained weight of a same parameter size. Extensive experiments on proof-of-concept dataset and several medical benchmarks validate the consistent superiority of the pretrained ACS CNNs, over the 2D / 3D CNN counterparts with / without pretraining. Even without pretraining, the ACS convolution can be used as a plug-and-play replacement of standard 3D convolution, with smaller model size and less computation.
Tasks	Representation Learning
Published	2019-11-24
URL	https://arxiv.org/abs/1911.10477v2
PDF	https://arxiv.org/pdf/1911.10477v2.pdf
PWC	https://paperswithcode.com/paper/reinventing-2d-convolutions-for-3d-medical
Repo	https://github.com/m3dv/ACSConv
Framework	pytorch

Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering


Title	Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering
Authors	Pedro F. Proenca, Yang Gao
Abstract	On-orbit proximity operations in space rendezvous, docking and debris removal require precise and robust 6D pose estimation under a wide range of lighting conditions and against highly textured background, i.e., the Earth. This paper investigates leveraging deep learning and photorealistic rendering for monocular pose estimation of known uncooperative spacecrafts. We first present a simulator built on Unreal Engine 4, named URSO, to generate labeled images of spacecrafts orbiting the Earth, which can be used to train and evaluate neural networks. Secondly, we propose a deep learning framework for pose estimation based on orientation soft classification, which allows modelling orientation ambiguity as a mixture of Gaussians. This framework was evaluated both on URSO datasets and the ESA pose estimation challenge. In this competition, our best model achieved 3rd place on the synthetic test set and 2nd place on the real test set. Moreover, our results show the impact of several architectural and training aspects, and we demonstrate qualitatively how models learned on URSO datasets can perform on real images from space.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04298v2
PDF	https://arxiv.org/pdf/1907.04298v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-spacecraft-pose-estimation
Repo	https://github.com/pedropro/UrsoNet
Framework	tf

A Neural-Symbolic Architecture for Inverse Graphics Improved by Lifelong Meta-Learning


Title	A Neural-Symbolic Architecture for Inverse Graphics Improved by Lifelong Meta-Learning
Authors	Michael Kissner, Helmut Mayer
Abstract	We follow the idea of formulating vision as inverse graphics and propose a new type of element for this task, a neural-symbolic capsule. It is capable of de-rendering a scene into semantic information feed-forward, as well as rendering it feed-backward. An initial set of capsules for graphical primitives is obtained from a generative grammar and connected into a full capsule network. Lifelong meta-learning continuously improves this network’s detection capabilities by adding capsules for new and more complex objects it detects in a scene using few-shot learning. Preliminary results demonstrate the potential of our novel approach.
Tasks	Few-Shot Learning, Meta-Learning
Published	2019-05-22
URL	https://arxiv.org/abs/1905.08910v2
PDF	https://arxiv.org/pdf/1905.08910v2.pdf
PWC	https://paperswithcode.com/paper/a-neural-symbolic-architecture-for-inverse
Repo	https://github.com/Kayzaks/VividNet
Framework	none

Two-stage Image Classification Supervised by a Single Teacher Single Student Model


Title	Two-stage Image Classification Supervised by a Single Teacher Single Student Model
Authors	Jianhang Zhou, Shaoning Zeng, Bob Zhang
Abstract	The two-stage strategy has been widely used in image classification. However, these methods barely take the classification criteria of the first stage into consideration in the second prediction stage. In this paper, we propose a novel two-stage representation method (TSR), and convert it to a Single-Teacher Single-Student (STSS) problem in our two-stage image classification framework. We seek the nearest neighbours of the test sample to choose candidate target classes. Meanwhile, the first stage classifier is formulated as the teacher, which holds the classification scores. The samples of the candidate classes are utilized to learn a student classifier based on L2-minimization in the second stage. The student will be supervised by the teacher classifier, which approves the student only if it obtains a higher score. In actuality, the proposed framework generates a stronger classifier by staging two weaker classifiers in a novel way. The experiments conducted on several face and object databases show that our proposed framework is effective and outperforms multiple popular classification methods.
Tasks	Image Classification
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12111v1
PDF	https://arxiv.org/pdf/1909.12111v1.pdf
PWC	https://paperswithcode.com/paper/two-stage-image-classification-supervised-by
Repo	https://github.com/zengsn/research
Framework	none

Adaptively Connected Neural Networks


Title	Adaptively Connected Neural Networks
Authors	Guangrun Wang, Keze Wang, Liang Lin
Abstract	This paper presents a novel adaptively connected neural network (ACNet) to improve the traditional convolutional neural networks (CNNs) {in} two aspects. First, ACNet employs a flexible way to switch global and local inference in processing the internal feature representations by adaptively determining the connection status among the feature nodes (e.g., pixels of the feature maps) \footnote{In a computer vision domain, a node refers to a pixel of a feature map{, while} in {the} graph domain, a node denotes a graph node.}. We can show that existing CNNs, the classical multilayer perceptron (MLP), and the recently proposed non-local network (NLN) \cite{nonlocalnn17} are all special cases of ACNet. Second, ACNet is also capable of handling non-Euclidean data. Extensive experimental analyses on {a variety of benchmarks (i.e.,} ImageNet-1k classification, COCO 2017 detection and segmentation, CUHK03 person re-identification, CIFAR analysis, and Cora document categorization) demonstrate that {ACNet} cannot only achieve state-of-the-art performance but also overcome the limitation of the conventional MLP and CNN \footnote{Corresponding author: Liang Lin (linliang@ieee.org)}. The code is available at \url{https://github.com/wanggrun/Adaptively-Connected-Neural-Networks}.
Tasks	Document Classification, Image Classification, Person Re-Identification
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03579v1
PDF	http://arxiv.org/pdf/1904.03579v1.pdf
PWC	https://paperswithcode.com/paper/adaptively-connected-neural-networks
Repo	https://github.com/wanggrun/Adaptively-Connected-Neural-Networks
Framework	tf

Dependency or Span, End-to-End Uniform Semantic Role Labeling


Title	Dependency or Span, End-to-End Uniform Semantic Role Labeling
Authors	Zuchao Li, Shexia He, Hai Zhao, Yiqing Zhang, Zhuosheng Zhang, Xi Zhou, Xiang Zhou
Abstract	Semantic role labeling (SRL) aims to discover the predicateargument structure of a sentence. End-to-end SRL without syntactic input has received great attention. However, most of them focus on either span-based or dependency-based semantic representation form and only show specific model optimization respectively. Meanwhile, handling these two SRL tasks uniformly was less successful. This paper presents an end-to-end model for both dependency and span SRL with a unified argument representation to deal with two different types of argument annotations in a uniform fashion. Furthermore, we jointly predict all predicates and arguments, especially including long-term ignored predicate identification subtask. Our single model achieves new state-of-the-art results on both span (CoNLL 2005, 2012) and dependency (CoNLL 2008, 2009) SRL benchmarks.
Tasks	Semantic Role Labeling
Published	2019-01-16
URL	http://arxiv.org/abs/1901.05280v1
PDF	http://arxiv.org/pdf/1901.05280v1.pdf
PWC	https://paperswithcode.com/paper/dependency-or-span-end-to-end-uniform
Repo	https://github.com/bcmi220/unisrl
Framework	tf

Traditional and Heavy-Tailed Self Regularization in Neural Network Models


Title	Traditional and Heavy-Tailed Self Regularization in Neural Network Models
Authors	Charles H. Martin, Michael W. Mahoney
Abstract	Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of regularization, such as Dropout or Weight Norm constraints. Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify \emph{5+1 Phases of Training}, corresponding to increasing amounts of \emph{Implicit Self-Regularization}. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a `size scale’ separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of \emph{Heavy-Tailed Self-Regularization}, similar to the self-organization seen in the statistical physics of disordered systems. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. \|
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08276v1
PDF	http://arxiv.org/pdf/1901.08276v1.pdf
PWC	https://paperswithcode.com/paper/traditional-and-heavy-tailed-self
Repo	https://github.com/CalculatedContent/ImplicitSelfRegularization
Framework	pytorch

Sheaves: A Topological Approach to Big Data


Title	Sheaves: A Topological Approach to Big Data
Authors	Linas Vepstas
Abstract	This document develops general concepts useful for extracting knowledge embedded in large graphs or datasets that have pair-wise relationships, such as cause-effect-type relations. Almost no underlying assumptions are made, other than that the data can be presented in terms of pair-wise relationships between objects/events. This assumption is used to mine for patterns in the dataset, defining a reduced graph or dataset that boils-down or concentrates information into a more compact form. The resulting extracted structure or set of patterns are manifestly symbolic in nature, as they capture and encode the graph structure of the dataset in terms of a (generative) grammar. This structure is identified as having the formal mathematical structure of a sheaf. In essence, this paper introduces the basic concepts of sheaf theory into the domain of graphical datasets.
Tasks
Published	2019-01-04
URL	http://arxiv.org/abs/1901.01341v1
PDF	http://arxiv.org/pdf/1901.01341v1.pdf
PWC	https://paperswithcode.com/paper/sheaves-a-topological-approach-to-big-data
Repo	https://github.com/arquicanedo/graph2sheaves
Framework	none

Cycle-IR: Deep Cyclic Image Retargeting


Title	Cycle-IR: Deep Cyclic Image Retargeting
Authors	Weimin Tan, Bo Yan, Chumin Lin, Xuejing Niu
Abstract	Supervised deep learning techniques have achieved great success in various fields due to getting rid of the limitation of handcrafted representations. However, most previous image retargeting algorithms still employ fixed design principles such as using gradient map or handcrafted features to compute saliency map, which inevitably restricts its generality. Deep learning techniques may help to address this issue, but the challenging problem is that we need to build a large-scale image retargeting dataset for the training of deep retargeting models. However, building such a dataset requires huge human efforts. In this paper, we propose a novel deep cyclic image retargeting approach, called Cycle-IR, to firstly implement image retargeting with a single deep model, without relying on any explicit user annotations. Our idea is built on the reverse mapping from the retargeted images to the given input images. If the retargeted image has serious distortion or excessive loss of important visual information, the reverse mapping is unlikely to restore the input image well. We constrain this forward-reverse consistency by introducing a cyclic perception coherence loss. In addition, we propose a simple yet effective image retargeting network (IRNet) to implement the image retargeting process. Our IRNet contains a spatial and channel attention layer, which is able to discriminate visually important regions of input images effectively, especially in cluttered images. Given arbitrary sizes of input images and desired aspect ratios, our Cycle-IR can produce visually pleasing target images directly. Extensive experiments on the standard RetargetMe dataset show the superiority of our Cycle-IR. In addition, our Cycle-IR outperforms the Multiop method and obtains the best result in the user study. Code is available at https://github.com/mintanwei/Cycle-IR.
Tasks
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03556v1
PDF	https://arxiv.org/pdf/1905.03556v1.pdf
PWC	https://paperswithcode.com/paper/190503556
Repo	https://github.com/mintanwei/Cycle-IR
Framework	tf

IRNet: A General Purpose Deep Residual Regression Framework for Materials Discovery


Title	IRNet: A General Purpose Deep Residual Regression Framework for Materials Discovery
Authors	Dipendra Jha, Logan Ward, Zijiang Yang, Christopher Wolverton, Ian Foster, Wei-keng Liao, Alok Choudhary, Ankit Agrawal
Abstract	Materials discovery is crucial for making scientific advances in many domains. Collections of data from experiments and first-principle computations have spurred interest in applying machine learning methods to create predictive models capable of mapping from composition and crystal structures to materials properties. Generally, these are regression problems with the input being a 1D vector composed of numerical attributes representing the material composition and/or crystal structure. While neural networks consisting of fully connected layers have been applied to such problems, their performance often suffers from the vanishing gradient problem when network depth is increased. In this paper, we study and propose design principles for building deep regression networks composed of fully connected layers with numerical vectors as input. We introduce a novel deep regression network with individual residual learning, IRNet, that places shortcut connections after each layer so that each layer learns the residual mapping between its output and input. We use the problem of learning properties of inorganic materials from numerical attributes derived from material composition and/or crystal structure to compare IRNet’s performance against that of other machine learning techniques. Using multiple datasets from the Open Quantum Materials Database (OQMD) and Materials Project for training and evaluation, we show that IRNet provides significantly better prediction performance than the state-of-the-art machine learning approaches currently used by domain scientists. We also show that IRNet’s use of individual residual learning leads to better convergence during the training phase than when shortcut connections are between multi-layer stacks while maintaining the same number of parameters.
Tasks
Published	2019-07-07
URL	https://arxiv.org/abs/1907.03222v1
PDF	https://arxiv.org/pdf/1907.03222v1.pdf
PWC	https://paperswithcode.com/paper/irnet-a-general-purpose-deep-residual
Repo	https://github.com/dipendra009/IRNet
Framework	none

TVQA+: Spatio-Temporal Grounding for Video Question Answering


Title	TVQA+: Spatio-Temporal Grounding for Video Question Answering
Authors	Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
Abstract	We present the task of Spatio-Temporal Video Question Answering, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced visual concepts (people and objects) to answer natural language questions about videos. We first augment the TVQA dataset with 310.8k bounding boxes, linking depicted objects to visual concepts in questions and answers. We name this augmented version as TVQA+. We then propose Spatio-Temporal Answerer with Grounded Evidence (STAGE), a unified framework that grounds evidence in both the spatial and temporal domains to answer questions about videos. Comprehensive experiments and analyses demonstrate the effectiveness of our framework and how the rich annotations in our TVQA+ dataset can contribute to the question answering task. As a side product, by performing this joint task, our model is able to produce more insightful intermediate results. Dataset and code are publicly available.
Tasks	Question Answering, Video Question Answering
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11574v1
PDF	http://arxiv.org/pdf/1904.11574v1.pdf
PWC	https://paperswithcode.com/paper/tvqa-spatio-temporal-grounding-for-video
Repo	https://github.com/jayleicn/TVQA-PLUS
Framework	pytorch


Title	Learning Sparse Sharing Architectures for Multiple Tasks
Authors	Tianxiang Sun, Yunfan Shao, Xiaonan Li, Pengfei Liu, Hang Yan, Xipeng Qiu, Xuanjing Huang
Abstract	Most existing deep multi-task learning models are based on parameter sharing, such as hard sharing, hierarchical sharing, and soft sharing. How choosing a suitable sharing mechanism depends on the relations among the tasks, which is not easy since it is difficult to understand the underlying shared factors among these tasks. In this paper, we propose a novel parameter sharing mechanism, named \emph{Sparse Sharing}. Given multiple tasks, our approach automatically finds a sparse sharing structure. We start with an over-parameterized base network, from which each task extracts a subnetwork. The subnetworks of multiple tasks are partially overlapped and trained in parallel. We show that both hard sharing and hierarchical sharing can be formulated as particular instances of the sparse sharing framework. We conduct extensive experiments on three sequence labeling tasks. Compared with single-task models and three typical multi-task learning baselines, our proposed approach achieves consistent improvement while requiring fewer parameters.
Tasks	Multi-Task Learning
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05034v2
PDF	https://arxiv.org/pdf/1911.05034v2.pdf
PWC	https://paperswithcode.com/paper/learning-sparse-sharing-architectures-for
Repo	https://github.com/choosewhatulike/sparse-sharing
Framework	pytorch

Machine learning for automatic construction of pseudo-realistic pediatric abdominal phantoms


Title	Machine learning for automatic construction of pseudo-realistic pediatric abdominal phantoms
Authors	Marco Virgolin, Ziyuan Wang, Tanja Alderliesten, Peter A. N. Bosman
Abstract	Machine Learning (ML) is proving extremely beneficial in many healthcare applications. In pediatric oncology, retrospective studies that investigate the relationship between treatment and late adverse effects still rely on simple heuristics. To assess the effects of radiation therapy, treatment plans are typically simulated on phantoms, i.e., virtual surrogates of patient anatomy. Currently, phantoms are built according to reasonable, yet simple, human-designed criteria. This often results in a lack of individualization. We present a novel approach that combines imaging and ML to build individualized phantoms automatically. Given the features of a patient treated historically (only 2D radiographs available), and a database of 3D Computed Tomography (CT) imaging with organ segmentations and relative patient features, our approach uses ML to predict how to assemble a patient-specific phantom automatically. Experiments on 60 abdominal CTs of pediatric patients show that our approach constructs significantly more representative phantoms than using current phantom building criteria, in terms of location and shape of the abdomen and of two considered organs, the liver and the spleen. Among several ML algorithms considered, the Gene-pool Optimal Mixing Evolutionary Algorithm for Genetic Programming (GP-GOMEA) is found to deliver the best performing models, which are, moreover, transparent and interpretable mathematical expressions.
Tasks	Computed Tomography (CT)
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03723v1
PDF	https://arxiv.org/pdf/1909.03723v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-for-automatic-construction
Repo	https://github.com/marcovirgolin/APhA
Framework	none

Facial Motion Prior Networks for Facial Expression Recognition


Title	Facial Motion Prior Networks for Facial Expression Recognition
Authors	Yuedong Chen, Jianfeng Wang, Shikai Chen, Zhongchao Shi, Jianfei Cai
Abstract	Deep learning based facial expression recognition (FER) has received a lot of attention in the past few years. Most of the existing deep learning based FER methods do not consider domain knowledge well, which thereby fail to extract representative features. In this work, we propose a novel FER framework, named Facial Motion Prior Networks (FMPN). Particularly, we introduce an addition branch to generate a facial mask so as to focus on facial muscle moving regions. To guide the facial mask learning, we propose to incorporate prior domain knowledge by using the average differences between neutral faces and the corresponding expressive faces as the training guidance. Extensive experiments on three facial expression benchmark datasets demonstrate the effectiveness of the proposed method, compared with the state-of-the-art approaches.
Tasks	Facial Expression Recognition
Published	2019-02-23
URL	https://arxiv.org/abs/1902.08788v2
PDF	https://arxiv.org/pdf/1902.08788v2.pdf
PWC	https://paperswithcode.com/paper/facial-motion-prior-networks-for-facial
Repo	https://github.com/donydchen/FMPN-FER
Framework	pytorch

ADN: Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction


Title	ADN: Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction
Authors	Haofu Liao, Wei-An Lin, S. Kevin Zhou, Jiebo Luo
Abstract	Current deep neural network based approaches to computed tomography (CT) metal artifact reduction (MAR) are supervised methods that rely on synthesized metal artifacts for training. However, as synthesized data may not accurately simulate the underlying physical mechanisms of CT imaging, the supervised methods often generalize poorly to clinical applications. To address this problem, we propose, to the best of our knowledge, the first unsupervised learning approach to MAR. Specifically, we introduce a novel artifact disentanglement network that disentangles the metal artifacts from CT images in the latent space. It supports different forms of generations (artifact reduction, artifact transfer, and self-reconstruction, etc.) with specialized loss functions to obviate the need for supervision with synthesized data. Extensive experiments show that when applied to a synthesized dataset, our method addresses metal artifacts significantly better than the existing unsupervised models designed for natural image-to-image translation problems, and achieves comparable performance to existing supervised models for MAR. When applied to clinical datasets, our method demonstrates better generalization ability over the supervised models. The source code of this paper is publicly available at https://github.com/liaohaofu/adn.
Tasks	Computed Tomography (CT), Image-to-Image Translation, Medical Image Generation, Metal Artifact Reduction
Published	2019-08-03
URL	https://arxiv.org/abs/1908.01104v4
PDF	https://arxiv.org/pdf/1908.01104v4.pdf
PWC	https://paperswithcode.com/paper/adn-artifact-disentanglement-network-for
Repo	https://github.com/JunMa11/MICCAI2019-OpenSourcePapers
Framework	tf