Paper Group AWR 400
Reinventing 2D Convolutions for 3D Images. Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering. A Neural-Symbolic Architecture for Inverse Graphics Improved by Lifelong Meta-Learning. Two-stage Image Classification Supervised by a Single Teacher Single Student Model. Adaptively Connected Neural Networks. Dependency or Span, E …
Reinventing 2D Convolutions for 3D Images
Title | Reinventing 2D Convolutions for 3D Images |
Authors | Jiancheng Yang, Xiaoyang Huang, Bingbing Ni, Jingwei Xu, Canqian Yang, Guozheng Xu |
Abstract | There have been considerable debates over 2D and 3D representation learning on 3D medical images. 2D approaches could benefit from large-scale 2D pretraining, whereas they are generally weak in capturing large 3D contexts. 3D approaches are natively strong in 3D contexts, however few publicly available 3D medical dataset is large and diverse enough for universal 3D pretraining. Even for hybrid (2D + 3D) approaches, the intrinsic disadvantages within the 2D / 3D parts still exist. In this study, we bridge the gap between 2D and 3D convolutions by reinventing the 2D convolutions. We propose ACS (axial-coronal-sagittal) convolutions to perform natively 3D representation learning, while utilizing the pretrained weights on 2D datasets. In ACS convolutions, 2D convolution kernels are split by channel into three parts, and convoluted separately on the three views (axial, coronal and sagittal) of 3D representations. Theoretically, ANY 2D CNN (ResNet, DenseNet, or DeepLab) is able to be converted into a 3D ACS CNN, with pretrained weight of a same parameter size. Extensive experiments on proof-of-concept dataset and several medical benchmarks validate the consistent superiority of the pretrained ACS CNNs, over the 2D / 3D CNN counterparts with / without pretraining. Even without pretraining, the ACS convolution can be used as a plug-and-play replacement of standard 3D convolution, with smaller model size and less computation. |
Tasks | Representation Learning |
Published | 2019-11-24 |
URL | https://arxiv.org/abs/1911.10477v2 |
https://arxiv.org/pdf/1911.10477v2.pdf | |
PWC | https://paperswithcode.com/paper/reinventing-2d-convolutions-for-3d-medical |
Repo | https://github.com/m3dv/ACSConv |
Framework | pytorch |
Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering
Title | Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering |
Authors | Pedro F. Proenca, Yang Gao |
Abstract | On-orbit proximity operations in space rendezvous, docking and debris removal require precise and robust 6D pose estimation under a wide range of lighting conditions and against highly textured background, i.e., the Earth. This paper investigates leveraging deep learning and photorealistic rendering for monocular pose estimation of known uncooperative spacecrafts. We first present a simulator built on Unreal Engine 4, named URSO, to generate labeled images of spacecrafts orbiting the Earth, which can be used to train and evaluate neural networks. Secondly, we propose a deep learning framework for pose estimation based on orientation soft classification, which allows modelling orientation ambiguity as a mixture of Gaussians. This framework was evaluated both on URSO datasets and the ESA pose estimation challenge. In this competition, our best model achieved 3rd place on the synthetic test set and 2nd place on the real test set. Moreover, our results show the impact of several architectural and training aspects, and we demonstrate qualitatively how models learned on URSO datasets can perform on real images from space. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04298v2 |
https://arxiv.org/pdf/1907.04298v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-spacecraft-pose-estimation |
Repo | https://github.com/pedropro/UrsoNet |
Framework | tf |
A Neural-Symbolic Architecture for Inverse Graphics Improved by Lifelong Meta-Learning
Title | A Neural-Symbolic Architecture for Inverse Graphics Improved by Lifelong Meta-Learning |
Authors | Michael Kissner, Helmut Mayer |
Abstract | We follow the idea of formulating vision as inverse graphics and propose a new type of element for this task, a neural-symbolic capsule. It is capable of de-rendering a scene into semantic information feed-forward, as well as rendering it feed-backward. An initial set of capsules for graphical primitives is obtained from a generative grammar and connected into a full capsule network. Lifelong meta-learning continuously improves this network’s detection capabilities by adding capsules for new and more complex objects it detects in a scene using few-shot learning. Preliminary results demonstrate the potential of our novel approach. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.08910v2 |
https://arxiv.org/pdf/1905.08910v2.pdf | |
PWC | https://paperswithcode.com/paper/a-neural-symbolic-architecture-for-inverse |
Repo | https://github.com/Kayzaks/VividNet |
Framework | none |
Two-stage Image Classification Supervised by a Single Teacher Single Student Model
Title | Two-stage Image Classification Supervised by a Single Teacher Single Student Model |
Authors | Jianhang Zhou, Shaoning Zeng, Bob Zhang |
Abstract | The two-stage strategy has been widely used in image classification. However, these methods barely take the classification criteria of the first stage into consideration in the second prediction stage. In this paper, we propose a novel two-stage representation method (TSR), and convert it to a Single-Teacher Single-Student (STSS) problem in our two-stage image classification framework. We seek the nearest neighbours of the test sample to choose candidate target classes. Meanwhile, the first stage classifier is formulated as the teacher, which holds the classification scores. The samples of the candidate classes are utilized to learn a student classifier based on L2-minimization in the second stage. The student will be supervised by the teacher classifier, which approves the student only if it obtains a higher score. In actuality, the proposed framework generates a stronger classifier by staging two weaker classifiers in a novel way. The experiments conducted on several face and object databases show that our proposed framework is effective and outperforms multiple popular classification methods. |
Tasks | Image Classification |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12111v1 |
https://arxiv.org/pdf/1909.12111v1.pdf | |
PWC | https://paperswithcode.com/paper/two-stage-image-classification-supervised-by |
Repo | https://github.com/zengsn/research |
Framework | none |
Adaptively Connected Neural Networks
Title | Adaptively Connected Neural Networks |
Authors | Guangrun Wang, Keze Wang, Liang Lin |
Abstract | This paper presents a novel adaptively connected neural network (ACNet) to improve the traditional convolutional neural networks (CNNs) {in} two aspects. First, ACNet employs a flexible way to switch global and local inference in processing the internal feature representations by adaptively determining the connection status among the feature nodes (e.g., pixels of the feature maps) \footnote{In a computer vision domain, a node refers to a pixel of a feature map{, while} in {the} graph domain, a node denotes a graph node.}. We can show that existing CNNs, the classical multilayer perceptron (MLP), and the recently proposed non-local network (NLN) \cite{nonlocalnn17} are all special cases of ACNet. Second, ACNet is also capable of handling non-Euclidean data. Extensive experimental analyses on {a variety of benchmarks (i.e.,} ImageNet-1k classification, COCO 2017 detection and segmentation, CUHK03 person re-identification, CIFAR analysis, and Cora document categorization) demonstrate that {ACNet} cannot only achieve state-of-the-art performance but also overcome the limitation of the conventional MLP and CNN \footnote{Corresponding author: Liang Lin (linliang@ieee.org)}. The code is available at \url{https://github.com/wanggrun/Adaptively-Connected-Neural-Networks}. |
Tasks | Document Classification, Image Classification, Person Re-Identification |
Published | 2019-04-07 |
URL | http://arxiv.org/abs/1904.03579v1 |
http://arxiv.org/pdf/1904.03579v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptively-connected-neural-networks |
Repo | https://github.com/wanggrun/Adaptively-Connected-Neural-Networks |
Framework | tf |
Dependency or Span, End-to-End Uniform Semantic Role Labeling
Title | Dependency or Span, End-to-End Uniform Semantic Role Labeling |
Authors | Zuchao Li, Shexia He, Hai Zhao, Yiqing Zhang, Zhuosheng Zhang, Xi Zhou, Xiang Zhou |
Abstract | Semantic role labeling (SRL) aims to discover the predicateargument structure of a sentence. End-to-end SRL without syntactic input has received great attention. However, most of them focus on either span-based or dependency-based semantic representation form and only show specific model optimization respectively. Meanwhile, handling these two SRL tasks uniformly was less successful. This paper presents an end-to-end model for both dependency and span SRL with a unified argument representation to deal with two different types of argument annotations in a uniform fashion. Furthermore, we jointly predict all predicates and arguments, especially including long-term ignored predicate identification subtask. Our single model achieves new state-of-the-art results on both span (CoNLL 2005, 2012) and dependency (CoNLL 2008, 2009) SRL benchmarks. |
Tasks | Semantic Role Labeling |
Published | 2019-01-16 |
URL | http://arxiv.org/abs/1901.05280v1 |
http://arxiv.org/pdf/1901.05280v1.pdf | |
PWC | https://paperswithcode.com/paper/dependency-or-span-end-to-end-uniform |
Repo | https://github.com/bcmi220/unisrl |
Framework | tf |
Traditional and Heavy-Tailed Self Regularization in Neural Network Models
Title | Traditional and Heavy-Tailed Self Regularization in Neural Network Models |
Authors | Charles H. Martin, Michael W. Mahoney |
Abstract | Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of regularization, such as Dropout or Weight Norm constraints. Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify \emph{5+1 Phases of Training}, corresponding to increasing amounts of \emph{Implicit Self-Regularization}. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a `size scale’ separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of \emph{Heavy-Tailed Self-Regularization}, similar to the self-organization seen in the statistical physics of disordered systems. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. | |
Tasks | |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1901.08276v1 |
http://arxiv.org/pdf/1901.08276v1.pdf | |
PWC | https://paperswithcode.com/paper/traditional-and-heavy-tailed-self |
Repo | https://github.com/CalculatedContent/ImplicitSelfRegularization |
Framework | pytorch |
Sheaves: A Topological Approach to Big Data
Title | Sheaves: A Topological Approach to Big Data |
Authors | Linas Vepstas |
Abstract | This document develops general concepts useful for extracting knowledge embedded in large graphs or datasets that have pair-wise relationships, such as cause-effect-type relations. Almost no underlying assumptions are made, other than that the data can be presented in terms of pair-wise relationships between objects/events. This assumption is used to mine for patterns in the dataset, defining a reduced graph or dataset that boils-down or concentrates information into a more compact form. The resulting extracted structure or set of patterns are manifestly symbolic in nature, as they capture and encode the graph structure of the dataset in terms of a (generative) grammar. This structure is identified as having the formal mathematical structure of a sheaf. In essence, this paper introduces the basic concepts of sheaf theory into the domain of graphical datasets. |
Tasks | |
Published | 2019-01-04 |
URL | http://arxiv.org/abs/1901.01341v1 |
http://arxiv.org/pdf/1901.01341v1.pdf | |
PWC | https://paperswithcode.com/paper/sheaves-a-topological-approach-to-big-data |
Repo | https://github.com/arquicanedo/graph2sheaves |
Framework | none |
Cycle-IR: Deep Cyclic Image Retargeting
Title | Cycle-IR: Deep Cyclic Image Retargeting |
Authors | Weimin Tan, Bo Yan, Chumin Lin, Xuejing Niu |
Abstract | Supervised deep learning techniques have achieved great success in various fields due to getting rid of the limitation of handcrafted representations. However, most previous image retargeting algorithms still employ fixed design principles such as using gradient map or handcrafted features to compute saliency map, which inevitably restricts its generality. Deep learning techniques may help to address this issue, but the challenging problem is that we need to build a large-scale image retargeting dataset for the training of deep retargeting models. However, building such a dataset requires huge human efforts. In this paper, we propose a novel deep cyclic image retargeting approach, called Cycle-IR, to firstly implement image retargeting with a single deep model, without relying on any explicit user annotations. Our idea is built on the reverse mapping from the retargeted images to the given input images. If the retargeted image has serious distortion or excessive loss of important visual information, the reverse mapping is unlikely to restore the input image well. We constrain this forward-reverse consistency by introducing a cyclic perception coherence loss. In addition, we propose a simple yet effective image retargeting network (IRNet) to implement the image retargeting process. Our IRNet contains a spatial and channel attention layer, which is able to discriminate visually important regions of input images effectively, especially in cluttered images. Given arbitrary sizes of input images and desired aspect ratios, our Cycle-IR can produce visually pleasing target images directly. Extensive experiments on the standard RetargetMe dataset show the superiority of our Cycle-IR. In addition, our Cycle-IR outperforms the Multiop method and obtains the best result in the user study. Code is available at https://github.com/mintanwei/Cycle-IR. |
Tasks | |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03556v1 |
https://arxiv.org/pdf/1905.03556v1.pdf | |
PWC | https://paperswithcode.com/paper/190503556 |
Repo | https://github.com/mintanwei/Cycle-IR |
Framework | tf |
IRNet: A General Purpose Deep Residual Regression Framework for Materials Discovery
Title | IRNet: A General Purpose Deep Residual Regression Framework for Materials Discovery |
Authors | Dipendra Jha, Logan Ward, Zijiang Yang, Christopher Wolverton, Ian Foster, Wei-keng Liao, Alok Choudhary, Ankit Agrawal |
Abstract | Materials discovery is crucial for making scientific advances in many domains. Collections of data from experiments and first-principle computations have spurred interest in applying machine learning methods to create predictive models capable of mapping from composition and crystal structures to materials properties. Generally, these are regression problems with the input being a 1D vector composed of numerical attributes representing the material composition and/or crystal structure. While neural networks consisting of fully connected layers have been applied to such problems, their performance often suffers from the vanishing gradient problem when network depth is increased. In this paper, we study and propose design principles for building deep regression networks composed of fully connected layers with numerical vectors as input. We introduce a novel deep regression network with individual residual learning, IRNet, that places shortcut connections after each layer so that each layer learns the residual mapping between its output and input. We use the problem of learning properties of inorganic materials from numerical attributes derived from material composition and/or crystal structure to compare IRNet’s performance against that of other machine learning techniques. Using multiple datasets from the Open Quantum Materials Database (OQMD) and Materials Project for training and evaluation, we show that IRNet provides significantly better prediction performance than the state-of-the-art machine learning approaches currently used by domain scientists. We also show that IRNet’s use of individual residual learning leads to better convergence during the training phase than when shortcut connections are between multi-layer stacks while maintaining the same number of parameters. |
Tasks | |
Published | 2019-07-07 |
URL | https://arxiv.org/abs/1907.03222v1 |
https://arxiv.org/pdf/1907.03222v1.pdf | |
PWC | https://paperswithcode.com/paper/irnet-a-general-purpose-deep-residual |
Repo | https://github.com/dipendra009/IRNet |
Framework | none |
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Title | TVQA+: Spatio-Temporal Grounding for Video Question Answering |
Authors | Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal |
Abstract | We present the task of Spatio-Temporal Video Question Answering, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced visual concepts (people and objects) to answer natural language questions about videos. We first augment the TVQA dataset with 310.8k bounding boxes, linking depicted objects to visual concepts in questions and answers. We name this augmented version as TVQA+. We then propose Spatio-Temporal Answerer with Grounded Evidence (STAGE), a unified framework that grounds evidence in both the spatial and temporal domains to answer questions about videos. Comprehensive experiments and analyses demonstrate the effectiveness of our framework and how the rich annotations in our TVQA+ dataset can contribute to the question answering task. As a side product, by performing this joint task, our model is able to produce more insightful intermediate results. Dataset and code are publicly available. |
Tasks | Question Answering, Video Question Answering |
Published | 2019-04-25 |
URL | http://arxiv.org/abs/1904.11574v1 |
http://arxiv.org/pdf/1904.11574v1.pdf | |
PWC | https://paperswithcode.com/paper/tvqa-spatio-temporal-grounding-for-video |
Repo | https://github.com/jayleicn/TVQA-PLUS |
Framework | pytorch |
Learning Sparse Sharing Architectures for Multiple Tasks
Title | Learning Sparse Sharing Architectures for Multiple Tasks |
Authors | Tianxiang Sun, Yunfan Shao, Xiaonan Li, Pengfei Liu, Hang Yan, Xipeng Qiu, Xuanjing Huang |
Abstract | Most existing deep multi-task learning models are based on parameter sharing, such as hard sharing, hierarchical sharing, and soft sharing. How choosing a suitable sharing mechanism depends on the relations among the tasks, which is not easy since it is difficult to understand the underlying shared factors among these tasks. In this paper, we propose a novel parameter sharing mechanism, named \emph{Sparse Sharing}. Given multiple tasks, our approach automatically finds a sparse sharing structure. We start with an over-parameterized base network, from which each task extracts a subnetwork. The subnetworks of multiple tasks are partially overlapped and trained in parallel. We show that both hard sharing and hierarchical sharing can be formulated as particular instances of the sparse sharing framework. We conduct extensive experiments on three sequence labeling tasks. Compared with single-task models and three typical multi-task learning baselines, our proposed approach achieves consistent improvement while requiring fewer parameters. |
Tasks | Multi-Task Learning |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.05034v2 |
https://arxiv.org/pdf/1911.05034v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-sparse-sharing-architectures-for |
Repo | https://github.com/choosewhatulike/sparse-sharing |
Framework | pytorch |
Machine learning for automatic construction of pseudo-realistic pediatric abdominal phantoms
Title | Machine learning for automatic construction of pseudo-realistic pediatric abdominal phantoms |
Authors | Marco Virgolin, Ziyuan Wang, Tanja Alderliesten, Peter A. N. Bosman |
Abstract | Machine Learning (ML) is proving extremely beneficial in many healthcare applications. In pediatric oncology, retrospective studies that investigate the relationship between treatment and late adverse effects still rely on simple heuristics. To assess the effects of radiation therapy, treatment plans are typically simulated on phantoms, i.e., virtual surrogates of patient anatomy. Currently, phantoms are built according to reasonable, yet simple, human-designed criteria. This often results in a lack of individualization. We present a novel approach that combines imaging and ML to build individualized phantoms automatically. Given the features of a patient treated historically (only 2D radiographs available), and a database of 3D Computed Tomography (CT) imaging with organ segmentations and relative patient features, our approach uses ML to predict how to assemble a patient-specific phantom automatically. Experiments on 60 abdominal CTs of pediatric patients show that our approach constructs significantly more representative phantoms than using current phantom building criteria, in terms of location and shape of the abdomen and of two considered organs, the liver and the spleen. Among several ML algorithms considered, the Gene-pool Optimal Mixing Evolutionary Algorithm for Genetic Programming (GP-GOMEA) is found to deliver the best performing models, which are, moreover, transparent and interpretable mathematical expressions. |
Tasks | Computed Tomography (CT) |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.03723v1 |
https://arxiv.org/pdf/1909.03723v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-for-automatic-construction |
Repo | https://github.com/marcovirgolin/APhA |
Framework | none |
Facial Motion Prior Networks for Facial Expression Recognition
Title | Facial Motion Prior Networks for Facial Expression Recognition |
Authors | Yuedong Chen, Jianfeng Wang, Shikai Chen, Zhongchao Shi, Jianfei Cai |
Abstract | Deep learning based facial expression recognition (FER) has received a lot of attention in the past few years. Most of the existing deep learning based FER methods do not consider domain knowledge well, which thereby fail to extract representative features. In this work, we propose a novel FER framework, named Facial Motion Prior Networks (FMPN). Particularly, we introduce an addition branch to generate a facial mask so as to focus on facial muscle moving regions. To guide the facial mask learning, we propose to incorporate prior domain knowledge by using the average differences between neutral faces and the corresponding expressive faces as the training guidance. Extensive experiments on three facial expression benchmark datasets demonstrate the effectiveness of the proposed method, compared with the state-of-the-art approaches. |
Tasks | Facial Expression Recognition |
Published | 2019-02-23 |
URL | https://arxiv.org/abs/1902.08788v2 |
https://arxiv.org/pdf/1902.08788v2.pdf | |
PWC | https://paperswithcode.com/paper/facial-motion-prior-networks-for-facial |
Repo | https://github.com/donydchen/FMPN-FER |
Framework | pytorch |
ADN: Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction
Title | ADN: Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction |
Authors | Haofu Liao, Wei-An Lin, S. Kevin Zhou, Jiebo Luo |
Abstract | Current deep neural network based approaches to computed tomography (CT) metal artifact reduction (MAR) are supervised methods that rely on synthesized metal artifacts for training. However, as synthesized data may not accurately simulate the underlying physical mechanisms of CT imaging, the supervised methods often generalize poorly to clinical applications. To address this problem, we propose, to the best of our knowledge, the first unsupervised learning approach to MAR. Specifically, we introduce a novel artifact disentanglement network that disentangles the metal artifacts from CT images in the latent space. It supports different forms of generations (artifact reduction, artifact transfer, and self-reconstruction, etc.) with specialized loss functions to obviate the need for supervision with synthesized data. Extensive experiments show that when applied to a synthesized dataset, our method addresses metal artifacts significantly better than the existing unsupervised models designed for natural image-to-image translation problems, and achieves comparable performance to existing supervised models for MAR. When applied to clinical datasets, our method demonstrates better generalization ability over the supervised models. The source code of this paper is publicly available at https://github.com/liaohaofu/adn. |
Tasks | Computed Tomography (CT), Image-to-Image Translation, Medical Image Generation, Metal Artifact Reduction |
Published | 2019-08-03 |
URL | https://arxiv.org/abs/1908.01104v4 |
https://arxiv.org/pdf/1908.01104v4.pdf | |
PWC | https://paperswithcode.com/paper/adn-artifact-disentanglement-network-for |
Repo | https://github.com/JunMa11/MICCAI2019-OpenSourcePapers |
Framework | tf |