January 30, 2020

3210 words 16 mins read

Paper Group ANR 301

RadiX-Net: Structured Sparse Matrices for Deep Neural Networks. Temporal and Aspectual Entailment. Projectron – A Shallow and Interpretable Network for Classifying Medical Images. Justification-Based Reliability in Machine Learning. Improving Limited Angle CT Reconstruction with a Robust GAN Prior. Literature Review: Human Segmentation with Static …

RadiX-Net: Structured Sparse Matrices for Deep Neural Networks


Title	RadiX-Net: Structured Sparse Matrices for Deep Neural Networks
Authors	Ryan A. Robinett, Jeremy Kepner
Abstract	The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to store and train them. Research over the past few decades has explored the prospect of sparsifying DNNs before, during, and after training by pruning edges from the underlying topology. The resulting neural network is known as a sparse neural network. More recent work has demonstrated the remarkable result that certain sparse DNNs can train to the same precision as dense DNNs at lower runtime and storage cost. An intriguing class of these sparse DNNs is the X-Nets, which are initialized and trained upon a sparse topology with neither reference to a parent dense DNN nor subsequent pruning. We present an algorithm that deterministically generates RadiX-Nets: sparse DNN topologies that, as a whole, are much more diverse than X-Net topologies, while preserving X-Nets’ desired characteristics. We further present a functional-analytic conjecture based on the longstanding observation that sparse neural network topologies can attain the same expressive power as dense counterparts
Tasks
Published	2019-04-30
URL	http://arxiv.org/abs/1905.00416v1
PDF	http://arxiv.org/pdf/1905.00416v1.pdf
PWC	https://paperswithcode.com/paper/radix-net-structured-sparse-matrices-for-deep
Repo
Framework

Temporal and Aspectual Entailment


Title	Temporal and Aspectual Entailment
Authors	Thomas Kober, Sander Bijl de Vroe, Mark Steedman
Abstract	Inferences regarding “Jane’s arrival in London” from predications such as “Jane is going to London” or “Jane has gone to London” depend on tense and aspect of the predications. Tense determines the temporal location of the predication in the past, present or future of the time of utterance. The aspectual auxiliaries on the other hand specify the internal constituency of the event, i.e. whether the event of “going to London” is completed and whether its consequences hold at that time or not. While tense and aspect are among the most important factors for determining natural language inference, there has been very little work to show whether modern NLP models capture these semantic concepts. In this paper we propose a novel entailment dataset and analyse the ability of a range of recently proposed NLP models to perform inference on temporal predications. We show that the models encode a substantial amount of morphosyntactic information relating to tense and aspect, but fail to model inferences that require reasoning with these semantic properties.
Tasks	Natural Language Inference
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01297v1
PDF	http://arxiv.org/pdf/1904.01297v1.pdf
PWC	https://paperswithcode.com/paper/temporal-and-aspectual-entailment
Repo
Framework

Projectron – A Shallow and Interpretable Network for Classifying Medical Images


Title	Projectron – A Shallow and Interpretable Network for Classifying Medical Images
Authors	Aditya Sriram, Shivam Kalra, H. R. Tizhoosh
Abstract	This paper introduces the `Projectron’ as a new neural network architecture that uses Radon projections to both classify and represent medical images. The motivation is to build shallow networks which are more interpretable in the medical imaging domain. Radon transform is an established technique that can reconstruct images from parallel projections. The Projectron first applies global Radon transform to each image using equidistant angles and then feeds these transformations for encoding to a single layer of neurons followed by a layer of suitable kernels to facilitate a linear separation of projections. Finally, the Projectron provides the output of the encoding as an input to two more layers for final classification. We validate the Projectron on five publicly available datasets, a general dataset (namely MNIST) and four medical datasets (namely Emphysema, IDC, IRMA, and Pneumonia). The results are encouraging as we compared the Projectron’s performance against MLPs with raw images and Radon projections as inputs, respectively. Experiments clearly demonstrate the potential of the proposed Projectron for representing/classifying medical images. \|
Tasks
Published	2019-03-15
URL	http://arxiv.org/abs/1904.00740v1
PDF	http://arxiv.org/pdf/1904.00740v1.pdf
PWC	https://paperswithcode.com/paper/projectron-a-shallow-and-interpretable
Repo
Framework

Justification-Based Reliability in Machine Learning


Title	Justification-Based Reliability in Machine Learning
Authors	Nurali Virani, Naresh Iyer, Zhaoyuan Yang
Abstract	With the advent of Deep Learning, the field of machine learning (ML) has surpassed human-level performance on diverse classification tasks. At the same time, there is a stark need to characterize and quantify reliability of a model’s prediction on individual samples. This is especially true in application of such models in safety-critical domains of industrial control and healthcare. To address this need, we link the question of reliability of a model’s individual prediction to the epistemic uncertainty of the model’s prediction. More specifically, we extend the theory of Justified True Belief (JTB) in epistemology, created to study the validity and limits of human-acquired knowledge, towards characterizing the validity and limits of knowledge in supervised classifiers. We present an analysis of neural network classifiers linking the reliability of its prediction on an input to characteristics of the support gathered from the input and latent spaces of the network. We hypothesize that the JTB analysis exposes the epistemic uncertainty (or ignorance) of a model with respect to its inference, thereby allowing for the inference to be only as strong as the justification permits. We explore various forms of support (for e.g., k-nearest neighbors (k-NN) and l_p-norm based) generated for an input, using the training data to construct a justification for the prediction with that input. Through experiments conducted on simulated and real datasets, we demonstrate that our approach can provide reliability for individual predictions and characterize regions where such reliability cannot be ascertained.
Tasks
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07391v2
PDF	https://arxiv.org/pdf/1911.07391v2.pdf
PWC	https://paperswithcode.com/paper/justification-based-reliability-in-machine
Repo
Framework

Improving Limited Angle CT Reconstruction with a Robust GAN Prior


Title	Improving Limited Angle CT Reconstruction with a Robust GAN Prior
Authors	Rushil Anirudh, Hyojin Kim, Jayaraman J. Thiagarajan, K. Aditya Mohan, Kyle M. Champley
Abstract	Limited angle CT reconstruction is an under-determined linear inverse problem that requires appropriate regularization techniques to be solved. In this work we study how pre-trained generative adversarial networks (GANs) can be used to clean noisy, highly artifact laden reconstructions from conventional techniques, by effectively projecting onto the inferred image manifold. In particular, we use a robust version of the popularly used GAN prior for inverse problems, based on a recent technique called corruption mimicking, that significantly improves the reconstruction quality. The proposed approach operates in the image space directly, as a result of which it does not need to be trained or require access to the measurement model, is scanner agnostic, and can work over a wide range of sensing scenarios.
Tasks
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01634v4
PDF	https://arxiv.org/pdf/1910.01634v4.pdf
PWC	https://paperswithcode.com/paper/improving-limited-angle-ct-reconstruction
Repo
Framework

Literature Review: Human Segmentation with Static Camera


Title	Literature Review: Human Segmentation with Static Camera
Authors	Jiaxin Xu, Rui Wang, Vaibhav Rakheja
Abstract	Our research topic is Human segmentation with static camera. This topic can be divided into three sub-tasks, which are object detection, instance identification and segmentation. These sub-tasks are three closely related subjects. The development of each subject has great impact on the other two fields. In this literature review, we will first introduce the background of human segmentation and then talk about issues related to the above three fields as well as how they interact with each other.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12945v1
PDF	https://arxiv.org/pdf/1910.12945v1.pdf
PWC	https://paperswithcode.com/paper/literature-review-human-segmentation-with
Repo
Framework

Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN)


Title	Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN)
Authors	Sixue Gong, Yichun Shi, Anil K. Jain
Abstract	We propose a new approach to video face recognition. Our component-wise feature aggregation network (C-FAN) accepts a set of face images of a subject as an input, and outputs a single feature vector as the face representation of the set for the recognition task. The whole network is trained in two steps: (i) train a base CNN for still image face recognition; (ii) add an aggregation module to the base network to learn the quality value for each feature component, which adaptively aggregates deep feature vectors into a single vector to represent the face in a video. C-FAN automatically learns to retain salient face features with high quality scores while suppressing features with low quality scores. The experimental results on three benchmark datasets, YouTube Faces, IJB-A, and IJB-S show that the proposed C-FAN network is capable of generating a compact feature vector with 512 dimensions for a video sequence by efficiently aggregating feature vectors of all the video frames to achieve state of the art performance.
Tasks	Face Recognition
Published	2019-02-19
URL	https://arxiv.org/abs/1902.07327v3
PDF	https://arxiv.org/pdf/1902.07327v3.pdf
PWC	https://paperswithcode.com/paper/video-face-recognition-component-wise-feature
Repo
Framework

An Empirical Study on Position of the Batch Normalization Layer in Convolutional Neural Networks


Title	An Empirical Study on Position of the Batch Normalization Layer in Convolutional Neural Networks
Authors	Moein Hasani, Hassan Khotanlou
Abstract	In this paper, we have studied how the training of the convolutional neural networks (CNNs) can be affected by changing the position of the batch normalization (BN) layer. Three different convolutional neural networks have been chosen for our experiments. These networks are AlexNet, VGG-16, and ResNet- 20. We show that the speed up in training provided by the BN algorithm can be improved by using other positions for the BN layer than the one suggested by its original paper. Also, we discuss how the BN layer in a certain position can aid the training of one network but not the other. Three different positions for the BN layer have been studied in this research. These positions are: the BN layer between the convolution layer and the non-linear activation function, the BN layer after the non-linear activation function and finally, the BN layer before each of the convolutional layers.
Tasks
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04259v2
PDF	https://arxiv.org/pdf/1912.04259v2.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-on-position-of-the-batch
Repo
Framework

Early Detection of Injuries in MLB Pitchers from Video


Title	Early Detection of Injuries in MLB Pitchers from Video
Authors	AJ Piergiovanni, Michael S. Ryoo
Abstract	Injuries are a major cost in sports. Teams spend millions of dollars every year on players who are hurt and unable to play, resulting in lost games, decreased fan interest and additional wages for replacement players. Modern convolutional neural networks have been successfully applied to many video recognition tasks. In this paper, we introduce the problem of injury detection/prediction in MLB pitchers and experimentally evaluate the ability of such convolutional models to detect and predict injuries in pitches only from video data. We conduct experiments on a large dataset of TV broadcast MLB videos of 20 different pitchers who were injured during the 2017 season. We experimentally evaluate the model’s performance on each individual pitcher, how well it generalizes to new pitchers, how it performs for various injuries, and how early it can predict or detect an injury.
Tasks	Video Recognition
Published	2019-04-18
URL	http://arxiv.org/abs/1904.08916v1
PDF	http://arxiv.org/pdf/1904.08916v1.pdf
PWC	https://paperswithcode.com/paper/early-detection-of-injuries-in-mlb-pitchers
Repo
Framework

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions


Title	Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
Authors	Yulei Niu, Hanwang Zhang, Zhiwu Lu, Shih-Fu Chang
Abstract	We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., `largest elephant standing behind baby elephant''. This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context -- visual attributes (e.g.,` largest’', `baby'') and relationships (e.g.,` behind’') that help to distinguish the referent from other objects, especially those of the same category. Due to the exponential complexity involved in modeling the context associated with multiple image regions, existing work oversimplifies this task to pairwise region modeling by multiple instance learning. In this paper, we propose a variational Bayesian method, called Variational Context, to solve the problem of complex context modeling in referring expression grounding. Specifically, our framework exploits the reciprocal relation between the referent and context, i.e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced. In addition to reciprocity, our framework considers the semantic information of context, i.e., the referring expression can be reproduced based on the estimated context. We also extend the model to unsupervised setting where no annotation for the referent is available. Extensive experiments on various benchmarks show consistent improvement over state-of-the-art methods in both supervised and unsupervised settings.
Tasks	Multiple Instance Learning
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03609v1
PDF	https://arxiv.org/pdf/1907.03609v1.pdf
PWC	https://paperswithcode.com/paper/variational-context-exploiting-visual-and
Repo
Framework

Black-box Adversarial Attacks on Video Recognition Models


Title	Black-box Adversarial Attacks on Video Recognition Models
Authors	Linxi Jiang, Xingjun Ma, Shaoxiang Chen, James Bailey, Yu-Gang Jiang
Abstract	Deep neural networks (DNNs) are known for their vulnerability to adversarial examples. These are examples that have undergone small, carefully crafted perturbations, and which can easily fool a DNN into making misclassifications at test time. Thus far, the field of adversarial research has mainly focused on image models, under either a white-box setting, where an adversary has full access to model parameters, or a black-box setting where an adversary can only query the target model for probabilities or labels. Whilst several white-box attacks have been proposed for video models, black-box video attacks are still unexplored. To close this gap, we propose the first black-box video attack framework, called V-BAD. V-BAD utilizes tentative perturbations transferred from image models, and partition-based rectifications found by the NES on partitions (patches) of tentative perturbations, to obtain good adversarial gradient estimates with fewer queries to the target model. V-BAD is equivalent to estimating the projection of an adversarial gradient on a selected subspace. Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models. For the targeted attack, it achieves $>$93% success rate using only an average of $3.4 \sim 8.4 \times 10^4$ queries, a similar number of queries to state-of-the-art black-box image attacks. This is despite the fact that videos often have two orders of magnitude higher dimensionality than static images. We believe that V-BAD is a promising new tool to evaluate and improve the robustness of video recognition models to black-box adversarial attacks.
Tasks	Video Recognition
Published	2019-04-10
URL	https://arxiv.org/abs/1904.05181v2
PDF	https://arxiv.org/pdf/1904.05181v2.pdf
PWC	https://paperswithcode.com/paper/black-box-adversarial-attacks-on-video
Repo
Framework

An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity


Title	An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity
Authors	Yingzhen Yang, Jiahui Yu, Xingjian Li, Jun Huan, Thomas S. Huang
Abstract	Regularization of Deep Neural Networks (DNNs) for the sake of improving their generalization capability is important and challenging. The development in this line benefits theoretical foundation of DNNs and promotes their usability in different areas of artificial intelligence. In this paper, we investigate the role of Rademacher complexity in improving generalization of DNNs and propose a novel regularizer rooted in Local Rademacher Complexity (LRC). While Rademacher complexity is well known as a distribution-free complexity measure of function class that help boost generalization of statistical learning methods, extensive study shows that LRC, its counterpart focusing on a restricted function class, leads to sharper convergence rates and potential better generalization given finite training sample. Our LRC based regularizer is developed by estimating the complexity of the function class centered at the minimizer of the empirical loss of DNNs. Experiments on various types of network architecture demonstrate the effectiveness of LRC regularization in improving generalization. Moreover, our method features the state-of-the-art result on the CIFAR-$10$ dataset with network architecture found by neural architecture search.
Tasks	Neural Architecture Search
Published	2019-02-03
URL	https://arxiv.org/abs/1902.00873v3
PDF	https://arxiv.org/pdf/1902.00873v3.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-on-regularization-of-deep
Repo
Framework

Towards Optimizing Reiter’s HS-Tree for Sequential Diagnosis


Title	Towards Optimizing Reiter’s HS-Tree for Sequential Diagnosis
Authors	Patrick Rodler
Abstract	Reiter’s HS-Tree is one of the most popular diagnostic search algorithms due to its desirable properties and general applicability. In sequential diagnosis, where the addressed diagnosis problem is subject to successive change through the acquisition of additional knowledge about the diagnosed system, HS-Tree is used in a stateless fashion. That is, the existing search tree is discarded when new knowledge is obtained, albeit often large parts of the tree are still relevant and have to be rebuilt in the next iteration, involving redundant operations and costly reasoner calls. As a remedy to this, we propose DynamicHS, a variant of HS-Tree that avoids these redundancy issues by maintaining state throughout sequential diagnosis while preserving all desirable properties of HS-Tree. Preliminary results of ongoing evaluations in a problem domain where HS-Tree is the state-of-the-art diagnostic method suggest significant time savings achieved by DynamicHS by reducing expensive reasoner calls.
Tasks	Sequential Diagnosis
Published	2019-07-28
URL	https://arxiv.org/abs/1907.12130v1
PDF	https://arxiv.org/pdf/1907.12130v1.pdf
PWC	https://paperswithcode.com/paper/towards-optimizing-reiters-hs-tree-for
Repo
Framework

Demonstration of Vector Flow Imaging using Convolutional Neural Networks


Title	Demonstration of Vector Flow Imaging using Convolutional Neural Networks
Authors	Thomas Robins, Antonio Stanziola, Kai Reimer, Peter Weinberg, Meng-Xing Tang
Abstract	Synthetic Aperture Vector Flow Imaging (SA-VFI) can visualize complex cardiac and vascular blood flow patterns at high temporal resolution with a large field of view. Convolutional neural networks (CNNs) are commonly used in image and video recognition and classification. However, most recently presented CNNs also allow for making per-pixel predictions as needed in optical flow velocimetry. To our knowledge we demonstrate here for the first time a CNN architecture to produce 2D full flow field predictions from high frame rate SA ultrasound images using supervised learning. The CNN was initially trained using CFD-generated and augmented noiseless SA ultrasound data of a realistic vessel geometry. Subsequently, a mix of noisy simulated and real \textit{in vivo} acquisitions were added to increase the network’s robustness. The resulting flow field of the CNN resembled the ground truth accurately with an endpoint-error percentage between 6.5% to 14.5%. Furthermore, when confronted with an unknown geometry of an arterial bifurcation, the CNN was able to predict an accurate flow field indicating its ability for generalization. Remarkably, the CNN also performed well for rotational flows, which usually requires advanced, computationally intensive VFI methods. We have demonstrated that convolutional neural networks can be used to estimate complex multidirectional flow.
Tasks	Optical Flow Estimation, Video Recognition
Published	2019-03-11
URL	http://arxiv.org/abs/1903.06254v1
PDF	http://arxiv.org/pdf/1903.06254v1.pdf
PWC	https://paperswithcode.com/paper/demonstration-of-vector-flow-imaging-using
Repo
Framework

Collaborative Spatio-temporal Feature Learning for Video Action Recognition


Title	Collaborative Spatio-temporal Feature Learning for Video Action Recognition
Authors	Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu
Abstract	Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D). In this paper, we propose a novel neural operation which encodes spatio-temporal features collaboratively by imposing a weight-sharing constraint on the learnable parameters. In particular, we perform 2D convolution along three orthogonal views of volumetric video data,which learns spatial appearance and temporal motion cues respectively. By sharing the convolution kernels of different views, spatial and temporal features are collaboratively learned and thus benefit from each other. The complementary features are subsequently fused by a weighted summation whose coefficients are learned end-to-end. Our approach achieves state-of-the-art performance on large-scale benchmarks and won the 1st place in the Moments in Time Challenge 2018. Moreover, based on the learned coefficients of different views, we are able to quantify the contributions of spatial and temporal features. This analysis sheds light on interpretability of the model and may also guide the future design of algorithm for video recognition.
Tasks	Action Recognition In Videos, Temporal Action Localization, Video Recognition
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01197v1
PDF	http://arxiv.org/pdf/1903.01197v1.pdf
PWC	https://paperswithcode.com/paper/collaborative-spatio-temporal-feature
Repo
Framework