February 1, 2020

3260 words 16 mins read

Paper Group AWR 271

No Word is an Island – A Transformation Weighting Model for Semantic Composition. Towards Automatic Concept-based Explanations. Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation. All-In-One Underwater Image Enhancement using Domain-Adversarial Learning. Regularized Fine-grained Meta Face Anti-spoo …

No Word is an Island – A Transformation Weighting Model for Semantic Composition


Title	No Word is an Island – A Transformation Weighting Model for Semantic Composition
Authors	Corina Dima, Daniël de Kok, Neele Witte, Erhard Hinrichs
Abstract	Composition models of distributional semantics are used to construct phrase representations from the representations of their words. Composition models are typically situated on two ends of a spectrum. They either have a small number of parameters but compose all phrases in the same way, or they perform word-specific compositions at the cost of a far larger number of parameters. In this paper we propose transformation weighting (TransWeight), a composition model that consistently outperforms existing models on nominal compounds, adjective-noun phrases and adverb-adjective phrases in English, German and Dutch. TransWeight drastically reduces the number of parameters needed compared to the best model in the literature by composing similar words in the same way.
Tasks	Semantic Composition
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05048v1
PDF	https://arxiv.org/pdf/1907.05048v1.pdf
PWC	https://paperswithcode.com/paper/no-word-is-an-island-a-transformation
Repo	https://github.com/sfb833-a3/commix
Framework	tf

Towards Automatic Concept-based Explanations


Title	Towards Automatic Concept-based Explanations
Authors	Amirata Ghorbani, James Wexler, James Zou, Been Kim
Abstract	Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Most of the current explanation methods provide explanations through feature importance scores, which identify features that are important for each individual input. However, how to systematically summarize and interpret such per sample feature importance scores itself is challenging. In this work, we propose principles and desiderata for \emph{concept} based explanation, which goes beyond per-sample features to identify higher-level human-understandable concepts that apply across the entire dataset. We develop a new algorithm, ACE, to automatically extract visual concepts. Our systematic experiments demonstrate that \alg discovers concepts that are human-meaningful, coherent and important for the neural network’s predictions.
Tasks	Feature Importance
Published	2019-02-07
URL	https://arxiv.org/abs/1902.03129v3
PDF	https://arxiv.org/pdf/1902.03129v3.pdf
PWC	https://paperswithcode.com/paper/automating-interpretability-discovering-and
Repo	https://github.com/amiratag/ACE
Framework	tf

Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation


Title	Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation
Authors	Sahil Singla, Eric Wallace, Shi Feng, Soheil Feizi
Abstract	Current methods to interpret deep learning models by generating saliency maps generally rely on two key assumptions. First, they use first-order approximations of the loss function neglecting higher-order terms such as the loss curvatures. Second, they evaluate each feature’s importance in isolation, ignoring their inter-dependencies. In this work, we study the effect of relaxing these two assumptions. First, by characterizing a closed-form formula for the Hessian matrix of a deep ReLU network, we prove that, for a classification problem with a large number of classes, if an input has a high confidence classification score, the inclusion of the Hessian term has small impacts in the final solution. We prove this result by showing that in this case the Hessian matrix is approximately of rank one and its leading eigenvector is almost parallel to the gradient of the loss function. Our empirical experiments on ImageNet samples are consistent with our theory. This result can have implications in other related problems such as adversarial examples as well. Second, we compute the importance of group-features in deep learning interpretation by introducing a sparsity regularization term. We use the $L_0-L_1$ relaxation technique along with the proximal gradient descent to have an efficient computation of group feature importance scores. Our empirical results indicate that considering group features can improve deep learning interpretation significantly.
Tasks	Feature Importance
Published	2019-02-01
URL	https://arxiv.org/abs/1902.00407v2
PDF	https://arxiv.org/pdf/1902.00407v2.pdf
PWC	https://paperswithcode.com/paper/understanding-impacts-of-high-order-loss
Repo	https://github.com/singlasahil14/CASO
Framework	pytorch

All-In-One Underwater Image Enhancement using Domain-Adversarial Learning


Title	All-In-One Underwater Image Enhancement using Domain-Adversarial Learning
Authors	Pritish Uplavikar, Zhenyu Wu, Zhangyang Wang
Abstract	Raw underwater images are degraded due to wavelength dependent light attenuation and scattering, limiting their applicability in vision systems. Another factor that makes enhancing underwater images particularly challenging is the diversity of the water types in which they are captured. For example, images captured in deep oceanic waters have a different distribution from those captured in shallow coastal waters. Such diversity makes it hard to train a single model to enhance underwater images. In this work, we propose a novel model which nicely handles the diversity of water during the enhancement, by adversarially learning the content features of the images by disentangling the unwanted nuisances corresponding to water types (viewed as different domains). We use the learned domain agnostic features to generate enhanced underwater images. We train our model on a dataset consisting images of 10 Jerlov water types. Experimental results show that the proposed model not only outperforms the previous methods in SSIM and PSNR scores for almost all Jerlov water types but also generalizes well on real-world datasets. The performance of a high-level vision task (object detection) also shows improvement using enhanced images with our model.
Tasks	Image Enhancement, Object Detection
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13342v1
PDF	https://arxiv.org/pdf/1905.13342v1.pdf
PWC	https://paperswithcode.com/paper/all-in-one-underwater-image-enhancement-using
Repo	https://github.com/pritishuplavikar/All-In-One-Underwater-Image-Enhancement-using-Domain-Adversarial-Learning
Framework	pytorch

Regularized Fine-grained Meta Face Anti-spoofing


Title	Regularized Fine-grained Meta Face Anti-spoofing
Authors	Rui Shao, Xiangyuan Lan, Pong C. Yuen
Abstract	Face presentation attacks have become an increasingly critical concern when face recognition is widely applied. Many face anti-spoofing methods have been proposed, but most of them ignore the generalization ability to unseen attacks. To overcome the limitation, this work casts face anti-spoofing as a domain generalization (DG) problem, and attempts to address this problem by developing a new meta-learning framework called Regularized Fine-grained Meta-learning. To let our face anti-spoofing model generalize well to unseen attacks, the proposed framework trains our model to perform well in the simulated domain shift scenarios, which is achieved by finding generalized learning directions in the meta-learning process. Specifically, the proposed framework incorporates the domain knowledge of face anti-spoofing as the regularization so that meta-learning is conducted in the feature space regularized by the supervision of domain knowledge. This enables our model more likely to find generalized learning directions with the regularized meta-learning for face anti-spoofing task. Besides, to further enhance the generalization ability of our model, the proposed framework adopts a fine-grained learning strategy that simultaneously conducts meta-learning in a variety of domain shift scenarios in each iteration. Extensive experiments on four public datasets validate the effectiveness of the proposed method.
Tasks	Domain Generalization, Face Anti-Spoofing, Face Recognition, Meta-Learning
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10771v1
PDF	https://arxiv.org/pdf/1911.10771v1.pdf
PWC	https://paperswithcode.com/paper/regularized-fine-grained-meta-face-anti
Repo	https://github.com/rshaojimmy/AAAI2020-RFMetaFAS
Framework	pytorch

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources


Title	Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Authors	Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li
Abstract	With an increasing demand for training powers for deep learning algorithms and the rapid growth of computation resources in data centers, it is desirable to dynamically schedule different distributed deep learning tasks to maximize resource utilization and reduce cost. In this process, different tasks may receive varying numbers of machines at different time, a setting we call elastic distributed training. Despite the recent successes in large mini-batch distributed training, these methods are rarely tested in elastic distributed training environments and suffer degraded performance in our experiments, when we adjust the learning rate linearly immediately with respect to the batch size. One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes. We therefore propose to smoothly adjust the learning rate over time to alleviate the influence of the noisy momentum estimation. Our experiments on image classification, object detection and semantic segmentation have demonstrated that our proposed Dynamic SGD method achieves stabilized performance when varying the number of GPUs from 8 to 128. We also provide theoretical understanding on the optimality of linear learning rate scheduling and the effects of stochastic momentum.
Tasks	Image Classification, Object Detection, Semantic Segmentation
Published	2019-04-26
URL	http://arxiv.org/abs/1904.12043v2
PDF	http://arxiv.org/pdf/1904.12043v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-mini-batch-sgd-for-elastic
Repo	https://github.com/awslabs/dynamic-training-with-apache-mxnet-on-aws
Framework	mxnet

Learning Actor Relation Graphs for Group Activity Recognition


Title	Learning Actor Relation Graphs for Group Activity Recognition
Authors	Jianchao Wu, Limin Wang, Li Wang, Jie Guo, Gangshan Wu
Abstract	Modeling relation between actors is important for recognizing group activity in a multi-person scene. This paper aims at learning discriminative relation between actors efficiently using deep models. To this end, we propose to build a flexible and efficient Actor Relation Graph (ARG) to simultaneously capture the appearance and position relation between actors. Thanks to the Graph Convolutional Network, the connections in ARG could be automatically learned from group activity videos in an end-to-end manner, and the inference on ARG could be efficiently performed with standard matrix operations. Furthermore, in practice, we come up with two variants to sparsify ARG for more effective modeling in videos: spatially localized ARG and temporal randomized ARG. We perform extensive experiments on two standard group activity recognition datasets: the Volleyball dataset and the Collective Activity dataset, where state-of-the-art performance is achieved on both datasets. We also visualize the learned actor graphs and relation features, which demonstrate that the proposed ARG is able to capture the discriminative relation information for group activity recognition.
Tasks	Activity Recognition, Group Activity Recognition
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10117v1
PDF	http://arxiv.org/pdf/1904.10117v1.pdf
PWC	https://paperswithcode.com/paper/learning-actor-relation-graphs-for-group
Repo	https://github.com/wjchaoGit/Group-Activity-Recognition
Framework	pytorch

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation


Title	MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation
Authors	Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee
Abstract	We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to learn the latent factor encoders. MixNMatch requires bounding boxes during training to model background, but requires no other supervision. Through extensive experiments, we demonstrate MixNMatch’s ability to accurately disentangle, encode, and combine multiple factors for mix-and-match image generation, including sketch2color, cartoon2img, and img2gif applications. Our code/models/demo can be found at https://github.com/Yuheng-Li/MixNMatch
Tasks	Conditional Image Generation, Image Generation
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11758v2
PDF	https://arxiv.org/pdf/1911.11758v2.pdf
PWC	https://paperswithcode.com/paper/mixnmatch-multifactor-disentanglement-and
Repo	https://github.com/Yuheng-Li/MixNMatch
Framework	pytorch

Fully Trainable and Interpretable Non-Local Sparse Models for Image Restoration


Title	Fully Trainable and Interpretable Non-Local Sparse Models for Image Restoration
Authors	Bruno Lecouat, Jean Ponce, Julien Mairal
Abstract	Non-local self-similarity and sparsity principles have proven to be powerful priors for natural image modeling. We propose a novel differentiable relaxation of joint sparsity that exploits both principles and leads to a general framework for image restoration which is (1) trainable end to end, (2) fully interpretable, and (3) much more compact than competing deep learning architectures. We apply this approach to denoising, jpeg deblocking, and demosaicking, and show that, with as few as 100K parameters, its performance on several standard benchmarks is on par or better than state-of-the-art methods that may have an order of magnitude or more parameters.
Tasks	Demosaicking, Denoising, Image Restoration
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02456v3
PDF	https://arxiv.org/pdf/1912.02456v3.pdf
PWC	https://paperswithcode.com/paper/revisiting-non-local-sparse-models-for-image
Repo	https://github.com/bruno-31/groupsc
Framework	pytorch

Stabilizing the Lottery Ticket Hypothesis


Title	Stabilizing the Lottery Ticket Hypothesis
Authors	Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
Abstract	Pruning is a well-established technique for removing unnecessary structure from neural networks after training to improve the performance of inference. Several recent results have explored the possibility of pruning at initialization time to provide similar benefits during training. In particular, the “lottery ticket hypothesis” conjectures that typical neural networks contain small subnetworks that can train to similar accuracy in a commensurate number of steps. The evidence for this claim is that a procedure based on iterative magnitude pruning (IMP) reliably finds such subnetworks retroactively on small vision tasks. However, IMP fails on deeper networks, and proposed methods to prune before training or train pruned networks encounter similar scaling limitations. In this paper, we argue that these efforts have struggled on deeper networks because they have focused on pruning precisely at initialization. We modify IMP to search for subnetworks that could have been obtained by pruning early in training (0.1% to 7% through) rather than at iteration 0. With this change, it finds small subnetworks of deeper networks (e.g., 80% sparsity on Resnet-50) that can complete the training process to match the accuracy of the original network on more challenging tasks (e.g., ImageNet). In situations where IMP fails at iteration 0, the accuracy benefits of delaying pruning accrue rapidly over the earliest iterations of training. To explain these behaviors, we study subnetwork “stability,” finding that - as accuracy improves in this fashion - IMP subnetworks train to parameters closer to those of the full network and do so with improved consistency in the face of gradient noise. These results offer new insights into the opportunity to prune large-scale networks early in training and the behaviors underlying the lottery ticket hypothesis.
Tasks
Published	2019-03-05
URL	https://arxiv.org/abs/1903.01611v2
PDF	https://arxiv.org/pdf/1903.01611v2.pdf
PWC	https://paperswithcode.com/paper/the-lottery-ticket-hypothesis-at-scale
Repo	https://github.com/ZhangXiao96/Lottery-Ticket-Hypothesis-for-DNNs
Framework	pytorch

Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks


Title	Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks
Authors	Zhaoxian Wu, Qing Ling, Tianyi Chen, Georgios B. Giannakis
Abstract	This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise makes it challenging to distinguish malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the ‘honest’ workers. This motivates us to reduce the variance of stochastic gradients as a means of robustifying SGD in the presence of Byzantine attacks. To this end, the present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks. Rather than the mean employed by distributed SAGA, the novel Byrd- SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, the robustness of geometric median to outliers enables Byrd-SAGA to attain provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd- SAGA over Byzantine attack resilient distributed SGD.
Tasks
Published	2019-12-29
URL	https://arxiv.org/abs/1912.12716v1
PDF	https://arxiv.org/pdf/1912.12716v1.pdf
PWC	https://paperswithcode.com/paper/federated-variance-reduced-stochastic
Repo	https://github.com/MrFive5555/Byrd-SAGA
Framework	pytorch

Named Entity Recognition for Nepali Language


Title	Named Entity Recognition for Nepali Language
Authors	Oyesh Mann Singh, Ankur Padia, Anupam Joshi
Abstract	Named Entity Recognition have been studied for different languages like English, German, Spanish and many others but no study have focused on Nepali language. In this paper we propose a neural based Nepali NER using latest state-of-the-art architecture based on grapheme-level which doesn’t require any hand-crafted features and no data pre-processing. Our novel neural based model gained relative improvement of 33% to 50% compared to feature based SVM model and up to 10% improvement over state-of-the-art neural based model developed for languages beside Nepali.
Tasks	Named Entity Recognition
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05828v1
PDF	https://arxiv.org/pdf/1908.05828v1.pdf
PWC	https://paperswithcode.com/paper/named-entity-recognition-for-nepali-language
Repo	https://github.com/oya163/nepali-ner
Framework	none

LiDARTag: A Real-Time Fiducial Tag using Point Clouds


Title	LiDARTag: A Real-Time Fiducial Tag using Point Clouds
Authors	Jiunn-Kai Huang, Maani Ghaffari, Ross Hartley, Lu Gan, Ryan M. Eustice, Jessy W. Grizzle
Abstract	Image-based fiducial markers are widely used in robotics and computer vision problems such as object tracking in cluttered or textureless environments, camera (and multi-sensor) calibration tasks, or vision-based simultaneous localization and mapping (SLAM). The state-of-the-art fiducial marker detection algorithms rely on consistency of the ambient lighting. This paper introduces LiDARTag, a novel fiducial tag design and detection algorithm suitable for light detection and ranging (LiDAR) point clouds. The proposed tag runs in real-time and can process data faster than the currently available LiDAR sensors frequencies. Due to the nature of the LiDAR’s sensor, rapidly changing ambient lighting will not affect detection of a LiDARTag; hence, the proposed fiducial marker can operate in a completely dark environment. In addition, the LiDARTag nicely complements available visual fiducial markers as the tag design is compatible with available techniques, such as AprilTags, allowing for efficient multi-sensor fusion and calibration tasks. The experimental results, verified by a motion capture system, confirm the proposed technique can reliably provide a tag’s pose and its unique ID code. All implementations are done in C++ and will be available soon at: https://github.com/brucejk/LiDARTag
Tasks	Calibration, Motion Capture, Object Tracking, Sensor Fusion, Simultaneous Localization and Mapping
Published	2019-08-23
URL	https://arxiv.org/abs/1908.10349v1
PDF	https://arxiv.org/pdf/1908.10349v1.pdf
PWC	https://paperswithcode.com/paper/lidartag-a-real-time-fiducial-tag-using-point
Repo	https://github.com/UMich-BipedLab/extrinsic_lidar_camera_calibration
Framework	none

Tigris: Architecture and Algorithms for 3D Perception in Point Clouds


Title	Tigris: Architecture and Algorithms for 3D Perception in Point Clouds
Authors	Tiancheng Xu, Boyuan Tian, Yuhao Zhu
Abstract	Machine perception applications are increasingly moving toward manipulating and processing 3D point cloud. This paper focuses on point cloud registration, a key primitive of 3D data processing widely used in high-level tasks such as odometry, simultaneous localization and mapping, and 3D reconstruction. As these applications are routinely deployed in energy-constrained environments, real-time and energy-efficient point cloud registration is critical. We present Tigris, an algorithm-architecture co-designed system specialized for point cloud registration. Through an extensive exploration of the registration pipeline design space, we find that, while different design points make vastly different trade-offs between accuracy and performance, KD-tree search is a common performance bottleneck, and thus is an ideal candidate for architectural specialization. While KD-tree search is inherently sequential, we propose an acceleration-amenable data structure and search algorithm that exposes different forms of parallelism of KD-tree search in the context of point cloud registration. The co-designed accelerator systematically exploits the parallelism while incorporating a set of architectural techniques that further improve the accelerator efficiency. Overall, Tigris achieves 77.2$\times$ speedup and 7.4$\times$ power reduction in KD-tree search over an RTX 2080 Ti GPU, which translates to a 41.7% registration performance improvements and 3.0$\times$ power reduction.
Tasks	3D Reconstruction, Point Cloud Registration, Simultaneous Localization and Mapping
Published	2019-11-16
URL	https://arxiv.org/abs/1911.07841v3
PDF	https://arxiv.org/pdf/1911.07841v3.pdf
PWC	https://paperswithcode.com/paper/tigris-architecture-and-algorithms-for-3d
Repo	https://github.com/horizon-research/pointcloud-pipeline
Framework	none

Self-Attention Network for Skeleton-based Human Action Recognition


Title	Self-Attention Network for Skeleton-based Human Action Recognition
Authors	Sangwoo Cho, Muhammad Hasan Maqbool, Fei Liu, Hassan Foroosh
Abstract	Skeleton-based action recognition has recently attracted a lot of attention. Researchers are coming up with new approaches for extracting spatio-temporal relations and making considerable progress on large-scale skeleton-based datasets. Most of the architectures being proposed are based upon recurrent neural networks (RNNs), convolutional neural networks (CNNs) and graph-based CNNs. When it comes to skeleton-based action recognition, the importance of long term contextual information is central which is not captured by the current architectures. In order to come up with a better representation and capturing of long term spatio-temporal relationships, we propose three variants of Self-Attention Network (SAN), namely, SAN-V1, SAN-V2 and SAN-V3. Our SAN variants has the impressive capability of extracting high-level semantics by capturing long-range correlations. We have also integrated the Temporal Segment Network (TSN) with our SAN variants which resulted in improved overall performance. Different configurations of Self-Attention Network (SAN) variants and Temporal Segment Network (TSN) are explored with extensive experiments. Our chosen configuration outperforms state-of-the-art Top-1 and Top-5 by 4.4% and 7.9% respectively on Kinetics and shows consistently better performance than state-of-the-art methods on NTU RGB+D.
Tasks	Skeleton Based Action Recognition, Temporal Action Localization
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08435v1
PDF	https://arxiv.org/pdf/1912.08435v1.pdf
PWC	https://paperswithcode.com/paper/self-attention-network-for-skeleton-based
Repo	https://github.com/fdu-wuyuan/Siren
Framework	none