January 31, 2020

3330 words 16 mins read

Paper Group AWR 384

Kannada-MNIST: A new handwritten digits dataset for the Kannada language. Adversarial Robustness Against the Union of Multiple Perturbation Models. Two Decades of Network Science as seen through the co-authorship network of network scientists. Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking. Asymmetric Shapley values: incorpo …

Kannada-MNIST: A new handwritten digits dataset for the Kannada language


Title	Kannada-MNIST: A new handwritten digits dataset for the Kannada language
Authors	Vinay Uday Prabhu
Abstract	In this paper, we disseminate a new handwritten digits-dataset, termed Kannada-MNIST, for the Kannada script, that can potentially serve as a direct drop-in replacement for the original MNIST dataset. In addition to this dataset, we disseminate an additional real world handwritten dataset (with $10k$ images), which we term as the Dig-MNIST dataset that can serve as an out-of-domain test dataset. We also duly open source all the code as well as the raw scanned images along with the scanner settings so that researchers who want to try out different signal processing pipelines can perform end-to-end comparisons. We provide high level morphological comparisons with the MNIST dataset and provide baselines accuracies for the dataset disseminated. The initial baselines obtained using an oft-used CNN architecture ($96.8%$ for the main test-set and $76.1%$ for the Dig-MNIST test-set) indicate that these datasets do provide a sterner challenge with regards to generalizability than MNIST or the KMNIST datasets. We also hope this dissemination will spur the creation of similar datasets for all the languages that use different symbols for the numeral digits.
Tasks	Image Classification
Published	2019-08-03
URL	https://arxiv.org/abs/1908.01242v1
PDF	https://arxiv.org/pdf/1908.01242v1.pdf
PWC	https://paperswithcode.com/paper/kannada-mnist-a-new-handwritten-digits
Repo	https://github.com/vinayprabhu/Kannada_MNIST
Framework	tf

Adversarial Robustness Against the Union of Multiple Perturbation Models


Title	Adversarial Robustness Against the Union of Multiple Perturbation Models
Authors	Pratyush Maini, Eric Wong, J. Zico Kolter
Abstract	Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in developing (both empirically and certifiably) robust classifiers, but the vast majority has defended against single types of attacks. Recent work has looked at defending against multiple attacks, specifically on the MNIST dataset, yet this approach used a relatively complex architecture, claiming that standard adversarial training can not apply because it “overfits” to a particular norm. In this work, we show that it is indeed possible to adversarially train a robust model against a union of norm-bounded attacks, by using a natural generalization of the standard PGD-based procedure for adversarial training to multiple threat models. With this approach, we are able to train standard architectures which are robust against $\ell_\infty$, $\ell_2$, and $\ell_1$ attacks, outperforming past approaches on the MNIST dataset and providing the first CIFAR10 network trained to be simultaneously robust against $(\ell_{\infty}, \ell_{2},\ell_{1})$ threat models, which achieves adversarial accuracy rates of $(47.6%, 64.8%, 53.4%)$ for $(\ell_{\infty}, \ell_{2},\ell_{1})$ perturbations with radius $\epsilon = (0.03,0.5,12)$.
Tasks
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04068v1
PDF	https://arxiv.org/pdf/1909.04068v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-robustness-against-the-union-of
Repo	https://github.com/locuslab/robust_union
Framework	pytorch

Two Decades of Network Science as seen through the co-authorship network of network scientists


Title	Two Decades of Network Science as seen through the co-authorship network of network scientists
Authors	Roland Molontay, Marcell Nagy
Abstract	Complex networks have attracted a great deal of research interest in the last two decades since Watts & Strogatz, Barab'asi & Albert and Girvan & Newman published their highly-cited seminal papers on small-world networks, on scale-free networks and on the community structure of complex networks, respectively. These fundamental papers initiated a new era of research establishing an interdisciplinary field called network science. Due to the multidisciplinary nature of the field, a diverse but not divided network science community has emerged in the past 20 years. This paper honors the contributions of network science by exploring the evolution of this community as seen through the growing co-authorship network of network scientists (here the notion refers to a scholar with at least one paper citing at least one of the three aforementioned milestone papers). After investigating various characteristics of 29,528 network science papers, we construct the co-authorship network of 52,406 network scientists and we analyze its topology and dynamics. We shed light on the collaboration patterns of the last 20 years of network science by investigating numerous structural properties of the co-authorship network and by using enhanced data visualization techniques. We also identify the most central authors, the largest communities, investigate the spatiotemporal changes, and compare the properties of the network to scientometric indicators.
Tasks
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08478v2
PDF	https://arxiv.org/pdf/1908.08478v2.pdf
PWC	https://paperswithcode.com/paper/two-decades-of-network-science-as-seen
Repo	https://github.com/marcessz/Two-Decades-of-Network-Science
Framework	none


Title	Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Authors	Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, Jingkuan Song
Abstract	A major challenge in matching images and text is that they have intrinsically different data distributions and feature representations. Most existing approaches are based either on embedding or classification, the first one mapping image and text instances into a common embedding space for distance measuring, and the second one regarding image-text matching as a binary classification problem. Neither of these approaches can, however, balance the matching accuracy and model complexity well. We propose a novel framework that achieves remarkable matching performance with acceptable model complexity. Specifically, in the training stage, we propose a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image-text instance. Then, during testing, we deploy a generic Cross-modal Re-ranking (RR) scheme for refinement without requiring additional training procedure. Extensive experiments on two datasets demonstrate that our MTFN-RR consistently achieves the state-of-the-art matching performance with much less time complexity. The implementation code is available at https://github.com/Wangt-CN/MTFN-RR-PyTorch-Code.
Tasks	Text Matching
Published	2019-08-12
URL	https://arxiv.org/abs/1908.04011v1
PDF	https://arxiv.org/pdf/1908.04011v1.pdf
PWC	https://paperswithcode.com/paper/matching-images-and-text-with-multi-modal
Repo	https://github.com/Wangt-CN/MTFN-RR-PyTorch-Code
Framework	pytorch

Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability


Title	Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability
Authors	Christopher Frye, Ilya Feige, Colin Rowat
Abstract	Explaining AI systems is fundamental both to the development of high performing models and to the trust placed in them by their users. A general framework for explaining any AI model is provided by the Shapley values that attribute the prediction output to the various model inputs (“features”) in a principled and model-agnostic way. The outstanding strength of Shapley values is their combined generality and rigorous foundation: they can be used to explain any AI system, and one always understands their values as the unique attribution method satisfying a set of mathematical axioms. However, as a framework, Shapley values are too restrictive in one significant regard: they ignore all causal structure in the data. We introduce a less-restrictive framework for model-agnostic explainability: “Asymmetric” Shapley values. Asymmetric Shapley values (ASVs) are rigorously founded on a set of axioms, applicable to any AI system, and can flexibly incorporate any causal knowledge known a-priori to be respected by the data. We show through explicit, realistic examples that the ASV framework can be used to (i) improve model explanations by incorporating causal information, (ii) provide an unambiguous test for unfair discrimination based on simple policy articulations, (iii) enable sequentially incremental explanations in time-series models, and (iv) support feature-selection studies without the need for model retraining.
Tasks	Feature Selection, Time Series
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06358v1
PDF	https://arxiv.org/pdf/1910.06358v1.pdf
PWC	https://paperswithcode.com/paper/asymmetric-shapley-values-incorporating
Repo	https://github.com/nredell/shapFlex
Framework	none

Unsupervised Learning for Optical Flow Estimation Using Pyramid Convolution LSTM


Title	Unsupervised Learning for Optical Flow Estimation Using Pyramid Convolution LSTM
Authors	Shuosen Guan, Haoxin Li, Wei-Shi Zheng
Abstract	Most of current Convolution Neural Network (CNN) based methods for optical flow estimation focus on learning optical flow on synthetic datasets with groundtruth, which is not practical. In this paper, we propose an unsupervised optical flow estimation framework named PCLNet. It uses pyramid Convolution LSTM (ConvLSTM) with the constraint of adjacent frame reconstruction, which allows flexibly estimating multi-frame optical flows from any video clip. Besides, by decoupling motion feature learning and optical flow representation, our method avoids complex short-cut connections used in existing frameworks while improving accuracy of optical flow estimation. Moreover, different from those methods using specialized CNN architectures for capturing motion, our framework directly learns optical flow from the features of generic CNNs and thus can be easily embedded in any CNN based frameworks for other tasks. Extensive experiments have verified that our method not only estimates optical flow effectively and accurately, but also obtains comparable performance on action recognition.
Tasks	Optical Flow Estimation
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11628v1
PDF	https://arxiv.org/pdf/1907.11628v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-for-optical-flow
Repo	https://github.com/Kwanss/PCLNet
Framework	pytorch

Feature-Critic Networks for Heterogeneous Domain Generalization


Title	Feature-Critic Networks for Heterogeneous Domain Generalization
Authors	Yiying Li, Yongxin Yang, Wei Zhou, Timothy M. Hospedales
Abstract	The well known domain shift issue causes model performance to degrade when deployed to a new target domain with different statistics to training. Domain adaptation techniques alleviate this, but need some instances from the target domain to drive adaptation. Domain generalisation is the recently topical problem of learning a model that generalises to unseen domains out of the box, and various approaches aim to train a domain-invariant feature extractor, typically by adding some manually designed losses. In this work, we propose a learning to learn approach, where the auxiliary loss that helps generalisation is itself learned. Beyond conventional domain generalisation, we consider a more challenging setting of heterogeneous domain generalisation, where the unseen domains do not share label space with the seen ones, and the goal is to train a feature representation that is useful off-the-shelf for novel data and novel categories. Experimental evaluation demonstrates that our method outperforms state-of-the-art solutions in both settings.
Tasks	Domain Adaptation, Domain Generalization
Published	2019-01-31
URL	https://arxiv.org/abs/1901.11448v3
PDF	https://arxiv.org/pdf/1901.11448v3.pdf
PWC	https://paperswithcode.com/paper/feature-critic-networks-for-heterogeneous
Repo	https://github.com/liyiying/Feature_Critic
Framework	pytorch

TensorNetwork: A Library for Physics and Machine Learning


Title	TensorNetwork: A Library for Physics and Machine Learning
Authors	Chase Roberts, Ashley Milsted, Martin Ganahl, Adam Zalcman, Bruce Fontaine, Yijian Zou, Jack Hidary, Guifre Vidal, Stefan Leichenauer
Abstract	TensorNetwork is an open source library for implementing tensor network algorithms. Tensor networks are sparse data structures originally designed for simulating quantum many-body physics, but are currently also applied in a number of other research areas, including machine learning. We demonstrate the use of the API with applications both physics and machine learning, with details appearing in companion papers.
Tasks	Tensor Networks
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01330v1
PDF	https://arxiv.org/pdf/1905.01330v1.pdf
PWC	https://paperswithcode.com/paper/tensornetwork-a-library-for-physics-and
Repo	https://github.com/google/TensorNetwork
Framework	jax

DeepBall: Deep Neural-Network Ball Detector


Title	DeepBall: Deep Neural-Network Ball Detector
Authors	Jacek Komorowski, Grzegorz Kurzejamski, Grzegorz Sarwas
Abstract	The paper describes a deep network based object detector specialized for ball detection in long shot videos. Due to its fully convolutional design, the method operates on images of any size and produces \emph{ball confidence map} encoding the position of detected ball. The network uses hypercolumn concept, where feature maps from different hierarchy levels of the deep convolutional network are combined and jointly fed to the convolutional classification layer. This allows boosting the detection accuracy as larger visual context around the object of interest is taken into account. The method achieves state-of-the-art results when tested on publicly available ISSIA-CNR Soccer Dataset.
Tasks
Published	2019-02-19
URL	http://arxiv.org/abs/1902.07304v1
PDF	http://arxiv.org/pdf/1902.07304v1.pdf
PWC	https://paperswithcode.com/paper/deepball-deep-neural-network-ball-detector
Repo	https://github.com/Ign0reLee/DeepBall_Tensorflow
Framework	none

Hand Segmentation and Fingertip Tracking from Depth Camera Images Using Deep Convolutional Neural Network and Multi-task SegNet


Title	Hand Segmentation and Fingertip Tracking from Depth Camera Images Using Deep Convolutional Neural Network and Multi-task SegNet
Authors	Duong Hai Nguyen, Tai Nhu Do, In-Seop Na, Soo-Hyung Kim
Abstract	Hand segmentation and fingertip detection play an indispensable role in hand gesture-based human-machine interaction systems. In this study, we propose a method to discriminate hand components and to locate fingertips in RGB-D images. The system consists of three main steps: hand detection using RGB images providing regions which are considered as promising areas for further processing, hand segmentation, and fingertip detection using depth image and our modified SegNet, a single lightweight architecture that can process two independent tasks at the same time. The experimental results show that our system is a promising method for hand segmentation and fingertip detection which achieves a comparable performance while model complexity is suitable for real-time applications.
Tasks	Hand Segmentation
Published	2019-01-11
URL	https://arxiv.org/abs/1901.03465v3
PDF	https://arxiv.org/pdf/1901.03465v3.pdf
PWC	https://paperswithcode.com/paper/hand-segmentation-and-fingertip-tracking-from
Repo	https://github.com/nhduong/multitask_segnet_hand_segmentation_fingertip_detection
Framework	tf

LXMERT: Learning Cross-Modality Encoder Representations from Transformers


Title	LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Authors	Hao Tan, Mohit Bansal
Abstract	Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly, the alignment and relationships between these two modalities. We thus propose the LXMERT (Learning Cross-Modality Encoder Representations from Transformers) framework to learn these vision-and-language connections. In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder. Next, to endow our model with the capability of connecting vision and language semantics, we pre-train the model with large amounts of image-and-sentence pairs, via five diverse representative pre-training tasks: masked language modeling, masked object prediction (feature regression and label classification), cross-modality matching, and image question answering. These tasks help in learning both intra-modality and cross-modality relationships. After fine-tuning from our pre-trained parameters, our model achieves the state-of-the-art results on two visual question answering datasets (i.e., VQA and GQA). We also show the generalizability of our pre-trained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR2, and improve the previous best result by 22% absolute (54% to 76%). Lastly, we demonstrate detailed ablation studies to prove that both our novel model components and pre-training strategies significantly contribute to our strong results; and also present several attention visualizations for the different encoders. Code and pre-trained models publicly available at: https://github.com/airsplay/lxmert
Tasks	Language Modelling, Question Answering, Visual Question Answering, Visual Reasoning
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07490v3
PDF	https://arxiv.org/pdf/1908.07490v3.pdf
PWC	https://paperswithcode.com/paper/lxmert-learning-cross-modality-encoder
Repo	https://github.com/airsplay/lxmert
Framework	pytorch

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods


Title	Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
Authors	Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, Himabindu Lakkaraju
Abstract	As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous. Using extensive evaluation with multiple real-world datasets (including COMPAS), we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases.
Tasks
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02508v2
PDF	https://arxiv.org/pdf/1911.02508v2.pdf
PWC	https://paperswithcode.com/paper/how-can-we-fool-lime-and-shap-adversarial
Repo	https://github.com/dylan-slack/Fooling-LIME-SHAP
Framework	none

Domain Adaptation via Low-Rank Basis Approximation


Title	Domain Adaptation via Low-Rank Basis Approximation
Authors	Christoph Raab, Frank-Michael Schleif
Abstract	Domain adaptation focuses on the reuse of supervised learning models in a new context. Prominent applications can be found in robotics, image processing or web mining. In these areas, learning scenarios change by nature, but often remain related and motivate the reuse of existing supervised models. While the majority of symmetric and asymmetric domain adaptation algorithms utilize all available source and target domain data, we show that efficient domain adaptation requires only a substantially smaller subset from both domains. This makes it more suitable for real-world scenarios where target domain data is rare. The presented approach finds a target subspace representation for source and target data to address domain differences by orthogonal basis transfer. By employing a low-rank approximation, the approach remains low in computational time. The presented idea is evaluated in typical domain adaptation tasks with standard benchmark data.
Tasks	Domain Adaptation, Transfer Learning
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01343v2
PDF	https://arxiv.org/pdf/1907.01343v2.pdf
PWC	https://paperswithcode.com/paper/domain-adaptation-via-low-rank-basis
Repo	https://github.com/iclr-nbt/nbt
Framework	none

The Liver Tumor Segmentation Benchmark (LiTS)


Title	The Liver Tumor Segmentation Benchmark (LiTS)
Authors	Patrick Bilic, Patrick Ferdinand Christ, Eugene Vorontsov, Grzegorz Chlebus, Hao Chen, Qi Dou, Chi-Wing Fu, Xiao Han, Pheng-Ann Heng, Jürgen Hesser, Samuel Kadoury, Tomasz Konopczynski, Miao Le, Chunming Li, Xiaomeng Li, Jana Lipkovà, John Lowengrub, Hans Meine, Jan Hendrik Moltz, Chris Pal, Marie Piraud, Xiaojuan Qi, Jin Qi, Markus Rempfler, Karsten Roth, Andrea Schenk, Anjany Sekuboyina, Eugene Vorontsov, Ping Zhou, Christian Hülsemeyer, Marcel Beetz, Florian Ettlinger, Felix Gruen, Georgios Kaissis, Fabian Lohöfer, Rickmer Braren, Julian Holch, Felix Hofmann, Wieland Sommer, Volker Heinemann, Colin Jacobs, Gabriel Efrain Humpire Mamani, Bram van Ginneken, Gabriel Chartrand, An Tang, Michal Drozdzal, Avi Ben-Cohen, Eyal Klang, Marianne M. Amitai, Eli Konen, Hayit Greenspan, Johan Moreau, Alexandre Hostettler, Luc Soler, Refael Vivanti, Adi Szeskin, Naama Lev-Cohain, Jacob Sosna, Leo Joskowicz, Bjoern H. Menze
Abstract	In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LITS) organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2016 and International Conference On Medical Image Computing Computer Assisted Intervention (MICCAI) 2017. Twenty four valid state-of-the-art liver and liver tumor segmentation algorithms were applied to a set of 131 computed tomography (CT) volumes with different types of tumor contrast levels (hyper-/hypo-intense), abnormalities in tissues (metastasectomie) size and varying amount of lesions. The submitted algorithms have been tested on 70 undisclosed volumes. The dataset is created in collaboration with seven hospitals and research institutions and manually reviewed by independent three radiologists. We found that not a single algorithm performed best for liver and tumors. The best liver segmentation algorithm achieved a Dice score of 0.96(MICCAI) whereas for tumor segmentation the best algorithm evaluated at 0.67(ISBI) and 0.70(MICCAI). The LITS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource.
Tasks	Computed Tomography (CT), Liver Segmentation
Published	2019-01-13
URL	http://arxiv.org/abs/1901.04056v1
PDF	http://arxiv.org/pdf/1901.04056v1.pdf
PWC	https://paperswithcode.com/paper/the-liver-tumor-segmentation-benchmark-lits
Repo	https://github.com/andreped/livermask
Framework	tf

Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection


Title	Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection
Authors	Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian
Abstract	Sound event detection (SED) consists in recognizing the presence of sound events in the segment of audio and detecting their onset as well as offset. In this paper, we focus on two common problems on SED: how to carry out efficient weakly-supervised learning and how to learn better from the unbalanced dataset in which multiple sound events often occur in co-occurrence. We approach SED as a multiple instance learning (MIL) problem and utilize a neural network framework with different pooling modules to solve it. General MIL approaches includes two approaches: the instance-level approach and the embedding-level approach. Since the embedding-level approach tends to perform better than the instance-level approach in terms of bag-level classification but can not provide instance-level probabilities, we present how to generate instance-level probabilities for it. Moreover, we further propose a specialized decision surface (SDS) for the embedding-level attention pooling. We analyze and explained why an embedding-level attention module with SDS is better than other typical pooling modules from the perspective of the high-level feature space. As for the problem of unbalanced dataset and the co-occurrence of multiple categories in the polyphonic event detection task, we propose a disentangled feature (DF) to reduce interference among categories, which optimizes the high-level feature space by disentangling it based on class-wise identifiable information and obtaining multiple different subspaces. Experiments on the dataset of DCASE 2018 Task 4 show that the proposed SDS and DF significantly improve the detection performance of the embedding-level MIL approach with an attention pooling module and outperform the first place system in the challenge by 6.2 percentage points.
Tasks	Multi-Label Classification, Multiple Instance Learning, Sound Event Detection
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10091v5
PDF	https://arxiv.org/pdf/1905.10091v5.pdf
PWC	https://paperswithcode.com/paper/disentangled-feature-for-weakly-supervised
Repo	https://github.com/Kikyo-16/Sound_event_detection
Framework	tf