April 3, 2020

3270 words 16 mins read

Paper Group AWR 21

Exploit Clues from Views: Self-Supervised and Regularized Learning for Multiview Object Recognition. Fast reconstruction of atomic-scale STEM-EELS images from sparse sampling. Counter-example Guided Learning of Bounds on Environment Behavior. Multi-Level Representation Learning for Deep Subspace Clustering. Co-occurrence of deep convolutional featu …

Exploit Clues from Views: Self-Supervised and Regularized Learning for Multiview Object Recognition


Title	Exploit Clues from Views: Self-Supervised and Regularized Learning for Multiview Object Recognition
Authors	Chih-Hui Ho, Bo Liu, Tz-Ying Wu, Nuno Vasconcelos
Abstract	Multiview recognition has been well studied in the literature and achieves decent performance in object recognition and retrieval task. However, most previous works rely on supervised learning and some impractical underlying assumptions, such as the availability of all views in training and inference time. In this work, the problem of multiview self-supervised learning (MV-SSL) is investigated, where only image to object association is given. Given this setup, a novel surrogate task for self-supervised learning is proposed by pursuing “object invariant” representation. This is solved by randomly selecting an image feature of an object as object prototype, accompanied with multiview consistency regularization, which results in view invariant stochastic prototype embedding (VISPE). Experiments shows that the recognition and retrieval results using VISPE outperform that of other self-supervised learning methods on seen and unseen data. VISPE can also be applied to semi-supervised scenario and demonstrates robust performance with limited data available. Code is available at https://github.com/chihhuiho/VISPE
Tasks	Object Recognition
Published	2020-03-28
URL	https://arxiv.org/abs/2003.12735v1
PDF	https://arxiv.org/pdf/2003.12735v1.pdf
PWC	https://paperswithcode.com/paper/exploit-clues-from-views-self-supervised-and
Repo	https://github.com/chihhuiho/VISPE
Framework	pytorch

Fast reconstruction of atomic-scale STEM-EELS images from sparse sampling


Title	Fast reconstruction of atomic-scale STEM-EELS images from sparse sampling
Authors	Etienne Monier, Thomas Oberlin, Nathalie Brun, Xiaoyan Li, Marcel Tencé, Nicolas Dobigeon
Abstract	This paper discusses the reconstruction of partially sampled spectrum-images to accelerate the acquisition in scanning transmission electron microscopy (STEM). The problem of image reconstruction has been widely considered in the literature for many imaging modalities, but only a few attempts handled 3D data such as spectral images acquired by STEM electron energy loss spectroscopy (EELS). Besides, among the methods proposed in the microscopy literature, some are fast but inaccurate while others provide accurate reconstruction but at the price of a high computation burden. Thus none of the proposed reconstruction methods fulfills our expectations in terms of accuracy and computation complexity. In this paper, we propose a fast and accurate reconstruction method suited for atomic-scale EELS. This method is compared to popular solutions such as beta process factor analysis (BPFA) which is used for the first time on STEM-EELS images. Experiments based on real as synthetic data will be conducted.
Tasks	Image Reconstruction
Published	2020-02-04
URL	https://arxiv.org/abs/2002.01225v1
PDF	https://arxiv.org/pdf/2002.01225v1.pdf
PWC	https://paperswithcode.com/paper/fast-reconstruction-of-atomic-scale-stem-eels
Repo	https://github.com/etienne-monier/2020-Ultramicro-fast
Framework	none

Counter-example Guided Learning of Bounds on Environment Behavior


Title	Counter-example Guided Learning of Bounds on Environment Behavior
Authors	Yuxiao Chen, Sumanth Dathathri, Tung Phan-Minh, Richard M. Murray
Abstract	There is a growing interest in building autonomous systems that interact with complex environments. The difficulty associated with obtaining an accurate model for such environments poses a challenge to the task of assessing and guaranteeing the system’s performance. We present a data-driven solution that allows for a system to be evaluated for specification conformance without an accurate model of the environment. Our approach involves learning a conservative reactive bound of the environment’s behavior using data and specification of the system’s desired behavior. First, the approach begins by learning a conservative reactive bound on the environment’s actions that captures its possible behaviors with high probability. This bound is then used to assist verification, and if the verification fails under this bound, the algorithm returns counter-examples to show how failure occurs and then uses these to refine the bound. We demonstrate the applicability of the approach through two case-studies: i) verifying controllers for a toy multi-robot system, and ii) verifying an instance of human-robot interaction during a lane-change maneuver given real-world human driving data.
Tasks
Published	2020-01-20
URL	https://arxiv.org/abs/2001.07233v3
PDF	https://arxiv.org/pdf/2001.07233v3.pdf
PWC	https://paperswithcode.com/paper/counter-example-guided-learning-of-bounds-on
Repo	https://github.com/chenyx09/Reactive-modelling
Framework	none

Multi-Level Representation Learning for Deep Subspace Clustering


Title	Multi-Level Representation Learning for Deep Subspace Clustering
Authors	Mohsen Kheirandishfard, Fariba Zohrizadeh, Farhad Kamangar
Abstract	This paper proposes a novel deep subspace clustering approach which uses convolutional autoencoders to transform input images into new representations lying on a union of linear subspaces. The first contribution of our work is to insert multiple fully-connected linear layers between the encoder layers and their corresponding decoder layers to promote learning more favorable representations for subspace clustering. These connection layers facilitate the feature learning procedure by combining low-level and high-level information for generating multiple sets of self-expressive and informative representations at different levels of the encoder. Moreover, we introduce a novel loss minimization problem which leverages an initial clustering of the samples to effectively fuse the multi-level representations and recover the underlying subspaces more accurately. The loss function is then minimized through an iterative scheme which alternatively updates the network parameters and produces new clusterings of the samples. Experiments on four real-world datasets demonstrate that our approach exhibits superior performance compared to the state-of-the-art methods on most of the subspace clustering problems.
Tasks	Representation Learning
Published	2020-01-19
URL	https://arxiv.org/abs/2001.08533v1
PDF	https://arxiv.org/pdf/2001.08533v1.pdf
PWC	https://paperswithcode.com/paper/multi-level-representation-learning-for-deep
Repo	https://github.com/mohsenkheirandishfard/MLRDSC
Framework	pytorch

Co-occurrence of deep convolutional features for image search


Title	Co-occurrence of deep convolutional features for image search
Authors	J. I. Forcen, Miguel Pagola, Edurne Barrenechea, Humberto Bustince
Abstract	Image search can be tackled using deep features from pre-trained Convolutional Neural Networks (CNN). The feature map from the last convolutional layer of a CNN encodes descriptive information from which a discriminative global descriptor can be obtained. We propose a new representation of co-occurrences from deep convolutional features to extract additional relevant information from this last convolutional layer. Combining this co-occurrence map with the feature map, we achieve an improved image representation. We present two different methods to get the co-occurrence representation, the first one based on direct aggregation of activations, and the second one, based on a trainable co-occurrence representation. The image descriptors derived from our methodology improve the performance in very well-known image retrieval datasets as we prove in the experiments.
Tasks	Image Retrieval
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13827v1
PDF	https://arxiv.org/pdf/2003.13827v1.pdf
PWC	https://paperswithcode.com/paper/co-occurrence-of-deep-convolutional-features
Repo	https://github.com/jiforcen/co-occurrence
Framework	pytorch

Towards a General Theory of Infinite-Width Limits of Neural Classifiers


Title	Towards a General Theory of Infinite-Width Limits of Neural Classifiers
Authors	Eugene A. Golikov
Abstract	Obtaining theoretical guarantees for neural networks training appears to be a hard problem in a general case. Recent research has been focused on studying this problem in the limit of infinite width and two different theories have been developed: mean-field (MF) and kernel limit theories. We propose a general framework that provides a link between these seemingly distinct theories. Our framework out of the box gives rise to a discrete-time MF limit which was not previously explored in the literature. We prove a convergence theorem for it and show that it provides a more reasonable approximation for finite-width nets compared to NTK limit if learning rates are not very small. Also, our analysis suggests that all infinite-width limits of a network with a single hidden layer are covered by either mean-field limit theory or kernel limit theory. We show that for networks with more than two hidden layers RMSProp training has a non-trivial MF limit, but GD training does not have one. Overall, our framework demonstrates that both MF and NTK limits have considerable limitations in approximating finite-sized neural nets, indicating the need for designing more accurate infinite-width approximations for them. Source code to reproduce all the reported results is available on GitHub.
Tasks
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05884v1
PDF	https://arxiv.org/pdf/2003.05884v1.pdf
PWC	https://paperswithcode.com/paper/towards-a-general-theory-of-infinite-width
Repo	https://github.com/deepmipt/infinite-width_nets
Framework	pytorch

Learning with Out-of-Distribution Data for Audio Classification


Title	Learning with Out-of-Distribution Data for Audio Classification
Authors	Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang
Abstract	In supervised machine learning, the assumption that training data is labelled correctly is not always satisfied. In this paper, we investigate an instance of labelling error for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances: data that does not belong to any of the target classes, but is labelled as such. We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling. The amount of data required for this is shown to be small. Experiments are carried out on the FSDnoisy18k audio dataset, where OOD instances are very prevalent. The proposed method is shown to improve the performance of convolutional neural networks by a significant margin. Comparisons with other noise-robust techniques are similarly encouraging.
Tasks	Audio Classification
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04683v1
PDF	https://arxiv.org/pdf/2002.04683v1.pdf
PWC	https://paperswithcode.com/paper/learning-with-out-of-distribution-data-for
Repo	https://github.com/tqbl/ood_audio
Framework	pytorch

Vertebra-Focused Landmark Detection for Scoliosis Assessment


Title	Vertebra-Focused Landmark Detection for Scoliosis Assessment
Authors	Jingru Yi, Pengxiang Wu, Qiaoying Huang, Hui Qu, Dimitris N. Metaxas
Abstract	Adolescent idiopathic scoliosis (AIS) is a lifetime disease that arises in children. Accurate estimation of Cobb angles of the scoliosis is essential for clinicians to make diagnosis and treatment decisions. The Cobb angles are measured according to the vertebrae landmarks. Existing regression-based methods for the vertebra landmark detection typically suffer from large dense mapping parameters and inaccurate landmark localization. The segmentation-based methods tend to predict connected or corrupted vertebra masks. In this paper, we propose a novel vertebra-focused landmark detection method. Our model first localizes the vertebra centers, based on which it then traces the four corner landmarks of the vertebra through the learned corner offset. In this way, our method is able to keep the order of the landmarks. The comparison results demonstrate the merits of our method in both Cobb angle measurement and landmark detection on low-contrast and ambiguous X-ray images. Code is available at: \url{https://github.com/yijingru/Vertebra-Landmark-Detection}.
Tasks
Published	2020-01-09
URL	https://arxiv.org/abs/2001.03187v1
PDF	https://arxiv.org/pdf/2001.03187v1.pdf
PWC	https://paperswithcode.com/paper/vertebra-focused-landmark-detection-for
Repo	https://github.com/yijingru/Vertebra-Landmark-Detection
Framework	pytorch

MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data


Title	MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data
Authors	Quande Liu, Qi Dou, Lequan Yu, Pheng Ann Heng
Abstract	Automated prostate segmentation in MRI is highly demanded for computer-assisted diagnosis. Recently, a variety of deep learning methods have achieved remarkable progress in this task, usually relying on large amounts of training data. Due to the nature of scarcity for medical images, it is important to effectively aggregate data from multiple sites for robust model training, to alleviate the insufficiency of single-site samples. However, the prostate MRIs from different sites present heterogeneity due to the differences in scanners and imaging protocols, raising challenges for effective ways of aggregating multi-site data for network training. In this paper, we propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations, leveraging multiple sources of data. To compensate for the inter-site heterogeneity of different MRI datasets, we develop Domain-Specific Batch Normalization layers in the network backbone, enabling the network to estimate statistics and perform feature normalization for each site separately. Considering the difficulty of capturing the shared knowledge from multiple datasets, a novel learning paradigm, i.e., Multi-site-guided Knowledge Transfer, is proposed to enhance the kernels to extract more generic representations from multi-site data. Extensive experiments on three heterogeneous prostate MRI datasets demonstrate that our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
Tasks	Transfer Learning
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03366v2
PDF	https://arxiv.org/pdf/2002.03366v2.pdf
PWC	https://paperswithcode.com/paper/ms-net-multi-site-network-for-improving
Repo	https://github.com/JunMa11/MedJournal-OpenSourcePapers
Framework	tf

Twenty Years of Network Science: A Bibliographic and Co-authorship Network Analysis


Title	Twenty Years of Network Science: A Bibliographic and Co-authorship Network Analysis
Authors	Roland Molontay, Marcell Nagy
Abstract	Two decades ago three pioneering papers turned the attention to complex networks and initiated a new era of research, establishing an interdisciplinary field called network science. Namely, these highly-cited seminal papers were written by Watts&Strogatz, Barab'asi&Albert and Girvan&Newman on small-world networks, on scale-free networks and on the community structure of complex networks, respectively. In the past 20 years - due to the multidisciplinary nature of the field - a diverse but not divided network science community has emerged. In this paper, we investigate how this community has evolved over time with respect to speed, diversity and interdisciplinary nature as seen through the growing co-authorship network of network scientists (here the notion refers to a scholar with at least one paper citing at least one of the three aforementioned milestone papers). After providing a bibliographic analysis of 31,763 network science papers, we construct the co-authorship network of 56,646 network scientists and we analyze its topology and dynamics. We shed light on the collaboration patterns of the last 20 years of network science by investigating numerous structural properties of the co-authorship network and by using enhanced data visualization techniques. We also identify the most central authors, the largest communities, investigate the spatiotemporal changes, and compare the properties of the network to scientometric indicators.
Tasks
Published	2020-01-23
URL	https://arxiv.org/abs/2001.09006v2
PDF	https://arxiv.org/pdf/2001.09006v2.pdf
PWC	https://paperswithcode.com/paper/twenty-years-of-network-science-a
Repo	https://github.com/marcessz/Twenty-Years-of-Network-Science
Framework	none

Distributed Momentum for Byzantine-resilient Learning


Title	Distributed Momentum for Byzantine-resilient Learning
Authors	El-Mahdi El-Mhamdi, Rachid Guerraoui, Sébastien Rouault
Abstract	Momentum is a variant of gradient descent that has been proposed for its benefits on convergence. In a distributed setting, momentum can be implemented either at the server or the worker side. When the aggregation rule used by the server is linear, commutativity with addition makes both deployments equivalent. Robustness and privacy are however among motivations to abandon linear aggregation rules. In this work, we demonstrate the benefits on robustness of using momentum at the worker side. We first prove that computing momentum at the workers reduces the variance-norm ratio of the gradient estimation at the server, strengthening Byzantine resilient aggregation rules. We then provide an extensive experimental demonstration of the robustness effect of worker-side momentum on distributed SGD.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2003.00010v2
PDF	https://arxiv.org/pdf/2003.00010v2.pdf
PWC	https://paperswithcode.com/paper/distributed-momentum-for-byzantine-resilient
Repo	https://github.com/LPD-EPFL/ByzantineMomentum
Framework	pytorch

Depth-Adaptive Graph Recurrent Network for Text Classification


Title	Depth-Adaptive Graph Recurrent Network for Text Classification
Authors	Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou
Abstract	The Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network, which views words as nodes and performs layer-wise recurrent steps between them simultaneously. Despite its successes on text representations, the S-LSTM still suffers from two drawbacks. Firstly, given a sentence, certain words are usually more ambiguous than others, and thus more computation steps need to be taken for these difficult words and vice versa. However, the S-LSTM takes fixed computation steps for all words, irrespective of their hardness. The secondary one comes from the lack of sequential information (e.g., word order) that is inherently important for natural language. In this paper, we try to address these issues and propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required. In addition, we integrate an extra RNN layer to inject sequential information, which also serves as an input feature for the decision of adaptive depths. Results on the classic text classification task (24 datasets in various sizes and domains) show that our model brings significant improvements against the conventional S-LSTM and other high-performance models (e.g., the Transformer), meanwhile achieving a good accuracy-speed trade off.
Tasks	Text Classification
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00166v1
PDF	https://arxiv.org/pdf/2003.00166v1.pdf
PWC	https://paperswithcode.com/paper/depth-adaptive-graph-recurrent-network-for
Repo	https://github.com/Adaxry/Depth-Adaptive-GRN
Framework	none

Learning in the Frequency Domain


Title	Learning in the Frequency Domain
Authors	Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, Fengbo Ren
Abstract	Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12416v4
PDF	https://arxiv.org/pdf/2002.12416v4.pdf
PWC	https://paperswithcode.com/paper/learning-in-the-frequency-domain
Repo	https://github.com/calmevtime1990/supp
Framework	pytorch

Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence


Title	Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence
Authors	Sebastian Raschka, Joshua Patterson, Corey Nolet
Abstract	Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical ML and scalable general-purpose GPU computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.
Tasks
Published	2020-02-12
URL	https://arxiv.org/abs/2002.04803v2
PDF	https://arxiv.org/pdf/2002.04803v2.pdf
PWC	https://paperswithcode.com/paper/machine-learning-in-python-main-developments
Repo	https://github.com/rapidsai/cuml
Framework	none

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference


Title	VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
Authors	Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu
Abstract	We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text. Given a video clip with aligned subtitles as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip. A new large-scale dataset, named Violin (VIdeO-and-Language INference), is introduced for this task, which consists of 95,322 video-hypothesis pairs from 15,887 video clips, spanning over 582 hours of video. These video clips contain rich content with diverse temporal dynamics, event shifts, and people interactions, collected from two sources: (i) popular TV shows, and (ii) movie clips from YouTube channels. In order to address our new multimodal inference task, a model is required to possess sophisticated reasoning skills, from surface-level grounding (e.g., identifying objects and characters in the video) to in-depth commonsense reasoning (e.g., inferring causal relations of events in the video). We present a detailed analysis of the dataset and an extensive evaluation over many strong baselines, providing valuable insights on the challenges of this new task.
Tasks
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11618v1
PDF	https://arxiv.org/pdf/2003.11618v1.pdf
PWC	https://paperswithcode.com/paper/violin-a-large-scale-dataset-for-video-and
Repo	https://github.com/jimmy646/violin
Framework	none