January 27, 2020

3404 words 16 mins read

Paper Group ANR 1315

Towards Geocoding Spatial Expressions. A Novel Approach for Robust Multi Human Action Detection and Recognition based on 3-Dimentional Convolutional Neural Networks. TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition. Exploration of Neural Machine Translation in Autoformalization of Mathematics in Mizar. Influen …

Towards Geocoding Spatial Expressions


Title	Towards Geocoding Spatial Expressions
Authors	Hussein S. Al-Olimat, Valerie L. Shalin, Krishnaprasad Thirunarayan, Joy Prakash Sain
Abstract	Imprecise composite location references formed using ad hoc spatial expressions in English text makes the geocoding task challenging for both inference and evaluation. Typically such spatial expressions fill in unestablished areas with new toponyms for finer spatial referents. For example, the spatial extent of the ad hoc spatial expression “north of” or “50 minutes away from” in relation to the toponym “Dayton, OH” refers to an ambiguous, imprecise area, requiring translation from this qualitative representation to a quantitative one with precise semantics using systems such as WGS84. Here we highlight the challenges of geocoding such referents and propose a formal representation that employs background knowledge, semantic approximations and rules, and fuzzy linguistic variables. We also discuss an appropriate evaluation technique for the task that is based on human contextualized and subjective judgment.
Tasks
Published	2019-06-12
URL	https://arxiv.org/abs/1906.04960v1
PDF	https://arxiv.org/pdf/1906.04960v1.pdf
PWC	https://paperswithcode.com/paper/towards-geocoding-spatial-expressions
Repo
Framework

A Novel Approach for Robust Multi Human Action Detection and Recognition based on 3-Dimentional Convolutional Neural Networks


Title	A Novel Approach for Robust Multi Human Action Detection and Recognition based on 3-Dimentional Convolutional Neural Networks
Authors	Noor Almaadeed, Omar Elharrouss, Somaya Al-Maadeed, Ahmed Bouridane, Azeddine Beghdadi
Abstract	In recent years, various attempts have been proposed to explore the use of spatial and temporal information for human action recognition using convolutional neural networks (CNNs). However, only a small number of methods are available for the recognition of many human actions performed by more than one person in the same surveillance video. This paper proposes a novel technique for multiple human action recognition using a new architecture based on 3Dimdenisional deep learning with application to video surveillance systems. The first stage of the model uses a new representation of the data by extracting the sequence of each person acting in the scene. An analysis of each sequence to detect the corresponding actions is also proposed. KTH, Weizmann and UCF-ARG datasets were used for training, new datasets were also constructed which include a number of persons having multiple actions were used for testing the proposed algorithm. The results of this work revealed that the proposed method provides more accurate multi human action recognition achieving 98%. Other videos were used for the evaluation including datasets (UCF101, Hollywood2, HDMB51, and YouTube) without any preprocessing and the results obtained suggest that our proposed method clearly improves the performances when compared to state-of-the-art methods.
Tasks	Action Detection, Temporal Action Localization
Published	2019-07-25
URL	https://arxiv.org/abs/1907.11272v1
PDF	https://arxiv.org/pdf/1907.11272v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-approach-for-robust-multi-human
Repo
Framework

TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition


Title	TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition
Authors	Mina Bishay, Georgios Zoumpourlis, Ioannis Patras
Abstract	In this paper we propose a novel Temporal Attentive Relation Network (TARN) for the problems of few-shot and zero-shot action recognition. At the heart of our network is a meta-learning approach that learns to compare representations of variable temporal length, that is, either two videos of different length (in the case of few-shot action recognition) or a video and a semantic representation such as word vector (in the case of zero-shot action recognition). By contrast to other works in few-shot and zero-shot action recognition, we a) utilise attention mechanisms so as to perform temporal alignment, and b) learn a deep-distance measure on the aligned representations at video segment level. We adopt an episode-based training scheme and train our network in an end-to-end manner. The proposed method does not require any fine-tuning in the target domain or maintaining additional representations as is the case of memory networks. Experimental results show that the proposed architecture outperforms the state of the art in few-shot action recognition, and achieves competitive results in zero-shot action recognition.
Tasks	Meta-Learning, Temporal Action Localization
Published	2019-07-21
URL	https://arxiv.org/abs/1907.09021v1
PDF	https://arxiv.org/pdf/1907.09021v1.pdf
PWC	https://paperswithcode.com/paper/tarn-temporal-attentive-relation-network-for
Repo
Framework

Exploration of Neural Machine Translation in Autoformalization of Mathematics in Mizar


Title	Exploration of Neural Machine Translation in Autoformalization of Mathematics in Mizar
Authors	Qingxiang Wang, Chad Brown, Cezary Kaliszyk, Josef Urban
Abstract	In this paper we share several experiments trying to automatically translate informal mathematics into formal mathematics. In our context informal mathematics refers to human-written mathematical sentences in the LaTeX format; and formal mathematics refers to statements in the Mizar language. We conducted our experiments against three established neural network-based machine translation models that are known to deliver competitive results on translating between natural languages. To train these models we also prepared four informal-to-formal datasets. We compare and analyze our results according to whether the model is supervised or unsupervised. In order to augment the data available for auto-formalization and improve the results, we develop a custom type-elaboration mechanism and integrate it in the supervised translation.
Tasks	Machine Translation
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02636v2
PDF	https://arxiv.org/pdf/1912.02636v2.pdf
PWC	https://paperswithcode.com/paper/exploration-of-neural-machine-translation-in
Repo
Framework

Influence of Control Parameters and the Size of Biomedical Image Datasets on the Success of Adversarial Attacks


Title	Influence of Control Parameters and the Size of Biomedical Image Datasets on the Success of Adversarial Attacks
Authors	Vassili Kovalev, Dmitry Voynov
Abstract	In this paper, we study dependence of the success rate of adversarial attacks to the Deep Neural Networks on the biomedical image type, control parameters, and image dataset size. With this work, we are going to contribute towards accumulation of experimental results on adversarial attacks for the community dealing with biomedical images. The white-box Projected Gradient Descent attacks were examined based on 8 classification tasks and 13 image datasets containing a total of 605,080 chest X-ray and 317,000 histology images of malignant tumors. We concluded that: (1) An increase of the amplitude of perturbation in generating malicious adversarial images leads to a growth of the fraction of successful attacks for the majority of image types examined in this study. (2) Histology images tend to be less sensitive to the growth of amplitude of adversarial perturbations. (3) Percentage of successful attacks is growing with an increase of the number of iterations of the algorithm of generating adversarial perturbations with an asymptotic stabilization. (4) It was found that the success of attacks dropping dramatically when the original confidence of predicting image class exceeds 0.95. (5) The expected dependence of the percentage of successful attacks on the size of image training set was not confirmed.
Tasks
Published	2019-04-15
URL	http://arxiv.org/abs/1904.06964v1
PDF	http://arxiv.org/pdf/1904.06964v1.pdf
PWC	https://paperswithcode.com/paper/influence-of-control-parameters-and-the-size
Repo
Framework

Video Action Recognition Via Neural Architecture Searching


Title	Video Action Recognition Via Neural Architecture Searching
Authors	Wei Peng, Xiaopeng Hong, Guoying Zhao
Abstract	Deep neural networks have achieved great success for video analysis and understanding. However, designing a high-performance neural architecture requires substantial efforts and expertise. In this paper, we make the first attempt to let algorithm automatically design neural networks for video action recognition tasks. Specifically, a spatio-temporal network is developed in a differentiable space modeled by a directed acyclic graph, thus a gradient-based strategy can be performed to search an optimal architecture. Nonetheless, it is computationally expensive, since the computational burden to evaluate each architecture candidate is still heavy. To alleviate this issue, we, for the video input, introduce a temporal segment approach to reduce the computational cost without losing global video information. For the architecture, we explore in an efficient search space by introducing pseudo 3D operators. Experiments show that, our architecture outperforms popular neural architectures, under the training from scratch protocol, on the challenging UCF101 dataset, surprisingly, with only around one percentage of parameters of its manual-design counterparts.
Tasks	Temporal Action Localization
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04632v1
PDF	https://arxiv.org/pdf/1907.04632v1.pdf
PWC	https://paperswithcode.com/paper/video-action-recognition-via-neural
Repo
Framework

A recipe for creating ideal hybrid memristive-CMOS neuromorphic computing systems


Title	A recipe for creating ideal hybrid memristive-CMOS neuromorphic computing systems
Authors	Elisabetta Chicca, Giacomo Indiveri
Abstract	The development of memristive device technologies has reached a level of maturity to enable the design of complex and large-scale hybrid memristive-CMOS neural processing systems. These systems offer promising solutions for implementing novel in-memory computing architectures for machine learning and data analysis problems. We argue that they are also ideal building blocks for the integration in neuromorphic electronic circuits suitable for ultra-low power brain-inspired sensory processing systems, therefore leading to the innovative solutions for always-on edge-computing and Internet-of-Things (IoT) applications. Here we present a recipe for creating such systems based on design strategies and computing principles inspired by those used in mammalian brains. We enumerate the specifications and properties of memristive devices required to support always-on learning in neuromorphic computing systems and to minimize their power consumption. Finally, we discuss in what cases such neuromorphic systems can complement conventional processing ones and highlight the importance of exploiting the physics of both the memristive devices and of the CMOS circuits interfaced to them.
Tasks
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05637v1
PDF	https://arxiv.org/pdf/1912.05637v1.pdf
PWC	https://paperswithcode.com/paper/a-recipe-for-creating-ideal-hybrid-memristive
Repo
Framework

Human vs Machine Attention in Neural Networks: A Comparative Study


Title	Human vs Machine Attention in Neural Networks: A Comparative Study
Authors	Qiuxia Lai, Wenguan Wang, Salman Khan, Jianbing Shen, Hanqiu Sun, Ling Shao
Abstract	Recent years have witnessed a surge in the popularity of attention mechanisms encoded within deep neural networks. Inspired by the selective attention in the visual cortex, artificial attention is designed to focus a neural network on the most task-relevant input signal. Many works claim that the attention mechanism offers an extra dimension of interpretability by explaining where the neural networks look. However, recent studies demonstrate that artificial attention maps do not always coincide with common intuition. In view of these conflicting evidences, here we make a systematic study on using artificial attention and human attention in neural network design. With three example computer vision tasks (i.e., salient object segmentation, video action recognition, and fine-grained image classification), diverse representative network backbones (i.e., AlexNet, VGGNet, ResNet) and famous architectures (i.e., Two-stream, FCN), corresponding real human gaze data, and systematically conducted large-scale quantitative studies, we offer novel insights into existing artificial attention mechanisms and give preliminary answers to several key questions related to human and artificial attention mechanisms. Our overall results demonstrate that human attention is capable of bench-marking the meaningful `ground-truth’ in attention-driven tasks, where the more the artificial attention is close to the human attention, the better the performance; for higher-level vision tasks, it is case-by-case. We believe it would be advisable for attention-driven tasks to explicitly force a better alignment between artificial and human attentions to boost the performance; such alignment would also benefit making the deep networks more transparent and explainable for higher-level computer vision tasks. \|
Tasks	Fine-Grained Image Classification, Image Classification, Semantic Segmentation, Temporal Action Localization
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08764v2
PDF	https://arxiv.org/pdf/1906.08764v2.pdf
PWC	https://paperswithcode.com/paper/human-textitvs-machine-attention-in-neural
Repo
Framework

Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems


Title	Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems
Authors	Xingjun Ma, Yuhao Niu, Lin Gu, Yisen Wang, Yitian Zhao, James Bailey, Feng Lu
Abstract	Deep neural networks (DNNs) have become popular for medical image analysis tasks like cancer diagnosis and lesion detection. However, a recent study demonstrates that medical deep learning systems can be compromised by carefully-engineered adversarial examples/attacks with small imperceptible perturbations. This raises safety concerns about the deployment of these systems in clinical settings. In this paper, we provide a deeper understanding of adversarial examples in the context of medical images. We find that medical DNN models can be more vulnerable to adversarial attacks compared to models for natural images, according to two different viewpoints. Surprisingly, we also find that medical adversarial attacks can be easily detected, i.e., simple detectors can achieve over 98% detection AUC against state-of-the-art attacks, due to fundamental feature differences compared to normal examples. We believe these findings may be a useful basis to approach the design of more explainable and secure medical deep learning systems.
Tasks
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10456v2
PDF	https://arxiv.org/pdf/1907.10456v2.pdf
PWC	https://paperswithcode.com/paper/understanding-adversarial-attacks-on-deep
Repo
Framework

0-1 phase transitions in sparse spiked matrix estimation


Title	0-1 phase transitions in sparse spiked matrix estimation
Authors	Jean Barbier, Nicolas Macris
Abstract	We consider statistical models of estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix in the sparse limit. In this limit the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal strength tends to infinity at an appropriate speed. We prove explicit low-dimensional variational formulas for the asymptotic mutual information between the spike and the observed noisy matrix in suitable sparse limits. For Bernoulli and Bernoulli-Rademacher distributed vectors, and when the sparsity and signal strength satisfy an appropriate scaling relation, these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error. A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression (compressive sensing).
Tasks	3D Human Pose Estimation, Aspect-Based Sentiment Analysis, Compressive Sensing, Conditional Image Generation, Data-to-Text Generation, Dependency Parsing, Factual Visual Question Answering, Image Classification, Image Inpainting, Instance Segmentation, Language Modelling, Machine Translation, Multi-Person Pose Estimation, Object Detection, One-Shot Object Detection, Retinal Vessel Segmentation, Scene Text Detection, Semantic Segmentation, Sentence Compression, Traffic Prediction, Unsupervised Domain Adaptation, Video Frame Interpolation, Video Retrieval, Weakly Supervised Object Detection
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05030v1
PDF	https://arxiv.org/pdf/1911.05030v1.pdf
PWC	https://paperswithcode.com/paper/0-1-phase-transitions-in-sparse-spiked-matrix
Repo
Framework

Decomposing multispectral face images into diffuse and specular shading and biophysical parameters


Title	Decomposing multispectral face images into diffuse and specular shading and biophysical parameters
Authors	Sarah Alotaibi, William A. P. Smith
Abstract	We propose a novel biophysical and dichromatic reflectance model that efficiently characterises spectral skin reflectance. We show how to fit the model to multispectral face images enabling high quality estimation of diffuse and specular shading as well as biophysical parameter maps (melanin and haemoglobin). Our method works from a single image without requiring complex controlled lighting setups yet provides quantitatively accurate reconstructions and qualitatively convincing decomposition and editing.
Tasks
Published	2019-02-18
URL	http://arxiv.org/abs/1902.06557v1
PDF	http://arxiv.org/pdf/1902.06557v1.pdf
PWC	https://paperswithcode.com/paper/decomposing-multispectral-face-images-into
Repo
Framework


Title	Blind Audio Source Separation with Minimum-Volume Beta-Divergence NMF
Authors	Valentin Leplat, Nicolas Gillis, Man Shun Ang
Abstract	Considering a mixed signal composed of various audio sources and recorded with a single microphone, we consider on this paper the blind audio source separation problem which consists in isolating and extracting each of the sources. To perform this task, nonnegative matrix factorization (NMF) based on the Kullback-Leibler and Itakura-Saito $\beta$-divergences is a standard and state-of-the-art technique that uses the time-frequency representation of the signal. We present a new NMF model better suited for this task. It is based on the minimization of $\beta$-divergences along with a penalty term that promotes the columns of the dictionary matrix to have a small volume. Under some mild assumptions and in noiseless conditions, we prove that this model is provably able to identify the sources. In order to solve this problem, we propose multiplicative updates whose derivations are based on the standard majorization-minimization framework. We show on several numerical experiments that our new model is able to obtain more interpretable results than standard NMF models. Moreover, we show that it is able to recover the sources even when the number of sources present into the mixed signal is overestimated. In fact, our model automatically sets sources to zero in this situation, hence performs model order selection automatically.
Tasks
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02404v1
PDF	https://arxiv.org/pdf/1907.02404v1.pdf
PWC	https://paperswithcode.com/paper/blind-audio-source-separation-with-minimum
Repo
Framework

First-Order Preconditioning via Hypergradient Descent


Title	First-Order Preconditioning via Hypergradient Descent
Authors	Ted Moskovitz, Rui Wang, Janice Lan, Sanyam Kapoor, Thomas Miconi, Jason Yosinski, Aditya Rawal
Abstract	Standard gradient descent methods are susceptible to a range of issues that can impede training, such as high correlations and different scaling in parameter space. These difficulties can be addressed by second-order approaches that apply a preconditioning matrix to the gradient to improve convergence. Unfortunately, such algorithms typically struggle to scale to high-dimensional problems, in part because the calculation of specific preconditioners such as the inverse Hessian or Fisher information matrix is highly expensive. We introduce first-order preconditioning (FOP), a fast, scalable approach that generalizes previous work on hypergradient descent (Almeida et al., 1998; Maclaurin et al., 2015; Baydin et al., 2017) to learn a preconditioning matrix that only makes use of first-order information. Experiments show that FOP is able to improve the performance of standard deep learning optimizers on several visual classification tasks with minimal computational overhead. We also investigate the properties of the learned preconditioning matrices and perform a preliminary theoretical analysis of the algorithm.
Tasks
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08461v1
PDF	https://arxiv.org/pdf/1910.08461v1.pdf
PWC	https://paperswithcode.com/paper/first-order-preconditioning-via-hypergradient
Repo
Framework

Combating the Filter Bubble: Designing for Serendipity in a University Course Recommendation System


Title	Combating the Filter Bubble: Designing for Serendipity in a University Course Recommendation System
Authors	Zachary A. Pardos, Weijie Jiang
Abstract	Collaborative filtering based algorithms, including Recurrent Neural Networks (RNN), tend towards predicting a perpetuation of past observed behavior. In a recommendation context, this can lead to an overly narrow set of suggestions lacking in serendipity and inadvertently placing the user in what is known as a “filter bubble.” In this paper, we grapple with the issue of the filter bubble in the context of a course recommendation system in production at a public university. Most universities in the United States encourage students to explore developing interests while simultaneously advising them to adhere to course taking norms which progress them towards graduation. These competing objectives, and the stakes involved for students, make this context a particularly meaningful one for investigating real-world recommendation strategies. We introduce a novel modification to the skip-gram model applied to nine years of historic course enrollment sequences to learn course vector representations used to diversify recommendations based on similarity to a student’s specified favorite course. This model, which we call multifactor2vec, is intended to improve the semantics of the primary token embedding by also learning embeddings of potentially conflated factors of the token (e.g., instructor). Our offline testing found this model improved accuracy and recall on our course similarity and analogy validation sets over a standard skip-gram. Incorporating course catalog description text resulted in further improvements. We compare the performance of these models to the system’s existing RNN-based recommendations with a user study of undergraduates (N = 70) rating six characteristics of their course recommendations. Results of the user study show a dramatic lack of novelty in RNN recommendations and depict the characteristic trade-offs that make serendipity difficult to achieve.
Tasks
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01591v1
PDF	https://arxiv.org/pdf/1907.01591v1.pdf
PWC	https://paperswithcode.com/paper/combating-the-filter-bubble-designing-for
Repo
Framework

Privacy-preserving Crowd-guided AI Decision-making in Ethical Dilemmas


Title	Privacy-preserving Crowd-guided AI Decision-making in Ethical Dilemmas
Authors	Teng Wang, Jun Zhao, Han Yu, Jinyan Liu, Xinyu Yang, Xuebin Ren, Shuyu Shi
Abstract	With the rapid development of artificial intelligence (AI), ethical issues surrounding AI have attracted increasing attention. In particular, autonomous vehicles may face moral dilemmas in accident scenarios, such as staying the course resulting in hurting pedestrians or swerving leading to hurting passengers. To investigate such ethical dilemmas, recent studies have adopted preference aggregation, in which each voter expresses her/his preferences over decisions for the possible ethical dilemma scenarios, and a centralized system aggregates these preferences to obtain the winning decision. Although a useful methodology for building ethical AI systems, such an approach can potentially violate the privacy of voters since moral preferences are sensitive information and their disclosure can be exploited by malicious parties. In this paper, we report a first-of-its-kind privacy-preserving crowd-guided AI decision-making approach in ethical dilemmas. We adopt the notion of differential privacy to quantify privacy and consider four granularities of privacy protection by taking voter-/record-level privacy protection and centralized/distributed perturbation into account, resulting in four approaches VLCP, RLCP, VLDP, and RLDP. Moreover, we propose different algorithms to achieve these privacy protection granularities, while retaining the accuracy of the learned moral preference model. Specifically, VLCP and RLCP are implemented with the data aggregator setting a universal privacy parameter and perturbing the averaged moral preference to protect the privacy of voters’ data. VLDP and RLDP are implemented in such a way that each voter perturbs her/his local moral preference with a personalized privacy parameter. Extensive experiments on both synthetic and real data demonstrate that the proposed approach can achieve high accuracy of preference aggregation while protecting individual voter’s privacy.
Tasks	Autonomous Vehicles, Decision Making
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01562v2
PDF	https://arxiv.org/pdf/1906.01562v2.pdf
PWC	https://paperswithcode.com/paper/privacy-preserving-crowd-guided-ai-decision
Repo
Framework