April 1, 2020

3091 words 15 mins read

Paper Group ANR 504

Captioning Images Taken by People Who Are Blind. Exploration of Surgeons’ Natural Skills for Robotic Catheterization. A machine-learning software-systems approach to capture social, regulatory, governance, and climate problems. Efficient Domain Generalization via Common-Specific Low-Rank Decomposition. Rigidity Properties of the Blum Medial Axis. D …


Title	Captioning Images Taken by People Who Are Blind
Authors	Danna Gurari, Yinan Zhao, Meng Zhang, Nilavra Bhattacharya
Abstract	While an important problem in the vision community is to design algorithms that can automatically caption images, few publicly-available datasets for algorithm development directly address the interests of real users. Observing that people who are blind have relied on (human-based) image captioning services to learn about images they take for nearly a decade, we introduce the first image captioning dataset to represent this real use case. This new dataset, which we call VizWiz-Captions, consists of over 39,000 images originating from people who are blind that are each paired with five captions. We analyze this dataset to (1) characterize the typical captions, (2) characterize the diversity of content found in the images, and (3) compare its content to that found in eight popular vision datasets. We also analyze modern image captioning algorithms to identify what makes this new dataset challenging for the vision community. We publicly-share the dataset with captioning challenge instructions at https://vizwiz.org
Tasks	Image Captioning
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08565v1
PDF	https://arxiv.org/pdf/2002.08565v1.pdf
PWC	https://paperswithcode.com/paper/captioning-images-taken-by-people-who-are
Repo
Framework

Exploration of Surgeons’ Natural Skills for Robotic Catheterization


Title	Exploration of Surgeons’ Natural Skills for Robotic Catheterization
Authors	Olatunji Mumini Omisore, Wenjing Du, Tao Zhou, Shipeng Han, Kamen Ivanov, Yousef Al-Handarish, Lei Wang
Abstract	Despite having the robotic catheter systems which have recently emerged as safe way of performing cardiovascular interventions, a number of important challenges are yet to be investigated. One of them is exploration of surgeons’ natural skills during vascular catheterization with robotic systems. In this study, surgeons’ natural hand motions were investigated for identification of four basic movements used for intravascular catheterization. Controlled experiment was setup to acquire surface electromyography (sEMG) signals from six muscles that are innervated when a subject with catheterization skills made the four movements in open settings. k-means and k-NN models were implemented over average EMG and root means square features to uniquely identify the movements. The result shows great potentials of sEMG analysis towards designing intelligent cyborg control for safe and efficient robotic catheterization.
Tasks
Published	2020-03-06
URL	https://arxiv.org/abs/2003.04291v1
PDF	https://arxiv.org/pdf/2003.04291v1.pdf
PWC	https://paperswithcode.com/paper/exploration-of-surgeons-natural-skills-for
Repo
Framework


Title	A machine-learning software-systems approach to capture social, regulatory, governance, and climate problems
Authors	Christopher A. Tucker
Abstract	This paper will discuss the role of an artificially-intelligent computer system as critique-based, implicit-organizational, and an inherently necessary device, deployed in synchrony with parallel governmental policy, as a genuine means of capturing nation-population complexity in quantitative form, public contentment in societal-cooperative economic groups, regulatory proposition, and governance-effectiveness domains. It will discuss a solution involving a well-known algorithm and proffer an improved mechanism for knowledge-representation, thereby increasing range of utility, scope of influence (in terms of differentiating class sectors) and operational efficiency. It will finish with a discussion of these and other historical implications.
Tasks
Published	2020-02-23
URL	https://arxiv.org/abs/2002.11485v1
PDF	https://arxiv.org/pdf/2002.11485v1.pdf
PWC	https://paperswithcode.com/paper/a-machine-learning-software-systems-approach
Repo
Framework

Efficient Domain Generalization via Common-Specific Low-Rank Decomposition


Title	Efficient Domain Generalization via Common-Specific Low-Rank Decomposition
Authors	Vihari Piratla, Praneeth Netrapalli, Sunita Sarawagi
Abstract	Domain generalization refers to the task of training a model which generalizes to new domains that are not seen during training. We present CSD (Common Specific Decomposition), for this setting,which jointly learns a common component (which generalizes to new domains) and a domain specific component (which overfits on training domains). The domain specific components are discarded after training and only the common component is retained. The algorithm is extremely simple and involves only modifying the final linear classification layer of any given neural network architecture. We present a principled analysis to understand existing approaches, provide identifiability results of CSD,and study effect of low-rank on domain generalization. We show that CSD either matches or beats state of the art approaches for domain generalization based on domain erasure, domain perturbed data augmentation, and meta-learning. Further diagnostics on rotated MNIST, where domains are interpretable, confirm the hypothesis that CSD successfully disentangles common and domain specific components and hence leads to better domain generalization.
Tasks	Data Augmentation, Domain Generalization, Meta-Learning
Published	2020-03-28
URL	https://arxiv.org/abs/2003.12815v1
PDF	https://arxiv.org/pdf/2003.12815v1.pdf
PWC	https://paperswithcode.com/paper/efficient-domain-generalization-via-common
Repo
Framework

Rigidity Properties of the Blum Medial Axis


Title	Rigidity Properties of the Blum Medial Axis
Authors	James Damon
Abstract	We consider the Blum medial axis of a region in $\mathbb R^n$ with piecewise smooth boundary and examine its “rigidity properties”, by which we mean properties preserved under diffeomorphisms of the regions preserving the medial axis. There are several possible versions of rigidity depending on what features of the Blum medial axis we wish to retain. We use a form of the cross ratio from projective geometry to show that in the case of four smooth sheets of the medial axis meeting along a branching submanifold, the cross ratio defines a function on the branching sheet which must be preserved under any diffeomorphism of the medial axis with another. Second, we show in the generic case, along a Y-branching submanifold that there are three cross ratios involving the three limiting tangent planes of the three smooth sheets and each of the hyperplanes defined by one of the radial lines and the tangent space to the Y-branching submanifold at the point, which again must be preserved. Moreover, the triple of cross ratios then locally uniquely determines the angles between the smooth sheets. Third, we observe that for a diffeomorphism of the region preserving the Blum medial axis and the infinitesimal directions of the radial lines, the second derivative of the diffeomorphism at points of the medial axis must satisfy a condition relating the radial shape operators and hence the differential geometry of the boundaries at corresponding boundary points.
Tasks
Published	2020-02-01
URL	https://arxiv.org/abs/2002.00241v1
PDF	https://arxiv.org/pdf/2002.00241v1.pdf
PWC	https://paperswithcode.com/paper/rigidity-properties-of-the-blum-medial-axis
Repo
Framework

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation


Title	Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Authors	Mitchell A. Gordon, Kevin Duh
Abstract	We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher.
Tasks	Domain Adaptation, Machine Translation
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02877v1
PDF	https://arxiv.org/pdf/2003.02877v1.pdf
PWC	https://paperswithcode.com/paper/distill-adapt-distill-training-small-in
Repo
Framework

Toward Making the Most of Context in Neural Machine Translation


Title	Toward Making the Most of Context in Neural Machine Translation
Authors	Zaixiang Zheng, Xiang Yue, Shujian Huang, Jiajun Chen, Alexandra Birch
Abstract	Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new document-level NMT framework that deliberately models the local context of each sentence with the awareness of the global context of the document in both source and target languages. We specifically design the model to be able to deal with documents containing any number of sentences, including single sentences. This unified approach allows our model to be trained elegantly on standard datasets without needing to train on sentence and document level data separately. Experimental results demonstrate that our model outperforms Transformer baselines and previous document-level NMT models with substantial margins of up to 2.1 BLEU on state-of-the-art baselines. We also provide analyses which show the benefit of context far beyond the neighboring two or three sentences, which previous studies have typically incorporated.
Tasks	Machine Translation
Published	2020-02-19
URL	https://arxiv.org/abs/2002.07982v1
PDF	https://arxiv.org/pdf/2002.07982v1.pdf
PWC	https://paperswithcode.com/paper/toward-making-the-most-of-context-in-neural
Repo
Framework

Can Deep Learning Recognize Subtle Human Activities?


Title	Can Deep Learning Recognize Subtle Human Activities?
Authors	Vincent Jacquot, Zhuofan Ying, Gabriel Kreiman
Abstract	Deep Learning has driven recent and exciting progress in computer vision, instilling the belief that these algorithms could solve any visual task. Yet, datasets commonly used to train and test computer vision algorithms have pervasive confounding factors. Such biases make it difficult to truly estimate the performance of those algorithms and how well computer vision models can extrapolate outside the distribution in which they were trained. In this work, we propose a new action classification challenge that is performed well by humans, but poorly by state-of-the-art Deep Learning models. As a proof-of-principle, we consider three exemplary tasks: drinking, reading, and sitting. The best accuracies reached using state-of-the-art computer vision models were 61.7%, 62.8%, and 76.8%, respectively, while human participants scored above 90% accuracy on the three tasks. We propose a rigorous method to reduce confounds when creating datasets, and when comparing human versus computer vision performance. Source code and datasets are publicly available.
Tasks	Action Classification
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13852v1
PDF	https://arxiv.org/pdf/2003.13852v1.pdf
PWC	https://paperswithcode.com/paper/can-deep-learning-recognize-subtle-human
Repo
Framework

M-estimators of scatter with eigenvalue shrinkage


Title	M-estimators of scatter with eigenvalue shrinkage
Authors	Esa Ollila, Daniel P. Palomar, Frederic Pascal
Abstract	A popular regularized (shrinkage) covariance estimator is the shrinkage sample covariance matrix (SCM) which shares the same set of eigenvectors as the SCM but shrinks its eigenvalues toward its grand mean. In this paper, a more general approach is considered in which the SCM is replaced by an M-estimator of scatter matrix and a fully automatic data adaptive method to compute the optimal shrinkage parameter with minimum mean squared error is proposed. Our approach permits the use of any weight function such as Gaussian, Huber’s, or $t$ weight functions, all of which are commonly used in M-estimation framework. Our simulation examples illustrate that shrinkage M-estimators based on the proposed optimal tuning combined with robust weight function do not loose in performance to shrinkage SCM estimator when the data is Gaussian, but provide significantly improved performance when the data is sampled from a heavy-tailed distribution.
Tasks
Published	2020-02-12
URL	https://arxiv.org/abs/2002.04996v1
PDF	https://arxiv.org/pdf/2002.04996v1.pdf
PWC	https://paperswithcode.com/paper/m-estimators-of-scatter-with-eigenvalue
Repo
Framework

MINT: Deep Network Compression via Mutual Information-based Neuron Trimming


Title	MINT: Deep Network Compression via Mutual Information-based Neuron Trimming
Authors	Madan Ravi Ganesh, Jason J. Corso, Salimeh Yasaei Sekeh
Abstract	Most approaches to deep neural network compression via pruning either evaluate a filter’s importance using its weights or optimize an alternative objective function with sparsity constraints. While these methods offer a useful way to approximate contributions from similar filters, they often either ignore the dependency between layers or solve a more difficult optimization objective than standard cross-entropy. Our method, Mutual Information-based Neuron Trimming (MINT), approaches deep compression via pruning by enforcing sparsity based on the strength of the relationship between filters of adjacent layers, across every pair of layers. The relationship is calculated using conditional geometric mutual information which evaluates the amount of similar information exchanged between the filters using a graph-based criterion. When pruning a network, we ensure that retained filters contribute the majority of the information towards succeeding layers which ensures high performance. Our novel approach outperforms existing state-of-the-art compression-via-pruning methods on the standard benchmarks for this task: MNIST, CIFAR-10, and ILSVRC2012, across a variety of network architectures. In addition, we discuss our observations of a common denominator between our pruning methodology’s response to adversarial attacks and calibration statistics when compared to the original network.
Tasks	Calibration, Neural Network Compression
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08472v1
PDF	https://arxiv.org/pdf/2003.08472v1.pdf
PWC	https://paperswithcode.com/paper/mint-deep-network-compression-via-mutual
Repo
Framework

Spatiotemporal Learning of Multivehicle Interaction Patterns in Lane-Change Scenarios


Title	Spatiotemporal Learning of Multivehicle Interaction Patterns in Lane-Change Scenarios
Authors	Chengyuan Zhang, Jiacheng Zhu, Wenshuo Wang, Junqiang Xi
Abstract	Interpretation of common-yet-challenging interaction scenarios can benefit well-founded decisions for autonomous vehicles. Previous research achieved this using their prior knowledge of specific scenarios with predefined models, which limits their adaptive capabilities. This paper describes a Bayesian nonparametric approach that leverages continuous (i.e., Gaussian processes) and discrete (i.e., Dirichlet processes) stochastic processes to reveal underlying interaction patterns of the ego vehicle with other nearby vehicles. Our model relaxes dependency on the number of surrounding vehicles by developing an acceleration-sensitive velocity field based on Gaussian processes. The experiment results demonstrate that the velocity field can represent the spatial interactions between the ego vehicle and its surroundings. Then, a discrete Bayesian nonparametric model, integrating Dirichlet processes and hidden Markov models, is developed to learn the interaction patterns over the temporal space by segmenting and clustering the sequential interaction data into interpretable granular patterns automatically. We then evaluate our approach in the highway lane-change scenarios using the highD dataset, which was collected from real-world settings. Results demonstrate that our proposed Bayesian nonparametric approach provides an insight into the complicated lane-change interactions of the ego vehicle with multiple surrounding traffic participants based on the interpretable interaction patterns and their transition properties in temporal relationships. Our proposed approach sheds light on efficiently analyzing other kinds of multi-agent interactions, such as vehicle-pedestrian interactions.
Tasks	Autonomous Vehicles, Gaussian Processes
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00759v1
PDF	https://arxiv.org/pdf/2003.00759v1.pdf
PWC	https://paperswithcode.com/paper/spatiotemporal-learning-of-multivehicle
Repo
Framework

Mixed Strategies for Robust Optimization of Unknown Objectives


Title	Mixed Strategies for Robust Optimization of Unknown Objectives
Authors	Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause
Abstract	We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter. For this setting, we design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations. GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value. To achieve this, it combines techniques from online learning with nonparametric confidence bounds from Gaussian processes. Our theoretical results characterize the number of samples required by GP-MRO to discover a robust near-optimal mixed strategy for different GP kernels of interest. We experimentally demonstrate the performance of our algorithm on synthetic datasets and on human-assisted trajectory planning tasks for autonomous vehicles. In our simulations, we show that robust deterministic strategies can be overly conservative, while the mixed strategies found by GP-MRO significantly improve the overall performance.
Tasks	Autonomous Vehicles, Gaussian Processes
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12613v2
PDF	https://arxiv.org/pdf/2002.12613v2.pdf
PWC	https://paperswithcode.com/paper/mixed-strategies-for-robust-optimization-of
Repo
Framework

Gaussian Graphical Model exploration and selection in high dimension low sample size setting


Title	Gaussian Graphical Model exploration and selection in high dimension low sample size setting
Authors	Thomas Lartigue, Simona Bottani, Stephanie Baron, Olivier Colliot, Stanley Durrleman, Stéphanie Allassonnière
Abstract	Gaussian Graphical Models (GGM) are often used to describe the conditional correlations between the components of a random vector. In this article, we compare two families of GGM inference methods: nodewise edge selection and penalised likelihood maximisation. We demonstrate on synthetic data that, when the sample size is small, the two methods produce graphs with either too few or too many edges when compared to the real one. As a result, we propose a composite procedure that explores a family of graphs with an nodewise numerical scheme and selects a candidate among them with an overall likelihood criterion. We demonstrate that, when the number of observations is small, this selection method yields graphs closer to the truth and corresponding to distributions with better KL divergence with regards to the real distribution than the other two. Finally, we show the interest of our algorithm on two concrete cases: first on brain imaging data, then on biological nephrology data. In both cases our results are more in line with current knowledge in each field.
Tasks
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05169v1
PDF	https://arxiv.org/pdf/2003.05169v1.pdf
PWC	https://paperswithcode.com/paper/gaussian-graphical-model-exploration-and
Repo
Framework

Infinitely Wide Graph Convolutional Networks: Semi-supervised Learning via Gaussian Processes


Title	Infinitely Wide Graph Convolutional Networks: Semi-supervised Learning via Gaussian Processes
Authors	Jilin Hu, Jianbing Shen, Bin Yang, Ling Shao
Abstract	Graph convolutional neural networks~(GCNs) have recently demonstrated promising results on graph-based semi-supervised classification, but little work has been done to explore their theoretical properties. Recently, several deep neural networks, e.g., fully connected and convolutional neural networks, with infinite hidden units have been proved to be equivalent to Gaussian processes~(GPs). To exploit both the powerful representational capacity of GCNs and the great expressive power of GPs, we investigate similar properties of infinitely wide GCNs. More specifically, we propose a GP regression model via GCNs~(GPGC) for graph-based semi-supervised learning. In the process, we formulate the kernel matrix computation of GPGC in an iterative analytical form. Finally, we derive a conditional distribution for the labels of unobserved nodes based on the graph structure, labels for the observed nodes, and the feature matrix of all the nodes. We conduct extensive experiments to evaluate the semi-supervised classification performance of GPGC and demonstrate that it outperforms other state-of-the-art methods by a clear margin on all the datasets while being efficient.
Tasks	Gaussian Processes
Published	2020-02-26
URL	https://arxiv.org/abs/2002.12168v1
PDF	https://arxiv.org/pdf/2002.12168v1.pdf
PWC	https://paperswithcode.com/paper/infinitely-wide-graph-convolutional-networks
Repo
Framework

Learn Task First or Learn Human Partner First? Deep Reinforcement Learning of Human-Robot Cooperation in Asymmetric Hierarchical Dynamic Task


Title	Learn Task First or Learn Human Partner First? Deep Reinforcement Learning of Human-Robot Cooperation in Asymmetric Hierarchical Dynamic Task
Authors	Lingfeng Tao, Michael Bowman, Jiucai Zhang, Xiaoli Zhang
Abstract	The deep reinforcement learning method for human-robot cooperation (HRC) is promising for its high performance when robots are learning complex tasks. However, the applicability of such an approach in a real-world context is limited due to long training time, additional training difficulty caused by inconsistent human performance and the inherent instability of policy exploration. With this approach, the robot has two dynamics to learn: how to accomplish the given physical task and how to cooperate with the human partner. Furthermore, the dynamics of the task and human partner are usually coupled, which means the observable outcomes and behaviors are coupled. It is hard for the robot to efficiently learn from coupled observations. In this paper, we hypothesize that the robot needs to learn the task separately from learning the behavior of the human partner to improve learning efficiency and outcomes. This leads to a fundamental question: Should the robot learn the task first or learn the human behavior first (Fig. 1)? We develop a novel hierarchical rewards mechanism with a task decomposition method that enables the robot to efficiently learn a complex hierarchical dynamic task and human behavior for better HRC. The algorithm is validated in a hierarchical control task in a simulated environment with human subject experiments, and we are able to answer the question by analyzing the collected experiment results.
Tasks
Published	2020-03-01
URL	https://arxiv.org/abs/2003.00400v1
PDF	https://arxiv.org/pdf/2003.00400v1.pdf
PWC	https://paperswithcode.com/paper/learn-task-first-or-learn-human-partner-first
Repo
Framework