Paper Group ANR 504
Captioning Images Taken by People Who Are Blind. Exploration of Surgeons’ Natural Skills for Robotic Catheterization. A machine-learning software-systems approach to capture social, regulatory, governance, and climate problems. Efficient Domain Generalization via Common-Specific Low-Rank Decomposition. Rigidity Properties of the Blum Medial Axis. D …
Captioning Images Taken by People Who Are Blind
Title | Captioning Images Taken by People Who Are Blind |
Authors | Danna Gurari, Yinan Zhao, Meng Zhang, Nilavra Bhattacharya |
Abstract | While an important problem in the vision community is to design algorithms that can automatically caption images, few publicly-available datasets for algorithm development directly address the interests of real users. Observing that people who are blind have relied on (human-based) image captioning services to learn about images they take for nearly a decade, we introduce the first image captioning dataset to represent this real use case. This new dataset, which we call VizWiz-Captions, consists of over 39,000 images originating from people who are blind that are each paired with five captions. We analyze this dataset to (1) characterize the typical captions, (2) characterize the diversity of content found in the images, and (3) compare its content to that found in eight popular vision datasets. We also analyze modern image captioning algorithms to identify what makes this new dataset challenging for the vision community. We publicly-share the dataset with captioning challenge instructions at https://vizwiz.org |
Tasks | Image Captioning |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08565v1 |
https://arxiv.org/pdf/2002.08565v1.pdf | |
PWC | https://paperswithcode.com/paper/captioning-images-taken-by-people-who-are |
Repo | |
Framework | |
Exploration of Surgeons’ Natural Skills for Robotic Catheterization
Title | Exploration of Surgeons’ Natural Skills for Robotic Catheterization |
Authors | Olatunji Mumini Omisore, Wenjing Du, Tao Zhou, Shipeng Han, Kamen Ivanov, Yousef Al-Handarish, Lei Wang |
Abstract | Despite having the robotic catheter systems which have recently emerged as safe way of performing cardiovascular interventions, a number of important challenges are yet to be investigated. One of them is exploration of surgeons’ natural skills during vascular catheterization with robotic systems. In this study, surgeons’ natural hand motions were investigated for identification of four basic movements used for intravascular catheterization. Controlled experiment was setup to acquire surface electromyography (sEMG) signals from six muscles that are innervated when a subject with catheterization skills made the four movements in open settings. k-means and k-NN models were implemented over average EMG and root means square features to uniquely identify the movements. The result shows great potentials of sEMG analysis towards designing intelligent cyborg control for safe and efficient robotic catheterization. |
Tasks | |
Published | 2020-03-06 |
URL | https://arxiv.org/abs/2003.04291v1 |
https://arxiv.org/pdf/2003.04291v1.pdf | |
PWC | https://paperswithcode.com/paper/exploration-of-surgeons-natural-skills-for |
Repo | |
Framework | |
A machine-learning software-systems approach to capture social, regulatory, governance, and climate problems
Title | A machine-learning software-systems approach to capture social, regulatory, governance, and climate problems |
Authors | Christopher A. Tucker |
Abstract | This paper will discuss the role of an artificially-intelligent computer system as critique-based, implicit-organizational, and an inherently necessary device, deployed in synchrony with parallel governmental policy, as a genuine means of capturing nation-population complexity in quantitative form, public contentment in societal-cooperative economic groups, regulatory proposition, and governance-effectiveness domains. It will discuss a solution involving a well-known algorithm and proffer an improved mechanism for knowledge-representation, thereby increasing range of utility, scope of influence (in terms of differentiating class sectors) and operational efficiency. It will finish with a discussion of these and other historical implications. |
Tasks | |
Published | 2020-02-23 |
URL | https://arxiv.org/abs/2002.11485v1 |
https://arxiv.org/pdf/2002.11485v1.pdf | |
PWC | https://paperswithcode.com/paper/a-machine-learning-software-systems-approach |
Repo | |
Framework | |
Efficient Domain Generalization via Common-Specific Low-Rank Decomposition
Title | Efficient Domain Generalization via Common-Specific Low-Rank Decomposition |
Authors | Vihari Piratla, Praneeth Netrapalli, Sunita Sarawagi |
Abstract | Domain generalization refers to the task of training a model which generalizes to new domains that are not seen during training. We present CSD (Common Specific Decomposition), for this setting,which jointly learns a common component (which generalizes to new domains) and a domain specific component (which overfits on training domains). The domain specific components are discarded after training and only the common component is retained. The algorithm is extremely simple and involves only modifying the final linear classification layer of any given neural network architecture. We present a principled analysis to understand existing approaches, provide identifiability results of CSD,and study effect of low-rank on domain generalization. We show that CSD either matches or beats state of the art approaches for domain generalization based on domain erasure, domain perturbed data augmentation, and meta-learning. Further diagnostics on rotated MNIST, where domains are interpretable, confirm the hypothesis that CSD successfully disentangles common and domain specific components and hence leads to better domain generalization. |
Tasks | Data Augmentation, Domain Generalization, Meta-Learning |
Published | 2020-03-28 |
URL | https://arxiv.org/abs/2003.12815v1 |
https://arxiv.org/pdf/2003.12815v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-domain-generalization-via-common |
Repo | |
Framework | |
Rigidity Properties of the Blum Medial Axis
Title | Rigidity Properties of the Blum Medial Axis |
Authors | James Damon |
Abstract | We consider the Blum medial axis of a region in $\mathbb R^n$ with piecewise smooth boundary and examine its “rigidity properties”, by which we mean properties preserved under diffeomorphisms of the regions preserving the medial axis. There are several possible versions of rigidity depending on what features of the Blum medial axis we wish to retain. We use a form of the cross ratio from projective geometry to show that in the case of four smooth sheets of the medial axis meeting along a branching submanifold, the cross ratio defines a function on the branching sheet which must be preserved under any diffeomorphism of the medial axis with another. Second, we show in the generic case, along a Y-branching submanifold that there are three cross ratios involving the three limiting tangent planes of the three smooth sheets and each of the hyperplanes defined by one of the radial lines and the tangent space to the Y-branching submanifold at the point, which again must be preserved. Moreover, the triple of cross ratios then locally uniquely determines the angles between the smooth sheets. Third, we observe that for a diffeomorphism of the region preserving the Blum medial axis and the infinitesimal directions of the radial lines, the second derivative of the diffeomorphism at points of the medial axis must satisfy a condition relating the radial shape operators and hence the differential geometry of the boundaries at corresponding boundary points. |
Tasks | |
Published | 2020-02-01 |
URL | https://arxiv.org/abs/2002.00241v1 |
https://arxiv.org/pdf/2002.00241v1.pdf | |
PWC | https://paperswithcode.com/paper/rigidity-properties-of-the-blum-medial-axis |
Repo | |
Framework | |
Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Title | Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation |
Authors | Mitchell A. Gordon, Kevin Duh |
Abstract | We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher. |
Tasks | Domain Adaptation, Machine Translation |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02877v1 |
https://arxiv.org/pdf/2003.02877v1.pdf | |
PWC | https://paperswithcode.com/paper/distill-adapt-distill-training-small-in |
Repo | |
Framework | |
Toward Making the Most of Context in Neural Machine Translation
Title | Toward Making the Most of Context in Neural Machine Translation |
Authors | Zaixiang Zheng, Xiang Yue, Shujian Huang, Jiajun Chen, Alexandra Birch |
Abstract | Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new document-level NMT framework that deliberately models the local context of each sentence with the awareness of the global context of the document in both source and target languages. We specifically design the model to be able to deal with documents containing any number of sentences, including single sentences. This unified approach allows our model to be trained elegantly on standard datasets without needing to train on sentence and document level data separately. Experimental results demonstrate that our model outperforms Transformer baselines and previous document-level NMT models with substantial margins of up to 2.1 BLEU on state-of-the-art baselines. We also provide analyses which show the benefit of context far beyond the neighboring two or three sentences, which previous studies have typically incorporated. |
Tasks | Machine Translation |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.07982v1 |
https://arxiv.org/pdf/2002.07982v1.pdf | |
PWC | https://paperswithcode.com/paper/toward-making-the-most-of-context-in-neural |
Repo | |
Framework | |
Can Deep Learning Recognize Subtle Human Activities?
Title | Can Deep Learning Recognize Subtle Human Activities? |
Authors | Vincent Jacquot, Zhuofan Ying, Gabriel Kreiman |
Abstract | Deep Learning has driven recent and exciting progress in computer vision, instilling the belief that these algorithms could solve any visual task. Yet, datasets commonly used to train and test computer vision algorithms have pervasive confounding factors. Such biases make it difficult to truly estimate the performance of those algorithms and how well computer vision models can extrapolate outside the distribution in which they were trained. In this work, we propose a new action classification challenge that is performed well by humans, but poorly by state-of-the-art Deep Learning models. As a proof-of-principle, we consider three exemplary tasks: drinking, reading, and sitting. The best accuracies reached using state-of-the-art computer vision models were 61.7%, 62.8%, and 76.8%, respectively, while human participants scored above 90% accuracy on the three tasks. We propose a rigorous method to reduce confounds when creating datasets, and when comparing human versus computer vision performance. Source code and datasets are publicly available. |
Tasks | Action Classification |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13852v1 |
https://arxiv.org/pdf/2003.13852v1.pdf | |
PWC | https://paperswithcode.com/paper/can-deep-learning-recognize-subtle-human |
Repo | |
Framework | |
M-estimators of scatter with eigenvalue shrinkage
Title | M-estimators of scatter with eigenvalue shrinkage |
Authors | Esa Ollila, Daniel P. Palomar, Frederic Pascal |
Abstract | A popular regularized (shrinkage) covariance estimator is the shrinkage sample covariance matrix (SCM) which shares the same set of eigenvectors as the SCM but shrinks its eigenvalues toward its grand mean. In this paper, a more general approach is considered in which the SCM is replaced by an M-estimator of scatter matrix and a fully automatic data adaptive method to compute the optimal shrinkage parameter with minimum mean squared error is proposed. Our approach permits the use of any weight function such as Gaussian, Huber’s, or $t$ weight functions, all of which are commonly used in M-estimation framework. Our simulation examples illustrate that shrinkage M-estimators based on the proposed optimal tuning combined with robust weight function do not loose in performance to shrinkage SCM estimator when the data is Gaussian, but provide significantly improved performance when the data is sampled from a heavy-tailed distribution. |
Tasks | |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.04996v1 |
https://arxiv.org/pdf/2002.04996v1.pdf | |
PWC | https://paperswithcode.com/paper/m-estimators-of-scatter-with-eigenvalue |
Repo | |
Framework | |
MINT: Deep Network Compression via Mutual Information-based Neuron Trimming
Title | MINT: Deep Network Compression via Mutual Information-based Neuron Trimming |
Authors | Madan Ravi Ganesh, Jason J. Corso, Salimeh Yasaei Sekeh |
Abstract | Most approaches to deep neural network compression via pruning either evaluate a filter’s importance using its weights or optimize an alternative objective function with sparsity constraints. While these methods offer a useful way to approximate contributions from similar filters, they often either ignore the dependency between layers or solve a more difficult optimization objective than standard cross-entropy. Our method, Mutual Information-based Neuron Trimming (MINT), approaches deep compression via pruning by enforcing sparsity based on the strength of the relationship between filters of adjacent layers, across every pair of layers. The relationship is calculated using conditional geometric mutual information which evaluates the amount of similar information exchanged between the filters using a graph-based criterion. When pruning a network, we ensure that retained filters contribute the majority of the information towards succeeding layers which ensures high performance. Our novel approach outperforms existing state-of-the-art compression-via-pruning methods on the standard benchmarks for this task: MNIST, CIFAR-10, and ILSVRC2012, across a variety of network architectures. In addition, we discuss our observations of a common denominator between our pruning methodology’s response to adversarial attacks and calibration statistics when compared to the original network. |
Tasks | Calibration, Neural Network Compression |
Published | 2020-03-18 |
URL | https://arxiv.org/abs/2003.08472v1 |
https://arxiv.org/pdf/2003.08472v1.pdf | |
PWC | https://paperswithcode.com/paper/mint-deep-network-compression-via-mutual |
Repo | |
Framework | |
Spatiotemporal Learning of Multivehicle Interaction Patterns in Lane-Change Scenarios
Title | Spatiotemporal Learning of Multivehicle Interaction Patterns in Lane-Change Scenarios |
Authors | Chengyuan Zhang, Jiacheng Zhu, Wenshuo Wang, Junqiang Xi |
Abstract | Interpretation of common-yet-challenging interaction scenarios can benefit well-founded decisions for autonomous vehicles. Previous research achieved this using their prior knowledge of specific scenarios with predefined models, which limits their adaptive capabilities. This paper describes a Bayesian nonparametric approach that leverages continuous (i.e., Gaussian processes) and discrete (i.e., Dirichlet processes) stochastic processes to reveal underlying interaction patterns of the ego vehicle with other nearby vehicles. Our model relaxes dependency on the number of surrounding vehicles by developing an acceleration-sensitive velocity field based on Gaussian processes. The experiment results demonstrate that the velocity field can represent the spatial interactions between the ego vehicle and its surroundings. Then, a discrete Bayesian nonparametric model, integrating Dirichlet processes and hidden Markov models, is developed to learn the interaction patterns over the temporal space by segmenting and clustering the sequential interaction data into interpretable granular patterns automatically. We then evaluate our approach in the highway lane-change scenarios using the highD dataset, which was collected from real-world settings. Results demonstrate that our proposed Bayesian nonparametric approach provides an insight into the complicated lane-change interactions of the ego vehicle with multiple surrounding traffic participants based on the interpretable interaction patterns and their transition properties in temporal relationships. Our proposed approach sheds light on efficiently analyzing other kinds of multi-agent interactions, such as vehicle-pedestrian interactions. |
Tasks | Autonomous Vehicles, Gaussian Processes |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.00759v1 |
https://arxiv.org/pdf/2003.00759v1.pdf | |
PWC | https://paperswithcode.com/paper/spatiotemporal-learning-of-multivehicle |
Repo | |
Framework | |
Mixed Strategies for Robust Optimization of Unknown Objectives
Title | Mixed Strategies for Robust Optimization of Unknown Objectives |
Authors | Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause |
Abstract | We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter. For this setting, we design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations. GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value. To achieve this, it combines techniques from online learning with nonparametric confidence bounds from Gaussian processes. Our theoretical results characterize the number of samples required by GP-MRO to discover a robust near-optimal mixed strategy for different GP kernels of interest. We experimentally demonstrate the performance of our algorithm on synthetic datasets and on human-assisted trajectory planning tasks for autonomous vehicles. In our simulations, we show that robust deterministic strategies can be overly conservative, while the mixed strategies found by GP-MRO significantly improve the overall performance. |
Tasks | Autonomous Vehicles, Gaussian Processes |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12613v2 |
https://arxiv.org/pdf/2002.12613v2.pdf | |
PWC | https://paperswithcode.com/paper/mixed-strategies-for-robust-optimization-of |
Repo | |
Framework | |
Gaussian Graphical Model exploration and selection in high dimension low sample size setting
Title | Gaussian Graphical Model exploration and selection in high dimension low sample size setting |
Authors | Thomas Lartigue, Simona Bottani, Stephanie Baron, Olivier Colliot, Stanley Durrleman, Stéphanie Allassonnière |
Abstract | Gaussian Graphical Models (GGM) are often used to describe the conditional correlations between the components of a random vector. In this article, we compare two families of GGM inference methods: nodewise edge selection and penalised likelihood maximisation. We demonstrate on synthetic data that, when the sample size is small, the two methods produce graphs with either too few or too many edges when compared to the real one. As a result, we propose a composite procedure that explores a family of graphs with an nodewise numerical scheme and selects a candidate among them with an overall likelihood criterion. We demonstrate that, when the number of observations is small, this selection method yields graphs closer to the truth and corresponding to distributions with better KL divergence with regards to the real distribution than the other two. Finally, we show the interest of our algorithm on two concrete cases: first on brain imaging data, then on biological nephrology data. In both cases our results are more in line with current knowledge in each field. |
Tasks | |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05169v1 |
https://arxiv.org/pdf/2003.05169v1.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-graphical-model-exploration-and |
Repo | |
Framework | |
Infinitely Wide Graph Convolutional Networks: Semi-supervised Learning via Gaussian Processes
Title | Infinitely Wide Graph Convolutional Networks: Semi-supervised Learning via Gaussian Processes |
Authors | Jilin Hu, Jianbing Shen, Bin Yang, Ling Shao |
Abstract | Graph convolutional neural networks~(GCNs) have recently demonstrated promising results on graph-based semi-supervised classification, but little work has been done to explore their theoretical properties. Recently, several deep neural networks, e.g., fully connected and convolutional neural networks, with infinite hidden units have been proved to be equivalent to Gaussian processes~(GPs). To exploit both the powerful representational capacity of GCNs and the great expressive power of GPs, we investigate similar properties of infinitely wide GCNs. More specifically, we propose a GP regression model via GCNs~(GPGC) for graph-based semi-supervised learning. In the process, we formulate the kernel matrix computation of GPGC in an iterative analytical form. Finally, we derive a conditional distribution for the labels of unobserved nodes based on the graph structure, labels for the observed nodes, and the feature matrix of all the nodes. We conduct extensive experiments to evaluate the semi-supervised classification performance of GPGC and demonstrate that it outperforms other state-of-the-art methods by a clear margin on all the datasets while being efficient. |
Tasks | Gaussian Processes |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.12168v1 |
https://arxiv.org/pdf/2002.12168v1.pdf | |
PWC | https://paperswithcode.com/paper/infinitely-wide-graph-convolutional-networks |
Repo | |
Framework | |
Learn Task First or Learn Human Partner First? Deep Reinforcement Learning of Human-Robot Cooperation in Asymmetric Hierarchical Dynamic Task
Title | Learn Task First or Learn Human Partner First? Deep Reinforcement Learning of Human-Robot Cooperation in Asymmetric Hierarchical Dynamic Task |
Authors | Lingfeng Tao, Michael Bowman, Jiucai Zhang, Xiaoli Zhang |
Abstract | The deep reinforcement learning method for human-robot cooperation (HRC) is promising for its high performance when robots are learning complex tasks. However, the applicability of such an approach in a real-world context is limited due to long training time, additional training difficulty caused by inconsistent human performance and the inherent instability of policy exploration. With this approach, the robot has two dynamics to learn: how to accomplish the given physical task and how to cooperate with the human partner. Furthermore, the dynamics of the task and human partner are usually coupled, which means the observable outcomes and behaviors are coupled. It is hard for the robot to efficiently learn from coupled observations. In this paper, we hypothesize that the robot needs to learn the task separately from learning the behavior of the human partner to improve learning efficiency and outcomes. This leads to a fundamental question: Should the robot learn the task first or learn the human behavior first (Fig. 1)? We develop a novel hierarchical rewards mechanism with a task decomposition method that enables the robot to efficiently learn a complex hierarchical dynamic task and human behavior for better HRC. The algorithm is validated in a hierarchical control task in a simulated environment with human subject experiments, and we are able to answer the question by analyzing the collected experiment results. |
Tasks | |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00400v1 |
https://arxiv.org/pdf/2003.00400v1.pdf | |
PWC | https://paperswithcode.com/paper/learn-task-first-or-learn-human-partner-first |
Repo | |
Framework | |