January 27, 2020

3299 words 16 mins read

Paper Group ANR 1308

Learning Navigation Subroutines from Egocentric Videos. Interpretable deep Gaussian processes with moments. Context-aware Entity Linking with Attentive Neural Networks on Wikidata Knowledge Graph. Audio-based automatic mating success prediction of giant pandas. SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild. Ge …


Title	Learning Navigation Subroutines from Egocentric Videos
Authors	Ashish Kumar, Saurabh Gupta, Jitendra Malik
Abstract	Planning at a higher level of abstraction instead of low level torques improves the sample efficiency in reinforcement learning, and computational efficiency in classical planning. We propose a method to learn such hierarchical abstractions, or subroutines from egocentric video data of experts performing tasks. We learn a self-supervised inverse model on small amounts of random interaction data to pseudo-label the expert egocentric videos with agent actions. Visuomotor subroutines are acquired from these pseudo-labeled videos by learning a latent intent-conditioned policy that predicts the inferred pseudo-actions from the corresponding image observations. We demonstrate our proposed approach in context of navigation, and show that we can successfully learn consistent and diverse visuomotor subroutines from passive egocentric videos. We demonstrate the utility of our acquired visuomotor subroutines by using them as is for exploration, and as sub-policies in a hierarchical RL framework for reaching point goals and semantic goals. We also demonstrate behavior of our subroutines in the real world, by deploying them on a real robotic platform. Project website: https://ashishkumar1993.github.io/subroutines/.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12612v2
PDF	https://arxiv.org/pdf/1905.12612v2.pdf
PWC	https://paperswithcode.com/paper/learning-navigation-subroutines-by-watching
Repo
Framework

Interpretable deep Gaussian processes with moments


Title	Interpretable deep Gaussian processes with moments
Authors	Chi-Ken Lu, Scott Cheng-Hsin Yang, Xiaoran Hao, Patrick Shafto
Abstract	Deep Gaussian Processes (DGPs) combine the expressiveness of Deep Neural Networks (DNNs) with quantified uncertainty of Gaussian Processes (GPs). Expressive power and intractable inference both result from the non-Gaussian distribution over composition functions. We propose interpretable DGP based on approximating DGP as a GP by calculating the exact moments, which additionally identify the heavy-tailed nature of some DGP distributions. Consequently, our approach admits interpretation as both NNs with specified activation functions and as a variational approximation to DGP. We identify the expressivity parameter of DGP and find non-local and non-stationary correlation from DGP composition. We provide general recipes for deriving the effective kernels for DGP of two, three, or infinitely many layers, composed of homogeneous or heterogeneous kernels. Results illustrate the expressiveness of our effective kernels through samples from the prior and inference on simulated and real data and demonstrate advantages of interpretability by analysis of analytic forms, and draw relations and equivalences across kernels.
Tasks	Gaussian Processes
Published	2019-05-27
URL	https://arxiv.org/abs/1905.10963v3
PDF	https://arxiv.org/pdf/1905.10963v3.pdf
PWC	https://paperswithcode.com/paper/interpretable-deep-gaussian-processes
Repo
Framework

Context-aware Entity Linking with Attentive Neural Networks on Wikidata Knowledge Graph


Title	Context-aware Entity Linking with Attentive Neural Networks on Wikidata Knowledge Graph
Authors	Isaiah Onando Mulang, Kuldeep Singh, Akhilesh Vyas, Saeedeh Shekarpour, Ahmad Sakor, Maria Esther Vidal, Soren Auer, Jens Lehmann
Abstract	The Entity Linking (EL) approaches have been a long-standing research field and find applicability in various use cases such as semantic search, text annotation, question answering, etc. Although effective and robust, current approaches are still limited to particular knowledge repositories (e.g. Wikipedia) or specific knowledge graphs (e.g. Freebase, DBpedia, and YAGO). The collaborative knowledge graphs such as Wikidata excessively rely on the crowd to author the information. Since the crowd is not bound to a standard protocol for assigning entity titles, the knowledge graph is populated by non-standard, noisy, long or even sometimes awkward titles. The issue of long, implicit, and nonstandard entity representations is a challenge in EL approaches for gaining high precision and recall. In this paper, we advance the state-of-the-art approaches by developing a context-aware attentive neural network approach for entity linking on Wikidata. Our approach contributes by exploiting the sufficient context from a Knowledge Graph as a source of background knowledge, which is then fed into the neural network. This approach demonstrates merit to address challenges associated with entity titles (multi-word, long, implicit, case-sensitive). Our experimental study shows $\approx$8% improvements over the baseline approach, and significantly outperform an end to end approach for Wikidata entity linking. This work, first of its kind, opens a new direction for the research community to pay attention to developing context-aware EL approaches for collaborative knowledge graphs.
Tasks	Entity Linking, Knowledge Graphs, Question Answering
Published	2019-12-12
URL	https://arxiv.org/abs/1912.06214v1
PDF	https://arxiv.org/pdf/1912.06214v1.pdf
PWC	https://paperswithcode.com/paper/context-aware-entity-linking-with-attentive
Repo
Framework

Audio-based automatic mating success prediction of giant pandas


Title	Audio-based automatic mating success prediction of giant pandas
Authors	WeiRan Yan, MaoLin Tang, Qijun Zhao, Peng Chen, Dunwu Qi, Rong Hou, Zhihe Zhang
Abstract	Giant pandas, stereotyped as silent animals, make significantly more vocal sounds during breeding season, suggesting that sounds are essential for coordinating their reproduction and expression of mating preference. Previous biological studies have also proven that giant panda sounds are correlated with mating results and reproduction. This paper makes the first attempt to devise an automatic method for predicting mating success of giant pandas based on their vocal sounds. Given an audio sequence of mating giant pandas recorded during breeding encounters, we first crop out the segments with vocal sound of giant pandas, and normalize its magnitude, and length. We then extract acoustic features from the audio segment and feed the features into a deep neural network, which classifies the mating into success or failure. The proposed deep neural network employs convolution layers followed by bidirection gated recurrent units to extract vocal features, and applies attention mechanism to force the network to focus on most relevant features. Evaluation experiments on a data set collected during the past nine years obtain promising results, proving the potential of audio-based automatic mating success prediction methods in assisting giant panda reproduction.
Tasks
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11333v1
PDF	https://arxiv.org/pdf/1912.11333v1.pdf
PWC	https://paperswithcode.com/paper/audio-based-automatic-mating-success
Repo
Framework

SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild


Title	SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild
Authors	Jean Kossaifi, Robert Walecki, Yannis Panagakis, Jie Shen, Maximilian Schmitt, Fabien Ringeval, Jing Han, Vedhas Pandit, Antoine Toisoul, Bjorn Schuller, Kam Star, Elnar Hajiyev, Maja Pantic
Abstract	Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are increasingly becoming an indispensable part of our life. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2000 minutes of audio-visual data of 398 people coming from six cultures, 50% female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal and (dis)liking intensity estimation.
Tasks
Published	2019-01-09
URL	https://arxiv.org/abs/1901.02839v2
PDF	https://arxiv.org/pdf/1901.02839v2.pdf
PWC	https://paperswithcode.com/paper/sewa-db-a-rich-database-for-audio-visual
Repo
Framework


Title	Generating Robust Supervision for Learning-Based Visual Navigation Using Hamilton-Jacobi Reachability
Authors	Anjian Li, Somil Bansal, Georgios Giovanis, Varun Tolani, Claire Tomlin, Mo Chen
Abstract	In Bansal et al. (2019), a novel visual navigation framework that combines learning-based and model-based approaches has been proposed. Specifically, a Convolutional Neural Network (CNN) predicts a waypoint that is used by the dynamics model for planning and tracking a trajectory to the waypoint. However, the CNN inevitably makes prediction errors, ultimately leading to collisions, especially when the robot is navigating through cluttered and tight spaces. In this paper, we present a novel Hamilton-Jacobi (HJ) reachability-based method to generate supervision for the CNN for waypoint prediction. By modeling the prediction error of the CNN as disturbances in dynamics, the proposed method generates waypoints that are robust to these disturbances, and consequently to the prediction errors. Moreover, using globally optimal HJ reachability analysis leads to predicting waypoints that are time-efficient and do not exhibit greedy behavior. Through simulations and experiments on a hardware testbed, we demonstrate the advantages of the proposed approach for navigation tasks where the robot needs to navigate through cluttered, narrow indoor environments.
Tasks	Visual Navigation
Published	2019-12-20
URL	https://arxiv.org/abs/1912.10120v1
PDF	https://arxiv.org/pdf/1912.10120v1.pdf
PWC	https://paperswithcode.com/paper/generating-robust-supervision-for-learning
Repo
Framework

Large-scale interactive object segmentation with human annotators


Title	Large-scale interactive object segmentation with human annotators
Authors	Rodrigo Benenson, Stefan Popov, Vittorio Ferrari
Abstract	Manually annotating object segmentation masks is very time consuming. Interactive object segmentation methods offer a more efficient alternative where a human annotator and a machine segmentation model collaborate. In this paper we make several contributions to interactive segmentation: (1) we systematically explore in simulation the design space of deep interactive segmentation models and report new insights and caveats; (2) we execute a large-scale annotation campaign with real human annotators, producing masks for 2.5M instances on the OpenImages dataset. We plan to release this data publicly, forming the largest existing dataset for instance segmentation. Moreover, by re-annotating part of the COCO dataset, we show that we can produce instance masks 3 times faster than traditional polygon drawing tools while also providing better quality. (3) We present a technique for automatically estimating the quality of the produced masks which exploits indirect signals from the annotation process.
Tasks	Instance Segmentation, Interactive Segmentation, Semantic Segmentation
Published	2019-03-26
URL	http://arxiv.org/abs/1903.10830v2
PDF	http://arxiv.org/pdf/1903.10830v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-interactive-object-segmentation
Repo
Framework

On Expected Accuracy


Title	On Expected Accuracy
Authors	Ozan İrsoy
Abstract	We empirically investigate the (negative) expected accuracy as an alternative loss function to cross entropy (negative log likelihood) for classification tasks. Coupled with softmax activation, it has small derivatives over most of its domain, and is therefore hard to optimize. A modified, leaky version is evaluated on a variety of classification tasks, including digit recognition, image classification, sequence tagging and tree tagging, using a variety of neural architectures such as logistic regression, multilayer perceptron, CNN, LSTM and Tree-LSTM. We show that it yields comparable or better accuracy compared to cross entropy. Furthermore, the proposed objective is shown to be more robust to label noise.
Tasks	Image Classification
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00448v1
PDF	http://arxiv.org/pdf/1905.00448v1.pdf
PWC	https://paperswithcode.com/paper/on-expected-accuracy
Repo
Framework

Domain-invariant Learning using Adaptive Filter Decomposition


Title	Domain-invariant Learning using Adaptive Filter Decomposition
Authors	Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, Qiang Qiu
Abstract	Domain shifts are frequently encountered in real-world scenarios. In this paper, we consider the problem of domain-invariant deep learning by explicitly modeling domain shifts with only a small amount of domain-specific parameters in a Convolutional Neural Network (CNN). By exploiting the observation that a convolutional filter can be well approximated as a linear combination of a small set of basis elements, we show for the first time, both empirically and theoretically, that domain shifts can be effectively handled by decomposing a regular convolutional layer into a domain-specific basis layer and a domain-shared basis coefficient layer, while both remain convolutional. An input channel will now first convolve spatially only with each respective domain-specific basis to “absorb” domain variations, and then output channels are linearly combined using common basis coefficients trained to promote shared semantics across domains. We use toy examples, rigorous analysis, and real-world examples to show the framework’s effectiveness in cross-domain performance and domain adaptation. With the proposed architecture, we need only a small set of basis elements to model each additional domain, which brings a negligible amount of additional parameters, typically a few hundred.
Tasks	Domain Adaptation
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11285v1
PDF	https://arxiv.org/pdf/1909.11285v1.pdf
PWC	https://paperswithcode.com/paper/domain-invariant-learning-using-adaptive
Repo
Framework

EigenRank by Committee: A Data Subset Selection and Failure Prediction paradigm for Robust Deep Learning based Medical Image Segmentation


Title	EigenRank by Committee: A Data Subset Selection and Failure Prediction paradigm for Robust Deep Learning based Medical Image Segmentation
Authors	Bilwaj Gaonkar, Alex Bui, Luke Macyszyn
Abstract	Translation of fully automated deep learning based medical image segmentation technologies to clinical workflows face two main algorithmic challenges. The first, is the collection and archival of large quantities of manually annotated ground truth data for both training and validation. The second is the relative inability of the majority of deep learning based segmentation techniques to alert physicians to a likely segmentation failure. Here we propose a novel algorithm, named `Eigenrank’ which addresses both of these challenges. Eigenrank can select for manual labeling, a subset of medical images from a large database, such that a U-Net trained on this subset is superior to one trained on a randomly selected subset of the same size. Eigenrank can also be used to pick out, cases in a large database, where deep learning segmentation will fail. We present our algorithm, followed by results and a discussion of how Eigenrank exploits the Von Neumann information to perform both data subset selection and failure prediction for medical image segmentation using deep learning. \|
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2019-08-17
URL	https://arxiv.org/abs/1908.06337v1
PDF	https://arxiv.org/pdf/1908.06337v1.pdf
PWC	https://paperswithcode.com/paper/eigenrank-by-committee-a-data-subset
Repo
Framework

On the Convergence of ADMM with Task Adaption and Beyond


Title	On the Convergence of ADMM with Task Adaption and Beyond
Authors	Risheng Liu, Pan Mu, Jin Zhang
Abstract	Along with the development of learning and vision, Alternating Direction Method of Multiplier (ADMM) has become a popular algorithm for separable optimization model with linear constraint. However, the ADMM and its numerical variants (e.g., inexact, proximal or linearized) are awkward to obtain state-of-the-art performance when dealing with complex learning and vision tasks due to their weak task-adaption ability. Recently, there has been an increasing interest in incorporating task-specific computational modules (e.g., designed filters or learned architectures) into ADMM iterations. Unfortunately, these task-related modules introduce uncontrolled and unstable iterative flows, they also break the structures of the original optimization model. Therefore, existing theoretical investigations are invalid for these resulted task-specific iterations. In this paper, we develop a simple and generic proximal ADMM framework to incorporate flexible task-specific module for learning and vision problems. We rigorously prove the convergence both in objective function values and the constraint violation and provide the worst-case convergence rate measured by the iteration complexity. Our investigations not only develop new perspectives for analyzing task-adaptive ADMM but also supply meaningful guidelines on designing practical optimization methods for real-world applications. Numerical experiments are conducted to verify the theoretical results and demonstrate the efficiency of our algorithmic framework.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10819v1
PDF	https://arxiv.org/pdf/1909.10819v1.pdf
PWC	https://paperswithcode.com/paper/on-the-convergence-of-admm-with-task-adaption
Repo
Framework

Minimal Sample Subspace Learning: Theory and Algorithms


Title	Minimal Sample Subspace Learning: Theory and Algorithms
Authors	Zhenyue Zhang, Yuqing Xia
Abstract	Subspace segmentation or subspace learning is a challenging and complicated task in machine learning. This paper builds a primary frame and solid theoretical bases for the minimal subspace segmentation (MSS) of finite samples. Existence and conditional uniqueness of MSS are discussed with conditions generally satisfied in applications. Utilizing weak prior information of MSS, the minimality inspection of segments is further simplified to the prior detection of partitions. The MSS problem is then modeled as a computable optimization problem via self-expressiveness of samples. A closed form of representation matrices is first given for the self-expressiveness, and the connection of diagonal blocks is then addressed. The MSS model uses a rank restriction on the sum of segment ranks. Theoretically, it can retrieve the minimal sample subspaces that could be heavily intersected. The optimization problem is solved via a basic manifold conjugate gradient algorithm, alternative optimization and hybrid optimization, taking into account of solving both the primal MSS problem and its pseudo-dual problem. The MSS model is further modified for handling noisy data, and solved by an ADMM algorithm. The reported experiments show the strong ability of the MSS method on retrieving minimal sample subspaces that are heavily intersected.
Tasks
Published	2019-07-13
URL	https://arxiv.org/abs/1907.06032v2
PDF	https://arxiv.org/pdf/1907.06032v2.pdf
PWC	https://paperswithcode.com/paper/minimal-sample-subspace-learning-theory-and
Repo
Framework

FoodAI: Food Image Recognition via Deep Learning for Smart Food Logging


Title	FoodAI: Food Image Recognition via Deep Learning for Smart Food Logging
Authors	Doyen Sahoo, Wang Hao, Shu Ke, Wu Xiongwei, Hung Le, Palakorn Achananuparp, Ee-Peng Lim, Steven C. H. Hoi
Abstract	An important aspect of health monitoring is effective logging of food consumption. This can help management of diet-related diseases like obesity, diabetes, and even cardiovascular diseases. Moreover, food logging can help fitness enthusiasts, and people who wanting to achieve a target weight. However, food-logging is cumbersome, and requires not only taking additional effort to note down the food item consumed regularly, but also sufficient knowledge of the food item consumed (which is difficult due to the availability of a wide variety of cuisines). With increasing reliance on smart devices, we exploit the convenience offered through the use of smart phones and propose a smart-food logging system: FoodAI, which offers state-of-the-art deep-learning based image recognition capabilities. FoodAI has been developed in Singapore and is particularly focused on food items commonly consumed in Singapore. FoodAI models were trained on a corpus of 400,000 food images from 756 different classes. In this paper we present extensive analysis and insights into the development of this system. FoodAI has been deployed as an API service and is one of the components powering Healthy 365, a mobile app developed by Singapore’s Heath Promotion Board. We have over 100 registered organizations (universities, companies, start-ups) subscribing to this service and actively receive several API requests a day. FoodAI has made food logging convenient, aiding smart consumption and a healthy lifestyle.
Tasks
Published	2019-09-26
URL	https://arxiv.org/abs/1909.11946v1
PDF	https://arxiv.org/pdf/1909.11946v1.pdf
PWC	https://paperswithcode.com/paper/foodai-food-image-recognition-via-deep
Repo
Framework

Reinforcement Learning of Markov Decision Processes with Peak Constraints


Title	Reinforcement Learning of Markov Decision Processes with Peak Constraints
Authors	Ather Gattami
Abstract	In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take actions based on the observed states, reward outputs, and constraint-outputs, without any knowledge about the dynamics, reward functions, and/or the knowledge of the constraint-functions. We introduce a game theoretic approach to construct reinforcement learning algorithms where the agent maximizes an unconstrained objective that depends on the simulated action of the minimizing opponent which acts on a finite set of actions and the output data of the constraint functions (rewards). We show that the policies obtained from maximin Q-learning converge to the optimal policies. To the best of our knowledge, this is the first time learning algorithms guarantee convergence to optimal stationary policies for the MDP problem with peak constraints for both discounted and expected average rewards.
Tasks	Q-Learning
Published	2019-01-23
URL	https://arxiv.org/abs/1901.07839v2
PDF	https://arxiv.org/pdf/1901.07839v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-of-markov-decision
Repo
Framework

A novel Multiplicative Polynomial Kernel for Volterra series identification


Title	A novel Multiplicative Polynomial Kernel for Volterra series identification
Authors	Alberto Dalla Libera, Ruggero Carli, Gianluigi Pillonetto
Abstract	Volterra series are especially useful for nonlinear system identification, also thanks to their capability to approximate a broad range of input-output maps. However, their identification from a finite set of data is hard, due to the curse of dimensionality. Recent approaches have shown how regularized kernel-based methods can be useful for this task. In this paper, we propose a new regularization network for Volterra models identification. It relies on a new kernel given by the product of basic building blocks. Each block contains some unknown parameters that can be estimated from data using marginal likelihood optimization. In comparison with other algorithms proposed in the literature, numerical experiments show that our approach allows to better select the monomials that really influence the system output, much increasing the prediction capability of the model.
Tasks
Published	2019-05-20
URL	https://arxiv.org/abs/1905.07960v2
PDF	https://arxiv.org/pdf/1905.07960v2.pdf
PWC	https://paperswithcode.com/paper/a-novel-multiplicative-polynomial-kernel-for
Repo
Framework