January 25, 2020

2805 words 14 mins read

Paper Group ANR 1616

Diagnosis of Pediatric Obstructive Sleep Apnea via Face Classification with Persistent Homology and Convolutional Neural Networks. ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation. Explain Your Move: Understanding Agent Actions Using Salient and Relevant Feature Attribution. Meta-Learning with Dynamic-Memory-Based Prototypical Network for …

Diagnosis of Pediatric Obstructive Sleep Apnea via Face Classification with Persistent Homology and Convolutional Neural Networks


Title	Diagnosis of Pediatric Obstructive Sleep Apnea via Face Classification with Persistent Homology and Convolutional Neural Networks
Authors	Milad Kiaee, Adam B Kashlak, Jisu Kim, Giseon Heo
Abstract	Obstructive sleep apnea is a serious condition causing a litany of health problems especially in the pediatric population. However, this chronic condition can be treated if diagnosis is possible. The gold standard for diagnosis is an overnight sleep study, which is often unobtainable by many potentially suffering from this condition. Hence, we attempt to develop a fast non-invasive diagnostic tool by training a classifier on 2D and 3D facial images of a patient to recognize facial features associated with obstructive sleep apnea. In this comparative study, we consider both persistent homology and geometric shape analysis from the field of computational topology as well as convolutional neural networks, a powerful method from deep learning whose success in image and specifically facial recognition has already been demonstrated by computer scientists.
Tasks
Published	2019-10-26
URL	https://arxiv.org/abs/1911.05628v1
PDF	https://arxiv.org/pdf/1911.05628v1.pdf
PWC	https://paperswithcode.com/paper/diagnosis-of-pediatric-obstructive-sleep
Repo
Framework

ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation


Title	ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation
Authors	Catherine Capellen, Max Schwarz, Sven Behnke
Abstract	6D object pose estimation is a prerequisite for many applications. In recent years, monocular pose estimation has attracted much research interest because it does not need depth measurements. In this work, we introduce ConvPoseCNN, a fully convolutional architecture that avoids cutting out individual objects. Instead we propose pixel-wise, dense prediction of both translation and orientation components of the object pose, where the dense orientation is represented in Quaternion form. We present different approaches for aggregation of the dense orientation predictions, including averaging and clustering schemes. We evaluate ConvPoseCNN on the challenging YCB-Video Dataset, where we show that the approach has far fewer parameters and trains faster than comparable methods without sacrificing accuracy. Furthermore, our results indicate that the dense orientation prediction implicitly learns to attend to trustworthy, occlusion-free, and feature-rich object regions.
Tasks	6D Pose Estimation using RGB, Pose Estimation
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07333v1
PDF	https://arxiv.org/pdf/1912.07333v1.pdf
PWC	https://paperswithcode.com/paper/convposecnn-dense-convolutional-6d-object
Repo
Framework

Explain Your Move: Understanding Agent Actions Using Salient and Relevant Feature Attribution


Title	Explain Your Move: Understanding Agent Actions Using Salient and Relevant Feature Attribution
Authors	Nikaash Puri, Sukriti Verma, Piyush Gupta, Dhruv Kayastha, Shripad Deshmukh, Balaji Krishnamurthy, Sameer Singh
Abstract	As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our approach, SARFA generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweighs irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare SARFA with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that SARFA generates saliency maps that are more interpretable for humans than existing approaches. For the code release and demo videos, see: https://nikaashpuri.github.io/sarfa-saliency/.
Tasks	Atari Games, Board Games
Published	2019-12-23
URL	https://arxiv.org/abs/1912.12191v3
PDF	https://arxiv.org/pdf/1912.12191v3.pdf
PWC	https://paperswithcode.com/paper/explain-your-move-understanding-agent-actions-1
Repo
Framework

Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection


Title	Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection
Authors	Shumin Deng, Ningyu Zhang, Jiaojian Kang, Yichi Zhang, Wei Zhang, Huajun Chen
Abstract	Event detection (ED), a sub-task of event extraction, involves identifying triggers and categorizing event mentions. Existing methods primarily rely upon supervised learning and require large-scale labeled event datasets which are unfortunately not readily available in many real-life applications. In this paper, we consider and reformulate the ED task with limited labeled data as a Few-Shot Learning problem. We propose a Dynamic-Memory-Based Prototypical Network (DMB-PN), which exploits Dynamic Memory Network (DMN) to not only learn better prototypes for event types, but also produce more robust sentence encodings for event mentions. Differing from vanilla prototypical networks simply computing event prototypes by averaging, which only consume event mentions once, our model is more robust and is capable of distilling contextual information from event mentions for multiple times due to the multi-hop mechanism of DMNs. The experiments show that DMB-PN not only deals with sample scarcity better than a series of baseline models but also performs more robustly when the variety of event types is relatively large and the instance quantity is extremely small.
Tasks	Few-Shot Learning, Meta-Learning
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11621v2
PDF	https://arxiv.org/pdf/1910.11621v2.pdf
PWC	https://paperswithcode.com/paper/meta-learning-with-dynamic-memory-based
Repo
Framework

A Seft-adaptive Multicellular GEP Algorithm Based On Fuzzy Control For Function Optimization


Title	A Seft-adaptive Multicellular GEP Algorithm Based On Fuzzy Control For Function Optimization
Authors	Chuyan Deng, Yuzhong Peng, Hongya Li, Daoqing Gong, Hao Zhang, Zhiping Liu
Abstract	To improve the global optimization ability of traditional GEP algorithm, a Multicellular gene expression programming algorithm based on fuzzy control (Multicellular GEP Algorithm Based On Fuzzy Control, MGEP-FC) is proposed. The MGEP-FC algorithm describes the size of cross rate, mutation rate and real number mutation rate by constructing fuzzy membership function. According to the concentration and dispersion of individual fitness values in population, the crossover rate, mutation rate and real number set mutation rate of genetic operation are dynamically adjusted. In order to make the diversity of the population continue in the iterative process, a new genetic operation scheme is designed, which combines the new individuals with the parent population to build a temporary population, and the diversity of the temporary and subpopulation are optimized. The results of 12 Benchmark optimization experiments show that the MGEP-FC algorithm has been greatly improved in stability, global convergence and optimization speed.
Tasks
Published	2019-04-01
URL	https://arxiv.org/abs/1906.08851v1
PDF	https://arxiv.org/pdf/1906.08851v1.pdf
PWC	https://paperswithcode.com/paper/a-seft-adaptive-multicellular-gep-algorithm
Repo
Framework

Bayesian Tensorized Neural Networks with Automatic Rank Selection


Title	Bayesian Tensorized Neural Networks with Automatic Rank Selection
Authors	Cole Hawkins, Zheng Zhang
Abstract	Tensor decomposition is an effective approach to compress over-parameterized neural networks and to enable their deployment on resource-constrained hardware platforms. However, directly applying tensor compression in the training process is a challenging task due to the difficulty of choosing a proper tensor rank. In order to achieve this goal, this paper proposes a Bayesian tensorized neural network. Our Bayesian method performs automatic model compression via an adaptive tensor rank determination. We also present approaches for posterior density calculation and maximum a posteriori (MAP) estimation for the end-to-end training of our tensorized neural network. We provide experimental validation on a fully connected neural network, a CNN and a residual neural network where our work produces $7.4\times$ to $137\times$ more compact neural networks directly from the training.
Tasks	Model Compression
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10478v1
PDF	https://arxiv.org/pdf/1905.10478v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-tensorized-neural-networks-with
Repo
Framework

Stereo relative pose from line and point feature triplets


Title	Stereo relative pose from line and point feature triplets
Authors	Alexander Vakhitov, Victor Lempitsky, Yinqiang Zheng
Abstract	Stereo relative pose problem lies at the core of stereo visual odometry systems that are used in many applications. In this work, we present two minimal solvers for the stereo relative pose. We specifically consider the case when a minimal set consists of three point or line features and each of them has three known projections on two stereo cameras. We validate the importance of this formulation for practical purposes in our experiments with motion estimation. We then present a complete classification of minimal cases with three point or line correspondences each having three projections, and present two new solvers that can handle all such cases. We demonstrate a considerable effect from the integration of the new solvers into a visual SLAM system.
Tasks	Motion Estimation, Visual Odometry
Published	2019-06-29
URL	https://arxiv.org/abs/1907.00276v1
PDF	https://arxiv.org/pdf/1907.00276v1.pdf
PWC	https://paperswithcode.com/paper/stereo-relative-pose-from-line-and-point-1
Repo
Framework

Extracting Tables from Documents using Conditional Generative Adversarial Networks and Genetic Algorithms


Title	Extracting Tables from Documents using Conditional Generative Adversarial Networks and Genetic Algorithms
Authors	Nataliya Le Vine, Matthew Zeigenfuse, Mark Rowan
Abstract	Extracting information from tables in documents presents a significant challenge in many industries and in academic research. Existing methods which take a bottom-up approach of integrating lines into cells and rows or columns neglect the available prior information relating to table structure. Our proposed method takes a top-down approach, first using a generative adversarial network to map a table image into a standardised `skeleton’ table form denoting the approximate row and column borders without table content, then fitting renderings of candidate latent table structures to the skeleton structure using a distance measure optimised by a genetic algorithm. \|
Tasks
Published	2019-04-03
URL	http://arxiv.org/abs/1904.01947v1
PDF	http://arxiv.org/pdf/1904.01947v1.pdf
PWC	https://paperswithcode.com/paper/extracting-tables-from-documents-using
Repo
Framework

Regularize, Expand and Compress: Multi-task based Lifelong Learning via NonExpansive AutoML


Title	Regularize, Expand and Compress: Multi-task based Lifelong Learning via NonExpansive AutoML
Authors	Jie Zhang, Junting Zhang, Shalini Ghosh, Dawei Li, Jingwen Zhu, Heming Zhang, Yalin Wang
Abstract	Lifelong learning, the problem of continual learning where tasks arrive in sequence, has been lately attracting more attention in the computer vision community. The aim of lifelong learning is to develop a system that can learn new tasks while maintaining the performance on the previously learned tasks. However, there are two obstacles for lifelong learning of deep neural networks: catastrophic forgetting and capacity limitation. To solve the above issues, inspired by the recent breakthroughs in automatically learning good neural network architectures, we develop a Multi-task based lifelong learning via nonexpansive AutoML framework termed Regularize, Expand and Compress (REC). REC is composed of three stages: 1) continually learns the sequential tasks without the learned tasks’ data via a newly proposed multi-task weight consolidation (MWC) algorithm; 2) expands the network to help the lifelong learning with potentially improved model capability and performance by network-transformation based AutoML; 3) compresses the expanded model after learning every new task to maintain model efficiency and performance. The proposed MWC and REC algorithms achieve superior performance over other lifelong learning algorithms on four different datasets.
Tasks	AutoML, Continual Learning
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08362v1
PDF	http://arxiv.org/pdf/1903.08362v1.pdf
PWC	https://paperswithcode.com/paper/regularize-expand-and-compress-multi-task
Repo
Framework

A Large-Scale Deep Architecture for Personalized Grocery Basket Recommendations


Title	A Large-Scale Deep Architecture for Personalized Grocery Basket Recommendations
Authors	Aditya Mantha, Yokila Arora, Shubham Gupta, Praveenkumar Kanumala, Zhiwei Liu, Stephen Guo, Kannan Achan
Abstract	With growing consumer adoption of online grocery shopping through platforms such as Amazon Fresh, Instacart, and Walmart Grocery, there is a pressing business need to provide relevant recommendations throughout the customer journey. In this paper, we introduce a production within-basket grocery recommendation system, RTT2Vec, which generates real-time personalized product recommendations to supplement the user’s current grocery basket. We conduct extensive offline evaluation of our system and demonstrate a 9.4% uplift in prediction metrics over baseline state-of-the-art within-basket recommendation models. We also propose an approximate inference technique 11.6x times faster than exact inference approaches. In production, our system has resulted in an increase in average basket size, improved product discovery, and enabled faster user check-out
Tasks
Published	2019-10-24
URL	https://arxiv.org/abs/1910.12757v3
PDF	https://arxiv.org/pdf/1910.12757v3.pdf
PWC	https://paperswithcode.com/paper/a-large-scale-deep-architecture-for
Repo
Framework

Single-Stage 6D Object Pose Estimation


Title	Single-Stage 6D Object Pose Estimation
Authors	Yinlin Hu, Pascal Fua, Wei Wang, Mathieu Salzmann
Abstract	Most recent 6D pose estimation frameworks first rely on a deep network to establish correspondences between 3D object keypoints and 2D image locations and then use a variant of a RANSAC-based Perspective-n-Point (PnP) algorithm. This two-stage process, however, is suboptimal: First, it is not end-to-end trainable. Second, training the deep network relies on a surrogate loss that does not directly reflect the final 6D pose estimation task. In this work, we introduce a deep architecture that directly regresses 6D poses from correspondences. It takes as input a group of candidate correspondences for each 3D keypoint and accounts for the fact that the order of the correspondences within each group is irrelevant, while the order of the groups, that is, of the 3D keypoints, is fixed. Our architecture is generic and can thus be exploited in conjunction with existing correspondence-extraction networks so as to yield single-stage 6D pose estimation frameworks. Our experiments demonstrate that these single-stage frameworks consistently outperform their two-stage counterparts in terms of both accuracy and speed.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08324v2
PDF	https://arxiv.org/pdf/1911.08324v2.pdf
PWC	https://paperswithcode.com/paper/single-stage-6d-object-pose-estimation
Repo
Framework

Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction


Title	Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction
Authors	Pedro Castro, Anil Armagan, Tae-Kyun Kim
Abstract	Current 6D object pose methods consist of deep CNN models fully optimized for a single object but with its architecture standardized among objects with different shapes. In contrast to previous works, we explicitly exploit each object’s distinct topological information i.e. 3D dense meshes in the pose estimation model, with an automated process and prior to any post-processing refinement stage. In order to achieve this, we propose a learning framework in which a Graph Convolutional Neural Network reconstructs a pose conditioned 3D mesh of the object. A robust estimation of the allocentric orientation is recovered by computing, in a differentiable manner, the Procrustes’ alignment between the canonical and reconstructed dense 3D meshes. 6D egocentric pose is then lifted using additional mask and 2D centroid projection estimations. Our method is capable of self validating its pose estimation by measuring the quality of the reconstructed mesh, which is invaluable in real life applications. In our experiments on the LINEMOD, OCCLUSION and YCB-Video benchmarks, the proposed method outperforms state-of-the-arts.
Tasks	6D Pose Estimation using RGB, Pose Estimation
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10653v1
PDF	https://arxiv.org/pdf/1910.10653v1.pdf
PWC	https://paperswithcode.com/paper/accurate-6d-object-pose-estimation-by-pose
Repo
Framework

Duty to Warn in Strategic Games


Title	Duty to Warn in Strategic Games
Authors	Pavel Naumov, Jia Tao
Abstract	The paper investigates the second-order blameworthiness or duty to warn modality “one coalition knew how another coalition could have prevented an outcome”. The main technical result is a sound and complete logical system that describes the interplay between the distributed knowledge and the duty to warn modalities.
Tasks
Published	2019-11-08
URL	https://arxiv.org/abs/1912.02759v2
PDF	https://arxiv.org/pdf/1912.02759v2.pdf
PWC	https://paperswithcode.com/paper/duty-to-warn-in-strategic-games
Repo
Framework

Meta Learning for End-to-End Low-Resource Speech Recognition


Title	Meta Learning for End-to-End Low-Resource Speech Recognition
Authors	Jui-Yang Hsu, Yuan-Jui Chen, Hung-yi Lee
Abstract	In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML’s model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
Tasks	Meta-Learning, Speech Recognition
Published	2019-10-26
URL	https://arxiv.org/abs/1910.12094v1
PDF	https://arxiv.org/pdf/1910.12094v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-for-end-to-end-low-resource
Repo
Framework

Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective


Title	Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective
Authors	Zhen Cheng, Zaixiang Zheng, Xin-Yu Dai, Shujian Huang, Jiajun Chen
Abstract	Natural Language Inference (NLI) aims to determine the logic relationships (i.e., entailment, neutral and contradiction) between a pair of premise and hypothesis. Recently, the alignment mechanism effectively helps NLI by capturing the aligned parts (i.e., the similar segments) in the sentence pairs, which imply the perspective of entailment and contradiction. However, these aligned parts will sometimes mislead the judgment of neutral relations. Intuitively, NLI should rely more on multiple perspectives to form a holistic view to eliminate bias. In this paper, we propose the Multi-Perspective Inferrer (MPI), a novel NLI model that reasons relationships from multiple perspectives associated with the three relationships. The MPI determines the perspectives of different parts of the sentences via a routing-by-agreement policy and makes the final decision from a holistic view. Additionally, we introduce an auxiliary supervised signal to ensure the MPI to learn the expected perspectives. Experiments on SNLI and MultiNLI show that 1) the MPI achieves substantial improvements on the base model, which verifies the motivation of multi-perspective inference; 2) visualized evidence verifies that the MPI learns highly interpretable perspectives as expected; 3) more importantly, the MPI is architecture-free and compatible with the powerful BERT.
Tasks	Natural Language Inference
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03668v1
PDF	https://arxiv.org/pdf/1911.03668v1.pdf
PWC	https://paperswithcode.com/paper/multi-perspective-inferrer-reasoning
Repo
Framework