Paper Group ANR 1616
Diagnosis of Pediatric Obstructive Sleep Apnea via Face Classification with Persistent Homology and Convolutional Neural Networks. ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation. Explain Your Move: Understanding Agent Actions Using Salient and Relevant Feature Attribution. Meta-Learning with Dynamic-Memory-Based Prototypical Network for …
Diagnosis of Pediatric Obstructive Sleep Apnea via Face Classification with Persistent Homology and Convolutional Neural Networks
Title | Diagnosis of Pediatric Obstructive Sleep Apnea via Face Classification with Persistent Homology and Convolutional Neural Networks |
Authors | Milad Kiaee, Adam B Kashlak, Jisu Kim, Giseon Heo |
Abstract | Obstructive sleep apnea is a serious condition causing a litany of health problems especially in the pediatric population. However, this chronic condition can be treated if diagnosis is possible. The gold standard for diagnosis is an overnight sleep study, which is often unobtainable by many potentially suffering from this condition. Hence, we attempt to develop a fast non-invasive diagnostic tool by training a classifier on 2D and 3D facial images of a patient to recognize facial features associated with obstructive sleep apnea. In this comparative study, we consider both persistent homology and geometric shape analysis from the field of computational topology as well as convolutional neural networks, a powerful method from deep learning whose success in image and specifically facial recognition has already been demonstrated by computer scientists. |
Tasks | |
Published | 2019-10-26 |
URL | https://arxiv.org/abs/1911.05628v1 |
https://arxiv.org/pdf/1911.05628v1.pdf | |
PWC | https://paperswithcode.com/paper/diagnosis-of-pediatric-obstructive-sleep |
Repo | |
Framework | |
ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation
Title | ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation |
Authors | Catherine Capellen, Max Schwarz, Sven Behnke |
Abstract | 6D object pose estimation is a prerequisite for many applications. In recent years, monocular pose estimation has attracted much research interest because it does not need depth measurements. In this work, we introduce ConvPoseCNN, a fully convolutional architecture that avoids cutting out individual objects. Instead we propose pixel-wise, dense prediction of both translation and orientation components of the object pose, where the dense orientation is represented in Quaternion form. We present different approaches for aggregation of the dense orientation predictions, including averaging and clustering schemes. We evaluate ConvPoseCNN on the challenging YCB-Video Dataset, where we show that the approach has far fewer parameters and trains faster than comparable methods without sacrificing accuracy. Furthermore, our results indicate that the dense orientation prediction implicitly learns to attend to trustworthy, occlusion-free, and feature-rich object regions. |
Tasks | 6D Pose Estimation using RGB, Pose Estimation |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07333v1 |
https://arxiv.org/pdf/1912.07333v1.pdf | |
PWC | https://paperswithcode.com/paper/convposecnn-dense-convolutional-6d-object |
Repo | |
Framework | |
Explain Your Move: Understanding Agent Actions Using Salient and Relevant Feature Attribution
Title | Explain Your Move: Understanding Agent Actions Using Salient and Relevant Feature Attribution |
Authors | Nikaash Puri, Sukriti Verma, Piyush Gupta, Dhruv Kayastha, Shripad Deshmukh, Balaji Krishnamurthy, Sameer Singh |
Abstract | As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our approach, SARFA generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweighs irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare SARFA with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that SARFA generates saliency maps that are more interpretable for humans than existing approaches. For the code release and demo videos, see: https://nikaashpuri.github.io/sarfa-saliency/. |
Tasks | Atari Games, Board Games |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.12191v3 |
https://arxiv.org/pdf/1912.12191v3.pdf | |
PWC | https://paperswithcode.com/paper/explain-your-move-understanding-agent-actions-1 |
Repo | |
Framework | |
Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection
Title | Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection |
Authors | Shumin Deng, Ningyu Zhang, Jiaojian Kang, Yichi Zhang, Wei Zhang, Huajun Chen |
Abstract | Event detection (ED), a sub-task of event extraction, involves identifying triggers and categorizing event mentions. Existing methods primarily rely upon supervised learning and require large-scale labeled event datasets which are unfortunately not readily available in many real-life applications. In this paper, we consider and reformulate the ED task with limited labeled data as a Few-Shot Learning problem. We propose a Dynamic-Memory-Based Prototypical Network (DMB-PN), which exploits Dynamic Memory Network (DMN) to not only learn better prototypes for event types, but also produce more robust sentence encodings for event mentions. Differing from vanilla prototypical networks simply computing event prototypes by averaging, which only consume event mentions once, our model is more robust and is capable of distilling contextual information from event mentions for multiple times due to the multi-hop mechanism of DMNs. The experiments show that DMB-PN not only deals with sample scarcity better than a series of baseline models but also performs more robustly when the variety of event types is relatively large and the instance quantity is extremely small. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11621v2 |
https://arxiv.org/pdf/1910.11621v2.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-with-dynamic-memory-based |
Repo | |
Framework | |
A Seft-adaptive Multicellular GEP Algorithm Based On Fuzzy Control For Function Optimization
Title | A Seft-adaptive Multicellular GEP Algorithm Based On Fuzzy Control For Function Optimization |
Authors | Chuyan Deng, Yuzhong Peng, Hongya Li, Daoqing Gong, Hao Zhang, Zhiping Liu |
Abstract | To improve the global optimization ability of traditional GEP algorithm, a Multicellular gene expression programming algorithm based on fuzzy control (Multicellular GEP Algorithm Based On Fuzzy Control, MGEP-FC) is proposed. The MGEP-FC algorithm describes the size of cross rate, mutation rate and real number mutation rate by constructing fuzzy membership function. According to the concentration and dispersion of individual fitness values in population, the crossover rate, mutation rate and real number set mutation rate of genetic operation are dynamically adjusted. In order to make the diversity of the population continue in the iterative process, a new genetic operation scheme is designed, which combines the new individuals with the parent population to build a temporary population, and the diversity of the temporary and subpopulation are optimized. The results of 12 Benchmark optimization experiments show that the MGEP-FC algorithm has been greatly improved in stability, global convergence and optimization speed. |
Tasks | |
Published | 2019-04-01 |
URL | https://arxiv.org/abs/1906.08851v1 |
https://arxiv.org/pdf/1906.08851v1.pdf | |
PWC | https://paperswithcode.com/paper/a-seft-adaptive-multicellular-gep-algorithm |
Repo | |
Framework | |
Bayesian Tensorized Neural Networks with Automatic Rank Selection
Title | Bayesian Tensorized Neural Networks with Automatic Rank Selection |
Authors | Cole Hawkins, Zheng Zhang |
Abstract | Tensor decomposition is an effective approach to compress over-parameterized neural networks and to enable their deployment on resource-constrained hardware platforms. However, directly applying tensor compression in the training process is a challenging task due to the difficulty of choosing a proper tensor rank. In order to achieve this goal, this paper proposes a Bayesian tensorized neural network. Our Bayesian method performs automatic model compression via an adaptive tensor rank determination. We also present approaches for posterior density calculation and maximum a posteriori (MAP) estimation for the end-to-end training of our tensorized neural network. We provide experimental validation on a fully connected neural network, a CNN and a residual neural network where our work produces $7.4\times$ to $137\times$ more compact neural networks directly from the training. |
Tasks | Model Compression |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10478v1 |
https://arxiv.org/pdf/1905.10478v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-tensorized-neural-networks-with |
Repo | |
Framework | |
Stereo relative pose from line and point feature triplets
Title | Stereo relative pose from line and point feature triplets |
Authors | Alexander Vakhitov, Victor Lempitsky, Yinqiang Zheng |
Abstract | Stereo relative pose problem lies at the core of stereo visual odometry systems that are used in many applications. In this work, we present two minimal solvers for the stereo relative pose. We specifically consider the case when a minimal set consists of three point or line features and each of them has three known projections on two stereo cameras. We validate the importance of this formulation for practical purposes in our experiments with motion estimation. We then present a complete classification of minimal cases with three point or line correspondences each having three projections, and present two new solvers that can handle all such cases. We demonstrate a considerable effect from the integration of the new solvers into a visual SLAM system. |
Tasks | Motion Estimation, Visual Odometry |
Published | 2019-06-29 |
URL | https://arxiv.org/abs/1907.00276v1 |
https://arxiv.org/pdf/1907.00276v1.pdf | |
PWC | https://paperswithcode.com/paper/stereo-relative-pose-from-line-and-point-1 |
Repo | |
Framework | |
Extracting Tables from Documents using Conditional Generative Adversarial Networks and Genetic Algorithms
Title | Extracting Tables from Documents using Conditional Generative Adversarial Networks and Genetic Algorithms |
Authors | Nataliya Le Vine, Matthew Zeigenfuse, Mark Rowan |
Abstract | Extracting information from tables in documents presents a significant challenge in many industries and in academic research. Existing methods which take a bottom-up approach of integrating lines into cells and rows or columns neglect the available prior information relating to table structure. Our proposed method takes a top-down approach, first using a generative adversarial network to map a table image into a standardised `skeleton’ table form denoting the approximate row and column borders without table content, then fitting renderings of candidate latent table structures to the skeleton structure using a distance measure optimised by a genetic algorithm. | |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01947v1 |
http://arxiv.org/pdf/1904.01947v1.pdf | |
PWC | https://paperswithcode.com/paper/extracting-tables-from-documents-using |
Repo | |
Framework | |
Regularize, Expand and Compress: Multi-task based Lifelong Learning via NonExpansive AutoML
Title | Regularize, Expand and Compress: Multi-task based Lifelong Learning via NonExpansive AutoML |
Authors | Jie Zhang, Junting Zhang, Shalini Ghosh, Dawei Li, Jingwen Zhu, Heming Zhang, Yalin Wang |
Abstract | Lifelong learning, the problem of continual learning where tasks arrive in sequence, has been lately attracting more attention in the computer vision community. The aim of lifelong learning is to develop a system that can learn new tasks while maintaining the performance on the previously learned tasks. However, there are two obstacles for lifelong learning of deep neural networks: catastrophic forgetting and capacity limitation. To solve the above issues, inspired by the recent breakthroughs in automatically learning good neural network architectures, we develop a Multi-task based lifelong learning via nonexpansive AutoML framework termed Regularize, Expand and Compress (REC). REC is composed of three stages: 1) continually learns the sequential tasks without the learned tasks’ data via a newly proposed multi-task weight consolidation (MWC) algorithm; 2) expands the network to help the lifelong learning with potentially improved model capability and performance by network-transformation based AutoML; 3) compresses the expanded model after learning every new task to maintain model efficiency and performance. The proposed MWC and REC algorithms achieve superior performance over other lifelong learning algorithms on four different datasets. |
Tasks | AutoML, Continual Learning |
Published | 2019-03-20 |
URL | http://arxiv.org/abs/1903.08362v1 |
http://arxiv.org/pdf/1903.08362v1.pdf | |
PWC | https://paperswithcode.com/paper/regularize-expand-and-compress-multi-task |
Repo | |
Framework | |
A Large-Scale Deep Architecture for Personalized Grocery Basket Recommendations
Title | A Large-Scale Deep Architecture for Personalized Grocery Basket Recommendations |
Authors | Aditya Mantha, Yokila Arora, Shubham Gupta, Praveenkumar Kanumala, Zhiwei Liu, Stephen Guo, Kannan Achan |
Abstract | With growing consumer adoption of online grocery shopping through platforms such as Amazon Fresh, Instacart, and Walmart Grocery, there is a pressing business need to provide relevant recommendations throughout the customer journey. In this paper, we introduce a production within-basket grocery recommendation system, RTT2Vec, which generates real-time personalized product recommendations to supplement the user’s current grocery basket. We conduct extensive offline evaluation of our system and demonstrate a 9.4% uplift in prediction metrics over baseline state-of-the-art within-basket recommendation models. We also propose an approximate inference technique 11.6x times faster than exact inference approaches. In production, our system has resulted in an increase in average basket size, improved product discovery, and enabled faster user check-out |
Tasks | |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.12757v3 |
https://arxiv.org/pdf/1910.12757v3.pdf | |
PWC | https://paperswithcode.com/paper/a-large-scale-deep-architecture-for |
Repo | |
Framework | |
Single-Stage 6D Object Pose Estimation
Title | Single-Stage 6D Object Pose Estimation |
Authors | Yinlin Hu, Pascal Fua, Wei Wang, Mathieu Salzmann |
Abstract | Most recent 6D pose estimation frameworks first rely on a deep network to establish correspondences between 3D object keypoints and 2D image locations and then use a variant of a RANSAC-based Perspective-n-Point (PnP) algorithm. This two-stage process, however, is suboptimal: First, it is not end-to-end trainable. Second, training the deep network relies on a surrogate loss that does not directly reflect the final 6D pose estimation task. In this work, we introduce a deep architecture that directly regresses 6D poses from correspondences. It takes as input a group of candidate correspondences for each 3D keypoint and accounts for the fact that the order of the correspondences within each group is irrelevant, while the order of the groups, that is, of the 3D keypoints, is fixed. Our architecture is generic and can thus be exploited in conjunction with existing correspondence-extraction networks so as to yield single-stage 6D pose estimation frameworks. Our experiments demonstrate that these single-stage frameworks consistently outperform their two-stage counterparts in terms of both accuracy and speed. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08324v2 |
https://arxiv.org/pdf/1911.08324v2.pdf | |
PWC | https://paperswithcode.com/paper/single-stage-6d-object-pose-estimation |
Repo | |
Framework | |
Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction
Title | Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction |
Authors | Pedro Castro, Anil Armagan, Tae-Kyun Kim |
Abstract | Current 6D object pose methods consist of deep CNN models fully optimized for a single object but with its architecture standardized among objects with different shapes. In contrast to previous works, we explicitly exploit each object’s distinct topological information i.e. 3D dense meshes in the pose estimation model, with an automated process and prior to any post-processing refinement stage. In order to achieve this, we propose a learning framework in which a Graph Convolutional Neural Network reconstructs a pose conditioned 3D mesh of the object. A robust estimation of the allocentric orientation is recovered by computing, in a differentiable manner, the Procrustes’ alignment between the canonical and reconstructed dense 3D meshes. 6D egocentric pose is then lifted using additional mask and 2D centroid projection estimations. Our method is capable of self validating its pose estimation by measuring the quality of the reconstructed mesh, which is invaluable in real life applications. In our experiments on the LINEMOD, OCCLUSION and YCB-Video benchmarks, the proposed method outperforms state-of-the-arts. |
Tasks | 6D Pose Estimation using RGB, Pose Estimation |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10653v1 |
https://arxiv.org/pdf/1910.10653v1.pdf | |
PWC | https://paperswithcode.com/paper/accurate-6d-object-pose-estimation-by-pose |
Repo | |
Framework | |
Duty to Warn in Strategic Games
Title | Duty to Warn in Strategic Games |
Authors | Pavel Naumov, Jia Tao |
Abstract | The paper investigates the second-order blameworthiness or duty to warn modality “one coalition knew how another coalition could have prevented an outcome”. The main technical result is a sound and complete logical system that describes the interplay between the distributed knowledge and the duty to warn modalities. |
Tasks | |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1912.02759v2 |
https://arxiv.org/pdf/1912.02759v2.pdf | |
PWC | https://paperswithcode.com/paper/duty-to-warn-in-strategic-games |
Repo | |
Framework | |
Meta Learning for End-to-End Low-Resource Speech Recognition
Title | Meta Learning for End-to-End Low-Resource Speech Recognition |
Authors | Jui-Yang Hsu, Yuan-Jui Chen, Hung-yi Lee |
Abstract | In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML’s model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications. |
Tasks | Meta-Learning, Speech Recognition |
Published | 2019-10-26 |
URL | https://arxiv.org/abs/1910.12094v1 |
https://arxiv.org/pdf/1910.12094v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-for-end-to-end-low-resource |
Repo | |
Framework | |
Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective
Title | Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective |
Authors | Zhen Cheng, Zaixiang Zheng, Xin-Yu Dai, Shujian Huang, Jiajun Chen |
Abstract | Natural Language Inference (NLI) aims to determine the logic relationships (i.e., entailment, neutral and contradiction) between a pair of premise and hypothesis. Recently, the alignment mechanism effectively helps NLI by capturing the aligned parts (i.e., the similar segments) in the sentence pairs, which imply the perspective of entailment and contradiction. However, these aligned parts will sometimes mislead the judgment of neutral relations. Intuitively, NLI should rely more on multiple perspectives to form a holistic view to eliminate bias. In this paper, we propose the Multi-Perspective Inferrer (MPI), a novel NLI model that reasons relationships from multiple perspectives associated with the three relationships. The MPI determines the perspectives of different parts of the sentences via a routing-by-agreement policy and makes the final decision from a holistic view. Additionally, we introduce an auxiliary supervised signal to ensure the MPI to learn the expected perspectives. Experiments on SNLI and MultiNLI show that 1) the MPI achieves substantial improvements on the base model, which verifies the motivation of multi-perspective inference; 2) visualized evidence verifies that the MPI learns highly interpretable perspectives as expected; 3) more importantly, the MPI is architecture-free and compatible with the powerful BERT. |
Tasks | Natural Language Inference |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.03668v1 |
https://arxiv.org/pdf/1911.03668v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-perspective-inferrer-reasoning |
Repo | |
Framework | |