February 1, 2020

3172 words 15 mins read

Paper Group AWR 257

Paper Group AWR 257

Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis. Probabilistic Models of Relational Implication. Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition. Revisiting Semantic Representation and Tree Search for Similar Question Retrieval. Towards More Accurate Automatic Sleep …

Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis

Title Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis
Authors Yueming Jin, Huaxia Li, Qi Dou, Hao Chen, Jing Qin, Chi-Wing Fu, Pheng-Ann Heng
Abstract Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis and also very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition). The code can be found on our project website.
Tasks
Published 2019-07-13
URL https://arxiv.org/abs/1907.06099v1
PDF https://arxiv.org/pdf/1907.06099v1.pdf
PWC https://paperswithcode.com/paper/multi-task-recurrent-convolutional-network
Repo https://github.com/YuemingJin/MTRCNet-CL
Framework pytorch

Probabilistic Models of Relational Implication

Title Probabilistic Models of Relational Implication
Authors Xavier Holt
Abstract Relational data in its most basic form is a static collection of known facts. However, by learning to infer and deduct additional information and structure, we can massively increase the usefulness of the underlying data. One common form of inferential reasoning in knowledge bases is implication discovery. Here, by learning when one relation implies another, we can extend our knowledge representation. There are several existing models for relational implication, however we argue they are motivated but not principled. To this end, we define a formal probabilistic model of relational implication. By using estimators based on the empirical distribution of our dataset, we demonstrate that our model outperforms existing approaches. While previous work achieves a best score of 0.7812 AUC on an evaluatory dataset, our ProbE model improves this to 0.7915. Furthermore, we demonstrate that our model can be improved substantially through the use of link prediction models and dense latent representations of the underlying argument and relations. This variant, denoted ProbL, improves the state of the art on our evaluation dataset to 0.8143. In addition to developing a new framework and providing novel scores of relational implication, we provide two pragmatic resources to assist future research. First, we motivate and develop an improved crowd framework for constructing labelled datasets of relational implication. Using this, we reannotate and make public a dataset comprised of 17,848 instances of labelled relational implication. We demonstrate that precision (as evaluated by expert consensus with the crowd labels) on the resulting dataset improves from 53% to 95%.
Tasks Link Prediction
Published 2019-07-28
URL https://arxiv.org/abs/1907.12048v1
PDF https://arxiv.org/pdf/1907.12048v1.pdf
PWC https://paperswithcode.com/paper/probabilistic-models-of-relational
Repo https://github.com/xavi-ai/relational-implication-dataset
Framework none

Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition

Title Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition
Authors Diogo C Luvizon, Hedi Tabia, David Picard
Abstract Human pose estimation and action recognition are related tasks since both problems are strongly dependent on the human body representation and analysis. Nonetheless, most recent methods in the literature handle the two problems separately. In this work, we propose a multi-task framework for jointly estimating 2D or 3D human poses from monocular color images and classifying human actions from video sequences. We show that a single architecture can be used to solve both problems in an efficient way and still achieves state-of-the-art or comparable results at each task while running at more than 100 frames per second. The proposed method benefits from high parameters sharing between the two tasks by unifying still images and video clips processing in a single pipeline, allowing the model to be trained with data from different categories simultaneously and in a seamlessly way. Additionally, we provide important insights for end-to-end training the proposed multi-task model by decoupling key prediction parts, which consistently leads to better accuracy on both tasks. The reported results on four datasets (MPII, Human3.6M, Penn Action and NTU RGB+D) demonstrate the effectiveness of our method on the targeted tasks. Our source code and trained weights are publicly available at https://github.com/dluvizon/deephar.
Tasks 3D Human Pose Estimation, Pose Estimation
Published 2019-12-15
URL https://arxiv.org/abs/1912.08077v2
PDF https://arxiv.org/pdf/1912.08077v2.pdf
PWC https://paperswithcode.com/paper/multi-task-deep-learning-for-real-time-3d
Repo https://github.com/fdu-wuyuan/Siren
Framework none

Revisiting Semantic Representation and Tree Search for Similar Question Retrieval

Title Revisiting Semantic Representation and Tree Search for Similar Question Retrieval
Authors Tong Guo, Huilin Gao
Abstract This paper studies the performances of BERT combined with tree structure in short sentence ranking task. In retrieval-based question answering system, we retrieve the most similar question of the query question by ranking all the questions in datasets. If we want to rank all the sentences by neural rankers, we need to score all the sentence pairs. However it consumes large amount of time. So we design a specific tree for searching and combine deep model to solve this problem. We fine-tune BERT on the training data to get semantic vector or sentence embeddings on the test data. We use all the sentence embeddings of test data to build our tree based on k-means and do beam search at predicting time when given a sentence as query. We do the experiments on the semantic textual similarity dataset, Quora Question Pairs, and process the dataset for sentence ranking. Experimental results show that our methods outperform the strong baseline. Our tree accelerate the predicting speed by 500%-1000% without losing too much ranking accuracy.
Tasks Information Retrieval, Question Answering, Semantic Textual Similarity, Sentence Embeddings
Published 2019-08-22
URL https://arxiv.org/abs/1908.08326v8
PDF https://arxiv.org/pdf/1908.08326v8.pdf
PWC https://paperswithcode.com/paper/revisit-semantic-representation-and-tree
Repo https://github.com/guotong1988/Semantic-Tree-Search
Framework none

Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning

Title Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning
Authors Huy Phan, Oliver Y. Chén, Philipp Koch, Zongqing Lu, Ian McLoughlin, Alfred Mertins, Maarten De Vos
Abstract Although large annotated sleep databases are publicly available, and might be used to train automated scoring algorithms, it might still be a challenge to develop an optimal algorithm for your personal sleep study, which might have few subjects or rely on a different recording setup. Both directly applying a learned algorithm or retraining the algorithm on your rather small database is suboptimal. And definitely state-of-the-art sleep staging algorithms based on deep neural networks demand a large amount of data to be trained. This work presents a deep transfer learning approach to overcome the channel mismatch problem and enable transferring knowledge from a large dataset to a small cohort for automatic sleep staging. We start from a generic end-to-end deep learning framework for sequence-to-sequence sleep staging and derive two networks adhering to this framework as a device for transfer learning. The networks are first trained in the source domain (i.e. the large database). The pretrained networks are then finetuned in the target domain, i.e. the small cohort, to complete knowledge transfer. We employ the Montreal Archive of Sleep Studies (MASS) database consisting of 200 subjects as the source domain and study deep transfer learning on four different target domains: the Sleep Cassette subset and the Sleep Telemetry subset of the Sleep-EDF Expanded database, the Surrey-cEEGGrid database, and the Surrey-PSG database. The target domains are purposely adopted to cover different degrees of channel mismatch to the source domain. Our experimental results show significant performance improvement on automatic sleep staging on the target domains achieved with the proposed deep transfer learning approach and we discuss the impact of various fine tuning approaches.
Tasks Automatic Sleep Stage Classification, Multimodal Sleep Stage Detection, Sleep Stage Detection, Transfer Learning
Published 2019-07-30
URL https://arxiv.org/abs/1907.13177v1
PDF https://arxiv.org/pdf/1907.13177v1.pdf
PWC https://paperswithcode.com/paper/towards-more-accurate-automatic-sleep-staging
Repo https://github.com/pquochuy/sleep_transfer_learning
Framework tf

Gating Mechanisms for Combining Character and Word-level Word Representations: An Empirical Study

Title Gating Mechanisms for Combining Character and Word-level Word Representations: An Empirical Study
Authors Jorge A. Balazs, Yutaka Matsuo
Abstract In this paper we study how different ways of combining character and word-level representations affect the quality of both final word and sentence representations. We provide strong empirical evidence that modeling characters improves the learned representations at the word and sentence levels, and that doing so is particularly useful when representing less frequent words. We further show that a feature-wise sigmoid gating mechanism is a robust method for creating representations that encode semantic similarity, as it performed reasonably well in several word similarity datasets. Finally, our findings suggest that properly capturing semantic similarity at the word level does not consistently yield improved performance in downstream sentence-level tasks. Our code is available at https://github.com/jabalazs/gating
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2019-04-11
URL http://arxiv.org/abs/1904.05584v1
PDF http://arxiv.org/pdf/1904.05584v1.pdf
PWC https://paperswithcode.com/paper/gating-mechanisms-for-combining-character-and
Repo https://github.com/jabalazs/gating
Framework none

Crypto-Oriented Neural Architecture Design

Title Crypto-Oriented Neural Architecture Design
Authors Avital Shafran, Gil Segev, Shmuel Peleg, Yedid Hoshen
Abstract As neural networks revolutionize many applications, significant privacy concerns emerge. Owners of private data wish to use remote neural network services while ensuring their data cannot be interpreted by others. Service providers wish to keep their model private to safeguard its intellectual property. Such privacy conflicts may slow down the adoption of neural networks in sensitive domains such as healthcare. Privacy issues have been addressed in the cryptography community in the context of secure computation. However, secure computation protocols have known performance issues. E.g., runtime of secure inference in deep neural networks is three orders of magnitude longer comparing to non-secure inference. Therefore, much research efforts address the optimization of cryptographic protocols for secure inference. We take a complementary approach, and provide design principles for optimizing the crypto-oriented neural network architectures to reduce the runtime of secure inference. The principles are evaluated on three state-of-the-art architectures: SqueezeNet, ShuffleNetV2, and MobileNetV2. Our novel method significantly improves the efficiency of secure inference on common evaluation metrics.
Tasks
Published 2019-11-27
URL https://arxiv.org/abs/1911.12322v1
PDF https://arxiv.org/pdf/1911.12322v1.pdf
PWC https://paperswithcode.com/paper/crypto-oriented-neural-architecture-design
Repo https://github.com/tf-encrypted/tf-encrypted
Framework tf

The Natural Language of Actions

Title The Natural Language of Actions
Authors Guy Tennenholtz, Shie Mannor
Abstract We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We show how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then use these for augmenting state representations as well as improving function approximation of Q-values. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II.
Tasks Starcraft, Starcraft II
Published 2019-02-04
URL https://arxiv.org/abs/1902.01119v2
PDF https://arxiv.org/pdf/1902.01119v2.pdf
PWC https://paperswithcode.com/paper/the-natural-language-of-actions
Repo https://github.com/1230113202/NV-JM-DD
Framework pytorch

One-Shot Object Detection with Co-Attention and Co-Excitation

Title One-Shot Object Detection with Co-Attention and Co-Excitation
Authors Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, Tyng-Luh Liu
Abstract This paper aims to tackle the challenging problem of one-shot object detection. Given a query image patch whose class label is not included in the training data, the goal of the task is to detect all instances of the same class in a target image. To this end, we develop a novel {\em co-attention and co-excitation} (CoAE) framework that makes contributions in three key technical aspects. First, we propose to use the non-local operation to explore the co-attention embodied in each query-target pair and yield region proposals accounting for the one-shot situation. Second, we formulate a squeeze-and-co-excitation scheme that can adaptively emphasize correlated feature channels to help uncover relevant proposals and eventually the target objects. Third, we design a margin-based ranking loss for implicitly learning a metric to predict the similarity of a region proposal to the underlying query, no matter its class label is seen or unseen in training. The resulting model is therefore a two-stage detector that yields a strong baseline on both VOC and MS-COCO under one-shot setting of detecting objects from both seen and never-seen classes. Codes are available at https://github.com/timy90022/One-Shot-Object-Detection.
Tasks Object Detection, One-Shot Object Detection
Published 2019-11-28
URL https://arxiv.org/abs/1911.12529v1
PDF https://arxiv.org/pdf/1911.12529v1.pdf
PWC https://paperswithcode.com/paper/one-shot-object-detection-with-co-attention-1
Repo https://github.com/timy90022/One-Shot-Object-Detection
Framework pytorch

Robust Visual Domain Randomization for Reinforcement Learning

Title Robust Visual Domain Randomization for Reinforcement Learning
Authors Reda Bahi Slaoui, William R. Clements, Jakob N. Foerster, Sébastien Toth
Abstract Producing agents that can generalize to a wide range of visually different environments is a significant challenge in reinforcement learning. One method for overcoming this issue is visual domain randomization, whereby at the start of each training episode some visual aspects of the environment are randomized so that the agent is exposed to many possible variations. However, domain randomization is highly inefficient and may lead to policies with high variance across domains. Instead, we propose a regularization method whereby the agent is only trained on one variation of the environment, and its learned state representations are regularized during training to be invariant across domains. We conduct experiments that demonstrate that our technique leads to more efficient and robust learning than standard domain randomization, while achieving equal generalization scores.
Tasks
Published 2019-10-23
URL https://arxiv.org/abs/1910.10537v2
PDF https://arxiv.org/pdf/1910.10537v2.pdf
PWC https://paperswithcode.com/paper/robust-domain-randomization-for-reinforcement
Repo https://github.com/uncharted-technologies/robust-domain-randomization
Framework none

LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC)

Title LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC)
Authors Daniel Loureiro, Alipio Jorge
Abstract This paper describes the LIAAD system that was ranked second place in the Word-in-Context challenge (WiC) featured in SemDeep-5. Our solution is based on a novel system for Word Sense Disambiguation (WSD) using contextual embeddings and full-inventory sense embeddings. We adapt this WSD system, in a straightforward manner, for the present task of detecting whether the same sense occurs in a pair of sentences. Additionally, we show that our solution is able to achieve competitive performance even without using the provided training or development sets, mitigating potential concerns related to task overfitting
Tasks Word Sense Disambiguation
Published 2019-06-24
URL https://arxiv.org/abs/1906.10002v1
PDF https://arxiv.org/pdf/1906.10002v1.pdf
PWC https://paperswithcode.com/paper/liaad-at-semdeep-5-challenge-word-in-context
Repo https://github.com/danlou/LMMS
Framework none

Topic-Aware Neural Keyphrase Generation for Social Media Language

Title Topic-Aware Neural Keyphrase Generation for Social Media Language
Authors Yue Wang, Jing Li, Hou Pong Chan, Irwin King, Michael R. Lyu, Shuming Shi
Abstract A huge volume of user-generated content is daily produced on social media. To facilitate automatic language understanding, we study keyphrase prediction, distilling salient information from massive posts. While most existing methods extract words from source posts to form keyphrases, we propose a sequence-to-sequence (seq2seq) based neural keyphrase generation framework, enabling absent keyphrases to be created. Moreover, our model, being topic-aware, allows joint modeling of corpus-level latent topic representations, which helps alleviate the data sparsity that widely exhibited in social media language. Experiments on three datasets collected from English and Chinese social media platforms show that our model significantly outperforms both extraction and generation models that do not exploit latent topics. Further discussions show that our model learns meaningful topics, which interprets its superiority in social media keyphrase generation.
Tasks
Published 2019-06-10
URL https://arxiv.org/abs/1906.03889v1
PDF https://arxiv.org/pdf/1906.03889v1.pdf
PWC https://paperswithcode.com/paper/topic-aware-neural-keyphrase-generation-for
Repo https://github.com/yuewang-cuhk/TAKG
Framework pytorch

A CCG-based Compositional Semantics and Inference System for Comparatives

Title A CCG-based Compositional Semantics and Inference System for Comparatives
Authors Izumi Haruta, Koji Mineshima, Daisuke Bekki
Abstract Comparative constructions play an important role in natural language inference. However, attempts to study semantic representations and logical inferences for comparatives from the computational perspective are not well developed, due to the complexity of their syntactic structures and inference patterns. In this study, using a framework based on Combinatory Categorial Grammar (CCG), we present a compositional semantics that maps various comparative constructions in English to semantic representations and introduces an inference system that effectively handles logical inference with comparatives, including those involving numeral adjectives, antonyms, and quantification. We evaluate the performance of our system on the FraCaS test suite and show that the system can handle a variety of complex logical inferences with comparatives.
Tasks Natural Language Inference
Published 2019-10-02
URL https://arxiv.org/abs/1910.00930v1
PDF https://arxiv.org/pdf/1910.00930v1.pdf
PWC https://paperswithcode.com/paper/a-ccg-based-compositional-semantics-and
Repo https://github.com/izumi-h/fracas-comparatives_adjectives
Framework none

Long-Term Urban Vehicle Localization Using Pole Landmarks Extracted from 3-D Lidar Scans

Title Long-Term Urban Vehicle Localization Using Pole Landmarks Extracted from 3-D Lidar Scans
Authors Alexander Schaefer, Daniel Büscher, Johan Vertens, Lukas Luft, Wolfram Burgard
Abstract Due to their ubiquity and long-term stability, pole-like objects are well suited to serve as landmarks for vehicle localization in urban environments. In this work, we present a complete mapping and long-term localization system based on pole landmarks extracted from 3-D lidar data. Our approach features a novel pole detector, a mapping module, and an online localization module, each of which are described in detail, and for which we provide an open-source implementation at www.github.com/acschaefer/polex. In extensive experiments, we demonstrate that our method improves on the state of the art with respect to long-term reliability and accuracy: First, we prove reliability by tasking the system with localizing a mobile robot over the course of 15~months in an urban area based on an initial map, confronting it with constantly varying routes, differing weather conditions, seasonal changes, and construction sites. Second, we show that the proposed approach clearly outperforms a recently published method in terms of accuracy.
Tasks
Published 2019-10-23
URL https://arxiv.org/abs/1910.10550v1
PDF https://arxiv.org/pdf/1910.10550v1.pdf
PWC https://paperswithcode.com/paper/long-term-urban-vehicle-localization-using
Repo https://github.com/acschaefer/polex
Framework none

Enhancing AMR-to-Text Generation with Dual Graph Representations

Title Enhancing AMR-to-Text Generation with Dual Graph Representations
Authors Leonardo F. R. Ribeiro, Claire Gardent, Iryna Gurevych
Abstract Generating text from graph-based data, such as Abstract Meaning Representation (AMR), is a challenging task due to the inherent difficulty in how to properly encode the structure of a graph with labeled edges. To address this difficulty, we propose a novel graph-to-sequence model that encodes different but complementary perspectives of the structural information contained in the AMR graph. The model learns parallel top-down and bottom-up representations of nodes capturing contrasting views of the graph. We also investigate the use of different node message passing strategies, employing different state-of-the-art graph encoders to compute node representations based on incoming and outgoing perspectives. In our experiments, we demonstrate that the dual graph representation leads to improvements in AMR-to-text generation, achieving state-of-the-art results on two AMR datasets.
Tasks Graph-to-Sequence, Text Generation
Published 2019-09-01
URL https://arxiv.org/abs/1909.00352v1
PDF https://arxiv.org/pdf/1909.00352v1.pdf
PWC https://paperswithcode.com/paper/enhancing-amr-to-text-generation-with-dual
Repo https://github.com/UKPLab/emnlp2019-dualgraph
Framework pytorch
comments powered by Disqus