February 1, 2020

3172 words 15 mins read

Paper Group AWR 257

Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis. Probabilistic Models of Relational Implication. Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition. Revisiting Semantic Representation and Tree Search for Similar Question Retrieval. Towards More Accurate Automatic Sleep …

Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis


Title	Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis
Authors	Yueming Jin, Huaxia Li, Qi Dou, Hao Chen, Jing Qin, Chi-Wing Fu, Pheng-Ann Heng
Abstract	Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis and also very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition). The code can be found on our project website.
Tasks
Published	2019-07-13
URL	https://arxiv.org/abs/1907.06099v1
PDF	https://arxiv.org/pdf/1907.06099v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-recurrent-convolutional-network
Repo	https://github.com/YuemingJin/MTRCNet-CL
Framework	pytorch

Probabilistic Models of Relational Implication


Title	Probabilistic Models of Relational Implication
Authors	Xavier Holt
Abstract	Relational data in its most basic form is a static collection of known facts. However, by learning to infer and deduct additional information and structure, we can massively increase the usefulness of the underlying data. One common form of inferential reasoning in knowledge bases is implication discovery. Here, by learning when one relation implies another, we can extend our knowledge representation. There are several existing models for relational implication, however we argue they are motivated but not principled. To this end, we define a formal probabilistic model of relational implication. By using estimators based on the empirical distribution of our dataset, we demonstrate that our model outperforms existing approaches. While previous work achieves a best score of 0.7812 AUC on an evaluatory dataset, our ProbE model improves this to 0.7915. Furthermore, we demonstrate that our model can be improved substantially through the use of link prediction models and dense latent representations of the underlying argument and relations. This variant, denoted ProbL, improves the state of the art on our evaluation dataset to 0.8143. In addition to developing a new framework and providing novel scores of relational implication, we provide two pragmatic resources to assist future research. First, we motivate and develop an improved crowd framework for constructing labelled datasets of relational implication. Using this, we reannotate and make public a dataset comprised of 17,848 instances of labelled relational implication. We demonstrate that precision (as evaluated by expert consensus with the crowd labels) on the resulting dataset improves from 53% to 95%.
Tasks	Link Prediction
Published	2019-07-28
URL	https://arxiv.org/abs/1907.12048v1
PDF	https://arxiv.org/pdf/1907.12048v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-models-of-relational
Repo	https://github.com/xavi-ai/relational-implication-dataset
Framework	none

Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition


Title	Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition
Authors	Diogo C Luvizon, Hedi Tabia, David Picard
Abstract	Human pose estimation and action recognition are related tasks since both problems are strongly dependent on the human body representation and analysis. Nonetheless, most recent methods in the literature handle the two problems separately. In this work, we propose a multi-task framework for jointly estimating 2D or 3D human poses from monocular color images and classifying human actions from video sequences. We show that a single architecture can be used to solve both problems in an efficient way and still achieves state-of-the-art or comparable results at each task while running at more than 100 frames per second. The proposed method benefits from high parameters sharing between the two tasks by unifying still images and video clips processing in a single pipeline, allowing the model to be trained with data from different categories simultaneously and in a seamlessly way. Additionally, we provide important insights for end-to-end training the proposed multi-task model by decoupling key prediction parts, which consistently leads to better accuracy on both tasks. The reported results on four datasets (MPII, Human3.6M, Penn Action and NTU RGB+D) demonstrate the effectiveness of our method on the targeted tasks. Our source code and trained weights are publicly available at https://github.com/dluvizon/deephar.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2019-12-15
URL	https://arxiv.org/abs/1912.08077v2
PDF	https://arxiv.org/pdf/1912.08077v2.pdf
PWC	https://paperswithcode.com/paper/multi-task-deep-learning-for-real-time-3d
Repo	https://github.com/fdu-wuyuan/Siren
Framework	none

Revisiting Semantic Representation and Tree Search for Similar Question Retrieval


Title	Revisiting Semantic Representation and Tree Search for Similar Question Retrieval
Authors	Tong Guo, Huilin Gao
Abstract	This paper studies the performances of BERT combined with tree structure in short sentence ranking task. In retrieval-based question answering system, we retrieve the most similar question of the query question by ranking all the questions in datasets. If we want to rank all the sentences by neural rankers, we need to score all the sentence pairs. However it consumes large amount of time. So we design a specific tree for searching and combine deep model to solve this problem. We fine-tune BERT on the training data to get semantic vector or sentence embeddings on the test data. We use all the sentence embeddings of test data to build our tree based on k-means and do beam search at predicting time when given a sentence as query. We do the experiments on the semantic textual similarity dataset, Quora Question Pairs, and process the dataset for sentence ranking. Experimental results show that our methods outperform the strong baseline. Our tree accelerate the predicting speed by 500%-1000% without losing too much ranking accuracy.
Tasks	Information Retrieval, Question Answering, Semantic Textual Similarity, Sentence Embeddings
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08326v8
PDF	https://arxiv.org/pdf/1908.08326v8.pdf
PWC	https://paperswithcode.com/paper/revisit-semantic-representation-and-tree
Repo	https://github.com/guotong1988/Semantic-Tree-Search
Framework	none

Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning


Title	Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning
Authors	Huy Phan, Oliver Y. Chén, Philipp Koch, Zongqing Lu, Ian McLoughlin, Alfred Mertins, Maarten De Vos
Abstract	Although large annotated sleep databases are publicly available, and might be used to train automated scoring algorithms, it might still be a challenge to develop an optimal algorithm for your personal sleep study, which might have few subjects or rely on a different recording setup. Both directly applying a learned algorithm or retraining the algorithm on your rather small database is suboptimal. And definitely state-of-the-art sleep staging algorithms based on deep neural networks demand a large amount of data to be trained. This work presents a deep transfer learning approach to overcome the channel mismatch problem and enable transferring knowledge from a large dataset to a small cohort for automatic sleep staging. We start from a generic end-to-end deep learning framework for sequence-to-sequence sleep staging and derive two networks adhering to this framework as a device for transfer learning. The networks are first trained in the source domain (i.e. the large database). The pretrained networks are then finetuned in the target domain, i.e. the small cohort, to complete knowledge transfer. We employ the Montreal Archive of Sleep Studies (MASS) database consisting of 200 subjects as the source domain and study deep transfer learning on four different target domains: the Sleep Cassette subset and the Sleep Telemetry subset of the Sleep-EDF Expanded database, the Surrey-cEEGGrid database, and the Surrey-PSG database. The target domains are purposely adopted to cover different degrees of channel mismatch to the source domain. Our experimental results show significant performance improvement on automatic sleep staging on the target domains achieved with the proposed deep transfer learning approach and we discuss the impact of various fine tuning approaches.
Tasks	Automatic Sleep Stage Classification, Multimodal Sleep Stage Detection, Sleep Stage Detection, Transfer Learning
Published	2019-07-30
URL	https://arxiv.org/abs/1907.13177v1
PDF	https://arxiv.org/pdf/1907.13177v1.pdf
PWC	https://paperswithcode.com/paper/towards-more-accurate-automatic-sleep-staging
Repo	https://github.com/pquochuy/sleep_transfer_learning
Framework	tf

Gating Mechanisms for Combining Character and Word-level Word Representations: An Empirical Study


Title	Gating Mechanisms for Combining Character and Word-level Word Representations: An Empirical Study
Authors	Jorge A. Balazs, Yutaka Matsuo
Abstract	In this paper we study how different ways of combining character and word-level representations affect the quality of both final word and sentence representations. We provide strong empirical evidence that modeling characters improves the learned representations at the word and sentence levels, and that doing so is particularly useful when representing less frequent words. We further show that a feature-wise sigmoid gating mechanism is a robust method for creating representations that encode semantic similarity, as it performed reasonably well in several word similarity datasets. Finally, our findings suggest that properly capturing semantic similarity at the word level does not consistently yield improved performance in downstream sentence-level tasks. Our code is available at https://github.com/jabalazs/gating
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05584v1
PDF	http://arxiv.org/pdf/1904.05584v1.pdf
PWC	https://paperswithcode.com/paper/gating-mechanisms-for-combining-character-and
Repo	https://github.com/jabalazs/gating
Framework	none

Crypto-Oriented Neural Architecture Design


Title	Crypto-Oriented Neural Architecture Design
Authors	Avital Shafran, Gil Segev, Shmuel Peleg, Yedid Hoshen
Abstract	As neural networks revolutionize many applications, significant privacy concerns emerge. Owners of private data wish to use remote neural network services while ensuring their data cannot be interpreted by others. Service providers wish to keep their model private to safeguard its intellectual property. Such privacy conflicts may slow down the adoption of neural networks in sensitive domains such as healthcare. Privacy issues have been addressed in the cryptography community in the context of secure computation. However, secure computation protocols have known performance issues. E.g., runtime of secure inference in deep neural networks is three orders of magnitude longer comparing to non-secure inference. Therefore, much research efforts address the optimization of cryptographic protocols for secure inference. We take a complementary approach, and provide design principles for optimizing the crypto-oriented neural network architectures to reduce the runtime of secure inference. The principles are evaluated on three state-of-the-art architectures: SqueezeNet, ShuffleNetV2, and MobileNetV2. Our novel method significantly improves the efficiency of secure inference on common evaluation metrics.
Tasks
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12322v1
PDF	https://arxiv.org/pdf/1911.12322v1.pdf
PWC	https://paperswithcode.com/paper/crypto-oriented-neural-architecture-design
Repo	https://github.com/tf-encrypted/tf-encrypted
Framework	tf

The Natural Language of Actions


Title	The Natural Language of Actions
Authors	Guy Tennenholtz, Shie Mannor
Abstract	We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We show how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then use these for augmenting state representations as well as improving function approximation of Q-values. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II.
Tasks	Starcraft, Starcraft II
Published	2019-02-04
URL	https://arxiv.org/abs/1902.01119v2
PDF	https://arxiv.org/pdf/1902.01119v2.pdf
PWC	https://paperswithcode.com/paper/the-natural-language-of-actions
Repo	https://github.com/1230113202/NV-JM-DD
Framework	pytorch

One-Shot Object Detection with Co-Attention and Co-Excitation


Title	One-Shot Object Detection with Co-Attention and Co-Excitation
Authors	Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, Tyng-Luh Liu
Abstract	This paper aims to tackle the challenging problem of one-shot object detection. Given a query image patch whose class label is not included in the training data, the goal of the task is to detect all instances of the same class in a target image. To this end, we develop a novel {\em co-attention and co-excitation} (CoAE) framework that makes contributions in three key technical aspects. First, we propose to use the non-local operation to explore the co-attention embodied in each query-target pair and yield region proposals accounting for the one-shot situation. Second, we formulate a squeeze-and-co-excitation scheme that can adaptively emphasize correlated feature channels to help uncover relevant proposals and eventually the target objects. Third, we design a margin-based ranking loss for implicitly learning a metric to predict the similarity of a region proposal to the underlying query, no matter its class label is seen or unseen in training. The resulting model is therefore a two-stage detector that yields a strong baseline on both VOC and MS-COCO under one-shot setting of detecting objects from both seen and never-seen classes. Codes are available at https://github.com/timy90022/One-Shot-Object-Detection.
Tasks	Object Detection, One-Shot Object Detection
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12529v1
PDF	https://arxiv.org/pdf/1911.12529v1.pdf
PWC	https://paperswithcode.com/paper/one-shot-object-detection-with-co-attention-1
Repo	https://github.com/timy90022/One-Shot-Object-Detection
Framework	pytorch

Robust Visual Domain Randomization for Reinforcement Learning


Title	Robust Visual Domain Randomization for Reinforcement Learning
Authors	Reda Bahi Slaoui, William R. Clements, Jakob N. Foerster, Sébastien Toth
Abstract	Producing agents that can generalize to a wide range of visually different environments is a significant challenge in reinforcement learning. One method for overcoming this issue is visual domain randomization, whereby at the start of each training episode some visual aspects of the environment are randomized so that the agent is exposed to many possible variations. However, domain randomization is highly inefficient and may lead to policies with high variance across domains. Instead, we propose a regularization method whereby the agent is only trained on one variation of the environment, and its learned state representations are regularized during training to be invariant across domains. We conduct experiments that demonstrate that our technique leads to more efficient and robust learning than standard domain randomization, while achieving equal generalization scores.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10537v2
PDF	https://arxiv.org/pdf/1910.10537v2.pdf
PWC	https://paperswithcode.com/paper/robust-domain-randomization-for-reinforcement
Repo	https://github.com/uncharted-technologies/robust-domain-randomization
Framework	none

LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC)


Title	LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC)
Authors	Daniel Loureiro, Alipio Jorge
Abstract	This paper describes the LIAAD system that was ranked second place in the Word-in-Context challenge (WiC) featured in SemDeep-5. Our solution is based on a novel system for Word Sense Disambiguation (WSD) using contextual embeddings and full-inventory sense embeddings. We adapt this WSD system, in a straightforward manner, for the present task of detecting whether the same sense occurs in a pair of sentences. Additionally, we show that our solution is able to achieve competitive performance even without using the provided training or development sets, mitigating potential concerns related to task overfitting
Tasks	Word Sense Disambiguation
Published	2019-06-24
URL	https://arxiv.org/abs/1906.10002v1
PDF	https://arxiv.org/pdf/1906.10002v1.pdf
PWC	https://paperswithcode.com/paper/liaad-at-semdeep-5-challenge-word-in-context
Repo	https://github.com/danlou/LMMS
Framework	none


Title	Topic-Aware Neural Keyphrase Generation for Social Media Language
Authors	Yue Wang, Jing Li, Hou Pong Chan, Irwin King, Michael R. Lyu, Shuming Shi
Abstract	A huge volume of user-generated content is daily produced on social media. To facilitate automatic language understanding, we study keyphrase prediction, distilling salient information from massive posts. While most existing methods extract words from source posts to form keyphrases, we propose a sequence-to-sequence (seq2seq) based neural keyphrase generation framework, enabling absent keyphrases to be created. Moreover, our model, being topic-aware, allows joint modeling of corpus-level latent topic representations, which helps alleviate the data sparsity that widely exhibited in social media language. Experiments on three datasets collected from English and Chinese social media platforms show that our model significantly outperforms both extraction and generation models that do not exploit latent topics. Further discussions show that our model learns meaningful topics, which interprets its superiority in social media keyphrase generation.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03889v1
PDF	https://arxiv.org/pdf/1906.03889v1.pdf
PWC	https://paperswithcode.com/paper/topic-aware-neural-keyphrase-generation-for
Repo	https://github.com/yuewang-cuhk/TAKG
Framework	pytorch

A CCG-based Compositional Semantics and Inference System for Comparatives


Title	A CCG-based Compositional Semantics and Inference System for Comparatives
Authors	Izumi Haruta, Koji Mineshima, Daisuke Bekki
Abstract	Comparative constructions play an important role in natural language inference. However, attempts to study semantic representations and logical inferences for comparatives from the computational perspective are not well developed, due to the complexity of their syntactic structures and inference patterns. In this study, using a framework based on Combinatory Categorial Grammar (CCG), we present a compositional semantics that maps various comparative constructions in English to semantic representations and introduces an inference system that effectively handles logical inference with comparatives, including those involving numeral adjectives, antonyms, and quantification. We evaluate the performance of our system on the FraCaS test suite and show that the system can handle a variety of complex logical inferences with comparatives.
Tasks	Natural Language Inference
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00930v1
PDF	https://arxiv.org/pdf/1910.00930v1.pdf
PWC	https://paperswithcode.com/paper/a-ccg-based-compositional-semantics-and
Repo	https://github.com/izumi-h/fracas-comparatives_adjectives
Framework	none

Long-Term Urban Vehicle Localization Using Pole Landmarks Extracted from 3-D Lidar Scans


Title	Long-Term Urban Vehicle Localization Using Pole Landmarks Extracted from 3-D Lidar Scans
Authors	Alexander Schaefer, Daniel Büscher, Johan Vertens, Lukas Luft, Wolfram Burgard
Abstract	Due to their ubiquity and long-term stability, pole-like objects are well suited to serve as landmarks for vehicle localization in urban environments. In this work, we present a complete mapping and long-term localization system based on pole landmarks extracted from 3-D lidar data. Our approach features a novel pole detector, a mapping module, and an online localization module, each of which are described in detail, and for which we provide an open-source implementation at www.github.com/acschaefer/polex. In extensive experiments, we demonstrate that our method improves on the state of the art with respect to long-term reliability and accuracy: First, we prove reliability by tasking the system with localizing a mobile robot over the course of 15~months in an urban area based on an initial map, confronting it with constantly varying routes, differing weather conditions, seasonal changes, and construction sites. Second, we show that the proposed approach clearly outperforms a recently published method in terms of accuracy.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10550v1
PDF	https://arxiv.org/pdf/1910.10550v1.pdf
PWC	https://paperswithcode.com/paper/long-term-urban-vehicle-localization-using
Repo	https://github.com/acschaefer/polex
Framework	none

Enhancing AMR-to-Text Generation with Dual Graph Representations


Title	Enhancing AMR-to-Text Generation with Dual Graph Representations
Authors	Leonardo F. R. Ribeiro, Claire Gardent, Iryna Gurevych
Abstract	Generating text from graph-based data, such as Abstract Meaning Representation (AMR), is a challenging task due to the inherent difficulty in how to properly encode the structure of a graph with labeled edges. To address this difficulty, we propose a novel graph-to-sequence model that encodes different but complementary perspectives of the structural information contained in the AMR graph. The model learns parallel top-down and bottom-up representations of nodes capturing contrasting views of the graph. We also investigate the use of different node message passing strategies, employing different state-of-the-art graph encoders to compute node representations based on incoming and outgoing perspectives. In our experiments, we demonstrate that the dual graph representation leads to improvements in AMR-to-text generation, achieving state-of-the-art results on two AMR datasets.
Tasks	Graph-to-Sequence, Text Generation
Published	2019-09-01
URL	https://arxiv.org/abs/1909.00352v1
PDF	https://arxiv.org/pdf/1909.00352v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-amr-to-text-generation-with-dual
Repo	https://github.com/UKPLab/emnlp2019-dualgraph
Framework	pytorch