Paper Group AWR 257
Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis. Probabilistic Models of Relational Implication. Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition. Revisiting Semantic Representation and Tree Search for Similar Question Retrieval. Towards More Accurate Automatic Sleep …
Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis
Title | Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis |
Authors | Yueming Jin, Huaxia Li, Qi Dou, Hao Chen, Jing Qin, Chi-Wing Fu, Pheng-Ann Heng |
Abstract | Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis and also very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition). The code can be found on our project website. |
Tasks | |
Published | 2019-07-13 |
URL | https://arxiv.org/abs/1907.06099v1 |
https://arxiv.org/pdf/1907.06099v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-recurrent-convolutional-network |
Repo | https://github.com/YuemingJin/MTRCNet-CL |
Framework | pytorch |
Probabilistic Models of Relational Implication
Title | Probabilistic Models of Relational Implication |
Authors | Xavier Holt |
Abstract | Relational data in its most basic form is a static collection of known facts. However, by learning to infer and deduct additional information and structure, we can massively increase the usefulness of the underlying data. One common form of inferential reasoning in knowledge bases is implication discovery. Here, by learning when one relation implies another, we can extend our knowledge representation. There are several existing models for relational implication, however we argue they are motivated but not principled. To this end, we define a formal probabilistic model of relational implication. By using estimators based on the empirical distribution of our dataset, we demonstrate that our model outperforms existing approaches. While previous work achieves a best score of 0.7812 AUC on an evaluatory dataset, our ProbE model improves this to 0.7915. Furthermore, we demonstrate that our model can be improved substantially through the use of link prediction models and dense latent representations of the underlying argument and relations. This variant, denoted ProbL, improves the state of the art on our evaluation dataset to 0.8143. In addition to developing a new framework and providing novel scores of relational implication, we provide two pragmatic resources to assist future research. First, we motivate and develop an improved crowd framework for constructing labelled datasets of relational implication. Using this, we reannotate and make public a dataset comprised of 17,848 instances of labelled relational implication. We demonstrate that precision (as evaluated by expert consensus with the crowd labels) on the resulting dataset improves from 53% to 95%. |
Tasks | Link Prediction |
Published | 2019-07-28 |
URL | https://arxiv.org/abs/1907.12048v1 |
https://arxiv.org/pdf/1907.12048v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-models-of-relational |
Repo | https://github.com/xavi-ai/relational-implication-dataset |
Framework | none |
Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition
Title | Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition |
Authors | Diogo C Luvizon, Hedi Tabia, David Picard |
Abstract | Human pose estimation and action recognition are related tasks since both problems are strongly dependent on the human body representation and analysis. Nonetheless, most recent methods in the literature handle the two problems separately. In this work, we propose a multi-task framework for jointly estimating 2D or 3D human poses from monocular color images and classifying human actions from video sequences. We show that a single architecture can be used to solve both problems in an efficient way and still achieves state-of-the-art or comparable results at each task while running at more than 100 frames per second. The proposed method benefits from high parameters sharing between the two tasks by unifying still images and video clips processing in a single pipeline, allowing the model to be trained with data from different categories simultaneously and in a seamlessly way. Additionally, we provide important insights for end-to-end training the proposed multi-task model by decoupling key prediction parts, which consistently leads to better accuracy on both tasks. The reported results on four datasets (MPII, Human3.6M, Penn Action and NTU RGB+D) demonstrate the effectiveness of our method on the targeted tasks. Our source code and trained weights are publicly available at https://github.com/dluvizon/deephar. |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2019-12-15 |
URL | https://arxiv.org/abs/1912.08077v2 |
https://arxiv.org/pdf/1912.08077v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-deep-learning-for-real-time-3d |
Repo | https://github.com/fdu-wuyuan/Siren |
Framework | none |
Revisiting Semantic Representation and Tree Search for Similar Question Retrieval
Title | Revisiting Semantic Representation and Tree Search for Similar Question Retrieval |
Authors | Tong Guo, Huilin Gao |
Abstract | This paper studies the performances of BERT combined with tree structure in short sentence ranking task. In retrieval-based question answering system, we retrieve the most similar question of the query question by ranking all the questions in datasets. If we want to rank all the sentences by neural rankers, we need to score all the sentence pairs. However it consumes large amount of time. So we design a specific tree for searching and combine deep model to solve this problem. We fine-tune BERT on the training data to get semantic vector or sentence embeddings on the test data. We use all the sentence embeddings of test data to build our tree based on k-means and do beam search at predicting time when given a sentence as query. We do the experiments on the semantic textual similarity dataset, Quora Question Pairs, and process the dataset for sentence ranking. Experimental results show that our methods outperform the strong baseline. Our tree accelerate the predicting speed by 500%-1000% without losing too much ranking accuracy. |
Tasks | Information Retrieval, Question Answering, Semantic Textual Similarity, Sentence Embeddings |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08326v8 |
https://arxiv.org/pdf/1908.08326v8.pdf | |
PWC | https://paperswithcode.com/paper/revisit-semantic-representation-and-tree |
Repo | https://github.com/guotong1988/Semantic-Tree-Search |
Framework | none |
Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning
Title | Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning |
Authors | Huy Phan, Oliver Y. Chén, Philipp Koch, Zongqing Lu, Ian McLoughlin, Alfred Mertins, Maarten De Vos |
Abstract | Although large annotated sleep databases are publicly available, and might be used to train automated scoring algorithms, it might still be a challenge to develop an optimal algorithm for your personal sleep study, which might have few subjects or rely on a different recording setup. Both directly applying a learned algorithm or retraining the algorithm on your rather small database is suboptimal. And definitely state-of-the-art sleep staging algorithms based on deep neural networks demand a large amount of data to be trained. This work presents a deep transfer learning approach to overcome the channel mismatch problem and enable transferring knowledge from a large dataset to a small cohort for automatic sleep staging. We start from a generic end-to-end deep learning framework for sequence-to-sequence sleep staging and derive two networks adhering to this framework as a device for transfer learning. The networks are first trained in the source domain (i.e. the large database). The pretrained networks are then finetuned in the target domain, i.e. the small cohort, to complete knowledge transfer. We employ the Montreal Archive of Sleep Studies (MASS) database consisting of 200 subjects as the source domain and study deep transfer learning on four different target domains: the Sleep Cassette subset and the Sleep Telemetry subset of the Sleep-EDF Expanded database, the Surrey-cEEGGrid database, and the Surrey-PSG database. The target domains are purposely adopted to cover different degrees of channel mismatch to the source domain. Our experimental results show significant performance improvement on automatic sleep staging on the target domains achieved with the proposed deep transfer learning approach and we discuss the impact of various fine tuning approaches. |
Tasks | Automatic Sleep Stage Classification, Multimodal Sleep Stage Detection, Sleep Stage Detection, Transfer Learning |
Published | 2019-07-30 |
URL | https://arxiv.org/abs/1907.13177v1 |
https://arxiv.org/pdf/1907.13177v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-more-accurate-automatic-sleep-staging |
Repo | https://github.com/pquochuy/sleep_transfer_learning |
Framework | tf |
Gating Mechanisms for Combining Character and Word-level Word Representations: An Empirical Study
Title | Gating Mechanisms for Combining Character and Word-level Word Representations: An Empirical Study |
Authors | Jorge A. Balazs, Yutaka Matsuo |
Abstract | In this paper we study how different ways of combining character and word-level representations affect the quality of both final word and sentence representations. We provide strong empirical evidence that modeling characters improves the learned representations at the word and sentence levels, and that doing so is particularly useful when representing less frequent words. We further show that a feature-wise sigmoid gating mechanism is a robust method for creating representations that encode semantic similarity, as it performed reasonably well in several word similarity datasets. Finally, our findings suggest that properly capturing semantic similarity at the word level does not consistently yield improved performance in downstream sentence-level tasks. Our code is available at https://github.com/jabalazs/gating |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05584v1 |
http://arxiv.org/pdf/1904.05584v1.pdf | |
PWC | https://paperswithcode.com/paper/gating-mechanisms-for-combining-character-and |
Repo | https://github.com/jabalazs/gating |
Framework | none |
Crypto-Oriented Neural Architecture Design
Title | Crypto-Oriented Neural Architecture Design |
Authors | Avital Shafran, Gil Segev, Shmuel Peleg, Yedid Hoshen |
Abstract | As neural networks revolutionize many applications, significant privacy concerns emerge. Owners of private data wish to use remote neural network services while ensuring their data cannot be interpreted by others. Service providers wish to keep their model private to safeguard its intellectual property. Such privacy conflicts may slow down the adoption of neural networks in sensitive domains such as healthcare. Privacy issues have been addressed in the cryptography community in the context of secure computation. However, secure computation protocols have known performance issues. E.g., runtime of secure inference in deep neural networks is three orders of magnitude longer comparing to non-secure inference. Therefore, much research efforts address the optimization of cryptographic protocols for secure inference. We take a complementary approach, and provide design principles for optimizing the crypto-oriented neural network architectures to reduce the runtime of secure inference. The principles are evaluated on three state-of-the-art architectures: SqueezeNet, ShuffleNetV2, and MobileNetV2. Our novel method significantly improves the efficiency of secure inference on common evaluation metrics. |
Tasks | |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12322v1 |
https://arxiv.org/pdf/1911.12322v1.pdf | |
PWC | https://paperswithcode.com/paper/crypto-oriented-neural-architecture-design |
Repo | https://github.com/tf-encrypted/tf-encrypted |
Framework | tf |
The Natural Language of Actions
Title | The Natural Language of Actions |
Authors | Guy Tennenholtz, Shie Mannor |
Abstract | We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We show how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then use these for augmenting state representations as well as improving function approximation of Q-values. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II. |
Tasks | Starcraft, Starcraft II |
Published | 2019-02-04 |
URL | https://arxiv.org/abs/1902.01119v2 |
https://arxiv.org/pdf/1902.01119v2.pdf | |
PWC | https://paperswithcode.com/paper/the-natural-language-of-actions |
Repo | https://github.com/1230113202/NV-JM-DD |
Framework | pytorch |
One-Shot Object Detection with Co-Attention and Co-Excitation
Title | One-Shot Object Detection with Co-Attention and Co-Excitation |
Authors | Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, Tyng-Luh Liu |
Abstract | This paper aims to tackle the challenging problem of one-shot object detection. Given a query image patch whose class label is not included in the training data, the goal of the task is to detect all instances of the same class in a target image. To this end, we develop a novel {\em co-attention and co-excitation} (CoAE) framework that makes contributions in three key technical aspects. First, we propose to use the non-local operation to explore the co-attention embodied in each query-target pair and yield region proposals accounting for the one-shot situation. Second, we formulate a squeeze-and-co-excitation scheme that can adaptively emphasize correlated feature channels to help uncover relevant proposals and eventually the target objects. Third, we design a margin-based ranking loss for implicitly learning a metric to predict the similarity of a region proposal to the underlying query, no matter its class label is seen or unseen in training. The resulting model is therefore a two-stage detector that yields a strong baseline on both VOC and MS-COCO under one-shot setting of detecting objects from both seen and never-seen classes. Codes are available at https://github.com/timy90022/One-Shot-Object-Detection. |
Tasks | Object Detection, One-Shot Object Detection |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12529v1 |
https://arxiv.org/pdf/1911.12529v1.pdf | |
PWC | https://paperswithcode.com/paper/one-shot-object-detection-with-co-attention-1 |
Repo | https://github.com/timy90022/One-Shot-Object-Detection |
Framework | pytorch |
Robust Visual Domain Randomization for Reinforcement Learning
Title | Robust Visual Domain Randomization for Reinforcement Learning |
Authors | Reda Bahi Slaoui, William R. Clements, Jakob N. Foerster, Sébastien Toth |
Abstract | Producing agents that can generalize to a wide range of visually different environments is a significant challenge in reinforcement learning. One method for overcoming this issue is visual domain randomization, whereby at the start of each training episode some visual aspects of the environment are randomized so that the agent is exposed to many possible variations. However, domain randomization is highly inefficient and may lead to policies with high variance across domains. Instead, we propose a regularization method whereby the agent is only trained on one variation of the environment, and its learned state representations are regularized during training to be invariant across domains. We conduct experiments that demonstrate that our technique leads to more efficient and robust learning than standard domain randomization, while achieving equal generalization scores. |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10537v2 |
https://arxiv.org/pdf/1910.10537v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-domain-randomization-for-reinforcement |
Repo | https://github.com/uncharted-technologies/robust-domain-randomization |
Framework | none |
LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC)
Title | LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC) |
Authors | Daniel Loureiro, Alipio Jorge |
Abstract | This paper describes the LIAAD system that was ranked second place in the Word-in-Context challenge (WiC) featured in SemDeep-5. Our solution is based on a novel system for Word Sense Disambiguation (WSD) using contextual embeddings and full-inventory sense embeddings. We adapt this WSD system, in a straightforward manner, for the present task of detecting whether the same sense occurs in a pair of sentences. Additionally, we show that our solution is able to achieve competitive performance even without using the provided training or development sets, mitigating potential concerns related to task overfitting |
Tasks | Word Sense Disambiguation |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10002v1 |
https://arxiv.org/pdf/1906.10002v1.pdf | |
PWC | https://paperswithcode.com/paper/liaad-at-semdeep-5-challenge-word-in-context |
Repo | https://github.com/danlou/LMMS |
Framework | none |
Topic-Aware Neural Keyphrase Generation for Social Media Language
Title | Topic-Aware Neural Keyphrase Generation for Social Media Language |
Authors | Yue Wang, Jing Li, Hou Pong Chan, Irwin King, Michael R. Lyu, Shuming Shi |
Abstract | A huge volume of user-generated content is daily produced on social media. To facilitate automatic language understanding, we study keyphrase prediction, distilling salient information from massive posts. While most existing methods extract words from source posts to form keyphrases, we propose a sequence-to-sequence (seq2seq) based neural keyphrase generation framework, enabling absent keyphrases to be created. Moreover, our model, being topic-aware, allows joint modeling of corpus-level latent topic representations, which helps alleviate the data sparsity that widely exhibited in social media language. Experiments on three datasets collected from English and Chinese social media platforms show that our model significantly outperforms both extraction and generation models that do not exploit latent topics. Further discussions show that our model learns meaningful topics, which interprets its superiority in social media keyphrase generation. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03889v1 |
https://arxiv.org/pdf/1906.03889v1.pdf | |
PWC | https://paperswithcode.com/paper/topic-aware-neural-keyphrase-generation-for |
Repo | https://github.com/yuewang-cuhk/TAKG |
Framework | pytorch |
A CCG-based Compositional Semantics and Inference System for Comparatives
Title | A CCG-based Compositional Semantics and Inference System for Comparatives |
Authors | Izumi Haruta, Koji Mineshima, Daisuke Bekki |
Abstract | Comparative constructions play an important role in natural language inference. However, attempts to study semantic representations and logical inferences for comparatives from the computational perspective are not well developed, due to the complexity of their syntactic structures and inference patterns. In this study, using a framework based on Combinatory Categorial Grammar (CCG), we present a compositional semantics that maps various comparative constructions in English to semantic representations and introduces an inference system that effectively handles logical inference with comparatives, including those involving numeral adjectives, antonyms, and quantification. We evaluate the performance of our system on the FraCaS test suite and show that the system can handle a variety of complex logical inferences with comparatives. |
Tasks | Natural Language Inference |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.00930v1 |
https://arxiv.org/pdf/1910.00930v1.pdf | |
PWC | https://paperswithcode.com/paper/a-ccg-based-compositional-semantics-and |
Repo | https://github.com/izumi-h/fracas-comparatives_adjectives |
Framework | none |
Long-Term Urban Vehicle Localization Using Pole Landmarks Extracted from 3-D Lidar Scans
Title | Long-Term Urban Vehicle Localization Using Pole Landmarks Extracted from 3-D Lidar Scans |
Authors | Alexander Schaefer, Daniel Büscher, Johan Vertens, Lukas Luft, Wolfram Burgard |
Abstract | Due to their ubiquity and long-term stability, pole-like objects are well suited to serve as landmarks for vehicle localization in urban environments. In this work, we present a complete mapping and long-term localization system based on pole landmarks extracted from 3-D lidar data. Our approach features a novel pole detector, a mapping module, and an online localization module, each of which are described in detail, and for which we provide an open-source implementation at www.github.com/acschaefer/polex. In extensive experiments, we demonstrate that our method improves on the state of the art with respect to long-term reliability and accuracy: First, we prove reliability by tasking the system with localizing a mobile robot over the course of 15~months in an urban area based on an initial map, confronting it with constantly varying routes, differing weather conditions, seasonal changes, and construction sites. Second, we show that the proposed approach clearly outperforms a recently published method in terms of accuracy. |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10550v1 |
https://arxiv.org/pdf/1910.10550v1.pdf | |
PWC | https://paperswithcode.com/paper/long-term-urban-vehicle-localization-using |
Repo | https://github.com/acschaefer/polex |
Framework | none |
Enhancing AMR-to-Text Generation with Dual Graph Representations
Title | Enhancing AMR-to-Text Generation with Dual Graph Representations |
Authors | Leonardo F. R. Ribeiro, Claire Gardent, Iryna Gurevych |
Abstract | Generating text from graph-based data, such as Abstract Meaning Representation (AMR), is a challenging task due to the inherent difficulty in how to properly encode the structure of a graph with labeled edges. To address this difficulty, we propose a novel graph-to-sequence model that encodes different but complementary perspectives of the structural information contained in the AMR graph. The model learns parallel top-down and bottom-up representations of nodes capturing contrasting views of the graph. We also investigate the use of different node message passing strategies, employing different state-of-the-art graph encoders to compute node representations based on incoming and outgoing perspectives. In our experiments, we demonstrate that the dual graph representation leads to improvements in AMR-to-text generation, achieving state-of-the-art results on two AMR datasets. |
Tasks | Graph-to-Sequence, Text Generation |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.00352v1 |
https://arxiv.org/pdf/1909.00352v1.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-amr-to-text-generation-with-dual |
Repo | https://github.com/UKPLab/emnlp2019-dualgraph |
Framework | pytorch |