April 3, 2020

3398 words 16 mins read

Paper Group AWR 25

ACEnet: Anatomical Context-Encoding Network for Neuroanatomy Segmentation. Stance Detection Benchmark: How Robust Is Your Stance Detection?. Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks. Plato Dialogue System: A Flexible Conversational AI Research Platform. Two-Stream Aural-Visual Affect Analysis in the Wild. Cli …

ACEnet: Anatomical Context-Encoding Network for Neuroanatomy Segmentation


Title	ACEnet: Anatomical Context-Encoding Network for Neuroanatomy Segmentation
Authors	Yuemeng Li, Hongming Li, Yong Fan
Abstract	Segmentation of brain structures from magnetic resonance (MR) scans plays an important role in the quantification of brain morphology. Since 3D deep learning models suffer from high computational cost, 2D deep learning methods are favored for their computational efficiency. However, existing 2D deep learning methods are not equipped to effectively capture 3D spatial contextual information that is needed to achieve accurate brain structure segmentation. In order to overcome this limitation, we develop an Anatomical Context-Encoding Network (ACEnet) to incorporate 3D spatial and anatomical contexts in 2D convolutional neural networks (CNNs) for efficient and accurate segmentation of brain structures from MR scans, consisting of 1) an anatomical context encoding module to incorporate anatomical information in 2D CNNs, 2) a spatial context encoding module to integrate 3D image information in 2D CNNs, and 3) a skull stripping module to guide 2D CNNs to attend to the brain. Extensive experiments on three benchmark datasets have demonstrated that our method outperforms state-of-the-art alternative methods for brain structure segmentation in terms of both computational efficiency and segmentation accuracy.
Tasks	Skull Stripping
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05773v1
PDF	https://arxiv.org/pdf/2002.05773v1.pdf
PWC	https://paperswithcode.com/paper/acenet-anatomical-context-encoding-network
Repo	https://github.com/ymli39/ACEnet-for-Neuroanatomy-Segmentation
Framework	pytorch

Stance Detection Benchmark: How Robust Is Your Stance Detection?


Title	Stance Detection Benchmark: How Robust Is Your Stance Detection?
Authors	Benjamin Schiller, Johannes Daxenberger, Iryna Gurevych
Abstract	Stance Detection (StD) aims to detect an author’s stance towards a certain topic or claim and has become a key component in applications like fake news detection, claim validation, and argument search. However, while stance is easily detected by humans, machine learning models are clearly falling short of this task. Given the major differences in dataset sizes and framing of StD (e.g. number of classes and inputs), we introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning (MDL) setting, as well as from related tasks via transfer learning. Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets. Yet, the models still perform well below human capabilities and even simple adversarial attacks severely hurt the performance of MDL models. Deeper investigation into this phenomenon suggests the existence of biases inherited from multiple datasets by design. Our analysis emphasizes the need of focus on robustness and de-biasing strategies in multi-task learning approaches. The benchmark dataset and code is made available.
Tasks	Fake News Detection, Multi-Task Learning, Stance Detection, Transfer Learning
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01565v1
PDF	https://arxiv.org/pdf/2001.01565v1.pdf
PWC	https://paperswithcode.com/paper/stance-detection-benchmark-how-robust-is-your
Repo	https://github.com/UKPLab/mdl-stance-robustness
Framework	pytorch

Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks


Title	Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks
Authors	Vaishali Pal, Fabien Guillot, Jean-Michel Renders, Laurent Besacier
Abstract	Spoken dialogue systems typically use a list of top-N ASR hypotheses for inferring the semantic meaning and tracking the state of the dialogue. However ASR graphs, such as confusion networks (confnets), provide a compact representation of a richer hypothesis space than a top-N ASR list. In this paper, we study the benefits of using confusion networks with a state-of-the-art neural dialogue state tracker (DST). We encode the 2-dimensional confnet into a 1-dimensional sequence of embeddings using an attentional confusion network encoder which can be used with any DST system. Our confnet encoder is plugged into the state-of-the-art ‘Global-locally Self-Attentive Dialogue State Tacker’ (GLAD) model for DST and obtains significant improvements in both accuracy and inference time compared to using top-N ASR hypotheses.
Tasks	Dialogue State Tracking, Spoken Dialogue Systems
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00768v1
PDF	https://arxiv.org/pdf/2002.00768v1.pdf
PWC	https://paperswithcode.com/paper/modeling-asr-ambiguity-for-dialogue-state
Repo	https://github.com/kolk/MODELING-ASR-AMBIGUITY-FOR-NEURAL-DIALOGUE-STATE-TRACKING-USING-WORD-CONFUSION-NETWORKS
Framework	pytorch

Plato Dialogue System: A Flexible Conversational AI Research Platform


Title	Plato Dialogue System: A Flexible Conversational AI Research Platform
Authors	Alexandros Papangelis, Mahdi Namazifar, Chandra Khatri, Yi-Chia Wang, Piero Molino, Gokhan Tur
Abstract	As the field of Spoken Dialogue Systems and Conversational AI grows, so does the need for tools and environments that abstract away implementation details in order to expedite the development process, lower the barrier of entry to the field, and offer a common test-bed for new ideas. In this paper, we present Plato, a flexible Conversational AI platform written in Python that supports any kind of conversational agent architecture, from standard architectures to architectures with jointly-trained components, single- or multi-party interactions, and offline or online training of any conversational agent component. Plato has been designed to be easy to understand and debug and is agnostic to the underlying learning frameworks that train each component.
Tasks	Spoken Dialogue Systems
Published	2020-01-17
URL	https://arxiv.org/abs/2001.06463v1
PDF	https://arxiv.org/pdf/2001.06463v1.pdf
PWC	https://paperswithcode.com/paper/plato-dialogue-system-a-flexible
Repo	https://github.com/uber-research/plato-research-dialogue-system
Framework	tf

Two-Stream Aural-Visual Affect Analysis in the Wild


Title	Two-Stream Aural-Visual Affect Analysis in the Wild
Authors	Felix Kuhnke, Lars Rumberg, Jörn Ostermann
Abstract	Human affect recognition is an essential part of natural human-computer interaction. However, current methods are still in their infancy, especially for in-the-wild data. In this work, we introduce our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2020 competition. We propose a two-stream aural-visual analysis model to recognize affective behavior from videos. Audio and image streams are first processed separately and fed into a convolutional neural network. Instead of applying recurrent architectures for temporal analysis we only use temporal convolutions. Furthermore, the model is given access to additional features extracted during face-alignment. At training time, we exploit correlations between different emotion representations to improve performance. Our model achieves promising results on the challenging Aff-Wild2 database.
Tasks	Face Alignment
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03399v2
PDF	https://arxiv.org/pdf/2002.03399v2.pdf
PWC	https://paperswithcode.com/paper/two-stream-aural-visual-affect-analysis-in
Repo	https://github.com/kuhnkeF/ABAW2020TNT
Framework	pytorch

Clinical Text Summarization with Syntax-Based Negation and Semantic Concept Identification


Title	Clinical Text Summarization with Syntax-Based Negation and Semantic Concept Identification
Authors	Wei-Hung Weng, Yu-An Chung, Schrasing Tong
Abstract	In the era of clinical information explosion, a good strategy for clinical text summarization is helpful to improve the clinical workflow. The ideal summarization strategy can preserve important information in the informative but less organized, ill-structured clinical narrative texts. Instead of using pure statistical learning approaches, which are difficult to interpret and explain, we utilized knowledge of computational linguistics with human experts-curated biomedical knowledge base to achieve the interpretable and meaningful clinical text summarization. Our research objective is to use the biomedical ontology with semantic information, and take the advantage from the language hierarchical structure, the constituency tree, in order to identify the correct clinical concepts and the corresponding negation information, which is critical for summarizing clinical concepts from narrative text. We achieved the clinically acceptable performance for both negation detection and concept identification, and the clinical concepts with common negated patterns can be identified and negated by the proposed method.
Tasks	Negation Detection, Text Summarization
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00353v1
PDF	https://arxiv.org/pdf/2003.00353v1.pdf
PWC	https://paperswithcode.com/paper/clinical-text-summarization-with-syntax-based
Repo	https://github.com/ckbjimmy/clneg
Framework	none

Graph Convolutional Topic Model for Data Streams


Title	Graph Convolutional Topic Model for Data Streams
Authors	Ngo Van Linh, Tran Xuan Bach, Khoat Than
Abstract	Learning hidden topics in data streams has been paid a great deal of attention by researchers with a lot of proposed methods, but exploiting prior knowledge in general and a knowledge graph in particular has not been taken into adequate consideration in these methods. Prior knowledge that is derived from human knowledge (e.g. Wordnet) or a pre-trained model (e.g.Word2vec) is very valuable and useful to help topic models work better, especially on short texts. However, previous work often ignores this resource, or it can only utilize prior knowledge of a vector form in a simple way. In this paper, we propose a novel graph convolutional topic model (GCTM) which integrates graph convolutional networks (GCN) into a topic model and a learning method which learns the networks and the topic model simultaneously for data streams. In each minibatch, our method not only can exploit an external knowledge graph but also can balance between the external and old knowledge to perform well on new data. We conduct extensive experiments to evaluate our method with both human graph knowledge(Wordnet) and a graph built from pre-trained word embeddings (Word2vec). The experimental results show that our method achieves significantly better performances than the state-of-the-art baselines in terms of probabilistic predictive measure and topic coherence. In particular, our method can work well when dealing with short texts as well as concept drift. The implementation of GCTM is available at https://github.com/bachtranxuan/GCTM.git.
Tasks	Topic Models, Word Embeddings
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06112v2
PDF	https://arxiv.org/pdf/2003.06112v2.pdf
PWC	https://paperswithcode.com/paper/graph-convolutional-topic-model-for-data
Repo	https://github.com/bachtranxuan/GCTM
Framework	pytorch

Med7: a transferable clinical natural language processing model for electronic health records


Title	Med7: a transferable clinical natural language processing model for electronic health records
Authors	Andrey Kormilitzin, Nemanja Vaci, Qiang Liu, Alejo Nevado-Holgado
Abstract	The field of clinical natural language processing has been advanced significantly since the introduction of deep learning models. The self-supervised representation learning and the transfer learning paradigm became the methods of choice in many natural language processing application, in particular in the settings with the dearth of high quality manually annotated data. Electronic health record systems are ubiquitous and the majority of patients’ data are now being collected electronically and in particular in the form of free text. Identification of medical concepts and information extraction is a challenging task, yet important ingredient for parsing unstructured data into structured and tabulated format for downstream analytical tasks. In this work we introduced a named-entity recognition model for clinical natural language processing. The model is trained to recognise seven categories: drug names, route, frequency, dosage, strength, form, duration. The model was first self-supervisedly pre-trained by predicting the next word, using a collection of 2 million free-text patients’ records from MIMIC-III corpora and then fine-tuned on the named-entity recognition task. The model achieved a lenient (strict) micro-averaged F1 score of 0.957 (0.893) across all seven categories. Additionally, we evaluated the transferability of the developed model using the data from the Intensive Care Unit in the US to secondary care mental health records (CRIS) in the UK. A direct application of the trained NER model to CRIS data resulted in reduced performance of F1=0.762, however after fine-tuning on a small sample from CRIS, the model achieved a reasonable performance of F1=0.944. This demonstrated that despite a close similarity between the data sets and the NER tasks, it is essential to fine-tune on the target domain data in order to achieve more accurate results.
Tasks	Named Entity Recognition, Representation Learning, Transfer Learning
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01271v1
PDF	https://arxiv.org/pdf/2003.01271v1.pdf
PWC	https://paperswithcode.com/paper/med7-a-transferable-clinical-natural-language
Repo	https://github.com/kormilitzin/med7
Framework	none

Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing


Title	Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing
Authors	Zezheng Wang, Zitong Yu, Chenxu Zhao, Xiangyu Zhu, Yunxiao Qin, Qiusheng Zhou, Feng Zhou, Zhen Lei
Abstract	Face anti-spoofing is critical to the security of face recognition systems. Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing. Despite the great success, most previous works still formulate the problem as a single-frame multi-task one by simply augmenting the loss with depth, while neglecting the detailed fine-grained information and the interplay between facial depths and moving patterns. In contrast, we design a new approach to detect presentation attacks from multiple frames based on two insights: 1) detailed discriminative clues (e.g., spatial gradient magnitude) between living and spoofing face may be discarded through stacked vanilla convolutions, and 2) the dynamics of 3D moving faces provide important clues in detecting the spoofing faces. The proposed method is able to capture discriminative details via Residual Spatial Gradient Block (RSGB) and encode spatio-temporal information from Spatio-Temporal Propagation Module (STPM) efficiently. Moreover, a novel Contrastive Depth Loss is presented for more accurate depth supervision. To assess the efficacy of our method, we also collect a Double-modal Anti-spoofing Dataset (DMAD) which provides actual depth for each sample. The experiments demonstrate that the proposed approach achieves state-of-the-art results on five benchmark datasets including OULU-NPU, SiW, CASIA-MFSD, Replay-Attack, and the new DMAD. Codes will be available at https://github.com/clks-wzz/FAS-SGTD.
Tasks	Face Anti-Spoofing, Face Recognition
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08061v1
PDF	https://arxiv.org/pdf/2003.08061v1.pdf
PWC	https://paperswithcode.com/paper/deep-spatial-gradient-and-temporal-depth
Repo	https://github.com/clks-wzz/FAS-SGTD
Framework	tf

Unsupervised Enhancement of Soft-biometric Privacy with Negative Face Recognition


Title	Unsupervised Enhancement of Soft-biometric Privacy with Negative Face Recognition
Authors	Philipp Terhörst, Marco Huber, Naser Damer, Florian Kirchbuchner, Arjan Kuijper
Abstract	Current research on soft-biometrics showed that privacy-sensitive information can be deduced from biometric templates of an individual. Since for many applications, these templates are expected to be used for recognition purposes only, this raises major privacy issues. Previous works focused on supervised privacy-enhancing solutions that require privacy-sensitive information about individuals and limit their application to the suppression of single and pre-defined attributes. Consequently, they do not take into account attributes that are not considered in the training. In this work, we present Negative Face Recognition (NFR), a novel face recognition approach that enhances the soft-biometric privacy on the template-level by representing face templates in a complementary (negative) domain. While ordinary templates characterize facial properties of an individual, negative templates describe facial properties that does not exist for this individual. This suppresses privacy-sensitive information from stored templates. Experiments are conducted on two publicly available datasets captured under controlled and uncontrolled scenarios on three privacy-sensitive attributes. The experiments demonstrate that our proposed approach reaches higher suppression rates than previous work, while maintaining higher recognition performances as well. Unlike previous works, our approach does not require privacy-sensitive labels and offers a more comprehensive privacy-protection not limited to pre-defined attributes.
Tasks	Face Recognition
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09181v1
PDF	https://arxiv.org/pdf/2002.09181v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-enhancement-of-soft-biometric
Repo	https://github.com/pterhoer/PrivacyPreservingFaceRecognition
Framework	none

Learning to Continually Learn


Title	Learning to Continually Learn
Authors	Shawn Beaulieu, Lapo Frati, Thomas Miconi, Joel Lehman, Kenneth O. Stanley, Jeff Clune, Nick Cheney
Abstract	Continual lifelong learning requires an agent or model to learn many sequentially ordered tasks, building on previous knowledge without catastrophically forgetting it. Much work has gone towards preventing the default tendency of machine learning models to catastrophically forget, yet virtually all such work involves manually-designed solutions to the problem. We instead advocate meta-learning a solution to catastrophic forgetting, allowing AI to learn to continually learn. Inspired by neuromodulatory processes in the brain, we propose A Neuromodulated Meta-Learning Algorithm (ANML). It differentiates through a sequential learning process to meta-learn an activation-gating function that enables context-dependent selective activation within a deep neural network. Specifically, a neuromodulatory (NM) neural network gates the forward pass of another (otherwise normal) neural network called the prediction learning network (PLN). The NM network also thus indirectly controls selective plasticity (i.e. the backward pass of) the PLN. ANML enables continual learning without catastrophic forgetting at scale: it produces state-of-the-art continual learning performance, sequentially learning as many as 600 classes (over 9,000 SGD updates).
Tasks	Continual Learning, Meta-Learning
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09571v2
PDF	https://arxiv.org/pdf/2002.09571v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-continually-learn
Repo	https://github.com/uvm-neurobotics-lab/ANML
Framework	pytorch

Weakly supervised discriminative feature learning with state information for person identification


Title	Weakly supervised discriminative feature learning with state information for person identification
Authors	Hong-Xing Yu, Wei-Shi Zheng
Abstract	Unsupervised learning of identity-discriminative visual feature is appealing in real-world tasks where manual labelling is costly. However, the images of an identity can be visually discrepant when images are taken under different states, e.g. different camera views and poses. This visual discrepancy leads to great difficulty in unsupervised discriminative learning. Fortunately, in real-world tasks we could often know the states without human annotation, e.g. we can easily have the camera view labels in person re-identification and facial pose labels in face recognition. In this work we propose utilizing the state information as weak supervision to address the visual discrepancy caused by different states. We formulate a simple pseudo label model and utilize the state information in an attempt to refine the assigned pseudo labels by the weakly supervised decision boundary rectification and weakly supervised feature drift regularization. We evaluate our model on unsupervised person re-identification and pose-invariant face recognition. Despite the simplicity of our method, it could outperform the state-of-the-art results on Duke-reID, MultiPIE and CFP datasets with a standard ResNet-50 backbone. We also find our model could perform comparably with the standard supervised fine-tuning results on the three datasets. Code is available at https://github.com/KovenYu/state-information
Tasks	Face Recognition, Person Identification, Person Re-Identification, Robust Face Recognition, Unsupervised Person Re-Identification
Published	2020-02-27
URL	https://arxiv.org/abs/2002.11939v1
PDF	https://arxiv.org/pdf/2002.11939v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-discriminative-feature
Repo	https://github.com/KovenYu/state-information
Framework	pytorch

Diversity-Achieving Slow-DropBlock Network for Person Re-Identification


Title	Diversity-Achieving Slow-DropBlock Network for Person Re-Identification
Authors	Xiaofu Wu, Ben Xie, Shiliang Zhao, Suofei Zhang, Yong Xiao, Ming Li
Abstract	A big challenge of person re-identification (Re-ID) using a multi-branch network architecture is to learn diverse features from the ID-labeled dataset. The 2-branch Batch DropBlock (BDB) network was recently proposed for achieving diversity between the global branch and the feature-dropping branch. In this paper, we propose to move the dropping operation from the intermediate feature layer towards the input (image dropping). Since it may drop a large portion of input images, this makes the training hard to converge. Hence, we propose a novel double-batch-split co-training approach for remedying this problem. In particular, we show that the feature diversity can be well achieved with the use of multiple dropping branches by setting individual dropping ratio for each branch. Empirical evidence demonstrates that the proposed method performs superior to BDB on popular person Re-ID datasets, including Market-1501, DukeMTMC-reID and CUHK03 and the use of more dropping branches can further boost the performance.
Tasks	Person Re-Identification
Published	2020-02-09
URL	https://arxiv.org/abs/2002.04414v1
PDF	https://arxiv.org/pdf/2002.04414v1.pdf
PWC	https://paperswithcode.com/paper/diversity-achieving-slow-dropblock-network
Repo	https://github.com/AI-NERC-NUPT/SDB
Framework	pytorch

Deep Learning for Person Re-identification: A Survey and Outlook


Title	Deep Learning for Person Re-identification: A Survey and Outlook
Authors	Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, Steven C. H. Hoi
Abstract	Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on both single- and cross-modality Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.
Tasks	Metric Learning, Person Re-Identification, Representation Learning
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04193v1
PDF	https://arxiv.org/pdf/2001.04193v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-person-re-identification-a
Repo	https://github.com/mangye16/ReID-Survey
Framework	pytorch

CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation


Title	CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Authors	Fei Huang, Dazhen Wan, Zhihong Shao, Pei Ke, Jian Guan, Yilin Niu, Xiaoyan Zhu, Minlie Huang
Abstract	In text generation evaluation, many practical issues, such as inconsistent experimental settings and metric implementations, are often ignored but lead to unfair evaluation and untenable conclusions. We present CoTK, an open-source toolkit aiming to support fast development and fair evaluation of text generation. In model development, CoTK helps handle the cumbersome issues, such as data processing, metric implementation, and reproduction. It standardizes the development steps and reduces human errors which may lead to inconsistent experimental settings. In model evaluation, CoTK provides implementation for many commonly used metrics and benchmark models across different experimental settings. As a unique feature, CoTK can signify when and which metric cannot be fairly compared. We demonstrate that it is convenient to use CoTK for model development and evaluation, particularly across different experimental settings.
Tasks	Text Generation
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00583v1
PDF	https://arxiv.org/pdf/2002.00583v1.pdf
PWC	https://paperswithcode.com/paper/cotk-an-open-source-toolkit-for-fast
Repo	https://github.com/thu-coai/cotk
Framework	pytorch