January 31, 2020

2973 words 14 mins read

Paper Group AWR 418

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. FaceForensics++: Learning to Detect Manipulated Facial Images. Visual Relationship Detection with Relative Location Mining. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Factorised Neural Relational Inference f …

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets


Title	Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets
Authors	Yifan Peng, Shankai Yan, Zhiyong Lu
Abstract
Tasks	Transfer Learning
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05474v2
PDF	https://arxiv.org/pdf/1906.05474v2.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-in-biomedical-natural
Repo	https://github.com/ncbi-nlp/BLUE_Benchmark
Framework	none

FaceForensics++: Learning to Detect Manipulated Facial Images


Title	FaceForensics++: Learning to Detect Manipulated Facial Images
Authors	Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Nießner
Abstract	The rapid progress in synthetic image generation and manipulation has now come to a point where it raises significant concerns for the implications towards society. At best, this leads to a loss of trust in digital content, but could potentially cause further harm by spreading false information or fake news. This paper examines the realism of state-of-the-art image manipulations, and how difficult it is to detect them, either automatically or by humans. To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. In particular, the benchmark is based on DeepFakes, Face2Face, FaceSwap and NeuralTextures as prominent representatives for facial manipulations at random compression level and size. The benchmark is publicly available and contains a hidden test set as well as a database of over 1.8 million manipulated images. This dataset is over an order of magnitude larger than comparable, publicly available, forgery datasets. Based on this data, we performed a thorough analysis of data-driven forgery detectors. We show that the use of additional domainspecific knowledge improves forgery detection to unprecedented accuracy, even in the presence of strong compression, and clearly outperforms human observers.
Tasks	Face Swapping, Fake Image Detection, Image Generation
Published	2019-01-25
URL	https://arxiv.org/abs/1901.08971v3
PDF	https://arxiv.org/pdf/1901.08971v3.pdf
PWC	https://paperswithcode.com/paper/faceforensics-learning-to-detect-manipulated
Repo	https://github.com/pothabattulasantosh/Detection-of-face-Manipulated-videos
Framework	none

Visual Relationship Detection with Relative Location Mining


Title	Visual Relationship Detection with Relative Location Mining
Authors	Hao Zhou, Chongyang Zhang, Chuanping Hu
Abstract	Visual relationship detection, as a challenging task used to find and distinguish the interactions between object pairs in one image, has received much attention recently. In this work, we propose a novel visual relationship detection framework by deeply mining and utilizing relative location of object-pair in every stage of the procedure. In both the stages, relative location information of each object-pair is abstracted and encoded as auxiliary feature to improve the distinguishing capability of object-pairs proposing and predicate recognition, respectively; Moreover, one Gated Graph Neural Network(GGNN) is introduced to mine and measure the relevance of predicates using relative location. With the location-based GGNN, those non-exclusive predicates with similar spatial position can be clustered firstly and then be smoothed with close classification scores, thus the accuracy of top $n$ recall can be increased further. Experiments on two widely used datasets VRD and VG show that, with the deeply mining and exploiting of relative location information, our proposed model significantly outperforms the current state-of-the-art.
Tasks
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00713v1
PDF	https://arxiv.org/pdf/1911.00713v1.pdf
PWC	https://paperswithcode.com/paper/visual-relationship-detection-with-relative
Repo	https://github.com/zhouhaocv/RLM-Net
Framework	pytorch

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks


Title	Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Authors	Nils Reimers, Iryna Gurevych
Abstract	BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.
Tasks	Semantic Similarity, Semantic Textual Similarity, Sentence Embeddings, Transfer Learning
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10084v1
PDF	https://arxiv.org/pdf/1908.10084v1.pdf
PWC	https://paperswithcode.com/paper/sentence-bert-sentence-embeddings-using
Repo	https://github.com/saulhazelius/transformer-clustering
Framework	none

Factorised Neural Relational Inference for Multi-Interaction Systems


Title	Factorised Neural Relational Inference for Multi-Interaction Systems
Authors	Ezra Webb, Ben Day, Helena Andres-Terre, Pietro Lió
Abstract	Many complex natural and cultural phenomena are well modelled by systems of simple interactions between particles. A number of architectures have been developed to articulate this kind of structure, both implicitly and explicitly. We consider an unsupervised explicit model, the NRI model, and make a series of representational adaptations and physically motivated changes. Most notably we factorise the inferred latent interaction graph into a multiplex graph, allowing each layer to encode for a different interaction-type. This fNRI model is smaller in size and significantly outperforms the original in both edge and trajectory prediction, establishing a new state-of-the-art. We also present a simplified variant of our model, which demonstrates the NRI’s formulation as a variational auto-encoder is not necessary for good performance, and make an adaptation to the NRI’s training routine, significantly improving its ability to model complex physical dynamical systems.
Tasks	Trajectory Prediction
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08721v1
PDF	https://arxiv.org/pdf/1905.08721v1.pdf
PWC	https://paperswithcode.com/paper/factorised-neural-relational-inference-for
Repo	https://github.com/ekwebb/fNRI
Framework	pytorch

SWALP : Stochastic Weight Averaging in Low-Precision Training


Title	SWALP : Stochastic Weight Averaging in Low-Precision Training
Authors	Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa
Abstract	Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.
Tasks
Published	2019-04-26
URL	https://arxiv.org/abs/1904.11943v2
PDF	https://arxiv.org/pdf/1904.11943v2.pdf
PWC	https://paperswithcode.com/paper/swalp-stochastic-weight-averaging-in-low
Repo	https://github.com/RICE-EIC/Early-Bird-Tickets
Framework	pytorch

Adaptive Fusion for RGB-D Salient Object Detection


Title	Adaptive Fusion for RGB-D Salient Object Detection
Authors	Ningning Wang, Xiaojin Gong
Abstract	RGB-D salient object detection aims to identify the most visually distinctive objects in a pair of color and depth images. Based upon an observation that most of the salient objects may stand out at least in one modality, this paper proposes an adaptive fusion scheme to fuse saliency predictions generated from two modalities. Specifically, we design a two-streamed convolutional neural network (CNN), each of which extracts features and predicts a saliency map from either RGB or depth modality. Then, a saliency fusion module learns a switch map that is used to adaptively fuse the predicted saliency maps. A loss function composed of saliency supervision, switch map supervision, and edge-preserving constraints is designed to make full supervision, and the entire network is trained in an end-to-end manner. Benefited from the adaptive fusion strategy and the edge-preserving constraint, our approach outperforms state-of-the-art methods on three publicly available datasets.
Tasks	Object Detection, Salient Object Detection
Published	2019-01-05
URL	http://arxiv.org/abs/1901.01369v2
PDF	http://arxiv.org/pdf/1901.01369v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-fusion-for-rgb-d-salient-object
Repo	https://github.com/Lucia-Ningning/Adaptive_Fusion_RGBD_Saliency_Detection
Framework	tf

Towards High-Resolution Salient Object Detection


Title	Towards High-Resolution Salient Object Detection
Authors	Yi Zeng, Pingping Zhang, Jianming Zhang, Zhe Lin, Huchuan Lu
Abstract	Deep neural network based methods have made a significant breakthrough in salient object detection. However, they are typically limited to input images with low resolutions ($400\times400$ pixels or less). Little effort has been made to train deep neural networks to directly handle salient object detection in very high-resolution images. This paper pushes forward high-resolution saliency detection, and contributes a new dataset, named High-Resolution Salient Object Detection (HRSOD). To our best knowledge, HRSOD is the first high-resolution saliency detection dataset to date. As another contribution, we also propose a novel approach, which incorporates both global semantic information and local high-resolution details, to address this challenging task. More specifically, our approach consists of a Global Semantic Network (GSN), a Local Refinement Network (LRN) and a Global-Local Fusion Network (GLFN). GSN extracts the global semantic information based on down-sampled entire image. Guided by the results of GSN, LRN focuses on some local regions and progressively produces high-resolution predictions. GLFN is further proposed to enforce spatial consistency and boost performance. Experiments illustrate that our method outperforms existing state-of-the-art methods on high-resolution saliency datasets by a large margin, and achieves comparable or even better performance than them on widely-used saliency benchmarks. The HRSOD dataset is available at https://github.com/yi94code/HRSOD.
Tasks	Object Detection, Saliency Detection, Salient Object Detection
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07274v1
PDF	https://arxiv.org/pdf/1908.07274v1.pdf
PWC	https://paperswithcode.com/paper/towards-high-resolution-salient-object
Repo	https://github.com/yi94code/HRSOD
Framework	none

Grouped Spatial-Temporal Aggregation for Efficient Action Recognition


Title	Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
Authors	Chenxu Luo, Alan Yuille
Abstract	Temporal reasoning is an important aspect of video analysis. 3D CNN shows good performance by exploring spatial-temporal features jointly in an unconstrained way, but it also increases the computational cost a lot. Previous works try to reduce the complexity by decoupling the spatial and temporal filters. In this paper, we propose a novel decomposition method that decomposes the feature channels into spatial and temporal groups in parallel. This decomposition can make two groups focus on static and dynamic cues separately. We call this grouped spatial-temporal aggregation (GST). This decomposition is more parameter-efficient and enables us to quantitatively analyze the contributions of spatial and temporal features in different layers. We verify our model on several action recognition tasks that require temporal reasoning and show its effectiveness.
Tasks
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13130v1
PDF	https://arxiv.org/pdf/1909.13130v1.pdf
PWC	https://paperswithcode.com/paper/grouped-spatial-temporal-aggregation-for
Repo	https://github.com/chenxuluo/GST-video
Framework	pytorch

End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization


Title	End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization
Authors	Bo Chen, Alvaro Parra, Jiewei Cao, Nan Li, Tat-Jun Chin
Abstract	Deep networks excel in learning patterns from large amounts of data. On the other hand, many geometric vision tasks are specified as optimization problems. To seamlessly combine deep learning and geometric vision, it is vital to perform learning and geometric optimization end-to-end. Towards this aim, we present BPnP, a novel network module that backpropagates gradients through a Perspective-n-Points (PnP) solver to guide parameter updates of a neural network. Based on implicit differentiation, we show that the gradients of a “self-contained” PnP solver can be derived accurately and efficiently, as if the optimizer block were a differentiable function. We validate BPnP by incorporating it in a deep model that can learn camera intrinsics, camera extrinsics (poses) and 3D structure from training datasets. Further, we develop an end-to-end trainable pipeline for object pose estimation, which achieves greater accuracy by combining feature-based heatmap losses with 2D-3D reprojection errors. Since our approach can be extended to other optimization problems, our work helps to pave the way to perform learnable geometric vision in a principled manner. Our PyTorch implementation of BPnP is available on http://github.com/BoChenYS/BPnP.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06043v3
PDF	https://arxiv.org/pdf/1909.06043v3.pdf
PWC	https://paperswithcode.com/paper/bpnp-further-empowering-end-to-end-learning
Repo	https://github.com/BoChenYS/BPnP
Framework	pytorch

Semi-Supervised Video Salient Object Detection Using Pseudo-Labels


Title	Semi-Supervised Video Salient Object Detection Using Pseudo-Labels
Authors	Pengxiang Yan, Guanbin Li, Yuan Xie, Zhen Li, Chuan Wang, Tianshui Chen, Liang Lin
Abstract	Deep learning-based video salient object detection has recently achieved great success with its performance significantly outperforming any other unsupervised methods. However, existing data-driven approaches heavily rely on a large quantity of pixel-wise annotated video frames to deliver such promising results. In this paper, we address the semi-supervised video salient object detection task using pseudo-labels. Specifically, we present an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module. Based on the same refinement network and motion information in terms of optical flow, we further propose a novel method for generating pixel-level pseudo-labels from sparsely annotated frames. By utilizing the generated pseudo-labels together with a part of manual annotations, our video saliency detector learns spatial and temporal cues for both contrast inference and coherence enhancement, thus producing accurate saliency maps. Experimental results demonstrate that our proposed semi-supervised method even greatly outperforms all the state-of-the-art fully supervised methods across three public benchmarks of VOS, DAVIS, and FBMS.
Tasks	Salient Object Detection, Unsupervised Video Object Segmentation, Video Salient Object Detection
Published	2019-08-12
URL	https://arxiv.org/abs/1908.04051v2
PDF	https://arxiv.org/pdf/1908.04051v2.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-video-salient-object
Repo	https://github.com/Kinpzz/RCRNet-Pytorch
Framework	pytorch

Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems


Title	Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems
Authors	Victor Makarenkov, Lior Rokach, Bracha Shapira
Abstract	Scientific writing is difficult. It is even harder for those for whom English is a second language (ESL learners). Scholars around the world spend a significant amount of time and resources proofreading their work before submitting it for review or publication. In this paper we present a novel machine learning based application for proper word choice task. Proper word choice is a generalization the lexical substitution (LS) and grammatical error correction (GEC) tasks. We demonstrate and evaluate the usefulness of applying bidirectional Long Short Term Memory (LSTM) tagger, for this task. While state-of-the-art grammatical error correction uses error-specific classifiers and machine translation methods, we demonstrate an unsupervised method that is based solely on a high quality text corpus and does not require manually annotated data. We use a bidirectional Recurrent Neural Network (RNN) with LSTM for learning the proper word choice based on a word’s sentential context. We demonstrate and evaluate our application on both a domain-specific (scientific), writing task and a general-purpose writing task. We show that our domain-specific and general-purpose models outperform state-of-the-art general context learning. As an additional contribution of this research, we also share our code, pre-trained models, and a new ESL learner test set with the research community.
Tasks	Grammatical Error Correction, Machine Translation
Published	2019-01-08
URL	http://arxiv.org/abs/1901.02490v1
PDF	http://arxiv.org/pdf/1901.02490v1.pdf
PWC	https://paperswithcode.com/paper/choosing-the-right-word-using-bidirectional
Repo	https://github.com/vicmak/Exploiting-BiLSTM-for-Proper-Word-Choice
Framework	none

Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge


Title	Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge
Authors	Bo Liu
Abstract	We present our 7th place solution to the Gendered Pronoun Resolution challenge, which uses BERT without fine-tuning and a novel augmentation strategy designed for contextual embedding token-level tasks. Our method anonymizes the referent by replacing candidate names with a set of common placeholder names. Besides the usual benefits of effectively increasing training data size, this approach diversifies idiosyncratic information embedded in names. Using same set of common first names can also help the model recognize names better, shorten token length, and remove gender and regional biases associated with names. The system scored 0.1947 log loss in stage 2, where the augmentation contributed to an improvements of 0.04. Post-competition analysis shows that, when using different embedding layers, the system scores 0.1799 which would be third place.
Tasks
Published	2019-05-06
URL	https://arxiv.org/abs/1905.01780v2
PDF	https://arxiv.org/pdf/1905.01780v2.pdf
PWC	https://paperswithcode.com/paper/anonymized-bert-an-augmentation-approach-to
Repo	https://github.com/boliu61/gendered-pronoun-resolution
Framework	none

FLEN: Leveraging Field for Scalable CTR Prediction


Title	FLEN: Leveraging Field for Scalable CTR Prediction
Authors	Wenqiang Chen, Lizhang Zhan, Yuanlong Ci, Chen Lin
Abstract	Click-Through Rate (CTR) prediction has been an indispensable component for many industrial applications, such as recommendation systems and online advertising. CTR prediction systems are usually based on multi-field categorical features, i.e., every feature is categorical and belongs to one and only one field. Modeling feature conjunctions is crucial for CTR prediction accuracy. However, it requires a massive number of parameters to explicitly model all feature conjunctions, which is not scalable for real-world production systems. In this paper, we describe a novel Field-Leveraged Embedding Network (FLEN) which has been deployed in the commercial recommender system in Meitu and serves the main traffic. FLEN devises a field-wise bi-interaction pooling technique. By suitably exploiting field information, the field-wise bi-interaction pooling captures both inter-field and intra-field feature conjunctions with a small number of model parameters and an acceptable time complexity for industrial applications. We show that a variety of state-of-the-art CTR models can be expressed under this technique. Furthermore, we develop Dicefactor: a dropout technique to prevent independent latent features from co-adapting. Extensive experiments, including offline evaluations and online A/B testing on real production systems, demonstrate the effectiveness and efficiency of FLEN against the state-of-the-arts. Notably, FLEN has obtained 5.19% improvement on CTR with 1/6 of memory usage and computation time, compared to last version (i.e. NFM).
Tasks	Click-Through Rate Prediction, Recommendation Systems
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04690v3
PDF	https://arxiv.org/pdf/1911.04690v3.pdf
PWC	https://paperswithcode.com/paper/flen-leveraging-field-for-scalable-ctr
Repo	https://github.com/aimetrics/jarvis
Framework	tf

Learning Choice Functions: Concepts and Architectures


Title	Learning Choice Functions: Concepts and Architectures
Authors	Karlson Pfannschmidt, Pritha Gupta, Eyke Hüllermeier
Abstract	We study the problem of learning choice functions, which play an important role in various domains of application, most notably in the field of economics. Formally, a choice function is a mapping from sets to sets: Given a set of choice alternatives as input, a choice function identifies a subset of most preferred elements. Learning choice functions from suitable training data comes with a number of challenges. For example, the sets provided as input and the subsets produced as output can be of any size. Moreover, since the order in which alternatives are presented is irrelevant, a choice function should be symmetric. Perhaps most importantly, choice functions are naturally context-dependent, in the sense that the preference in favor of an alternative may depend on what other options are available. We formalize the problem of learning choice functions and present two general approaches based on two representations of context-dependent utility functions. Both approaches are instantiated by means of appropriate neural network architectures, and their performance is demonstrated on suitable benchmark tasks.
Tasks
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10860v2
PDF	https://arxiv.org/pdf/1901.10860v2.pdf
PWC	https://paperswithcode.com/paper/learning-choice-functions
Repo	https://github.com/kiudee/cs-ranking
Framework	tf