February 1, 2020

3260 words 16 mins read

Paper Group AWR 356

Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs. The Sensitivity of Counterfactual Fairness to Unmeasured Confounding. Alternating Roles Dialog Model with Large-scale Pre-trained Language Models. FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. Representation of Constituents in Neural Language Mode …

Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs


Title	Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs
Authors	Alex Warstadt, Yu Cao, Ioana Grosu, Wei Peng, Hagen Blix, Yining Nie, Anna Alsop, Shikha Bordia, Haokun Liu, Alicia Parrish, Sheng-Fu Wang, Jason Phang, Anhad Mohananey, Phu Mon Htut, Paloma Jeretič, Samuel R. Bowman
Abstract	Though state-of-the-art sentence representation models can perform tasks requiring significant knowledge of grammar, it is an open question how best to evaluate their grammatical knowledge. We explore five experimental methods inspired by prior work evaluating pretrained sentence representation models. We use a single linguistic phenomenon, negative polarity item (NPI) licensing in English, as a case study for our experiments. NPIs like “any” are grammatical only if they appear in a licensing environment like negation (“Sue doesn’t have any cats” vs. “Sue has any cats”). This phenomenon is challenging because of the variety of NPI licensing environments that exist. We introduce an artificially generated dataset that manipulates key features of NPI licensing for the experiments. We find that BERT has significant knowledge of these features, but its success varies widely across different experimental methods. We conclude that a variety of methods is necessary to reveal all relevant aspects of a model’s grammatical knowledge in a given domain.
Tasks
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02597v2
PDF	https://arxiv.org/pdf/1909.02597v2.pdf
PWC	https://paperswithcode.com/paper/investigating-berts-knowledge-of-language
Repo	https://github.com/alexwarstadt/data_generation
Framework	none

The Sensitivity of Counterfactual Fairness to Unmeasured Confounding


Title	The Sensitivity of Counterfactual Fairness to Unmeasured Confounding
Authors	Niki Kilbertus, Philip J. Ball, Matt J. Kusner, Adrian Weller, Ricardo Silva
Abstract	Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions in causal modeling is that you know the causal graph. This introduces a new opportunity for bias, caused by misspecifying the causal model. One common way for misspecification to occur is via unmeasured confounding: the true causal effect between variables is partially described by unobserved quantities. In this work we design tools to assess the sensitivity of fairness measures to this confounding for the popular class of non-linear additive noise models (ANMs). Specifically, we give a procedure for computing the maximum difference between two counterfactually fair predictors, where one has become biased due to confounding. For the case of bivariate confounding our technique can be swiftly computed via a sequence of closed-form updates. For multivariate confounding we give an algorithm that can be efficiently solved via automatic differentiation. We demonstrate our new sensitivity analysis tools in real-world fairness scenarios to assess the bias arising from confounding.
Tasks
Published	2019-07-01
URL	https://arxiv.org/abs/1907.01040v1
PDF	https://arxiv.org/pdf/1907.01040v1.pdf
PWC	https://paperswithcode.com/paper/the-sensitivity-of-counterfactual-fairness-to
Repo	https://github.com/nikikilbertus/cf-fairness-sensitivity
Framework	none

Alternating Roles Dialog Model with Large-scale Pre-trained Language Models


Title	Alternating Roles Dialog Model with Large-scale Pre-trained Language Models
Authors	Qingyang Wu, Yichi Zhang, Yu Li, Zhou Yu
Abstract	Existing dialog system models require extensive human annotations and are difficult to generalize to different tasks. The recent success of large pre-trained language models such as BERT and GPT-2 (Devlin et al., 2019; Radford et al., 2019) have suggested the effectiveness of incorporating language priors in down-stream NLP tasks. However, how much pre-trained language models can help dialog response generation is still under exploration. In this paper, we propose a simple, general, and effective framework: Alternating Roles Dialog Model (ARDM). ARDM models each speaker separately and takes advantage of the large pre-trained language model. It requires no supervision from human annotations such as belief states or dialog acts to achieve effective conversations. ARDM outperforms or is on par with state-of-the-art methods on two popular task-oriented dialog datasets: CamRest676 and MultiWOZ. Moreover, we can generalize ARDM to more challenging, non-collaborative tasks such as persuasion. In persuasion tasks, ARDM is capable of generating human-like responses to persuade people to donate to a charity.
Tasks	Language Modelling
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03756v2
PDF	https://arxiv.org/pdf/1910.03756v2.pdf
PWC	https://paperswithcode.com/paper/alternating-recurrent-dialog-model-with-large
Repo	https://github.com/budzianowski/multiwoz
Framework	pytorch

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models


Title	FinBERT: Financial Sentiment Analysis with Pre-trained Language Models
Authors	Dogu Araci
Abstract	Financial sentiment analysis is a challenging task due to the specialized language and lack of labeled data in that domain. General-purpose models are not effective enough because of the specialized language used in a financial context. We hypothesize that pre-trained language models can help with this problem because they require fewer labeled examples and they can be further trained on domain-specific corpora. We introduce FinBERT, a language model based on BERT, to tackle NLP tasks in the financial domain. Our results show improvement in every measured metric on current state-of-the-art results for two financial sentiment analysis datasets. We find that even with a smaller training set and fine-tuning only a part of the model, FinBERT outperforms state-of-the-art machine learning methods.
Tasks	Language Modelling, Sentiment Analysis
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10063v1
PDF	https://arxiv.org/pdf/1908.10063v1.pdf
PWC	https://paperswithcode.com/paper/finbert-financial-sentiment-analysis-with-pre
Repo	https://github.com/ProsusAI/finBERT
Framework	none

Representation of Constituents in Neural Language Models: Coordination Phrase as a Case Study


Title	Representation of Constituents in Neural Language Models: Coordination Phrase as a Case Study
Authors	Aixiu An, Peng Qian, Ethan Wilcox, Roger Levy
Abstract	Neural language models have achieved state-of-the-art performances on many NLP tasks, and recently have been shown to learn a number of hierarchically-sensitive syntactic dependencies between individual words. However, equally important for language processing is the ability to combine words into phrasal constituents, and use constituent-level features to drive downstream expectations. Here we investigate neural models’ ability to represent constituent-level features, using coordinated noun phrases as a case study. We assess whether different neural language models trained on English and French represent phrase-level number and gender features, and use those features to drive downstream expectations. Our results suggest that models use a linear combination of NP constituent number to drive CoordNP/verb number agreement. This behavior is highly regular and even sensitive to local syntactic context, however it differs crucially from observed human behavior. Models have less success with gender agreement. Models trained on large corpora perform best, and there is no obvious advantage for models trained using explicit syntactic supervision.
Tasks
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04625v1
PDF	https://arxiv.org/pdf/1909.04625v1.pdf
PWC	https://paperswithcode.com/paper/representation-of-constituents-in-neural
Repo	https://github.com/cpllab/rnn_psycholing_coordination
Framework	none

Utilizing Temporal Information in Deep Convolutional Network for Efficient Soccer Ball Detection and Tracking


Title	Utilizing Temporal Information in Deep Convolutional Network for Efficient Soccer Ball Detection and Tracking
Authors	Anna Kukleva, Mohammad Asif Khan, Hafez Farazi, Sven Behnke
Abstract	Soccer ball detection is identified as one of the critical challenges in the RoboCup competition. It requires an efficient vision system capable of handling the task of detection with high precision and recall and providing robust and low inference time. In this work, we present a novel convolutional neural network (CNN) approach to detect the soccer ball in an image sequence. In contrast to the existing methods where only the current frame or an image is used for the detection, we make use of the history of frames. Using history allows to efficiently track the ball in situations where the ball disappears or gets partially occluded in some of the frames. Our approach exploits spatio-temporal correlation and detects the ball based on the trajectory of its movements. We present our results with three convolutional methods, namely temporal convolutional networks (TCN), ConvLSTM, and ConvGRU. We first solve the detection task for an image using fully convolutional encoder-decoder architecture, and later, we use it as an input to our temporal models and jointly learn the detection task in sequences of images. We evaluate all our experiments on a novel dataset prepared as a part of this work. Furthermore, we present empirical results to support the effectiveness of using the history of the ball in challenging scenarios.
Tasks	Game of Football
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02406v2
PDF	https://arxiv.org/pdf/1909.02406v2.pdf
PWC	https://paperswithcode.com/paper/utilizing-temporal-information-in
Repo	https://github.com/AIS-Bonn/TemporalBallDetection
Framework	pytorch

Labeling, Cutting, Grouping: an Efficient Text Line Segmentation Method for Medieval Manuscripts


Title	Labeling, Cutting, Grouping: an Efficient Text Line Segmentation Method for Medieval Manuscripts
Authors	Michele Alberti, Lars Vögtlin, Vinaychandran Pondenkandath, Mathias Seuret, Rolf Ingold, Marcus Liwicki
Abstract	This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.
Tasks	Denoising, Semantic Segmentation
Published	2019-06-11
URL	https://arxiv.org/abs/1906.11894v2
PDF	https://arxiv.org/pdf/1906.11894v2.pdf
PWC	https://paperswithcode.com/paper/labeling-cutting-grouping-an-efficient-text
Repo	https://github.com/DIVA-DIA/Text-Line-Segmentation-Method-for-Medieval-Manuscripts
Framework	none

PointAtrousGraph: Deep Hierarchical Encoder-Decoder with Point Atrous Convolution for Unorganized 3D Points


Title	PointAtrousGraph: Deep Hierarchical Encoder-Decoder with Point Atrous Convolution for Unorganized 3D Points
Authors	Liang Pan, Chee-Meng Chew, Gim Hee Lee
Abstract	Motivated by the success of encoding multi-scale contextual information for image analysis, we propose our PointAtrousGraph (PAG) - a deep permutation-invariant hierarchical encoder-decoder for efficiently exploiting multi-scale edge features in point clouds. Our PAG is constructed by several novel modules, such as Point Atrous Convolution (PAC), Edge-preserved Pooling (EP) and Edge-preserved Unpooling (EU). Similar with atrous convolution, our PAC can effectively enlarge receptive fields of filters and thus densely learn multi-scale point features. Following the idea of non-overlapping max-pooling operations, we propose our EP to preserve critical edge features during subsampling. Correspondingly, our EU modules gradually recover spatial information for edge features. In addition, we introduce chained skip subsampling/upsampling modules that directly propagate edge features to the final stage. Particularly, our proposed auxiliary loss functions can further improve our performance. Experimental results show that our PAG outperform previous state-of-the-art methods on various 3D semantic perception applications.
Tasks
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09798v2
PDF	https://arxiv.org/pdf/1907.09798v2.pdf
PWC	https://paperswithcode.com/paper/pointatrousgraph-deep-hierarchical-encoder
Repo	https://github.com/paul007pl/PointAtrousGraph
Framework	tf

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition


Title	Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition
Authors	Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, Yu Qiao
Abstract	Occlusion and pose variations, which can change facial appearance significantly, are two major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER has made substantial progresses in the past few decades, occlusion-robust and pose-invariant issues of FER have received relatively less attention, especially in real-world scenarios. This paper addresses the real-world pose and occlusion robust FER problem with three-fold contributions. First, to stimulate the research of FER under real-world occlusions and variant poses, we build several in-the-wild facial expression datasets with manual annotations for the community. Second, we propose a novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER. The RAN aggregates and embeds varied number of region features produced by a backbone convolutional neural network into a compact fixed-length representation. Last, inspired by the fact that facial expressions are mainly defined by facial action units, we propose a region biased loss to encourage high attention weights for the most important regions. We validate our RAN and region biased loss on both our built test datasets and four popular datasets: FERPlus, AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet, RAF-DB, and SFEW. Code and the collected test data will be publicly available.
Tasks	Facial Expression Recognition
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04075v2
PDF	https://arxiv.org/pdf/1905.04075v2.pdf
PWC	https://paperswithcode.com/paper/region-attention-networks-for-pose-and
Repo	https://github.com/kaiwang960112/Challenge-condition-FER-dataset
Framework	pytorch

Focus-Enhanced Scene Text Recognition with Deformable Convolutions


Title	Focus-Enhanced Scene Text Recognition with Deformable Convolutions
Authors	Linjie Deng, Yanxiang Gong, Xinchen Lu, Xin Yi, Zheng Ma, Mei Xie
Abstract	Recently, scene text recognition methods based on deep learning have sprung up in computer vision area. The existing methods achieved great performances, but the recognition of irregular text is still challenging due to the various shapes and distorted patterns. Consider that at the time of reading words in the real world, normally we will not rectify it in our mind but adjust our focus and visual fields. Similarly, through utilizing deformable convolutional layers whose geometric structures are adjustable, we present an enhanced recognition network without the steps of rectification to deal with irregular text in this work. A number of experiments have been applied, where the results on public benchmarks demonstrate the effectiveness of our proposed components and shows that our method has reached satisfactory performances. The code will be publicly available at https://github.com/Alpaca07/dtr soon.
Tasks	Scene Text Recognition
Published	2019-08-29
URL	https://arxiv.org/abs/1908.10998v2
PDF	https://arxiv.org/pdf/1908.10998v2.pdf
PWC	https://paperswithcode.com/paper/focus-enhanced-scene-text-recognition-with
Repo	https://github.com/Alpaca07/dtr
Framework	pytorch

Adversarial Self-Defense for Cycle-Consistent GANs


Title	Adversarial Self-Defense for Cycle-Consistent GANs
Authors	Dina Bashkirova, Ben Usman, Kate Saenko
Abstract	The goal of unsupervised image-to-image translation is to map images from one domain to another without the ground truth correspondence between the two domains. State-of-art methods learn the correspondence using large numbers of unpaired examples from both domains and are based on generative adversarial networks. In order to preserve the semantics of the input image, the adversarial objective is usually combined with a cycle-consistency loss that penalizes incorrect reconstruction of the input image from the translated one. However, if the target mapping is many-to-one, e.g. aerial photos to maps, such a restriction forces the generator to hide information in low-amplitude structured noise that is undetectable by human eye or by the discriminator. In this paper, we show how such self-attacking behavior of unsupervised translation methods affects their performance and provide two defense techniques. We perform a quantitative evaluation of the proposed techniques and show that making the translation model more robust to the self-adversarial attack increases its generation quality and reconstruction reliability and makes the model less sensitive to low-amplitude perturbations.
Tasks	Adversarial Attack, Image-to-Image Translation, Unsupervised Image-To-Image Translation
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01517v1
PDF	https://arxiv.org/pdf/1908.01517v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-self-defense-for-cycle-consistent
Repo	https://github.com/dbash/pix2pix_cyclegan_guess_noise
Framework	pytorch

MSG-GAN: Multi-Scale Gradient GAN for Stable Image Synthesis


Title	MSG-GAN: Multi-Scale Gradient GAN for Stable Image Synthesis
Authors	Animesh Karnewar, Oliver Wang
Abstract	While Generative Adversarial Networks (GANs) have seen huge successes in image synthesis tasks, they are notoriously difficult to adapt to different datasets, in part due to instability during training and sensitivity to hyperparameters. One commonly accepted reason for this instability is that gradients passing from the discriminator to the generator become uninformative when there isn’t enough overlap in the supports of the real and fake distributions. In this work, we propose the Multi-Scale Gradient Generative Adversarial Network (MSG-GAN), a simple but effective technique for addressing this by allowing the flow of gradients from the discriminator to the generator at multiple scales. This technique provides a stable approach for high resolution image synthesis, and serves as an alternative to the commonly used progressive growing technique. We show that MSG-GAN converges stably on a variety of image datasets of different sizes, resolutions and domains, as well as different types of loss functions and architectures, all with the same set of fixed hyperparameters. When compared to state-of-the-art GANs, our approach matches or exceeds the performance in most of the cases we tried.
Tasks	Image Generation
Published	2019-03-14
URL	https://arxiv.org/abs/1903.06048v3
PDF	https://arxiv.org/pdf/1903.06048v3.pdf
PWC	https://paperswithcode.com/paper/msg-gan-multi-scale-gradients-gan-for-more
Repo	https://github.com/manicman1999/StyleGAN-Tensorflow-2.0
Framework	tf

Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models


Title	Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models
Authors	Grusha Prasad, Marten van Schijndel, Tal Linzen
Abstract	Neural language models (LMs) perform well on tasks that require sensitivity to syntactic structure. Drawing on the syntactic priming paradigm from psycholinguistics, we propose a novel technique to analyze the representations that enable such success. By establishing a gradient similarity metric between structures, this technique allows us to reconstruct the organization of the LMs’ syntactic representational space. We use this technique to demonstrate that LSTM LMs’ representations of different types of sentences with relative clauses are organized hierarchically in a linguistically interpretable manner, suggesting that the LMs track abstract properties of the sentence.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10579v1
PDF	https://arxiv.org/pdf/1909.10579v1.pdf
PWC	https://paperswithcode.com/paper/using-priming-to-uncover-the-organization-of
Repo	https://github.com/grushaprasad/RNN-Priming
Framework	none

Invariant Transform Experience Replay


Title	Invariant Transform Experience Replay
Authors	Yijiong Lin, Jiancong Huang, Matthieu Zimmer, Juan Rojas, Paul Weng
Abstract	Deep Reinforcement Learning (RL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from observed trajectories define transformations that leave the space of feasible RL trajectories invariant and can be used to generate new feasible trajectories, which could be used for training. Based on this data augmentation idea, we formulate a general framework, called Invariant Transform Experience Replay that we present with two techniques. First, Kaleidoscope Experience Replay exploits reflectional symmetries. Second, Goal-augmented Experience Replay takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show significant increases in learning rates and success rates. Particularly, we attain a 13, 3, and 5 times speedup in the pushing, sliding, and pick-and-place tasks respectively in the multi-goal setting. Invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL.
Tasks	Data Augmentation
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10707v4
PDF	https://arxiv.org/pdf/1909.10707v4.pdf
PWC	https://paperswithcode.com/paper/invariant-transform-experience-replay
Repo	https://github.com/YijiongLin/ITER_KER_GER
Framework	none

Use of a Capsule Network to Detect Fake Images and Videos


Title	Use of a Capsule Network to Detect Fake Images and Videos
Authors	Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Abstract	The revolution in computer hardware, especially in graphics processing units and tensor processing units, has enabled significant advances in computer graphics and artificial intelligence algorithms. In addition to their many beneficial applications in daily life and business, computer-generated/manipulated images and videos can be used for malicious purposes that violate security systems, privacy, and social trust. The deepfake phenomenon and its variations enable a normal user to use his or her personal computer to easily create fake videos of anybody from a short real online video. Several countermeasures have been introduced to deal with attacks using such videos. However, most of them are targeted at certain domains and are ineffective when applied to other domains or new attacks. In this paper, we introduce a capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning. It uses many fewer parameters than traditional convolutional neural networks with similar performance. Moreover, we explain, for the first time ever in the literature, the theory behind the application of capsule networks to the forensics problem through detailed analysis and visualization.
Tasks	Detect Forged Images And Videos
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12467v2
PDF	https://arxiv.org/pdf/1910.12467v2.pdf
PWC	https://paperswithcode.com/paper/use-of-a-capsule-network-to-detect-fake
Repo	https://github.com/nii-yamagishilab/Capsule-Forensics
Framework	pytorch