February 2, 2020

3684 words 18 mins read

Paper Group AWR 35

Your Stance is Exposed! Analysing Possible Factors for Stance Detection on Social Media. Learning Depth-Guided Convolutions for Monocular 3D Object Detection. Knowing What, How and Why: A Near Complete Solution for Aspect-based Sentiment Analysis. A deep learning approach for automated detection of geographic atrophy from color fundus photographs. …


Title	Your Stance is Exposed! Analysing Possible Factors for Stance Detection on Social Media
Authors	Abeer Aldayel, Walid Magdy
Abstract	To what extent user’s stance towards a given topic could be inferred? Most of the studies on stance detection have focused on analysing user’s posts on a given topic to predict the stance. However, the stance in social media can be inferred from a mixture of signals that might reflect user’s beliefs including posts and online interactions. This paper examines various online features of users to detect their stance towards different topics. We compare multiple set of features, including on-topic content, network interactions, user’s preferences, and online network connections. Our objective is to understand the online signals that can reveal the users’ stance. Experimentation is applied on tweets dataset from the SemEval stance detection task, which covers five topics. Results show that stance of a user can be detected with multiple signals of user’s online activity, including their posts on the topic, the network they interact with or follow, the websites they visit, and the content they like. The performance of the stance modelling using different network features are comparable with the state-of-the-art reported model that used textual content only. In addition, combining network and content features leads to the highest reported performance to date on the SemEval dataset with F-measure of 72.49%. We further present an extensive analysis to show how these different set of features can reveal stance. Our findings have distinct privacy implications, where they highlight that stance is strongly embedded in user’s online social network that, in principle, individuals can be profiled from their interactions and connections even when they do not post about the topic.
Tasks	Stance Detection
Published	2019-08-08
URL	https://arxiv.org/abs/1908.03146v1
PDF	https://arxiv.org/pdf/1908.03146v1.pdf
PWC	https://paperswithcode.com/paper/your-stance-is-exposed-analysing-possible
Repo	https://github.com/AbeerAldayel/Stance_detection
Framework	none

Learning Depth-Guided Convolutions for Monocular 3D Object Detection


Title	Learning Depth-Guided Convolutions for Monocular 3D Object Detection
Authors	Mingyu Ding, Yuqi Huo, Hongwei Yi, Zhe Wang, Jianping Shi, Zhiwu Lu, Ping Luo
Abstract	3D object detection from a single image without LiDAR is a challenging task due to the lack of accurate depth information. Conventional 2D convolutions are unsuitable for this task because they fail to capture local object and its scale information, which are vital for 3D object detection. To better represent 3D structure, prior arts typically transform depth maps estimated from 2D images into a pseudo-LiDAR representation, and then apply existing 3D point-cloud based object detectors. However, their results depend heavily on the accuracy of the estimated depth maps, resulting in suboptimal performance. In this work, instead of using pseudo-LiDAR representation, we improve the fundamental 2D fully convolutions by proposing a new local convolutional network (LCN), termed Depth-guided Dynamic-Depthwise-Dilated LCN (D$^4$LCN), where the filters and their receptive fields can be automatically learned from image-based depth maps, making different pixels of different images have different filters. D$^4$LCN overcomes the limitation of conventional 2D convolutions and narrows the gap between image representation and 3D point cloud representation. Extensive experiments show that D$^4$LCN outperforms existing works by large margins. For example, the relative improvement of D$^4$LCN against the state-of-the-art on KITTI is 9.1% in the moderate setting. The code is available at https://github.com/dingmyu/D4LCN.
Tasks	3D Object Detection, Object Detection
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04799v2
PDF	https://arxiv.org/pdf/1912.04799v2.pdf
PWC	https://paperswithcode.com/paper/learning-depth-guided-convolutions-for
Repo	https://github.com/dingmyu/D4LCN
Framework	pytorch

Knowing What, How and Why: A Near Complete Solution for Aspect-based Sentiment Analysis


Title	Knowing What, How and Why: A Near Complete Solution for Aspect-based Sentiment Analysis
Authors	Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu, Luo Si
Abstract	Target-based sentiment analysis or aspect-based sentiment analysis (ABSA) refers to addressing various sentiment analysis tasks at a fine-grained level, which includes but is not limited to aspect extraction, aspect sentiment classification, and opinion extraction. There exist many solvers of the above individual subtasks or a combination of two subtasks, and they can work together to tell a complete story, i.e. the discussed aspect, the sentiment on it, and the cause of the sentiment. However, no previous ABSA research tried to provide a complete solution in one shot. In this paper, we introduce a new subtask under ABSA, named aspect sentiment triplet extraction (ASTE). Particularly, a solver of this task needs to extract triplets (What, How, Why) from the inputs, which show WHAT the targeted aspects are, HOW their sentiment polarities are and WHY they have such polarities (i.e. opinion reasons). For instance, one triplet from “Waiters are very friendly and the pasta is simply average” could be (‘Waiters’, positive, ‘friendly’). We propose a two-stage framework to address this task. The first stage predicts what, how and why in a unified model, and then the second stage pairs up the predicted what (how) and why from the first stage to output triplets. In the experiments, our framework has set a benchmark performance in this novel triplet extraction task. Meanwhile, it outperforms a few strong baselines adapted from state-of-the-art related methods.
Tasks	Aspect-Based Sentiment Analysis, Aspect Extraction, Sentiment Analysis
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01616v4
PDF	https://arxiv.org/pdf/1911.01616v4.pdf
PWC	https://paperswithcode.com/paper/knowing-what-how-and-why-a-near-complete
Repo	https://github.com/xuuuluuu/SemEval-Triplet-data
Framework	none

A deep learning approach for automated detection of geographic atrophy from color fundus photographs


Title	A deep learning approach for automated detection of geographic atrophy from color fundus photographs
Authors	Tiarnan D. Keenan, Shazia Dharssi, Yifan Peng, Qingyu Chen, Elvira Agrón, Wai T. Wong, Zhiyong Lu, Emily Y. Chew
Abstract	Purpose: To assess the utility of deep learning in the detection of geographic atrophy (GA) from color fundus photographs; secondary aim to explore potential utility in detecting central GA (CGA). Design: A deep learning model was developed to detect the presence of GA in color fundus photographs, and two additional models to detect CGA in different scenarios. Participants: 59,812 color fundus photographs from longitudinal follow up of 4,582 participants in the AREDS dataset. Gold standard labels were from human expert reading center graders using a standardized protocol. Methods: A deep learning model was trained to use color fundus photographs to predict GA presence from a population of eyes with no AMD to advanced AMD. A second model was trained to predict CGA presence from the same population. A third model was trained to predict CGA presence from the subset of eyes with GA. For training and testing, 5-fold cross-validation was employed. For comparison with human clinician performance, model performance was compared with that of 88 retinal specialists. Results: The deep learning models (GA detection, CGA detection from all eyes, and centrality detection from GA eyes) had AUC of 0.933-0.976, 0.939-0.976, and 0.827-0.888, respectively. The GA detection model had accuracy, sensitivity, specificity, and precision of 0.965, 0.692, 0.978, and 0.584, respectively. The CGA detection model had equivalent values of 0.966, 0.763, 0.971, and 0.394. The centrality detection model had equivalent values of 0.762, 0.782, 0.729, and 0.799. Conclusions: A deep learning model demonstrated high accuracy for the automated detection of GA. The AUC was non-inferior to that of human retinal specialists. Deep learning approaches may also be applied to the identification of CGA. The code and pretrained models are publicly available at https://github.com/ncbi-nlp/DeepSeeNet.
Tasks
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03153v1
PDF	https://arxiv.org/pdf/1906.03153v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-approach-for-automated
Repo	https://github.com/ncbi-nlp/DeepSeeNet
Framework	tf

DOER: Dual Cross-Shared RNN for Aspect Term-Polarity Co-Extraction


Title	DOER: Dual Cross-Shared RNN for Aspect Term-Polarity Co-Extraction
Authors	Huaishao Luo, Tianrui Li, Bing Liu, Junbo Zhang
Abstract	This paper focuses on two related subtasks of aspect-based sentiment analysis, namely aspect term extraction and aspect sentiment classification, which we call aspect term-polarity co-extraction. The former task is to extract aspects of a product or service from an opinion document, and the latter is to identify the polarity expressed in the document about these extracted aspects. Most existing algorithms address them as two separate tasks and solve them one by one, or only perform one task, which can be complicated for real applications. In this paper, we treat these two tasks as two sequence labeling problems and propose a novel Dual crOss-sharEd RNN framework (DOER) to generate all aspect term-polarity pairs of the input sentence simultaneously. Specifically, DOER involves a dual recurrent neural network to extract the respective representation of each task, and a cross-shared unit to consider the relationship between them. Experimental results demonstrate that the proposed framework outperforms state-of-the-art baselines on three benchmark datasets.
Tasks	Aspect-Based Sentiment Analysis, Sentiment Analysis
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01794v1
PDF	https://arxiv.org/pdf/1906.01794v1.pdf
PWC	https://paperswithcode.com/paper/doer-dual-cross-shared-rnn-for-aspect-term
Repo	https://github.com/ArrowLuo/DOER
Framework	tf

Learning Human Objectives by Evaluating Hypothetical Behavior


Title	Learning Human Objectives by Evaluating Hypothetical Behavior
Authors	Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike
Abstract	We seek to align agent behavior with a user’s objectives in a reinforcement learning setting with unknown dynamics, an unknown reward function, and unknown unsafe states. The user knows the rewards and unsafe states, but querying the user is expensive. To address this challenge, we propose an algorithm that safely and interactively learns a model of the user’s reward function. We start with a generative model of initial states and a forward dynamics model trained on off-policy data. Our method uses these models to synthesize hypothetical behaviors, asks the user to label the behaviors with rewards, and trains a neural network to predict the rewards. The key idea is to actively synthesize the hypothetical behaviors from scratch by maximizing tractable proxies for the value of information, without interacting with the environment. We call this method reward query synthesis via trajectory optimization (ReQueST). We evaluate ReQueST with simulated users on a state-based 2D navigation task and the image-based Car Racing video game. The results show that ReQueST significantly outperforms prior methods in learning reward models that transfer to new environments with different initial state distributions. Moreover, ReQueST safely trains the reward model to detect unsafe states, and corrects reward hacking before deploying the agent.
Tasks	Car Racing
Published	2019-12-05
URL	https://arxiv.org/abs/1912.05652v1
PDF	https://arxiv.org/pdf/1912.05652v1.pdf
PWC	https://paperswithcode.com/paper/learning-human-objectives-by-evaluating
Repo	https://github.com/rddy/ReQueST
Framework	tf

Charting the Right Manifold: Manifold Mixup for Few-shot Learning


Title	Charting the Right Manifold: Manifold Mixup for Few-shot Learning
Authors	Puneet Mangla, Mayank Singh, Abhishek Sinha, Nupur Kumari, Vineeth N Balasubramanian, Balaji Krishnamurthy
Abstract	Few-shot learning algorithms aim to learn model parameters capable of adapting to unseen classes with the help of only a few labeled examples. A recent regularization technique - Manifold Mixup focuses on learning a general-purpose representation, robust to small changes in the data distribution. Since the goal of few-shot learning is closely linked to robust representation learning, we study Manifold Mixup in this problem setting. Self-supervised learning is another technique that learns semantically meaningful features, using only the inherent structure of the data. This work investigates the role of learning relevant feature manifold for few-shot tasks using self-supervision and regularization techniques. We observe that regularizing the feature manifold, enriched via self-supervised techniques, with Manifold Mixup significantly improves few-shot learning performance. We show that our proposed method S2M2 beats the current state-of-the-art accuracy on standard few-shot learning datasets like CIFAR-FS, CUB, mini-ImageNet and tiered-ImageNet by 3-8 %. Through extensive experimentation, we show that the features learned using our approach generalize to complex few-shot evaluation tasks, cross-domain scenarios and are robust against slight changes to data distribution.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Representation Learning
Published	2019-07-28
URL	https://arxiv.org/abs/1907.12087v4
PDF	https://arxiv.org/pdf/1907.12087v4.pdf
PWC	https://paperswithcode.com/paper/charting-the-right-manifold-manifold-mixup
Repo	https://github.com/nupurkmr9/S2M2_fewshot
Framework	pytorch

Graph U-Nets


Title	Graph U-Nets
Authors	Hongyang Gao, Shuiwang Ji
Abstract	We consider the problem of representation learning for graph data. Convolutional neural networks can naturally operate on images, but have significant challenges in dealing with graph data. Given images are special cases of graphs with nodes lie on 2D lattices, graph embedding tasks have a natural correspondence with image pixel-wise prediction tasks such as segmentation. While encoder-decoder architectures like U-Nets have been successfully applied on many image pixel-wise prediction tasks, similar methods are lacking for graph data. This is due to the fact that pooling and up-sampling operations are not natural on graph data. To address these challenges, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work. The gPool layer adaptively selects some nodes to form a smaller graph based on their scalar projection values on a trainable projection vector. We further propose the gUnpool layer as the inverse operation of the gPool layer. The gUnpool layer restores the graph into its original structure using the position information of nodes selected in the corresponding gPool layer. Based on our proposed gPool and gUnpool layers, we develop an encoder-decoder model on graph, known as the graph U-Nets. Our experimental results on node classification and graph classification tasks demonstrate that our methods achieve consistently better performance than previous models.
Tasks	Graph Classification, Graph Embedding, Node Classification, Representation Learning
Published	2019-05-11
URL	https://arxiv.org/abs/1905.05178v1
PDF	https://arxiv.org/pdf/1905.05178v1.pdf
PWC	https://paperswithcode.com/paper/graph-u-nets
Repo	https://github.com/HongyangGao/Graph-U-Nets
Framework	pytorch

Adversarial Attacks and Defenses in Images, Graphs and Text: A Review


Title	Adversarial Attacks and Defenses in Images, Graphs and Text: A Review
Authors	Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, Anil K. Jain
Abstract	Deep neural networks (DNN) have achieved unprecedented success in numerous machine learning tasks in various domains. However, the existence of adversarial examples has raised concerns about applying deep learning to safety-critical applications. As a result, we have witnessed increasing interests in studying attack and defense mechanisms for DNN models on different data types, such as images, graphs and text. Thus, it is necessary to provide a systematic and comprehensive overview of the main threats of attacks and the success of corresponding countermeasures. In this survey, we review the state of the art algorithms for generating adversarial examples and the countermeasures against adversarial examples, for the three popular data types, i.e., images, graphs and text.
Tasks
Published	2019-09-17
URL	https://arxiv.org/abs/1909.08072v2
PDF	https://arxiv.org/pdf/1909.08072v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-attacks-and-defenses-in-images
Repo	https://github.com/snaka0213/PyTorch-AdvAttacks
Framework	pytorch

E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles


Title	E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles
Authors	Markus Kettunen, Erik Härkönen, Jaakko Lehtinen
Abstract	It has been recently shown that the hidden variables of convolutional neural networks make for an efficient perceptual similarity metric that accurately predicts human judgment on relative image similarity assessment. First, we show that such learned perceptual similarity metrics (LPIPS) are susceptible to adversarial attacks that dramatically contradict human visual similarity judgment. While this is not surprising in light of neural networks’ well-known weakness to adversarial perturbations, we proceed to show that self-ensembling with an infinite family of random transformations of the input — a technique known not to render classification networks robust — is enough to turn the metric robust against attack, while retaining predictive power on human judgments. Finally, we study the geometry imposed by our our novel self-ensembled metric (E-LPIPS) on the space of natural images. We find evidence of “perceptual convexity” by showing that convex combinations of similar-looking images retain appearance, and that discrete geodesics yield meaningful frame interpolation and texture morphing, all without explicit correspondences.
Tasks	Image Similarity Search
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03973v2
PDF	https://arxiv.org/pdf/1906.03973v2.pdf
PWC	https://paperswithcode.com/paper/e-lpips-robust-perceptual-image-similarity
Repo	https://github.com/mkettune/elpips
Framework	tf

EnlightenGAN: Deep Light Enhancement without Paired Supervision


Title	EnlightenGAN: Deep Light Enhancement without Paired Supervision
Authors	Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang
Abstract	Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose a highly effective unsupervised generative adversarial network, dubbed EnlightenGAN, that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images. Instead of supervising the learning using ground truth data, we propose to regularize the unpaired training using the information extracted from the input itself, and benchmark a series of innovations for the low-light image enhancement problem, including a global-local discriminator structure, a self-regularized perceptual loss fusion, and attention mechanism. Through extensive experiments, our proposed approach outperforms recent methods under a variety of metrics in terms of visual quality and subjective user study. Thanks to the great flexibility brought by unpaired training, EnlightenGAN is demonstrated to be easily adaptable to enhancing real-world images from various domains. The code is available at \url{https://github.com/yueruchen/EnlightenGAN}
Tasks	Image Enhancement, Image Restoration, Low-Light Image Enhancement
Published	2019-06-17
URL	https://arxiv.org/abs/1906.06972v1
PDF	https://arxiv.org/pdf/1906.06972v1.pdf
PWC	https://paperswithcode.com/paper/enlightengan-deep-light-enhancement-without
Repo	https://github.com/ksheeraj/CS256-AI-ObjectDetection
Framework	none


Title	Predicting the Type and Target of Offensive Posts in Social Media
Authors	Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
Abstract	As offensive content has become pervasive in social media, there has been much research in identifying potentially offensive messages. However, previous work on this topic did not consider the problem as a whole, but rather focused on detecting very specific types of offensive content, e.g., hate speech, cyberbulling, or cyber-aggression. In contrast, here we target several different kinds of offensive content. In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media. For this purpose, we complied the Offensive Language Identification Dataset (OLID), a new dataset with tweets annotated for offensive content using a fine-grained three-layer annotation scheme, which we make publicly available. We discuss the main similarities and differences between OLID and pre-existing datasets for hate speech identification, aggression detection, and similar tasks. We further experiment with and we compare the performance of different machine learning models on OLID.
Tasks	Language Identification
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09666v2
PDF	http://arxiv.org/pdf/1902.09666v2.pdf
PWC	https://paperswithcode.com/paper/predicting-the-type-and-target-of-offensive
Repo	https://github.com/idontflow/OLID
Framework	none

A Fully Differentiable Beam Search Decoder


Title	A Fully Differentiable Beam Search Decoder
Authors	Ronan Collobert, Awni Hannun, Gabriel Synnaeve
Abstract	We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g. acoustic and language models). It can be used when target sequences are not aligned to input sequences by considering all possible alignments between the two. We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models. The system is end-to-end, with gradients flowing through the whole architecture from the word-level transcriptions. Recent research efforts have shown that deep neural networks with attention-based mechanisms are powerful enough to successfully train an acoustic model from the final transcription, while implicitly learning a language model. Instead, we show that it is possible to discriminatively train an acoustic model jointly with an explicit and possibly pre-trained language model.
Tasks	Language Modelling, Speech Recognition
Published	2019-02-16
URL	http://arxiv.org/abs/1902.06022v1
PDF	http://arxiv.org/pdf/1902.06022v1.pdf
PWC	https://paperswithcode.com/paper/a-fully-differentiable-beam-search-decoder
Repo	https://github.com/johnhw/differentiable_sorting
Framework	tf

Detector-in-Detector: Multi-Level Analysis for Human-Parts


Title	Detector-in-Detector: Multi-Level Analysis for Human-Parts
Authors	Xiaojie Li, Lu Yang, Qing Song, Fuqiang Zhou
Abstract	Vision-based person, hand or face detection approaches have achieved incredible success in recent years with the development of deep convolutional neural network (CNN). In this paper, we take the inherent correlation between the body and body parts into account and propose a new framework to boost up the detection performance of the multi-level objects. In particular, we adopt a region-based object detection structure with two carefully designed detectors to separately pay attention to the human body and body parts in a coarse-to-fine manner, which we call Detector-in-Detector network (DID-Net). The first detector is designed to detect human body, hand, and face. The second detector, based on the body detection results of the first detector, mainly focus on the detection of small hand and face inside each body. The framework is trained in an end-to-end way by optimizing a multi-task loss. Due to the lack of human body, face and hand detection dataset, we have collected and labeled a new large dataset named Human-Parts with 14,962 images and 106,879 annotations. Experiments show that our method can achieve excellent performance on Human-Parts.
Tasks	Face Detection, Object Detection
Published	2019-02-19
URL	http://arxiv.org/abs/1902.07017v1
PDF	http://arxiv.org/pdf/1902.07017v1.pdf
PWC	https://paperswithcode.com/paper/detector-in-detector-multi-level-analysis-for
Repo	https://github.com/svjack/Detector-in-Detector
Framework	tf

Asymmetric Generative Adversarial Networks for Image-to-Image Translation


Title	Asymmetric Generative Adversarial Networks for Image-to-Image Translation
Authors	Hao Tang, Dan Xu, Hong Liu, Nicu Sebe
Abstract	State-of-the-art models for unpaired image-to-image translation with Generative Adversarial Networks (GANs) can learn the mapping from the source domain to the target domain using a cycle-consistency loss. The intuition behind these models is that if we translate from one domain to the other and back again we should arrive at where we started. However, existing methods always adopt a symmetric network architecture to learn both forward and backward cycles. Because of the task complexity and cycle input difference between the source and target image domains, the inequality in bidirectional forward-backward cycle translations is significant and the amount of information between two domains is different. In this paper, we analyze the limitation of the existing symmetric GAN models in asymmetric translation tasks, and propose an AsymmetricGAN model with both translation and reconstruction generators of unequal sizes and different parameter-sharing strategy to adapt to the asymmetric need in both unsupervised and supervised image-to-image translation tasks. Moreover, the training stage of existing methods has the common problem of model collapse that degrades the quality of the generated images, thus we explore different optimization losses for better training of AsymmetricGAN, and thus make image-to-image translation with higher consistency and better stability. Extensive experiments on both supervised and unsupervised generative tasks with several publicly available datasets demonstrate that the proposed AsymmetricGAN achieves superior model capacity and better generation performance compared with existing GAN models. To the best of our knowledge, we are the first to investigate the asymmetric GAN framework on both unsupervised and supervised image-to-image translation tasks. The source code, data and trained models are available at https://github.com/Ha0Tang/AsymmetricGAN.
Tasks	Image-to-Image Translation
Published	2019-12-14
URL	https://arxiv.org/abs/1912.06931v1
PDF	https://arxiv.org/pdf/1912.06931v1.pdf
PWC	https://paperswithcode.com/paper/asymmetric-generative-adversarial-networks
Repo	https://github.com/Ha0Tang/AsymmetricGAN
Framework	pytorch