February 1, 2020

3231 words 16 mins read

Paper Group AWR 135

Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval. The Woman Worked as a Babysitter: On Biases in Language Generation. Learning Implicitly Recurrent CNNs Through Parameter Sharing. Making Fast Graph-based Algorithms with Graph Metric Embeddings. Make Skeleton-based Action Recognition Model Smaller, Faster and Bette …

Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval


Title	Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval
Authors	Zhuyun Dai, Jamie Callan
Abstract	Term frequency is a common method for identifying the importance of a term in a query or document. But it is a weak signal, especially when the frequency distribution is flat, such as in long queries or short documents where the text is of sentence/passage-length. This paper proposes a Deep Contextualized Term Weighting framework that learns to map BERT’s contextualized text representations to context-aware term weights for sentences and passages. When applied to passages, DeepCT-Index produces term weights that can be stored in an ordinary inverted index for passage retrieval. When applied to query text, DeepCT-Query generates a weighted bag-of-words query. Both types of term weight can be used directly by typical first-stage retrieval algorithms. This is novel because most deep neural network based ranking models have higher computational costs, and thus are restricted to later-stage rankers. Experiments on four datasets demonstrate that DeepCT’s deep contextualized text understanding greatly improves the accuracy of first-stage retrieval algorithms.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10687v2
PDF	https://arxiv.org/pdf/1910.10687v2.pdf
PWC	https://paperswithcode.com/paper/context-aware-sentencepassage-term-importance
Repo	https://github.com/AdeDZY/DeepCT
Framework	tf

The Woman Worked as a Babysitter: On Biases in Language Generation


Title	The Woman Worked as a Babysitter: On Biases in Language Generation
Authors	Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, Nanyun Peng
Abstract	We present a systematic study of biases in natural language generation (NLG) by analyzing text generated from prompts that contain mentions of different demographic groups. In this work, we introduce the notion of the regard towards a demographic, use the varying levels of regard towards different demographics as a defining metric for bias in NLG, and analyze the extent to which sentiment scores are a relevant proxy metric for regard. To this end, we collect strategically-generated text from language models and manually annotate the text with both sentiment and regard scores. Additionally, we build an automatic regard classifier through transfer learning, so that we can analyze biases in unseen text. Together, these methods reveal the extent of the biased nature of language model generations. Our analysis provides a study of biases in NLG, bias metrics and correlated human judgments, and empirical evidence on the usefulness of our annotated dataset.
Tasks	Language Modelling, Text Generation, Transfer Learning
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01326v2
PDF	https://arxiv.org/pdf/1909.01326v2.pdf
PWC	https://paperswithcode.com/paper/the-woman-worked-as-a-babysitter-on-biases-in
Repo	https://github.com/ewsheng/nlg-bias
Framework	pytorch


Title	Learning Implicitly Recurrent CNNs Through Parameter Sharing
Authors	Pedro Savarese, Michael Maire
Abstract	We introduce a parameter sharing scheme, in which different layers of a convolutional neural network (CNN) are defined by a learned linear combination of parameter tensors from a global bank of templates. Restricting the number of templates yields a flexible hybridization of traditional CNNs and recurrent networks. Compared to traditional CNNs, we demonstrate substantial parameter savings on standard image classification tasks, while maintaining accuracy. Our simple parameter sharing scheme, though defined via soft weights, in practice often yields trained networks with near strict recurrent structure; with negligible side effects, they convert into networks with actual loops. Training these networks thus implicitly involves discovery of suitable recurrent architectures. Though considering only the design aspect of recurrent links, our trained networks achieve accuracy competitive with those built using state-of-the-art neural architecture search (NAS) procedures. Our hybridization of recurrent and convolutional networks may also represent a beneficial architectural bias. Specifically, on synthetic tasks which are algorithmic in nature, our hybrid networks both train faster and extrapolate better to test examples outside the span of the training set.
Tasks	Image Classification, Neural Architecture Search
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09701v2
PDF	http://arxiv.org/pdf/1902.09701v2.pdf
PWC	https://paperswithcode.com/paper/learning-implicitly-recurrent-cnns-through
Repo	https://github.com/lolemacs/soft-sharing
Framework	pytorch

Making Fast Graph-based Algorithms with Graph Metric Embeddings


Title	Making Fast Graph-based Algorithms with Graph Metric Embeddings
Authors	Andrey Kutuzov, Mohammad Dorgham, Oleksiy Oliynyk, Chris Biemann, Alexander Panchenko
Abstract	The computation of distance measures between nodes in graphs is inefficient and does not scale to large graphs. We explore dense vector representations as an effective way to approximate the same information: we introduce a simple yet efficient and effective approach for learning graph embeddings. Instead of directly operating on the graph structure, our method takes structural measures of pairwise node similarities into account and learns dense node representations reflecting user-defined graph distance measures, such as e.g.the shortest path distance or distance measures that take information beyond the graph structure into account. We demonstrate a speed-up of several orders of magnitude when predicting word similarity by vector operations on our embeddings as opposed to directly computing the respective path-based measures, while outperforming various other graph embeddings on semantic similarity and word sense disambiguation tasks and show evaluations on the WordNet graph and two knowledge base graphs.
Tasks	Semantic Similarity, Semantic Textual Similarity, Word Sense Disambiguation
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07040v1
PDF	https://arxiv.org/pdf/1906.07040v1.pdf
PWC	https://paperswithcode.com/paper/making-fast-graph-based-algorithms-with-graph
Repo	https://github.com/uhh-lt/path2vec
Framework	none

Make Skeleton-based Action Recognition Model Smaller, Faster and Better


Title	Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Authors	Fan Yang, Sakriani Sakti, Yang Wu, Satoshi Nakamura
Abstract	Although skeleton-based action recognition has achieved great success in recent years, most of the existing methods may suffer from a large model size and slow execution speed. To alleviate this issue, we analyze skeleton sequence properties to propose a Double-feature Double-motion Network (DD-Net) for skeleton-based action recognition. By using a lightweight network structure (i.e., 0.15 million parameters), DD-Net can reach a super fast speed, as 3,500 FPS on one GPU, or, 2,000 FPS on one CPU. By employing robust features, DD-Net achieves the state-of-the-art performance on our experimental datasets: SHREC (i.e., hand actions) and JHMDB (i.e., body actions). Our code will be released with this paper later.
Tasks	Hand Gesture Recognition, Skeleton Based Action Recognition
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09658v8
PDF	https://arxiv.org/pdf/1907.09658v8.pdf
PWC	https://paperswithcode.com/paper/make-skeleton-based-action-recognition-model-1
Repo	https://github.com/fandulu/DD-Net
Framework	tf

AI vs Humans for the diagnosis of sleep apnea


Title	AI vs Humans for the diagnosis of sleep apnea
Authors	Valentin Thorey, Albert Bou Hernandez, Pierrick J. Arnal, Emmanuel H. During
Abstract	Polysomnography (PSG) is the gold standard for diagnosing sleep obstructive apnea (OSA). It allows monitoring of breathing events throughout the night. The detection of these events is usually done by trained sleep experts. However, this task is tedious, highly time-consuming and subject to important inter-scorer variability. In this study, we adapted our state-of-the-art deep learning method for sleep event detection, DOSED, to the detection of sleep breathing events in PSG for the diagnosis of OSA. We used a dataset of 52 PSG recordings with apnea-hypopnea event scoring from 5 trained sleep experts. We assessed the performance of the automatic approach and compared it to the inter-scorer performance for both the diagnosis of OSA severity and, at the microscale, for the detection of single breathing events. We observed that human sleep experts reached an average accuracy of 75% while the automatic approach reached 81% for sleep apnea severity diagnosis. The F1 score for individual event detection was 0.55 for experts and 0.57 for the automatic approach, on average. These results demonstrate that the automatic approach can perform at a sleep expert level for the diagnosis of OSA.
Tasks
Published	2019-06-20
URL	https://arxiv.org/abs/1906.09936v1
PDF	https://arxiv.org/pdf/1906.09936v1.pdf
PWC	https://paperswithcode.com/paper/ai-vs-humans-for-the-diagnosis-of-sleep-apnea
Repo	https://github.com/Dreem-Organization/dosed
Framework	pytorch

Learning to Schedule Communication in Multi-agent Reinforcement Learning


Title	Learning to Schedule Communication in Multi-agent Reinforcement Learning
Authors	Daewoo Kim, Sangwoo Moon, David Hostallero, Wan Ju Kang, Taeyoung Lee, Kyunghwan Son, Yung Yi
Abstract	Many real-world reinforcement learning tasks require multiple agents to make sequential decisions under the agents’ interaction, where well-coordinated actions among the agents are crucial to achieve the target goal better at these tasks. One way to accelerate the coordination effect is to enable multiple agents to communicate with each other in a distributed manner and behave as a group. In this paper, we study a practical scenario when (i) the communication bandwidth is limited and (ii) the agents share the communication medium so that only a restricted number of agents are able to simultaneously use the medium, as in the state-of-the-art wireless networking standards. This calls for a certain form of communication scheduling. In that regard, we propose a multi-agent deep reinforcement learning framework, called SchedNet, in which agents learn how to schedule themselves, how to encode the messages, and how to select actions based on received messages. SchedNet is capable of deciding which agents should be entitled to broadcasting their (encoded) messages, by learning the importance of each agent’s partially observed information. We evaluate SchedNet against multiple baselines under two different applications, namely, cooperative communication and navigation, and predator-prey. Our experiments show a non-negligible performance gap between SchedNet and other mechanisms such as the ones without communication and with vanilla scheduling methods, e.g., round robin, ranging from 32% to 43%.
Tasks	Multi-agent Reinforcement Learning
Published	2019-02-05
URL	http://arxiv.org/abs/1902.01554v1
PDF	http://arxiv.org/pdf/1902.01554v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-schedule-communication-in-multi
Repo	https://github.com/rhoowd/sched_net
Framework	none

Learning Character-Agnostic Motion for Motion Retargeting in 2D


Title	Learning Character-Agnostic Motion for Motion Retargeting in 2D
Authors	Kfir Aberman, Rundi Wu, Dani Lischinski, Baoquan Chen, Daniel Cohen-Or
Abstract	Analyzing human motion is a challenging task with a wide variety of applications in computer vision and in graphics. One such application, of particular importance in computer animation, is the retargeting of motion from one performer to another. While humans move in three dimensions, the vast majority of human motions are captured using video, requiring 2D-to-3D pose and camera recovery, before existing retargeting approaches may be applied. In this paper, we present a new method for retargeting video-captured motion between different human performers, without the need to explicitly reconstruct 3D poses and/or camera parameters. In order to achieve our goal, we learn to extract, directly from a video, a high-level latent motion representation, which is invariant to the skeleton geometry and the camera view. Our key idea is to train a deep neural network to decompose temporal sequences of 2D poses into three components: motion, skeleton, and camera view-angle. Having extracted such a representation, we are able to re-combine motion with novel skeletons and camera views, and decode a retargeted temporal sequence, which we compare to a ground truth from a synthetic dataset. We demonstrate that our framework can be used to robustly extract human motion from videos, bypassing 3D reconstruction, and outperforming existing retargeting methods, when applied to videos in-the-wild. It also enables additional applications, such as performance cloning, video-driven cartoons, and motion retrieval.
Tasks	3D Reconstruction
Published	2019-05-05
URL	https://arxiv.org/abs/1905.01680v1
PDF	https://arxiv.org/pdf/1905.01680v1.pdf
PWC	https://paperswithcode.com/paper/learning-character-agnostic-motion-for-motion
Repo	https://github.com/ChrisWu1997/2D-Motion-Retargeting
Framework	pytorch

Rethinking Attribute Representation and Injection for Sentiment Classification


Title	Rethinking Attribute Representation and Injection for Sentiment Classification
Authors	Reinald Kim Amplayo
Abstract	Text attributes, such as user and product information in product reviews, have been used to improve the performance of sentiment classification models. The de facto standard method is to incorporate them as additional biases in the attention mechanism, and more performance gains are achieved by extending the model architecture. In this paper, we show that the above method is the least effective way to represent and inject attributes. To demonstrate this hypothesis, unlike previous models with complicated architectures, we limit our base model to a simple BiLSTM with attention classifier, and instead focus on how and where the attributes should be incorporated in the model. We propose to represent attributes as chunk-wise importance weight matrices and consider four locations in the model (i.e., embedding, encoding, attention, classifier) to inject attributes. Experiments show that our proposed method achieves significant improvements over the standard approach and that attention mechanism is the worst location to inject attributes, contradicting prior work. We also outperform the state-of-the-art despite our use of a simple base model. Finally, we show that these representations transfer well to other tasks. Model implementation and datasets are released here: https://github.com/rktamplayo/CHIM.
Tasks	Sentiment Analysis
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09590v1
PDF	https://arxiv.org/pdf/1908.09590v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-attribute-representation-and
Repo	https://github.com/rktamplayo/CHIM
Framework	pytorch

LanczosNet: Multi-Scale Deep Graph Convolutional Networks


Title	LanczosNet: Multi-Scale Deep Graph Convolutional Networks
Authors	Renjie Liao, Zhizhen Zhao, Raquel Urtasun, Richard S. Zemel
Abstract	We propose the Lanczos network (LanczosNet), which uses the Lanczos algorithm to construct low rank approximations of the graph Laplacian for graph convolution. Relying on the tridiagonal decomposition of the Lanczos algorithm, we not only efficiently exploit multi-scale information via fast approximated computation of matrix power but also design learnable spectral filters. Being fully differentiable, LanczosNet facilitates both graph kernel learning as well as learning node embeddings. We show the connection between our LanczosNet and graph based manifold learning methods, especially the diffusion maps. We benchmark our model against several recent deep graph networks on citation networks and QM8 quantum chemistry dataset. Experimental results show that our model achieves the state-of-the-art performance in most tasks. Code is released at: \url{https://github.com/lrjconan/LanczosNetwork}.
Tasks	Node Classification
Published	2019-01-06
URL	https://arxiv.org/abs/1901.01484v2
PDF	https://arxiv.org/pdf/1901.01484v2.pdf
PWC	https://paperswithcode.com/paper/lanczosnet-multi-scale-deep-graph
Repo	https://github.com/lrjconan/LanczosNetwork
Framework	pytorch

Learning by Abstraction: The Neural State Machine


Title	Learning by Abstraction: The Neural State Machine
Authors	Drew A. Hudson, Christopher D. Manning
Abstract	We introduce the Neural State Machine, seeking to bridge the gap between the neural and symbolic views of AI and integrate their complementary strengths for the task of visual reasoning. Given an image, we first predict a probabilistic graph that represents its underlying semantics and serves as a structured world model. Then, we perform sequential reasoning over the graph, iteratively traversing its nodes to answer a given question or draw a new inference. In contrast to most neural architectures that are designed to closely interact with the raw sensory data, our model operates instead in an abstract latent space, by transforming both the visual and linguistic modalities into semantic concept-based representations, thereby achieving enhanced transparency and modularity. We evaluate our model on VQA-CP and GQA, two recent VQA datasets that involve compositionality, multi-step inference and diverse reasoning skills, achieving state-of-the-art results in both cases. We provide further experiments that illustrate the model’s strong generalization capacity across multiple dimensions, including novel compositions of concepts, changes in the answer distribution, and unseen linguistic structures, demonstrating the qualities and efficacy of our approach.
Tasks	Visual Question Answering, Visual Reasoning
Published	2019-07-09
URL	https://arxiv.org/abs/1907.03950v4
PDF	https://arxiv.org/pdf/1907.03950v4.pdf
PWC	https://paperswithcode.com/paper/learning-by-abstraction-the-neural-state
Repo	https://github.com/ceyzaguirre4/NSM
Framework	pytorch

Semantic flow in language networks


Title	Semantic flow in language networks
Authors	Edilson A. Corrêa Jr., Vanessa Q. Marinho, Diego R. Amancio
Abstract	In this study we propose a framework to characterize documents based on their semantic flow. The proposed framework encompasses a network-based model that connected sentences based on their semantic similarity. Semantic fields are detected using standard community detection methods. as the story unfolds, transitions between semantic fields are represent in Markov networks, which in turned are characterized via network motifs (subgraphs). Here we show that the proposed framework can be used to classify books according to their style and publication dates. Remarkably, even without a systematic optimization of parameters, philosophy and investigative books were discriminated with an accuracy rate of 92.5%. Because this model captures semantic features of texts, it could be used as an additional feature in traditional network-based models of texts that capture only syntactical/stylistic information, as it is the case of word adjacency (co-occurrence) networks.
Tasks	Community Detection, Semantic Similarity, Semantic Textual Similarity
Published	2019-05-18
URL	https://arxiv.org/abs/1905.07595v1
PDF	https://arxiv.org/pdf/1905.07595v1.pdf
PWC	https://paperswithcode.com/paper/semantic-flow-in-language-networks
Repo	https://github.com/edilsonacjr/semantic_flow
Framework	none

KISS: Keeping It Simple for Scene Text Recognition


Title	KISS: Keeping It Simple for Scene Text Recognition
Authors	Christian Bartz, Joseph Bethge, Haojin Yang, Christoph Meinel
Abstract	Over the past few years, several new methods for scene text recognition have been proposed. Most of these methods propose novel building blocks for neural networks. These novel building blocks are specially tailored for the task of scene text recognition and can thus hardly be used in any other tasks. In this paper, we introduce a new model for scene text recognition that only consists of off-the-shelf building blocks for neural networks. Our model (KISS) consists of two ResNet based feature extractors, a spatial transformer, and a transformer. We train our model only on publicly available, synthetic training data and evaluate it on a range of scene text recognition benchmarks, where we reach state-of-the-art or competitive performance, although our model does not use methods like 2D-attention, or image rectification.
Tasks	Scene Text Recognition
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08400v1
PDF	https://arxiv.org/pdf/1911.08400v1.pdf
PWC	https://paperswithcode.com/paper/kiss-keeping-it-simple-for-scene-text
Repo	https://github.com/Bartzi/kiss
Framework	none

HATS: A Hierarchical Graph Attention Network for Stock Movement Prediction


Title	HATS: A Hierarchical Graph Attention Network for Stock Movement Prediction
Authors	Raehyun Kim, Chan Ho So, Minbyul Jeong, Sanghoon Lee, Jinkyu Kim, Jaewoo Kang
Abstract	Many researchers both in academia and industry have long been interested in the stock market. Numerous approaches were developed to accurately predict future trends in stock prices. Recently, there has been a growing interest in utilizing graph-structured data in computer science research communities. Methods that use relational data for stock market prediction have been recently proposed, but they are still in their infancy. First, the quality of collected information from different types of relations can vary considerably. No existing work has focused on the effect of using different types of relations on stock market prediction or finding an effective way to selectively aggregate information on different relation types. Furthermore, existing works have focused on only individual stock prediction which is similar to the node classification task. To address this, we propose a hierarchical attention network for stock prediction (HATS) which uses relational data for stock market prediction. Our HATS method selectively aggregates information on different relation types and adds the information to the representations of each company. Specifically, node representations are initialized with features extracted from a feature extraction module. HATS is used as a relational modeling module with initialized node representations. Then, node representations with the added information are fed into a task-specific layer. Our method is used for predicting not only individual stock prices but also market index movements, which is similar to the graph classification task. The experimental results show that performance can change depending on the relational data used. HATS which can automatically select information outperformed all the existing methods.
Tasks	Graph Classification, Node Classification, Stock Market Prediction, Stock Prediction
Published	2019-08-07
URL	https://arxiv.org/abs/1908.07999v3
PDF	https://arxiv.org/pdf/1908.07999v3.pdf
PWC	https://paperswithcode.com/paper/hats-a-hierarchical-graph-attention-network
Repo	https://github.com/dmis-lab/hats
Framework	tf

PU-GAN: a Point Cloud Upsampling Adversarial Network


Title	PU-GAN: a Point Cloud Upsampling Adversarial Network
Authors	Ruihui Li, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, Pheng-Ann Heng
Abstract	Point clouds acquired from range scans are often sparse, noisy, and non-uniform. This paper presents a new point cloud upsampling network called PU-GAN, which is formulated based on a generative adversarial network (GAN), to learn a rich variety of point distributions from the latent space and upsample points over patches on object surfaces. To realize a working GAN network, we construct an up-down-up expansion unit in the generator for upsampling point features with error feedback and self-correction, and formulate a self-attention unit to enhance the feature integration. Further, we design a compound loss with adversarial, uniform and reconstruction terms, to encourage the discriminator to learn more latent patterns and enhance the output point distribution uniformity. Qualitative and quantitative evaluations demonstrate the quality of our results over the state-of-the-arts in terms of distribution uniformity, proximity-to-surface, and 3D reconstruction quality.
Tasks	3D Reconstruction, Point Cloud Super Resolution
Published	2019-07-25
URL	https://arxiv.org/abs/1907.10844v1
PDF	https://arxiv.org/pdf/1907.10844v1.pdf
PWC	https://paperswithcode.com/paper/pu-gan-a-point-cloud-upsampling-adversarial
Repo	https://github.com/liruihui/PU-GAN
Framework	tf