February 1, 2020

3188 words 15 mins read

Paper Group AWR 130

Scene Graph Prediction with Limited Labels. Neural Consciousness Flow. Path Ranking with Attention to Type Hierarchies. Contextual Encoder-Decoder Network for Visual Saliency Prediction. DELTA: A DEep learning based Language Technology plAtform. Forecasting Pedestrian Trajectory with Machine-Annotated Training Data. Simple vs complex temporal recur …

Scene Graph Prediction with Limited Labels


Title	Scene Graph Prediction with Limited Labels
Authors	Vincent S. Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei
Abstract	Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each. Hiring human annotators is expensive, and using textual knowledge base completion methods are incompatible with visual data. In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples. We analyze visual relationships to suggest two types of image-agnostic features that are used to generate noisy heuristics, whose outputs are aggregated using a factor graph-based generative model. With as few as 10 labeled examples per relationship, the generative model creates enough training data to train any existing state-of-the-art scene graph model. We demonstrate that our method outperforms all baseline approaches on scene graph prediction by 5.16 recall@100 for PREDCLS. In our limited label setting, we define a complexity metric for relationships that serves as an indicator (R^2 = 0.778) for conditions under which our method succeeds over transfer learning, the de-facto approach for training with limited labels.
Tasks	Knowledge Base Completion, Question Answering, Transfer Learning, Visual Question Answering
Published	2019-04-25
URL	https://arxiv.org/abs/1904.11622v3
PDF	https://arxiv.org/pdf/1904.11622v3.pdf
PWC	https://paperswithcode.com/paper/scene-graph-prediction-with-limited-labels
Repo	https://github.com/vincentschen/limited-label-scene-graphs
Framework	none

Neural Consciousness Flow


Title	Neural Consciousness Flow
Authors	Xiaoran Xu, Wei Feng, Zhiqing Sun, Zhi-Hong Deng
Abstract	The ability of reasoning beyond data fitting is substantial to deep learning systems in order to make a leap forward towards artificial general intelligence. A lot of efforts have been made to model neural-based reasoning as an iterative decision-making process based on recurrent networks and reinforcement learning. Instead, inspired by the consciousness prior proposed by Yoshua Bengio, we explore reasoning with the notion of attentive awareness from a cognitive perspective, and formulate it in the form of attentive message passing on graphs, called neural consciousness flow (NeuCFlow). Aiming to bridge the gap between deep learning systems and reasoning, we propose an attentive computation framework with a three-layer architecture, which consists of an unconsciousness flow layer, a consciousness flow layer, and an attention flow layer. We implement the NeuCFlow model with graph neural networks (GNNs) and conditional transition matrices. Our attentive computation greatly reduces the complexity of vanilla GNN-based methods, capable of running on large-scale graphs. We validate our model for knowledge graph reasoning by solving a series of knowledge base completion (KBC) tasks. The experimental results show NeuCFlow significantly outperforms previous state-of-the-art KBC methods, including the embedding-based and the path-based. The reproducible code can be found by the link below.
Tasks	Decision Making, Knowledge Base Completion
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13049v1
PDF	https://arxiv.org/pdf/1905.13049v1.pdf
PWC	https://paperswithcode.com/paper/neural-consciousness-flow
Repo	https://github.com/netpaladinx/NeuCFlow
Framework	tf

Path Ranking with Attention to Type Hierarchies


Title	Path Ranking with Attention to Type Hierarchies
Authors	Weiyu Liu, Angel Daruna, Zsolt Kira, Sonia Chernova
Abstract	The objective of the knowledge base completion problem is to infer missing information from existing facts in a knowledge base. Prior work has demonstrated the effectiveness of path-ranking based methods, which solve the problem by discovering observable patterns in knowledge graphs, consisting of nodes representing entities and edges representing relations. However, these patterns either lack accuracy because they rely solely on relations or cannot easily generalize due to the direct use of specific entity information. We introduce Attentive Path Ranking, a novel path pattern representation that leverages type hierarchies of entities to both avoid ambiguity and maintain generalization. Then, we present an end-to-end trained attention-based RNN model to discover the new path patterns from data. Experiments conducted on benchmark knowledge base completion datasets WN18RR and FB15k-237 demonstrate that the proposed model outperforms existing methods on the fact prediction task by statistically significant margins of 26% and 10%, respectively. Furthermore, quantitative and qualitative analyses show that the path patterns balance between generalization and discrimination.
Tasks	Knowledge Base Completion, Knowledge Graphs
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10799v3
PDF	https://arxiv.org/pdf/1905.10799v3.pdf
PWC	https://paperswithcode.com/paper/path-ranking-with-attention-to-type
Repo	https://github.com/wliu88/AttentivePathRanking
Framework	pytorch

Contextual Encoder-Decoder Network for Visual Saliency Prediction


Title	Contextual Encoder-Decoder Network for Visual Saliency Prediction
Authors	Alexander Kroner, Mario Senden, Kurt Driessens, Rainer Goebel
Abstract	Predicting salient regions in natural images requires the detection of objects that are present in a scene. To develop robust representations for this challenging task, high-level visual features at multiple spatial scales must be extracted and augmented with contextual information. However, existing models aimed at explaining human fixation maps do not incorporate such a mechanism explicitly. Here we propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task. The architecture forms an encoder-decoder structure and includes a module with multiple convolutional layers at different dilation rates to capture multi-scale features in parallel. Moreover, we combine the resulting representations with global scene information for accurately predicting visual saliency. Our model achieves competitive results on two public saliency benchmarks and we demonstrate the effectiveness of the suggested approach on selected examples. The network is based on a lightweight image classification backbone and hence presents a suitable choice for applications with limited computational resources to estimate human fixations across complex natural scenes.
Tasks	Image Classification, Saliency Prediction
Published	2019-02-18
URL	https://arxiv.org/abs/1902.06634v2
PDF	https://arxiv.org/pdf/1902.06634v2.pdf
PWC	https://paperswithcode.com/paper/contextual-encoder-decoder-network-for-visual
Repo	https://github.com/alexanderkroner/saliency
Framework	tf

DELTA: A DEep learning based Language Technology plAtform


Title	DELTA: A DEep learning based Language Technology plAtform
Authors	Kun Han, Junwen Chen, Hui Zhang, Haiyang Xu, Yiping Peng, Yun Wang, Ning Ding, Hui Deng, Yonghu Gao, Tingwei Guo, Yi Zhang, Yahao He, Baochang Ma, Yulong Zhou, Kangli Zhang, Chao Liu, Ying Lyu, Chenxi Wang, Cheng Gong, Yunbo Wang, Wei Zou, Hui Song, Xiangang Li
Abstract	In this paper we present DELTA, a deep learning based language technology platform. DELTA is an end-to-end platform designed to solve industry level natural language and speech processing problems. It integrates most popular neural network models for training as well as comprehensive deployment tools for production. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. We demonstrate the reliable performance with DELTA on several natural language processing and speech tasks, including text classification, named entity recognition, natural language inference, speech recognition, speaker verification, etc. DELTA has been used for developing several state-of-the-art algorithms for publications and delivering real production to serve millions of users.
Tasks	Named Entity Recognition, Natural Language Inference, Speaker Verification, Speech Recognition, Text Classification
Published	2019-08-02
URL	https://arxiv.org/abs/1908.01853v1
PDF	https://arxiv.org/pdf/1908.01853v1.pdf
PWC	https://paperswithcode.com/paper/delta-a-deep-learning-based-language
Repo	https://github.com/didi/delta
Framework	tf

Forecasting Pedestrian Trajectory with Machine-Annotated Training Data


Title	Forecasting Pedestrian Trajectory with Machine-Annotated Training Data
Authors	Olly Styles, Arun Ross, Victor Sanchez
Abstract	Reliable anticipation of pedestrian trajectory is imperative for the operation of autonomous vehicles and can significantly enhance the functionality of advanced driver assistance systems. While significant progress has been made in the field of pedestrian detection, forecasting pedestrian trajectories remains a challenging problem due to the unpredictable nature of pedestrians and the huge space of potentially useful features. In this work, we present a deep learning approach for pedestrian trajectory forecasting using a single vehicle-mounted camera. Deep learning models that have revolutionized other areas in computer vision have seen limited application to trajectory forecasting, in part due to the lack of richly annotated training data. We address the lack of training data by introducing a scalable machine annotation scheme that enables our model to be trained using a large dataset without human annotation. In addition, we propose Dynamic Trajectory Predictor (DTP), a model for forecasting pedestrian trajectory up to one second into the future. DTP is trained using both human and machine-annotated data, and anticipates dynamic motion that is not captured by linear models. Experimental evaluation confirms the benefits of the proposed model.
Tasks	Autonomous Vehicles, Pedestrian Detection
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03681v1
PDF	https://arxiv.org/pdf/1905.03681v1.pdf
PWC	https://paperswithcode.com/paper/190503681
Repo	https://github.com/olly-styles/Dynamic-Trajectory-Predictor
Framework	none

Simple vs complex temporal recurrences for video saliency prediction


Title	Simple vs complex temporal recurrences for video saliency prediction
Authors	Panagiotis Linardos, Eva Mohedano, Juan Jose Nieto, Noel E. O’Connor, Xavier Giro-i-Nieto, Kevin McGuinness
Abstract	This paper investigates modifying an existing neural network architecture for static saliency prediction using two types of recurrences that integrate information from the temporal domain. The first modification is the addition of a ConvLSTM within the architecture, while the second is a conceptually simple exponential moving average of an internal convolutional state. We use weights pre-trained on the SALICON dataset and fine-tune our model on DHF1K. Our results show that both modifications achieve state-of-the-art results and produce similar saliency maps. Source code is available at https://git.io/fjPiB.
Tasks	Saliency Prediction
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01869v4
PDF	https://arxiv.org/pdf/1907.01869v4.pdf
PWC	https://paperswithcode.com/paper/simple-vs-complex-temporal-recurrences-for
Repo	https://github.com/Linardos/SalEMA
Framework	pytorch

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations


Title	Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations
Authors	Jiwoon Ahn, Sunghyun Cho, Suha Kwak
Abstract	This paper presents a novel approach for learning instance segmentation with image-level class labels as supervision. Our approach generates pseudo instance segmentation labels of training images, which are used to train a fully supervised model. For generating the pseudo labels, we first identify confident seed areas of object classes from attention maps of an image classification model, and propagate them to discover the entire instance areas with accurate boundaries. To this end, we propose IRNet, which estimates rough areas of individual instances and detects boundaries between different object classes. It thus enables to assign instance labels to the seeds and to propagate them within the boundaries so that the entire areas of instances can be estimated accurately. Furthermore, IRNet is trained with inter-pixel relations on the attention maps, thus no extra supervision is required. Our method with IRNet achieves an outstanding performance on the PASCAL VOC 2012 dataset, surpassing not only previous state-of-the-art trained with the same level of supervision, but also some of previous models relying on stronger supervision.
Tasks	Image Classification, Instance Segmentation, Semantic Segmentation
Published	2019-04-10
URL	https://arxiv.org/abs/1904.05044v3
PDF	https://arxiv.org/pdf/1904.05044v3.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-learning-of-instance
Repo	https://github.com/jiwoon-ahn/irn
Framework	pytorch

Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation


Title	Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation
Authors	Ivan Donadello, Luciano Serafini
Abstract	Semantic Image Interpretation is the task of extracting a structured semantic description from images. This requires the detection of visual relationships: triples (subject,relation,object) describing a semantic relation between a subject and an object. A pure supervised approach to visual relationship detection requires a complete and balanced training set for all the possible combinations of (subject, relation, object). However, such training sets are not available and would require a prohibitive human effort. This implies the ability of predicting triples which do not appear in the training set. This problem is called zero-shot learning. State-of-the-art approaches to zero-shot learning exploit similarities among relationships in the training set or external linguistic knowledge. In this paper, we perform zero-shot learning by using Logic Tensor Networks, a novel Statistical Relational Learning framework that exploits both the similarities with other seen relationships and background knowledge, expressed with logical constraints between subjects, relations and objects. The experiments on the Visual Relationship Dataset show that the use of logical constraints outperforms the current methods. This implies that background knowledge can be used to alleviate the incompleteness of training sets.
Tasks	Relational Reasoning, Tensor Networks, Zero-Shot Learning
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00462v1
PDF	https://arxiv.org/pdf/1910.00462v1.pdf
PWC	https://paperswithcode.com/paper/compensating-supervision-incompleteness-with
Repo	https://github.com/logictensornetworks/logictensornetworks
Framework	tf

OPIEC: An Open Information Extraction Corpus


Title	OPIEC: An Open Information Extraction Corpus
Authors	Kiril Gashteovski, Sebastian Wanner, Sven Hertling, Samuel Broscheit, Rainer Gemulla
Abstract	Open information extraction (OIE) systems extract relations and their arguments from natural language text in an unsupervised manner. The resulting extractions are a valuable resource for downstream tasks such as knowledge base construction, open question answering, or event schema induction. In this paper, we release, describe, and analyze an OIE corpus called OPIEC, which was extracted from the text of English Wikipedia. OPIEC complements the available OIE resources: It is the largest OIE corpus publicly available to date (over 340M triples) and contains valuable metadata such as provenance information, confidence scores, linguistic annotations, and semantic annotations including spatial and temporal information. We analyze the OPIEC corpus by comparing its content with knowledge bases such as DBpedia or YAGO, which are also based on Wikipedia. We found that most of the facts between entities present in OPIEC cannot be found in DBpedia and/or YAGO, that OIE facts often differ in the level of specificity compared to knowledge base facts, and that OIE open relations are generally highly polysemous. We believe that the OPIEC corpus is a valuable resource for future research on automated knowledge base construction.
Tasks	Open Information Extraction, Question Answering
Published	2019-04-28
URL	http://arxiv.org/abs/1904.12324v1
PDF	http://arxiv.org/pdf/1904.12324v1.pdf
PWC	https://paperswithcode.com/paper/opiec-an-open-information-extraction-corpus
Repo	https://github.com/uma-pi1/OPIEC
Framework	none

Tensor-based algorithms for image classification


Title	Tensor-based algorithms for image classification
Authors	Stefan Klus, Patrick Gelß
Abstract	The interest in machine learning with tensor networks has been growing rapidly in recent years. We show that tensor-based methods developed for learning the governing equations of dynamical systems from data can, in the same way, be used for supervised learning problems and propose two novel approaches for image classification. One is a kernel-based reformulation of the previously introduced MANDy (multidimensional approximation of nonlinear dynamics), the other an alternating ridge regression in the tensor-train format. We apply both methods to the MNIST and fashion MNIST data set and show that the approaches are competitive with state-of-the-art neural network-based classifiers.
Tasks	Image Classification, Tensor Networks
Published	2019-10-04
URL	https://arxiv.org/abs/1910.02150v2
PDF	https://arxiv.org/pdf/1910.02150v2.pdf
PWC	https://paperswithcode.com/paper/tensor-based-algorithms-for-image
Repo	https://github.com/PGelss/scikit_tt
Framework	none

Learning Shape Representation on Sparse Point Clouds for Volumetric Image Segmentation


Title	Learning Shape Representation on Sparse Point Clouds for Volumetric Image Segmentation
Authors	Fabian Balsiger, Yannick Soom, Olivier Scheidegger, Mauricio Reyes
Abstract	Volumetric image segmentation with convolutional neural networks (CNNs) encounters several challenges, which are specific to medical images. Among these challenges are large volumes of interest, high class imbalances, and difficulties in learning shape representations. To tackle these challenges, we propose to improve over traditional CNN-based volumetric image segmentation through point-wise classification of point clouds. The sparsity of point clouds allows processing of entire image volumes, balancing highly imbalanced segmentation problems, and explicitly learning an anatomical shape. We build upon PointCNN, a neural network proposed to process point clouds, and propose here to jointly encode shape and volumetric information within the point cloud in a compact and computationally effective manner. We demonstrate how this approach can then be used to refine CNN-based segmentation, which yields significantly improved results in our experiments on the difficult task of peripheral nerve segmentation from magnetic resonance neurography images. By synthetic experiments, we further show the capability of our approach in learning an explicit anatomical shape representation.
Tasks	Semantic Segmentation
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02281v1
PDF	https://arxiv.org/pdf/1906.02281v1.pdf
PWC	https://paperswithcode.com/paper/learning-shape-representation-on-sparse-point
Repo	https://github.com/fabianbalsiger/point-cloud-segmentation-miccai2019
Framework	tf

COMIC: Towards A Compact Image Captioning Model with Attention


Title	COMIC: Towards A Compact Image Captioning Model with Attention
Authors	Jia Huei Tan, Chee Seng Chan, Joon Huang Chuah
Abstract	Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder-decoder style networks with attention do not scale naturally to large vocabulary size, making them difficult to be deployed on embedded system with limited hardware resources. This is because the size of word and output embedding matrices grow proportionally with the size of vocabulary, adversely affecting the compactness of these networks. To address this limitation, this paper introduces a brand new idea in the domain of image captioning. That is, we tackle the problem of compactness of image captioning models which is hitherto unexplored. We showed that, our proposed model, named COMIC for COMpact Image Captioning, achieves comparable results in five common evaluation metrics with state-of-the-art approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an embedding vocabulary size that is 39x - 99x smaller. The source code and models are available at: https://github.com/jiahuei/COMIC-Compact-Image-Captioning-with-Attention
Tasks	Image Captioning
Published	2019-03-04
URL	https://arxiv.org/abs/1903.01072v3
PDF	https://arxiv.org/pdf/1903.01072v3.pdf
PWC	https://paperswithcode.com/paper/comic-towards-a-compact-image-captioning
Repo	https://github.com/jiahuei/COMIC-Compact-Image-Captioning-with-Attention
Framework	tf

Plug and Play Language Models: A Simple Approach to Controlled Text Generation


Title	Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Authors	Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu
Abstract	Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities. However, controlling attributes of the generated language (e.g. switching topic or sentiment) is difficult without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining. We propose a simple alternative: the Plug and Play Language Model (PPLM) for controllable language generation, which combines a pretrained LM with one or more simple attribute classifiers that guide text generation without any further training of the LM. In the canonical scenario we present, the attribute models are simple classifiers consisting of a user-specified bag of words or a single learned layer with 100,000 times fewer parameters than the LM. Sampling entails a forward and backward pass in which gradients from the attribute model push the LM’s hidden activations and thus guide the generation. Model samples demonstrate control over a range of topics and sentiment styles, and extensive automated and human annotated evaluations show attribute alignment and fluency. PPLMs are flexible in that any combination of differentiable attribute models may be used to steer text generation, which will allow for diverse and creative applications beyond the examples given in this paper.
Tasks	Language Modelling, Text Generation
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02164v4
PDF	https://arxiv.org/pdf/1912.02164v4.pdf
PWC	https://paperswithcode.com/paper/plug-and-play-language-models-a-simple
Repo	https://github.com/uber-research/PPLM
Framework	pytorch

CODA: Counting Objects via Scale-aware Adversarial Density Adaption


Title	CODA: Counting Objects via Scale-aware Adversarial Density Adaption
Authors	Li Wang, Yongbo Li, Xiangyang Xue
Abstract	Recent advances in crowd counting have achieved promising results with increasingly complex convolutional neural network designs. However, due to the unpredictable domain shift, generalizing trained model to unseen scenarios is often suboptimal. Inspired by the observation that density maps of different scenarios share similar local structures, we propose a novel adversarial learning approach in this paper, i.e., CODA (\emph{Counting Objects via scale-aware adversarial Density Adaption}). To deal with different object scales and density distributions, we perform adversarial training with pyramid patches of multi-scales from both source- and target-domain. Along with a ranking constraint across levels of the pyramid input, consistent object counts can be produced for different scales. Extensive experiments demonstrate that our network produces much better results on unseen datasets compared with existing counting adaption models. Notably, the performance of our CODA is comparable with the state-of-the-art fully-supervised models that are trained on the target dataset. Further analysis indicates that our density adaption framework can effortlessly extend to scenarios with different objects. \emph{The code is available at https://github.com/Willy0919/CODA.}
Tasks	Crowd Counting
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10442v1
PDF	http://arxiv.org/pdf/1903.10442v1.pdf
PWC	https://paperswithcode.com/paper/coda-counting-objects-via-scale-aware
Repo	https://github.com/Willy0919/CODA
Framework	pytorch