February 1, 2020

3373 words 16 mins read

Paper Group AWR 221

Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification. ODE$^2$VAE: Deep generative second order ODEs with Bayesian neural networks. Missing Data Imputation with Adversarially-trained Graph Convolutional Networks. Spontaneous Facial Micro-Expression Recognition using 3D Spatiotemporal Convolutional Neural Networks. Structured Va …

Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification


Title	Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification
Authors	Reno Kriz, João Sedoc, Marianna Apidianaki, Carolina Zheng, Gaurav Kumar, Eleni Miltsakaki, Chris Callison-Burch
Abstract	Sentence simplification is the task of rewriting texts so they are easier to understand. Recent research has applied sequence-to-sequence (Seq2Seq) models to this task, focusing largely on training-time improvements via reinforcement learning and memory augmentation. One of the main problems with applying generic Seq2Seq models for simplification is that these models tend to copy directly from the original sentence, resulting in outputs that are relatively long and complex. We aim to alleviate this issue through the use of two main techniques. First, we incorporate content word complexities, as predicted with a leveled word complexity model, into our loss function during training. Second, we generate a large set of diverse candidate simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity. Here, we measure simplicity through a novel sentence complexity model. These extensions allow our models to perform competitively with state-of-the-art systems while generating simpler sentences. We report standard automatic and human evaluation metrics.
Tasks
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02767v1
PDF	http://arxiv.org/pdf/1904.02767v1.pdf
PWC	https://paperswithcode.com/paper/complexity-weighted-loss-and-diverse
Repo	https://github.com/rekriz11/sockeye-recipes
Framework	none

ODE$^2$VAE: Deep generative second order ODEs with Bayesian neural networks


Title	ODE$^2$VAE: Deep generative second order ODEs with Bayesian neural networks
Authors	Çağatay Yıldız, Markus Heinonen, Harri Lähdesmäki
Abstract	We present Ordinary Differential Equation Variational Auto-Encoder (ODE$^2$VAE), a latent second order ODE model for high-dimensional sequential data. Leveraging the advances in deep generative models, ODE$^2$VAE can simultaneously learn the embedding of high dimensional trajectories and infer arbitrarily complex continuous-time latent dynamics. Our model explicitly decomposes the latent space into momentum and position components and solves a second order ODE system, which is in contrast to recurrent neural network (RNN) based time series models and recently proposed black-box ODE techniques. In order to account for uncertainty, we propose probabilistic latent ODE dynamics parameterized by deep Bayesian neural networks. We demonstrate our approach on motion capture, image rotation and bouncing balls datasets. We achieve state-of-the-art performance in long term motion prediction and imputation tasks.
Tasks	Imputation, Motion Capture, motion prediction, Time Series
Published	2019-05-27
URL	https://arxiv.org/abs/1905.10994v2
PDF	https://arxiv.org/pdf/1905.10994v2.pdf
PWC	https://paperswithcode.com/paper/ode2vae-deep-generative-second-order-odes
Repo	https://github.com/cagatayyildiz/ODE2VAE
Framework	tf

Missing Data Imputation with Adversarially-trained Graph Convolutional Networks


Title	Missing Data Imputation with Adversarially-trained Graph Convolutional Networks
Authors	Indro Spinelli, Simone Scardapane, Aurelio Uncini
Abstract	Missing data imputation (MDI) is a fundamental problem in many scientific disciplines. Popular methods for MDI use global statistics computed from the entire data set (e.g., the feature-wise medians), or build predictive models operating independently on every instance. In this paper we propose a more general framework for MDI, leveraging recent work in the field of graph neural networks (GNNs). We formulate the MDI task in terms of a graph denoising autoencoder, where each edge of the graph encodes the similarity between two patterns. A GNN encoder learns to build intermediate representations for each example by interleaving classical projection layers and locally combining information between neighbors, while another decoding GNN learns to reconstruct the full imputed data set from this intermediate embedding. In order to speed-up training and improve the performance, we use a combination of multiple losses, including an adversarial loss implemented with the Wasserstein metric and a gradient penalty. We also explore a few extensions to the basic architecture involving the use of residual connections between layers, and of global statistics computed from the data set to improve the accuracy. On a large experimental evaluation, we show that our method robustly outperforms state-of-the-art approaches for MDI, especially for large percentages of missing values.
Tasks	Denoising, Imputation
Published	2019-05-06
URL	https://arxiv.org/abs/1905.01907v1
PDF	https://arxiv.org/pdf/1905.01907v1.pdf
PWC	https://paperswithcode.com/paper/missing-data-imputation-with-adversarially
Repo	https://github.com/spindro/GINN
Framework	pytorch

Spontaneous Facial Micro-Expression Recognition using 3D Spatiotemporal Convolutional Neural Networks


Title	Spontaneous Facial Micro-Expression Recognition using 3D Spatiotemporal Convolutional Neural Networks
Authors	Sai Prasanna Teja Reddy, Surya Teja Karri, Shiv Ram Dubey, Snehasis Mukherjee
Abstract	Facial expression recognition in videos is an active area of research in computer vision. However, fake facial expressions are difficult to be recognized even by humans. On the other hand, facial micro-expressions generally represent the actual emotion of a person, as it is a spontaneous reaction expressed through human face. Despite of a few attempts made for recognizing micro-expressions, still the problem is far from being a solved problem, which is depicted by the poor rate of accuracy shown by the state-of-the-art methods. A few CNN based approaches are found in the literature to recognize micro-facial expressions from still images. Whereas, a spontaneous micro-expression video contains multiple frames that have to be processed together to encode both spatial and temporal information. This paper proposes two 3D-CNN methods: MicroExpSTCNN and MicroExpFuseNet, for spontaneous facial micro-expression recognition by exploiting the spatiotemporal information in CNN framework. The MicroExpSTCNN considers the full spatial information, whereas the MicroExpFuseNet is based on the 3D-CNN feature fusion of the eyes and mouth regions. The experiments are performed over CAS(ME)^2 and SMIC micro-expression databases. The proposed MicroExpSTCNN model outperforms the state-of-the-art methods.
Tasks	Facial Expression Recognition
Published	2019-03-27
URL	http://arxiv.org/abs/1904.01390v1
PDF	http://arxiv.org/pdf/1904.01390v1.pdf
PWC	https://paperswithcode.com/paper/spontaneous-facial-micro-expression-1
Repo	https://github.com/bogireddytejareddy/micro-expression-recognition
Framework	none

Structured Variational Inference in Continuous Cox Process Models


Title	Structured Variational Inference in Continuous Cox Process Models
Authors	Virginia Aglietti, Edwin V. Bonilla, Theodoros Damoulas, Sally Cripps
Abstract	We propose a scalable framework for inference in an inhomogeneous Poisson process modeled by a continuous sigmoidal Cox process that assumes the corresponding intensity function is given by a Gaussian process (GP) prior transformed with a scaled logistic sigmoid function. We present a tractable representation of the likelihood through augmentation with a superposition of Poisson processes. This view enables a structured variational approximation capturing dependencies across variables in the model. Our framework avoids discretization of the domain, does not require accurate numerical integration over the input space and is not limited to GPs with squared exponential kernels. We evaluate our approach on synthetic and real-world data showing that its benefits are particularly pronounced on multivariate input settings where it overcomes the limitations of mean-field methods and sampling schemes. We provide the state of-the-art in terms of speed, accuracy and uncertainty quantification trade-offs.
Tasks
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03161v1
PDF	https://arxiv.org/pdf/1906.03161v1.pdf
PWC	https://paperswithcode.com/paper/structured-variational-inference-in
Repo	https://github.com/VirgiAgl/STVB
Framework	tf

CBNet: A Novel Composite Backbone Network Architecture for Object Detection


Title	CBNet: A Novel Composite Backbone Network Architecture for Object Detection
Authors	Yudong Liu, Yongtao Wang, Siwei Wang, TingTing Liang, Qijie Zhao, Zhi Tang, Haibin Ling
Abstract	In existing CNN based detectors, the backbone network is a very important component for basic feature extraction, and the performance of the detectors highly depends on it. In this paper, we aim to achieve better detection performance by building a more powerful backbone from existing backbones like ResNet and ResNeXt. Specifically, we propose a novel strategy for assembling multiple identical backbones by composite connections between the adjacent backbones, to form a more powerful backbone named Composite Backbone Network (CBNet). In this way, CBNet iteratively feeds the output features of the previous backbone, namely high-level features, as part of input features to the succeeding backbone, in a stage-by-stage fashion, and finally the feature maps of the last backbone (named Lead Backbone) are used for object detection. We show that CBNet can be very easily integrated into most state-of-the-art detectors and significantly improve their performances. For example, it boosts the mAP of FPN, Mask R-CNN and Cascade R-CNN on the COCO dataset by about 1.5 to 3.0 percent. Meanwhile, experimental results show that the instance segmentation results can also be improved. Specially, by simply integrating the proposed CBNet into the baseline detector Cascade Mask R-CNN, we achieve a new state-of-the-art result on COCO dataset (mAP of 53.3) with single model, which demonstrates great effectiveness of the proposed CBNet architecture. Code will be made available on https://github.com/PKUbahuangliuhe/CBNet.
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03625v1
PDF	https://arxiv.org/pdf/1909.03625v1.pdf
PWC	https://paperswithcode.com/paper/cbnet-a-novel-composite-backbone-network
Repo	https://github.com/PKUbahuangliuhe/CBNet
Framework	none

Dynamic Self-training Framework for Graph Convolutional Networks


Title	Dynamic Self-training Framework for Graph Convolutional Networks
Authors	Ziang Zhou, Shenzhong Zhang, Zengfeng Huang
Abstract	Graph neural networks (GNN) such as GCN, GAT, MoNet have achieved state-of-the-art results on semi-supervised learning on graphs. However, when the number of labeled nodes is very small, the performances of GNNs downgrade dramatically. Self-training has proved to be effective for resolving this issue, however, the performance of self-trained GCN is still inferior to that of G2G and DGI for many settings. Moreover, additional model complexity make it more difficult to tune the hyper-parameters and do model selection. We argue that the power of self-training is still not fully explored for the node classification task. In this paper, we propose a unified end-to-end self-training framework called \emph{Dynamic Self-traning}, which generalizes and simplifies prior work. A simple instantiation of the framework based on GCN is provided and empirical results show that our framework outperforms all previous methods including GNNs, embedding based method and self-trained GCNs by a noticeable margin. Moreover, compared with standard self-training, hyper-parameter tuning for our framework is easier.
Tasks	Model Selection, Node Classification
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02684v1
PDF	https://arxiv.org/pdf/1910.02684v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-self-training-framework-for-graph
Repo	https://github.com/scottjiao/Dynamic-selftraining-GCN
Framework	tf

Kernel Node Embeddings


Title	Kernel Node Embeddings
Authors	Abdulkadir Çelikkanat, Fragkiskos D. Malliaros
Abstract	Learning representations of nodes in a low dimensional space is a crucial task with many interesting applications in network analysis, including link prediction and node classification. Two popular approaches for this problem include matrix factorization and random walk-based models. In this paper, we aim to bring together the best of both worlds, towards learning latent node representations. In particular, we propose a weighted matrix factorization model which encodes random walk-based information about the nodes of the graph. The main benefit of this formulation is that it allows to utilize kernel functions on the computation of the embeddings. We perform an empirical evaluation on real-world networks, showing that the proposed model outperforms baseline node embedding algorithms in two downstream machine learning tasks.
Tasks	Link Prediction, Node Classification
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03416v2
PDF	https://arxiv.org/pdf/1909.03416v2.pdf
PWC	https://paperswithcode.com/paper/kernel-node-embeddings
Repo	https://github.com/abdcelikkanat/kernelNE
Framework	none

Graph Representation Ensemble Learning


Title	Graph Representation Ensemble Learning
Authors	Palash Goyal, Di Huang, Sujit Rokka Chhetri, Arquimedes Canedo, Jaya Shree, Evan Patterson
Abstract	Representation learning on graphs has been gaining attention due to its wide applicability in predicting missing links, and classifying and recommending nodes. Most embedding methods aim to preserve certain properties of the original graph in the low dimensional space. However, real world graphs have a combination of several properties which are difficult to characterize and capture by a single approach. In this work, we introduce the problem of graph representation ensemble learning and provide a first of its kind framework to aggregate multiple graph embedding methods efficiently. We provide analysis of our framework and analyze – theoretically and empirically – the dependence between state-of-the-art embedding methods. We test our models on the node classification task on four real world graphs and show that proposed ensemble approaches can outperform the state-of-the-art methods by up to 8% on macro-F1. We further show that the approach is even more beneficial for underrepresented classes providing an improvement of up to 12%.
Tasks	Graph Embedding, Node Classification, Representation Learning
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02811v2
PDF	https://arxiv.org/pdf/1909.02811v2.pdf
PWC	https://paperswithcode.com/paper/graph-representation-ensemble-learning
Repo	https://github.com/dihuang0220/GraphEnsembleLearning
Framework	none

STEP: Spatio-Temporal Progressive Learning for Video Action Detection


Title	STEP: Spatio-Temporal Progressive Learning for Video Action Detection
Authors	Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz
Abstract	In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector—a progressive learning framework for spatio-temporal action detection in videos. Starting from a handful of coarse-scale proposal cuboids, our approach progressively refines the proposals towards actions over a few steps. In this way, high-quality proposals (i.e., adhere to action movements) can be gradually obtained at later steps by leveraging the regression outputs from previous steps. At each step, we adaptively extend the proposals in time to incorporate more related temporal context. Compared to the prior work that performs action detection in one run, our progressive learning framework is able to naturally handle the spatial displacement within action tubes and therefore provides a more effective way for spatio-temporal modeling. We extensively evaluate our approach on UCF101 and AVA, and demonstrate superior detection results. Remarkably, we achieve mAP of 75.0% and 18.6% on the two datasets with 3 progressive steps and using respectively only 11 and 34 initial proposals.
Tasks	Action Detection, Action Recognition In Videos
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09288v1
PDF	http://arxiv.org/pdf/1904.09288v1.pdf
PWC	https://paperswithcode.com/paper/step-spatio-temporal-progressive-learning-for
Repo	https://github.com/NVlabs/STEP
Framework	pytorch

Predicting kills in Game of Thrones using network properties


Title	Predicting kills in Game of Thrones using network properties
Authors	Jaka Stavanja, Matej Klemen
Abstract	TV series such as HBO’s most popular show Game of Thrones have seen a high number of dedicated followers, who watch and thoroughly analyze every minute of the show. Largely discussed aspect of the show between viewers seems to be the dramatic murders of the most important characters, the thing that the series is most known for. In our work, we try to predict characters’ kills (killer and victim pairs) using data about previous kills by the characters and additional metadata. We construct a network with characters as nodes, where two nodes are linked if one killed the other. Then we use a link prediction framework and evaluate different techniques to predict the next possible kills. Lastly, we construct features from various network properties on a social network of characters, which we use in conjunction with classic data mining techniques. We see that due to the small size of the kills dataset and the somewhat random distribution of kills, we cannot predict much with standard indices. However, we show that we can construct an index which is very customized for the exact network we create for Game of Thrones but might not work on other series. We also see that the features we compute on the social network of characters help with standard machine learning approaches as well but do not yield as accurate predictions as we would hope. The best results overall are achieved by using a custom index for link prediction, which is fitting for our type of network and gives us an Area Under the ROC Curve (AUC) of 0.863.
Tasks	Link Prediction
Published	2019-06-22
URL	https://arxiv.org/abs/1906.09468v1
PDF	https://arxiv.org/pdf/1906.09468v1.pdf
PWC	https://paperswithcode.com/paper/predicting-kills-in-game-of-thrones-using
Repo	https://github.com/matejklemen/got-link-prediction
Framework	none

Relevance Proximity Graphs for Fast Relevance Retrieval


Title	Relevance Proximity Graphs for Fast Relevance Retrieval
Authors	Stanislav Morozov, Artem Babenko
Abstract	In plenty of machine learning applications, the most relevant items for a particular query should be efficiently extracted, while the relevance function is based on a highly-nonlinear model, e.g., DNNs or GBDTs. Due to the high computational complexity of such models, exhaustive search is infeasible even for medium-scale problems. To address this issue, we introduce Relevance Proximity Graphs (RPG): an efficient non-exhaustive approach that provides a high-quality approximate solution for maximal relevance retrieval. Namely, we extend the recent similarity graphs framework to the setting, when there is no similarity measure defined on item pairs, which is a common practical use-case. By design, our approach directly maximizes off-the-shelf relevance functions and does not require any proxy auxiliary models. Via extensive experiments, we show that the developed method provides excellent retrieval accuracy while requiring only a few model computations, outperforming indirect models. We open-source our implementation as well as two large-scale datasets to support further research on relevance retrieval.
Tasks
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06887v3
PDF	https://arxiv.org/pdf/1908.06887v3.pdf
PWC	https://paperswithcode.com/paper/relevance-proximity-graphs-for-fast-relevance
Repo	https://github.com/stanis-morozov/rpg
Framework	none


Title	Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations
Authors	Rogerio Bonatti, Ratnesh Madaan, Vibhav Vineet, Sebastian Scherer, Ashish Kapoor
Abstract	Machines are a long way from robustly solving open-world perception-control tasks, such as first-person view (FPV) aerial navigation. While recent advances in end-to-end Machine Learning, especially Imitation and Reinforcement Learning appear promising, they are constrained by the need of large amounts of difficult-to-collect labeled real-world data. Simulated data, on the other hand, is easy to generate, but generally does not render safe behaviors in diverse real-life scenarios. In this work we propose a novel method for learning robust visuomotor policies for real-world deployment which can be trained purely with simulated data. We develop rich state representations that combine supervised and unsupervised environment data. Our approach takes a cross-modal perspective, where separate modalities correspond to the raw camera data and the system states relevant to the task, such as the relative pose of gates to the drone in the case of drone racing. We feed both data modalities into a novel factored architecture, which learns a joint low-dimensional embedding via Variational Auto Encoders. This compact representation is then fed into a control policy, which we trained using imitation learning with expert trajectories in a simulator. We analyze the rich latent spaces learned with our proposed representations, and show that the use of our cross-modal architecture significantly improves control policy performance as compared to end-to-end learning or purely unsupervised feature extractors. We also present real-world results for drone navigation through gates in different track configurations and environmental conditions. Our proposed method, which runs fully onboard, can successfully generalize the learned representations and policies across simulation and reality, significantly outperforming baseline approaches. Supplementary video: https://youtu.be/VKc3A5HlUU8
Tasks	Drone navigation, Imitation Learning
Published	2019-09-16
URL	https://arxiv.org/abs/1909.06993v2
PDF	https://arxiv.org/pdf/1909.06993v2.pdf
PWC	https://paperswithcode.com/paper/learning-controls-using-cross-modal
Repo	https://github.com/microsoft/AirSim-Drone-Racing-VAE-Imitation
Framework	tf

Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative Study


Title	Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative Study
Authors	Manuel Serra Nunes, Atabak Dehban, Plinio Moreno, José Santos-Victor
Abstract	A defining characteristic of intelligent systems is the ability to make action decisions based on the anticipated outcomes. Video prediction systems have been demonstrated as a solution for predicting how the future will unfold visually, and thus, many models have been proposed that are capable of predicting future frames based on a history of observed frames~(and sometimes robot actions). However, a comprehensive method for determining the fitness of different video prediction models at guiding the selection of actions is yet to be developed. Current metrics assess video prediction models based on human perception of frame quality. In contrast, we argue that if these systems are to be used to guide action, necessarily, the actions the robot performs should be encoded in the predicted frames. In this paper, we are proposing a new metric to compare different video prediction models based on this argument. More specifically, we propose an action inference system and quantitatively rank different models based on how well we can infer the robot actions from the predicted frames. Our extensive experiments show that models with high perceptual scores can perform poorly in the proposed action inference tests and thus, may not be suitable options to be used in robot planning systems.
Tasks	Video Prediction
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02564v1
PDF	https://arxiv.org/pdf/1910.02564v1.pdf
PWC	https://paperswithcode.com/paper/action-conditioned-benchmarking-of-robotic
Repo	https://github.com/m-serra/action-inference-for-video-prediction-benchmarking
Framework	tf

Fine-Grained Continual Learning


Title	Fine-Grained Continual Learning
Authors	Vincenzo Lomonaco, Davide Maltoni, Lorenzo Pellegrini
Abstract	Robotic vision is a field where continual learning can play a significant role. An embodied agent operating in a complex environment subject to frequent and unpredictable changes is required to learn and adapt continuously. In the context of object recognition, for example, a robot should be able to learn (without forgetting) objects of never before seen classes as well as improving its recognition capabilities as new instances of already known classes are discovered. Ideally, continual learning should be triggered by the availability of short videos of single objects and performed on-line on on-board hardware with fine-grained updates. In this paper, we introduce a novel fine-grained continual learning protocol based on the CORe50 benchmark and propose two rehearsal-free continual learning techniques, CWR* and AR1, that can learn effectively even in the challenging case of nearly 400 small non-i.i.d. incremental batches. In particular, our experiments show that AR1 can outperform other state-of-the-art rehearsal-free techniques by more than 15% accuracy in some cases, with a very light and constant computational and memory overhead across training batches.
Tasks	Continual Learning, Object Recognition
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03799v2
PDF	https://arxiv.org/pdf/1907.03799v2.pdf
PWC	https://paperswithcode.com/paper/fine-grained-continual-learning
Repo	https://github.com/vlomonaco/core50
Framework	none