February 1, 2020

3277 words 16 mins read

Paper Group AWR 367

“Why did you do that?": Explaining black box models with Inductive Synthesis. On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset. VideoGraph: Recognizing Minutes-Long Human Activities in Videos. Interaction Relational Network for Mutual Action Recognition. Relation Network for Multi-label Aerial Image …

“Why did you do that?": Explaining black box models with Inductive Synthesis


Title	“Why did you do that?": Explaining black box models with Inductive Synthesis
Authors	Görkem Paçacı, David Johnson, Steve McKeever, Andreas Hamfelt
Abstract	By their nature, the composition of black box models is opaque. This makes the ability to generate explanations for the response to stimuli challenging. The importance of explaining black box models has become increasingly important given the prevalence of AI and ML systems and the need to build legal and regulatory frameworks around them. Such explanations can also increase trust in these uncertain systems. In our paper we present RICE, a method for generating explanations of the behaviour of black box models by (1) probing a model to extract model output examples using sensitivity analysis; (2) applying CNPInduce, a method for inductive logic program synthesis, to generate logic programs based on critical input-output pairs; and (3) interpreting the target program as a human-readable explanation. We demonstrate the application of our method by generating explanations of an artificial neural network trained to follow simple traffic rules in a hypothetical self-driving car simulation. We conclude with a discussion on the scalability and usability of our approach and its potential applications to explanation-critical scenarios.
Tasks	Program Synthesis
Published	2019-04-17
URL	http://arxiv.org/abs/1904.09273v1
PDF	http://arxiv.org/pdf/1904.09273v1.pdf
PWC	https://paperswithcode.com/paper/why-did-you-do-that-explaining-black-box
Repo	https://github.com/UppsalaIM/rice
Framework	tf

On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset


Title	On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset
Authors	Muhammad Waleed Gondal, Manuel Wüthrich, Đorđe Miladinović, Francesco Locatello, Martin Breidt, Valentin Volchkov, Joel Akpo, Olivier Bachem, Bernhard Schölkopf, Stefan Bauer
Abstract	Learning meaningful and compact representations with disentangled semantic aspects is considered to be of key importance in representation learning. Since real-world data is notoriously costly to collect, many recent state-of-the-art disentanglement models have heavily relied on synthetic toy data-sets. In this paper, we propose a novel data-set which consists of over one million images of physical 3D objects with seven factors of variation, such as object color, shape, size and position. In order to be able to control all the factors of variation precisely, we built an experimental platform where the objects are being moved by a robotic arm. In addition, we provide two more datasets which consist of simulations of the experimental setup. These datasets provide for the first time the possibility to systematically investigate how well different disentanglement methods perform on real data in comparison to simulation, and how simulated data can be leveraged to build better representations of the real world. We provide a first experimental study of these questions and our results indicate that learned models transfer poorly, but that model and hyperparameter selection is an effective means of transferring information to the real world.
Tasks	Representation Learning
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03292v3
PDF	https://arxiv.org/pdf/1906.03292v3.pdf
PWC	https://paperswithcode.com/paper/on-the-transfer-of-inductive-bias-from
Repo	https://github.com/causality-and-transfer-learning/disentanglement_dataset
Framework	none

VideoGraph: Recognizing Minutes-Long Human Activities in Videos


Title	VideoGraph: Recognizing Minutes-Long Human Activities in Videos
Authors	Noureldien Hussein, Efstratios Gavves, Arnold W. M. Smeulders
Abstract	Many human activities take minutes to unfold. To represent them, related works opt for statistical pooling, which neglects the temporal structure. Others opt for convolutional methods, as CNN and Non-Local. While successful in learning temporal concepts, they are short of modeling minutes-long temporal dependencies. We propose VideoGraph, a method to achieve the best of two worlds: represent minutes-long human activities and learn their underlying temporal structure. VideoGraph learns a graph-based representation for human activities. The graph, its nodes and edges are learned entirely from video datasets, making VideoGraph applicable to problems without node-level annotation. The result is improvements over related works on benchmarks: Epic-Kitchen and Breakfast. Besides, we demonstrate that VideoGraph is able to learn the temporal structure of human activities in minutes-long videos.
Tasks
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05143v2
PDF	https://arxiv.org/pdf/1905.05143v2.pdf
PWC	https://paperswithcode.com/paper/videograph-recognizing-minutes-long-human
Repo	https://github.com/noureldien/videograph
Framework	tf

Interaction Relational Network for Mutual Action Recognition


Title	Interaction Relational Network for Mutual Action Recognition
Authors	Mauricio Perez, Jun Liu, Alex C. Kot
Abstract	Person-person mutual action recognition (also referred to as interaction recognition) is an important research branch of human activity analysis. Current solutions in the field are mainly dominated by CNNs, GCNs and LSTMs. These approaches often consist of complicated architectures and mechanisms to embed the relationships between the two persons on the architecture itself, to ensure the interaction patterns can be properly learned. In this paper, we propose a more simple yet very powerful architecture, named Interaction Relational Network (IRN), which utilizes minimal prior knowledge about the structure of the human body. We drive the network to identify by itself how to relate the body parts from the individuals interacting. In order to better represent the interaction, we define two different relationships, leading to specialized architectures and models for each. These multiple relationship models will then be fused into a single and special architecture, in order to leverage both streams of information for further enhancing the relational reasoning capability. Furthermore we define important structured pair-wise operations to extract meaningful extra information from each pair of joints – distance and motion. Ultimately, with the coupling of an LSTM, our IRN is capable of paramount sequential relational reasoning. These important extensions we made to our network can also be valuable to other problems that require sophisticated relational reasoning. Our solution is able to achieve state-of-the-art performance on the traditional interaction recognition datasets SBU and UT, and also on the mutual actions from the large-scale NTU RGB+D and NTU RGB+D 120 datasets.
Tasks	Relational Reasoning
Published	2019-10-11
URL	https://arxiv.org/abs/1910.04963v1
PDF	https://arxiv.org/pdf/1910.04963v1.pdf
PWC	https://paperswithcode.com/paper/interaction-relational-network-for-mutual
Repo	https://github.com/mauriciolp/inter-rel-net
Framework	tf

Relation Network for Multi-label Aerial Image Classification


Title	Relation Network for Multi-label Aerial Image Classification
Authors	Yuansheng Hua, Lichao Mou, Xiao Xiang Zhu
Abstract	Multi-label classification plays a momentous role in perceiving intricate contents of an aerial image and triggers several related studies over the last years. However, most of them deploy few efforts in exploiting label relations, while such dependencies are crucial for making accurate predictions. Although an LSTM layer can be introduced to modeling such label dependencies in a chain propagation manner, the efficiency might be questioned when certain labels are improperly inferred. To address this, we propose a novel aerial image multi-label classification network, attention-aware label relational reasoning network. Particularly, our network consists of three elemental modules: 1) a label-wise feature parcel learning module, 2) an attentional region extraction module, and 3) a label relational inference module. To be more specific, the label-wise feature parcel learning module is designed for extracting high-level label-specific features. The attentional region extraction module aims at localizing discriminative regions in these features and yielding attentional label-specific features. The label relational inference module finally predicts label existences using label relations reasoned from outputs of the previous module. The proposed network is characterized by its capacities of extracting discriminative label-wise features in a proposal-free way and reasoning about label relations naturally and interpretably. In our experiments, we evaluate the proposed model on the UCM multi-label dataset and a newly produced dataset, AID multi-label dataset. Quantitative and qualitative results on these two datasets demonstrate the effectiveness of our model. To facilitate progress in the multi-label aerial image classification, the AID multi-label dataset will be made publicly available.
Tasks	Image Classification, Multi-Label Classification, Relational Reasoning
Published	2019-07-16
URL	https://arxiv.org/abs/1907.07274v3
PDF	https://arxiv.org/pdf/1907.07274v3.pdf
PWC	https://paperswithcode.com/paper/relation-network-for-multi-label-aerial-image
Repo	https://github.com/Hua-YS/AID-Multilabel-Dataset
Framework	none

mu-Forcing: Training Variational Recurrent Autoencoders for Text Generation


Title	mu-Forcing: Training Variational Recurrent Autoencoders for Text Generation
Authors	Dayiheng Liu, Xu Yang, Feng He, Yuanyuan Chen, Jiancheng Lv
Abstract	It has been previously observed that training Variational Recurrent Autoencoders (VRAE) for text generation suffers from serious uninformative latent variables problem. The model would collapse into a plain language model that totally ignore the latent variables and can only generate repeating and dull samples. In this paper, we explore the reason behind this issue and propose an effective regularizer based approach to address it. The proposed method directly injects extra constraints on the posteriors of latent variables into the learning process of VRAE, which can flexibly and stably control the trade-off between the KL term and the reconstruction term, making the model learn dense and meaningful latent representations. The experimental results show that the proposed method outperforms several strong baselines and can make the model learn interpretable latent variables and generate diverse meaningful sentences. Furthermore, the proposed method can perform well without using other strategies, such as KL annealing.
Tasks	Language Modelling, Text Generation
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10072v1
PDF	https://arxiv.org/pdf/1905.10072v1.pdf
PWC	https://paperswithcode.com/paper/mu-forcing-training-variational-recurrent
Repo	https://github.com/dayihengliu/Mu-Forcing-VRAE
Framework	tf

FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms


Title	FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms
Authors	Henry B. Moss, Andrew Moore, David S. Leslie, Paul Rayson
Abstract	We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models. Despite being known to produce unreliable comparisons, it is still common practice to compare model evaluations based on single choices of random seeds. We show that reliable model selection also requires evaluations based on multiple train-test splits (contrary to common practice in many shared tasks). Using bandit theory from the statistics literature, we are able to adaptively determine appropriate numbers of data splits and random seeds used to evaluate each model, focusing computational resources on the evaluation of promising models whilst avoiding wasting evaluations on models with lower performance. Furthermore, our user-friendly Python implementation produces confidence guarantees of correctly selecting the optimal model. We evaluate our algorithms by selecting between 8 target-dependent sentiment analysis methods using dramatically fewer model evaluations than current model selection approaches.
Tasks	Model Selection, Sentiment Analysis
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12230v1
PDF	https://arxiv.org/pdf/1906.12230v1.pdf
PWC	https://paperswithcode.com/paper/fiesta-fast-identification-of-state-of-the
Repo	https://github.com/apmoore1/fiesta
Framework	none

Additive Margin SincNet for Speaker Recognition


Title	Additive Margin SincNet for Speaker Recognition
Authors	João Antônio Chagas Nunes, David Macêdo, Cleber Zanchettin
Abstract	Speaker Recognition is a challenging task with essential applications such as authentication, automation, and security. The SincNet is a new deep learning based model which has produced promising results to tackle the mentioned task. To train deep learning systems, the loss function is essential to the network performance. The Softmax loss function is a widely used function in deep learning methods, but it is not the best choice for all kind of problems. For distance-based problems, one new Softmax based loss function called Additive Margin Softmax (AM-Softmax) is proving to be a better choice than the traditional Softmax. The AM-Softmax introduces a margin of separation between the classes that forces the samples from the same class to be closer to each other and also maximizes the distance between classes. In this paper, we propose a new approach for speaker recognition systems called AM-SincNet, which is based on the SincNet but uses an improved AM-Softmax layer. The proposed method is evaluated in the TIMIT dataset and obtained an improvement of approximately 40% in the Frame Error Rate compared to SincNet.
Tasks	Speaker Recognition
Published	2019-01-28
URL	http://arxiv.org/abs/1901.10826v1
PDF	http://arxiv.org/pdf/1901.10826v1.pdf
PWC	https://paperswithcode.com/paper/additive-margin-sincnet-for-speaker
Repo	https://github.com/joaoantoniocn/AM-SincNet
Framework	pytorch

Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification


Title	Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification
Authors	Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng Li, Yiwei Lv
Abstract	Open-domain targeted sentiment analysis aims to detect opinion targets along with their sentiment polarities from a sentence. Prior work typically formulates this task as a sequence tagging problem. However, such formulation suffers from problems such as huge search space and sentiment inconsistency. To address these problems, we propose a span-based extract-then-classify framework, where multiple opinion targets are directly extracted from the sentence under the supervision of target span boundaries, and corresponding polarities are then classified using their span representations. We further investigate three approaches under this framework, namely the pipeline, joint, and collapsed models. Experiments on three benchmark datasets show that our approach consistently outperforms the sequence tagging baseline. Moreover, we find that the pipeline model achieves the best performance compared with the other two models.
Tasks	Sentiment Analysis
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03820v1
PDF	https://arxiv.org/pdf/1906.03820v1.pdf
PWC	https://paperswithcode.com/paper/open-domain-targeted-sentiment-analysis-via
Repo	https://github.com/huminghao16/SpanABSA
Framework	pytorch

VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal


Title	VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal
Authors	Ya-Liang Chang, Zhe Yu Liu, Winston Hsu
Abstract	Video object removal is a challenging task in video processing that often requires massive human efforts. Given the mask of the foreground object in each frame, the goal is to complete (inpaint) the object region and generate a video without the target object. While recently deep learning based methods have achieved great success on the image inpainting task, they often lead to inconsistent results between frames when applied to videos. In this work, we propose a novel learning-based Video Object Removal Network (VORNet) to solve the video object removal task in a spatio-temporally consistent manner, by combining the optical flow warping and image-based inpainting model. Experiments are done on our Synthesized Video Object Removal (SVOR) dataset based on the YouTube-VOS video segmentation dataset, and both the objective and subjective evaluation demonstrate that our VORNet generates more spatially and temporally consistent videos compared with existing methods.
Tasks	Image Inpainting, Optical Flow Estimation, Video Inpainting, Video Semantic Segmentation
Published	2019-04-14
URL	http://arxiv.org/abs/1904.06726v1
PDF	http://arxiv.org/pdf/1904.06726v1.pdf
PWC	https://paperswithcode.com/paper/190406726
Repo	https://github.com/amjltc295/VORNet-Spatio-temporally-Consistent-Video-Inpainting-for-Object-Removal
Framework	none

TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots


Title	TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots
Authors	Wentao Ma, Yiming Cui, Nan Shao, Su He, Wei-Nan Zhang, Ting Liu, Shijin Wang, Guoping Hu
Abstract	We consider the importance of different utterances in the context for selecting the response usually depends on the current query. In this paper, we propose the model TripleNet to fully model the task with the triple <context, query, response> instead of <context, response> in previous works. The heart of TripleNet is a novel attention mechanism named triple attention to model the relationships within the triple at four levels. The new mechanism updates the representation for each element based on the attention with the other two concurrently and symmetrically. We match the triple <C, Q, R> centered on the response from char to context level for prediction. Experimental results on two large-scale multi-turn response selection datasets show that the proposed model can significantly outperform the state-of-the-art methods. TripleNet source code is available at https://github.com/wtma/TripleNet
Tasks	Conversational Response Selection
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10666v2
PDF	https://arxiv.org/pdf/1909.10666v2.pdf
PWC	https://paperswithcode.com/paper/triplenet-triple-attention-network-for-multi
Repo	https://github.com/wtma/TripleNet
Framework	none

KERMIT: Generative Insertion-Based Modeling for Sequences


Title	KERMIT: Generative Insertion-Based Modeling for Sequences
Authors	William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, Jakob Uszkoreit
Abstract	We present KERMIT, a simple insertion-based approach to generative modeling for sequences and sequence pairs. KERMIT models the joint distribution and its decompositions (i.e., marginals and conditionals) using a single neural network and, unlike much prior work, does not rely on a prespecified factorization of the data distribution. During training, one can feed KERMIT paired data $(x, y)$ to learn the joint distribution $p(x, y)$, and optionally mix in unpaired data $x$ or $y$ to refine the marginals $p(x)$ or $p(y)$. During inference, we have access to the conditionals $p(x \mid y)$ and $p(y \mid x)$ in both directions. We can also sample from the joint distribution or the marginals. The model supports both serial fully autoregressive decoding and parallel partially autoregressive decoding, with the latter exhibiting an empirically logarithmic runtime. We demonstrate through experiments in machine translation, representation learning, and zero-shot cloze question answering that our unified approach is capable of matching or exceeding the performance of dedicated state-of-the-art systems across a wide range of tasks without the need for problem-specific architectural adaptation.
Tasks	Machine Translation, Question Answering, Representation Learning
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01604v1
PDF	https://arxiv.org/pdf/1906.01604v1.pdf
PWC	https://paperswithcode.com/paper/kermit-generative-insertion-based-modeling
Repo	https://github.com/rusiaaman/PCPM
Framework	pytorch

Text Summarization with Pretrained Encoders


Title	Text Summarization with Pretrained Encoders
Authors	Yang Liu, Mirella Lapata
Abstract	Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves state-of-the-art results across the board in both extractive and abstractive settings. Our code is available at https://github.com/nlpyang/PreSumm
Tasks	Abstractive Text Summarization, Document Summarization, Extractive Document Summarization, Text Summarization
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08345v2
PDF	https://arxiv.org/pdf/1908.08345v2.pdf
PWC	https://paperswithcode.com/paper/text-summarization-with-pretrained-encoders
Repo	https://github.com/nlpyang/PreSumm
Framework	pytorch


Title	RoNIN: Robust Neural Inertial Navigation in the Wild: Benchmark, Evaluations, and New Methods
Authors	Hang Yan, Sachini Herath, Yasutaka Furukawa
Abstract	This paper sets a new foundation for data-driven inertial navigation research, where the task is the estimation of positions and orientations of a moving subject from a sequence of IMU sensor measurements. More concretely, the paper presents 1) a new benchmark containing more than 40 hours of IMU sensor data from 100 human subjects with ground-truth 3D trajectories under natural human motions; 2) novel neural inertial navigation architectures, making significant improvements for challenging motion cases; and 3) qualitative and quantitative evaluations of the competing methods over three inertial navigation benchmarks. We will share the code and data to promote further research.
Tasks
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12853v1
PDF	https://arxiv.org/pdf/1905.12853v1.pdf
PWC	https://paperswithcode.com/paper/ronin-robust-neural-inertial-navigation-in
Repo	https://github.com/Sachini/ronin
Framework	pytorch

Multi-Sample Dropout for Accelerated Training and Better Generalization


Title	Multi-Sample Dropout for Accelerated Training and Better Generalization
Authors	Hiroshi Inoue
Abstract	Dropout is a simple but efficient regularization technique for achieving better generalization of deep neural networks (DNNs); hence it is widely used in tasks based on DNNs. During training, dropout randomly discards a portion of the neurons to avoid overfitting. This paper presents an enhanced dropout technique, which we call multi-sample dropout, for both accelerating training and improving generalization over the original dropout. The original dropout creates a randomly selected subset (called a dropout sample) from the input in each training iteration while the multi-sample dropout creates multiple dropout samples. The loss is calculated for each sample, and then the sample losses are averaged to obtain the final loss. This technique can be easily implemented without implementing a new operator by duplicating a part of the network after the dropout layer while sharing the weights among the duplicated fully connected layers. Experimental results showed that multi-sample dropout significantly accelerates training by reducing the number of iterations until convergence for image classification tasks using the ImageNet, CIFAR-10, CIFAR-100, and SVHN datasets. Multi-sample dropout does not significantly increase computation cost per iteration because most of the computation time is consumed in the convolution layers before the dropout layer, which are not duplicated. Experiments also showed that networks trained using multi-sample dropout achieved lower error rates and losses for both the training set and validation set.
Tasks	Image Classification
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09788v2
PDF	https://arxiv.org/pdf/1905.09788v2.pdf
PWC	https://paperswithcode.com/paper/multi-sample-dropout-for-accelerated-training
Repo	https://github.com/KushajveerSingh/Deep-Learning-Notebooks
Framework	pytorch