Paper Group AWR 367
“Why did you do that?": Explaining black box models with Inductive Synthesis. On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset. VideoGraph: Recognizing Minutes-Long Human Activities in Videos. Interaction Relational Network for Mutual Action Recognition. Relation Network for Multi-label Aerial Image …
“Why did you do that?": Explaining black box models with Inductive Synthesis
Title | “Why did you do that?": Explaining black box models with Inductive Synthesis |
Authors | Görkem Paçacı, David Johnson, Steve McKeever, Andreas Hamfelt |
Abstract | By their nature, the composition of black box models is opaque. This makes the ability to generate explanations for the response to stimuli challenging. The importance of explaining black box models has become increasingly important given the prevalence of AI and ML systems and the need to build legal and regulatory frameworks around them. Such explanations can also increase trust in these uncertain systems. In our paper we present RICE, a method for generating explanations of the behaviour of black box models by (1) probing a model to extract model output examples using sensitivity analysis; (2) applying CNPInduce, a method for inductive logic program synthesis, to generate logic programs based on critical input-output pairs; and (3) interpreting the target program as a human-readable explanation. We demonstrate the application of our method by generating explanations of an artificial neural network trained to follow simple traffic rules in a hypothetical self-driving car simulation. We conclude with a discussion on the scalability and usability of our approach and its potential applications to explanation-critical scenarios. |
Tasks | Program Synthesis |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.09273v1 |
http://arxiv.org/pdf/1904.09273v1.pdf | |
PWC | https://paperswithcode.com/paper/why-did-you-do-that-explaining-black-box |
Repo | https://github.com/UppsalaIM/rice |
Framework | tf |
On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset
Title | On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset |
Authors | Muhammad Waleed Gondal, Manuel Wüthrich, Đorđe Miladinović, Francesco Locatello, Martin Breidt, Valentin Volchkov, Joel Akpo, Olivier Bachem, Bernhard Schölkopf, Stefan Bauer |
Abstract | Learning meaningful and compact representations with disentangled semantic aspects is considered to be of key importance in representation learning. Since real-world data is notoriously costly to collect, many recent state-of-the-art disentanglement models have heavily relied on synthetic toy data-sets. In this paper, we propose a novel data-set which consists of over one million images of physical 3D objects with seven factors of variation, such as object color, shape, size and position. In order to be able to control all the factors of variation precisely, we built an experimental platform where the objects are being moved by a robotic arm. In addition, we provide two more datasets which consist of simulations of the experimental setup. These datasets provide for the first time the possibility to systematically investigate how well different disentanglement methods perform on real data in comparison to simulation, and how simulated data can be leveraged to build better representations of the real world. We provide a first experimental study of these questions and our results indicate that learned models transfer poorly, but that model and hyperparameter selection is an effective means of transferring information to the real world. |
Tasks | Representation Learning |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03292v3 |
https://arxiv.org/pdf/1906.03292v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-transfer-of-inductive-bias-from |
Repo | https://github.com/causality-and-transfer-learning/disentanglement_dataset |
Framework | none |
VideoGraph: Recognizing Minutes-Long Human Activities in Videos
Title | VideoGraph: Recognizing Minutes-Long Human Activities in Videos |
Authors | Noureldien Hussein, Efstratios Gavves, Arnold W. M. Smeulders |
Abstract | Many human activities take minutes to unfold. To represent them, related works opt for statistical pooling, which neglects the temporal structure. Others opt for convolutional methods, as CNN and Non-Local. While successful in learning temporal concepts, they are short of modeling minutes-long temporal dependencies. We propose VideoGraph, a method to achieve the best of two worlds: represent minutes-long human activities and learn their underlying temporal structure. VideoGraph learns a graph-based representation for human activities. The graph, its nodes and edges are learned entirely from video datasets, making VideoGraph applicable to problems without node-level annotation. The result is improvements over related works on benchmarks: Epic-Kitchen and Breakfast. Besides, we demonstrate that VideoGraph is able to learn the temporal structure of human activities in minutes-long videos. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05143v2 |
https://arxiv.org/pdf/1905.05143v2.pdf | |
PWC | https://paperswithcode.com/paper/videograph-recognizing-minutes-long-human |
Repo | https://github.com/noureldien/videograph |
Framework | tf |
Interaction Relational Network for Mutual Action Recognition
Title | Interaction Relational Network for Mutual Action Recognition |
Authors | Mauricio Perez, Jun Liu, Alex C. Kot |
Abstract | Person-person mutual action recognition (also referred to as interaction recognition) is an important research branch of human activity analysis. Current solutions in the field are mainly dominated by CNNs, GCNs and LSTMs. These approaches often consist of complicated architectures and mechanisms to embed the relationships between the two persons on the architecture itself, to ensure the interaction patterns can be properly learned. In this paper, we propose a more simple yet very powerful architecture, named Interaction Relational Network (IRN), which utilizes minimal prior knowledge about the structure of the human body. We drive the network to identify by itself how to relate the body parts from the individuals interacting. In order to better represent the interaction, we define two different relationships, leading to specialized architectures and models for each. These multiple relationship models will then be fused into a single and special architecture, in order to leverage both streams of information for further enhancing the relational reasoning capability. Furthermore we define important structured pair-wise operations to extract meaningful extra information from each pair of joints – distance and motion. Ultimately, with the coupling of an LSTM, our IRN is capable of paramount sequential relational reasoning. These important extensions we made to our network can also be valuable to other problems that require sophisticated relational reasoning. Our solution is able to achieve state-of-the-art performance on the traditional interaction recognition datasets SBU and UT, and also on the mutual actions from the large-scale NTU RGB+D and NTU RGB+D 120 datasets. |
Tasks | Relational Reasoning |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.04963v1 |
https://arxiv.org/pdf/1910.04963v1.pdf | |
PWC | https://paperswithcode.com/paper/interaction-relational-network-for-mutual |
Repo | https://github.com/mauriciolp/inter-rel-net |
Framework | tf |
Relation Network for Multi-label Aerial Image Classification
Title | Relation Network for Multi-label Aerial Image Classification |
Authors | Yuansheng Hua, Lichao Mou, Xiao Xiang Zhu |
Abstract | Multi-label classification plays a momentous role in perceiving intricate contents of an aerial image and triggers several related studies over the last years. However, most of them deploy few efforts in exploiting label relations, while such dependencies are crucial for making accurate predictions. Although an LSTM layer can be introduced to modeling such label dependencies in a chain propagation manner, the efficiency might be questioned when certain labels are improperly inferred. To address this, we propose a novel aerial image multi-label classification network, attention-aware label relational reasoning network. Particularly, our network consists of three elemental modules: 1) a label-wise feature parcel learning module, 2) an attentional region extraction module, and 3) a label relational inference module. To be more specific, the label-wise feature parcel learning module is designed for extracting high-level label-specific features. The attentional region extraction module aims at localizing discriminative regions in these features and yielding attentional label-specific features. The label relational inference module finally predicts label existences using label relations reasoned from outputs of the previous module. The proposed network is characterized by its capacities of extracting discriminative label-wise features in a proposal-free way and reasoning about label relations naturally and interpretably. In our experiments, we evaluate the proposed model on the UCM multi-label dataset and a newly produced dataset, AID multi-label dataset. Quantitative and qualitative results on these two datasets demonstrate the effectiveness of our model. To facilitate progress in the multi-label aerial image classification, the AID multi-label dataset will be made publicly available. |
Tasks | Image Classification, Multi-Label Classification, Relational Reasoning |
Published | 2019-07-16 |
URL | https://arxiv.org/abs/1907.07274v3 |
https://arxiv.org/pdf/1907.07274v3.pdf | |
PWC | https://paperswithcode.com/paper/relation-network-for-multi-label-aerial-image |
Repo | https://github.com/Hua-YS/AID-Multilabel-Dataset |
Framework | none |
mu-Forcing: Training Variational Recurrent Autoencoders for Text Generation
Title | mu-Forcing: Training Variational Recurrent Autoencoders for Text Generation |
Authors | Dayiheng Liu, Xu Yang, Feng He, Yuanyuan Chen, Jiancheng Lv |
Abstract | It has been previously observed that training Variational Recurrent Autoencoders (VRAE) for text generation suffers from serious uninformative latent variables problem. The model would collapse into a plain language model that totally ignore the latent variables and can only generate repeating and dull samples. In this paper, we explore the reason behind this issue and propose an effective regularizer based approach to address it. The proposed method directly injects extra constraints on the posteriors of latent variables into the learning process of VRAE, which can flexibly and stably control the trade-off between the KL term and the reconstruction term, making the model learn dense and meaningful latent representations. The experimental results show that the proposed method outperforms several strong baselines and can make the model learn interpretable latent variables and generate diverse meaningful sentences. Furthermore, the proposed method can perform well without using other strategies, such as KL annealing. |
Tasks | Language Modelling, Text Generation |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10072v1 |
https://arxiv.org/pdf/1905.10072v1.pdf | |
PWC | https://paperswithcode.com/paper/mu-forcing-training-variational-recurrent |
Repo | https://github.com/dayihengliu/Mu-Forcing-VRAE |
Framework | tf |
FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms
Title | FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms |
Authors | Henry B. Moss, Andrew Moore, David S. Leslie, Paul Rayson |
Abstract | We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models. Despite being known to produce unreliable comparisons, it is still common practice to compare model evaluations based on single choices of random seeds. We show that reliable model selection also requires evaluations based on multiple train-test splits (contrary to common practice in many shared tasks). Using bandit theory from the statistics literature, we are able to adaptively determine appropriate numbers of data splits and random seeds used to evaluate each model, focusing computational resources on the evaluation of promising models whilst avoiding wasting evaluations on models with lower performance. Furthermore, our user-friendly Python implementation produces confidence guarantees of correctly selecting the optimal model. We evaluate our algorithms by selecting between 8 target-dependent sentiment analysis methods using dramatically fewer model evaluations than current model selection approaches. |
Tasks | Model Selection, Sentiment Analysis |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1906.12230v1 |
https://arxiv.org/pdf/1906.12230v1.pdf | |
PWC | https://paperswithcode.com/paper/fiesta-fast-identification-of-state-of-the |
Repo | https://github.com/apmoore1/fiesta |
Framework | none |
Additive Margin SincNet for Speaker Recognition
Title | Additive Margin SincNet for Speaker Recognition |
Authors | João Antônio Chagas Nunes, David Macêdo, Cleber Zanchettin |
Abstract | Speaker Recognition is a challenging task with essential applications such as authentication, automation, and security. The SincNet is a new deep learning based model which has produced promising results to tackle the mentioned task. To train deep learning systems, the loss function is essential to the network performance. The Softmax loss function is a widely used function in deep learning methods, but it is not the best choice for all kind of problems. For distance-based problems, one new Softmax based loss function called Additive Margin Softmax (AM-Softmax) is proving to be a better choice than the traditional Softmax. The AM-Softmax introduces a margin of separation between the classes that forces the samples from the same class to be closer to each other and also maximizes the distance between classes. In this paper, we propose a new approach for speaker recognition systems called AM-SincNet, which is based on the SincNet but uses an improved AM-Softmax layer. The proposed method is evaluated in the TIMIT dataset and obtained an improvement of approximately 40% in the Frame Error Rate compared to SincNet. |
Tasks | Speaker Recognition |
Published | 2019-01-28 |
URL | http://arxiv.org/abs/1901.10826v1 |
http://arxiv.org/pdf/1901.10826v1.pdf | |
PWC | https://paperswithcode.com/paper/additive-margin-sincnet-for-speaker |
Repo | https://github.com/joaoantoniocn/AM-SincNet |
Framework | pytorch |
Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification
Title | Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification |
Authors | Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng Li, Yiwei Lv |
Abstract | Open-domain targeted sentiment analysis aims to detect opinion targets along with their sentiment polarities from a sentence. Prior work typically formulates this task as a sequence tagging problem. However, such formulation suffers from problems such as huge search space and sentiment inconsistency. To address these problems, we propose a span-based extract-then-classify framework, where multiple opinion targets are directly extracted from the sentence under the supervision of target span boundaries, and corresponding polarities are then classified using their span representations. We further investigate three approaches under this framework, namely the pipeline, joint, and collapsed models. Experiments on three benchmark datasets show that our approach consistently outperforms the sequence tagging baseline. Moreover, we find that the pipeline model achieves the best performance compared with the other two models. |
Tasks | Sentiment Analysis |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03820v1 |
https://arxiv.org/pdf/1906.03820v1.pdf | |
PWC | https://paperswithcode.com/paper/open-domain-targeted-sentiment-analysis-via |
Repo | https://github.com/huminghao16/SpanABSA |
Framework | pytorch |
VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal
Title | VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal |
Authors | Ya-Liang Chang, Zhe Yu Liu, Winston Hsu |
Abstract | Video object removal is a challenging task in video processing that often requires massive human efforts. Given the mask of the foreground object in each frame, the goal is to complete (inpaint) the object region and generate a video without the target object. While recently deep learning based methods have achieved great success on the image inpainting task, they often lead to inconsistent results between frames when applied to videos. In this work, we propose a novel learning-based Video Object Removal Network (VORNet) to solve the video object removal task in a spatio-temporally consistent manner, by combining the optical flow warping and image-based inpainting model. Experiments are done on our Synthesized Video Object Removal (SVOR) dataset based on the YouTube-VOS video segmentation dataset, and both the objective and subjective evaluation demonstrate that our VORNet generates more spatially and temporally consistent videos compared with existing methods. |
Tasks | Image Inpainting, Optical Flow Estimation, Video Inpainting, Video Semantic Segmentation |
Published | 2019-04-14 |
URL | http://arxiv.org/abs/1904.06726v1 |
http://arxiv.org/pdf/1904.06726v1.pdf | |
PWC | https://paperswithcode.com/paper/190406726 |
Repo | https://github.com/amjltc295/VORNet-Spatio-temporally-Consistent-Video-Inpainting-for-Object-Removal |
Framework | none |
TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots
Title | TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots |
Authors | Wentao Ma, Yiming Cui, Nan Shao, Su He, Wei-Nan Zhang, Ting Liu, Shijin Wang, Guoping Hu |
Abstract | We consider the importance of different utterances in the context for selecting the response usually depends on the current query. In this paper, we propose the model TripleNet to fully model the task with the triple <context, query, response> instead of <context, response> in previous works. The heart of TripleNet is a novel attention mechanism named triple attention to model the relationships within the triple at four levels. The new mechanism updates the representation for each element based on the attention with the other two concurrently and symmetrically. We match the triple <C, Q, R> centered on the response from char to context level for prediction. Experimental results on two large-scale multi-turn response selection datasets show that the proposed model can significantly outperform the state-of-the-art methods. TripleNet source code is available at https://github.com/wtma/TripleNet |
Tasks | Conversational Response Selection |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10666v2 |
https://arxiv.org/pdf/1909.10666v2.pdf | |
PWC | https://paperswithcode.com/paper/triplenet-triple-attention-network-for-multi |
Repo | https://github.com/wtma/TripleNet |
Framework | none |
KERMIT: Generative Insertion-Based Modeling for Sequences
Title | KERMIT: Generative Insertion-Based Modeling for Sequences |
Authors | William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, Jakob Uszkoreit |
Abstract | We present KERMIT, a simple insertion-based approach to generative modeling for sequences and sequence pairs. KERMIT models the joint distribution and its decompositions (i.e., marginals and conditionals) using a single neural network and, unlike much prior work, does not rely on a prespecified factorization of the data distribution. During training, one can feed KERMIT paired data $(x, y)$ to learn the joint distribution $p(x, y)$, and optionally mix in unpaired data $x$ or $y$ to refine the marginals $p(x)$ or $p(y)$. During inference, we have access to the conditionals $p(x \mid y)$ and $p(y \mid x)$ in both directions. We can also sample from the joint distribution or the marginals. The model supports both serial fully autoregressive decoding and parallel partially autoregressive decoding, with the latter exhibiting an empirically logarithmic runtime. We demonstrate through experiments in machine translation, representation learning, and zero-shot cloze question answering that our unified approach is capable of matching or exceeding the performance of dedicated state-of-the-art systems across a wide range of tasks without the need for problem-specific architectural adaptation. |
Tasks | Machine Translation, Question Answering, Representation Learning |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01604v1 |
https://arxiv.org/pdf/1906.01604v1.pdf | |
PWC | https://paperswithcode.com/paper/kermit-generative-insertion-based-modeling |
Repo | https://github.com/rusiaaman/PCPM |
Framework | pytorch |
Text Summarization with Pretrained Encoders
Title | Text Summarization with Pretrained Encoders |
Authors | Yang Liu, Mirella Lapata |
Abstract | Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves state-of-the-art results across the board in both extractive and abstractive settings. Our code is available at https://github.com/nlpyang/PreSumm |
Tasks | Abstractive Text Summarization, Document Summarization, Extractive Document Summarization, Text Summarization |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08345v2 |
https://arxiv.org/pdf/1908.08345v2.pdf | |
PWC | https://paperswithcode.com/paper/text-summarization-with-pretrained-encoders |
Repo | https://github.com/nlpyang/PreSumm |
Framework | pytorch |
RoNIN: Robust Neural Inertial Navigation in the Wild: Benchmark, Evaluations, and New Methods
Title | RoNIN: Robust Neural Inertial Navigation in the Wild: Benchmark, Evaluations, and New Methods |
Authors | Hang Yan, Sachini Herath, Yasutaka Furukawa |
Abstract | This paper sets a new foundation for data-driven inertial navigation research, where the task is the estimation of positions and orientations of a moving subject from a sequence of IMU sensor measurements. More concretely, the paper presents 1) a new benchmark containing more than 40 hours of IMU sensor data from 100 human subjects with ground-truth 3D trajectories under natural human motions; 2) novel neural inertial navigation architectures, making significant improvements for challenging motion cases; and 3) qualitative and quantitative evaluations of the competing methods over three inertial navigation benchmarks. We will share the code and data to promote further research. |
Tasks | |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.12853v1 |
https://arxiv.org/pdf/1905.12853v1.pdf | |
PWC | https://paperswithcode.com/paper/ronin-robust-neural-inertial-navigation-in |
Repo | https://github.com/Sachini/ronin |
Framework | pytorch |
Multi-Sample Dropout for Accelerated Training and Better Generalization
Title | Multi-Sample Dropout for Accelerated Training and Better Generalization |
Authors | Hiroshi Inoue |
Abstract | Dropout is a simple but efficient regularization technique for achieving better generalization of deep neural networks (DNNs); hence it is widely used in tasks based on DNNs. During training, dropout randomly discards a portion of the neurons to avoid overfitting. This paper presents an enhanced dropout technique, which we call multi-sample dropout, for both accelerating training and improving generalization over the original dropout. The original dropout creates a randomly selected subset (called a dropout sample) from the input in each training iteration while the multi-sample dropout creates multiple dropout samples. The loss is calculated for each sample, and then the sample losses are averaged to obtain the final loss. This technique can be easily implemented without implementing a new operator by duplicating a part of the network after the dropout layer while sharing the weights among the duplicated fully connected layers. Experimental results showed that multi-sample dropout significantly accelerates training by reducing the number of iterations until convergence for image classification tasks using the ImageNet, CIFAR-10, CIFAR-100, and SVHN datasets. Multi-sample dropout does not significantly increase computation cost per iteration because most of the computation time is consumed in the convolution layers before the dropout layer, which are not duplicated. Experiments also showed that networks trained using multi-sample dropout achieved lower error rates and losses for both the training set and validation set. |
Tasks | Image Classification |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09788v2 |
https://arxiv.org/pdf/1905.09788v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-sample-dropout-for-accelerated-training |
Repo | https://github.com/KushajveerSingh/Deep-Learning-Notebooks |
Framework | pytorch |