January 29, 2020

2905 words 14 mins read

Paper Group ANR 650

Question Classification with Deep Contextualized Transformer. Why Do Masked Neural Language Models Still Need Common Sense Knowledge?. Random Vector Functional Link Neural Network based Ensemble Deep Learning. Action Semantics Network: Considering the Effects of Actions in Multiagent Systems. Hamiltonian Graph Networks with ODE Integrators. MonSter …

Question Classification with Deep Contextualized Transformer


Title	Question Classification with Deep Contextualized Transformer
Authors	Haozheng Luo, Ningwei Liu, Charles Feng
Abstract	The latest work for Question and Answer problems is to use the Stanford Parse Tree. We build on prior work and develop a new method to handle the Question and Answer problem with the Deep Contextualized Transformer to manage some aberrant expressions. We also conduct extensive evaluations of the SQuAD and SwDA dataset and show significant improvement over QA problem classification of industry needs. We also investigate the impact of different models for the accuracy and efficiency of the problem answers. It shows that our new method is more effective for solving QA problems with higher accuracy
Tasks
Published	2019-10-17
URL	https://arxiv.org/abs/1910.10492v1
PDF	https://arxiv.org/pdf/1910.10492v1.pdf
PWC	https://paperswithcode.com/paper/question-classification-with-deep
Repo
Framework

Why Do Masked Neural Language Models Still Need Common Sense Knowledge?


Title	Why Do Masked Neural Language Models Still Need Common Sense Knowledge?
Authors	Sunjae Kwon, Cheongwoong Kang, Jiyeon Han, Jaesik Choi
Abstract	Currently, contextualized word representations are learned by intricate neural network models, such as masked neural language models (MNLMs). The new representations significantly enhanced the performance in automated question answering by reading paragraphs. However, identifying the detailed knowledge trained in the MNLMs is difficult owing to numerous and intermingled parameters. This paper provides empirical but insightful analyses on the pretrained MNLMs with respect to common sense knowledge. First, we propose a test that measures what types of common sense knowledge do pretrained MNLMs understand. From the test, we observed that MNLMs partially understand various types of common sense knowledge but do not accurately understand the semantic meaning of relations. In addition, based on the difficulty of the question-answering task problems, we observed that pretrained MLM-based models are still vulnerable to problems that require common sense knowledge. We also experimentally demonstrated that we can elevate existing MNLM-based models by combining knowledge from an external common sense repository.
Tasks	Common Sense Reasoning, Question Answering
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03024v1
PDF	https://arxiv.org/pdf/1911.03024v1.pdf
PWC	https://paperswithcode.com/paper/why-do-masked-neural-language-models-still
Repo
Framework

Random Vector Functional Link Neural Network based Ensemble Deep Learning


Title	Random Vector Functional Link Neural Network based Ensemble Deep Learning
Authors	Rakesh Katuwal, P. N. Suganthan, M. Tanveer
Abstract	In this paper, we propose a deep learning framework based on randomized neural network. In particular, inspired by the principles of Random Vector Functional Link (RVFL) network, we present a deep RVFL network (dRVFL) with stacked layers. The parameters of the hidden layers of the dRVFL are randomly generated within a suitable range and kept fixed while the output weights are computed using the closed form solution as in a standard RVFL network. We also propose an ensemble deep network (edRVFL) that can be regarded as a marriage of ensemble learning with deep learning. Unlike traditional ensembling approaches that require training several models independently from scratch, edRVFL is obtained by training a single dRVFL network once. Both dRVFL and edRVFL frameworks are generic and can be used with any RVFL variant. To illustrate this, we integrate the deep learning networks with a recently proposed sparse-pretrained RVFL (SP-RVFL). Extensive experiments on benchmark datasets from diverse domains show the superior performance of our proposed deep RVFL networks.
Tasks
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00350v1
PDF	https://arxiv.org/pdf/1907.00350v1.pdf
PWC	https://paperswithcode.com/paper/random-vector-functional-link-neural-network
Repo
Framework

Action Semantics Network: Considering the Effects of Actions in Multiagent Systems


Title	Action Semantics Network: Considering the Effects of Actions in Multiagent Systems
Authors	Weixun Wang, Tianpei Yang, Yong Liu, Jianye Hao, Xiaotian Hao, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao
Abstract	In multiagent systems (MASs), each agent makes individual decisions but all of them contribute globally to the system evolution. Learning in MASs is difficult since each agent’s selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the increase in the number of agents. Previous works borrow various multiagent coordination mechanisms into deep learning architecture to facilitate multiagent coordination. However, none of them explicitly consider action semantics between agents that different actions have different influences on other agents. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions’ influence on other agents using neural networks based on the action semantics between them. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II micromanagement and Neural MMO show ASN significantly improves the performance of state-of-the-art DRL approaches compared with several network architectures.
Tasks	Starcraft, Starcraft II
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11461v3
PDF	https://arxiv.org/pdf/1907.11461v3.pdf
PWC	https://paperswithcode.com/paper/action-semantics-network-considering-the
Repo
Framework

Hamiltonian Graph Networks with ODE Integrators


Title	Hamiltonian Graph Networks with ODE Integrators
Authors	Alvaro Sanchez-Gonzalez, Victor Bapst, Kyle Cranmer, Peter Battaglia
Abstract	We introduce an approach for imposing physically informed inductive biases in learned simulation models. We combine graph networks with a differentiable ordinary differential equation integrator as a mechanism for predicting future states, and a Hamiltonian as an internal representation. We find that our approach outperforms baselines without these biases in terms of predictive accuracy, energy accuracy, and zero-shot generalization to time-step sizes and integrator orders not experienced during training. This advances the state-of-the-art of learned simulation, and in principle is applicable beyond physical domains.
Tasks
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12790v1
PDF	https://arxiv.org/pdf/1909.12790v1.pdf
PWC	https://paperswithcode.com/paper/hamiltonian-graph-networks-with-ode
Repo
Framework

MonSter: Awakening the Mono in Stereo


Title	MonSter: Awakening the Mono in Stereo
Authors	Yotam Gil, Shay Elmalem, Harel Haim, Emanuel Marom, Raja Giryes
Abstract	Passive depth estimation is among the most long-studied fields in computer vision. The most common methods for passive depth estimation are either a stereo or a monocular system. Using the former requires an accurate calibration process, and has a limited effective range. The latter, which does not require extrinsic calibration but generally achieves inferior depth accuracy, can be tuned to achieve better results in part of the depth range. In this work, we suggest combining the two frameworks. We propose a two-camera system, in which the cameras are used jointly to extract a stereo depth and individually to provide a monocular depth from each camera. The combination of these depth maps leads to more accurate depth estimation. Moreover, enforcing consistency between the extracted maps leads to a novel online self-calibration strategy. We present a prototype camera that demonstrates the benefits of the proposed combination, for both self-calibration and depth reconstruction in real-world scenes.
Tasks	Calibration, Depth Estimation
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13708v1
PDF	https://arxiv.org/pdf/1910.13708v1.pdf
PWC	https://paperswithcode.com/paper/monster-awakening-the-mono-in-stereo
Repo
Framework

A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity


Title	A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity
Authors	Yifan Cui, Eric Tchetgen Tchetgen
Abstract	There is a fast-growing literature on estimating optimal treatment regimes based on randomized trials or observational studies under a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a general instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we provide sufficient conditions for the identification of both value function $E[Y_{\cD(L)}]$ for a given regime $\cD$ and optimal regime $\arg \max_{\cD} E[Y_{\cD(L)}]$ with the aid of a binary instrumental variable, when no unmeasured confounding fails to hold. We also propose novel multiply robust classification-based estimators. Furthermore, we extend the proposed method to identify and estimate the optimal treatment regime among those who would comply to the assigned treatment under a standard monotonicity assumption. In this latter case, we establish the somewhat surprising result that the complier optimal regime can be consistently estimated without directly collecting compliance information and therefore without the complier average treatment effect itself being identified. Our approach is illustrated via extensive simulation studies and a data application on the effect of child rearing on labor participation.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09260v2
PDF	https://arxiv.org/pdf/1911.09260v2.pdf
PWC	https://paperswithcode.com/paper/a-semiparametric-instrumental-variable
Repo
Framework

Automated Speech Generation from UN General Assembly Statements: Mapping Risks in AI Generated Texts


Title	Automated Speech Generation from UN General Assembly Statements: Mapping Risks in AI Generated Texts
Authors	Joseph Bullock, Miguel Luengo-Oroz
Abstract	Automated text generation has been applied broadly in many domains such as marketing and robotics, and used to create chatbots, product reviews and write poetry. The ability to synthesize text, however, presents many potential risks, while access to the technology required to build generative models is becoming increasingly easy. This work is aligned with the efforts of the United Nations and other civil society organisations to highlight potential political and societal risks arising through the malicious use of text generation software, and their potential impact on human rights. As a case study, we present the findings of an experiment to generate remarks in the style of political leaders by fine-tuning a pretrained AWD- LSTM model on a dataset of speeches made at the UN General Assembly. This work highlights the ease with which this can be accomplished, as well as the threats of combining these techniques with other technologies.
Tasks	Text Generation
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01946v1
PDF	https://arxiv.org/pdf/1906.01946v1.pdf
PWC	https://paperswithcode.com/paper/automated-speech-generation-from-un-general
Repo
Framework

A Tensorized Transformer for Language Modeling


Title	A Tensorized Transformer for Language Modeling
Authors	Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Dawei Song, Ming Zhou
Abstract	Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.
Tasks	Language Modelling, Machine Translation
Published	2019-06-24
URL	https://arxiv.org/abs/1906.09777v3
PDF	https://arxiv.org/pdf/1906.09777v3.pdf
PWC	https://paperswithcode.com/paper/a-tensorized-transformer-for-language
Repo
Framework

Deep Neural Network Ensembles against Deception: Ensemble Diversity, Accuracy and Robustness


Title	Deep Neural Network Ensembles against Deception: Ensemble Diversity, Accuracy and Robustness
Authors	Ling Liu, Wenqi Wei, Ka-Ho Chow, Margaret Loper, Emre Gursoy, Stacey Truex, Yanzhao Wu
Abstract	Ensemble learning is a methodology that integrates multiple DNN learners for improving prediction performance of individual learners. Diversity is greater when the errors of the ensemble prediction is more uniformly distributed. Greater diversity is highly correlated with the increase in ensemble accuracy. Another attractive property of diversity optimized ensemble learning is its robustness against deception: an adversarial perturbation attack can mislead one DNN model to misclassify but may not fool other ensemble DNN members consistently. In this paper we first give an overview of the concept of ensemble diversity and examine the three types of ensemble diversity in the context of DNN classifiers. We then describe a set of ensemble diversity measures, a suite of algorithms for creating diversity ensembles and for performing ensemble consensus (voted or learned) for generating high accuracy ensemble output by strategically combining outputs of individual members. This paper concludes with a discussion on a set of open issues in quantifying ensemble diversity for robust deep learning.
Tasks
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11091v1
PDF	https://arxiv.org/pdf/1908.11091v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-ensembles-against
Repo
Framework

Medical Time Series Classification with Hierarchical Attention-based Temporal Convolutional Networks: A Case Study of Myotonic Dystrophy Diagnosis


Title	Medical Time Series Classification with Hierarchical Attention-based Temporal Convolutional Networks: A Case Study of Myotonic Dystrophy Diagnosis
Authors	Lei Lin, Beilei Xu, Wencheng Wu, Trevor Richardson, Edgar A. Bernal
Abstract	Myotonia, which refers to delayed muscle relaxation after contraction, is the main symptom of myotonic dystrophy patients. We propose a hierarchical attention-based temporal convolutional network (HA-TCN) for myotonic dystrohpy diagnosis from handgrip time series data, and introduce mechanisms that enable model explainability. We compare the performance of the HA-TCN model against that of benchmark TCN models, LSTM models with and without attention mechanisms, and SVM approaches with handcrafted features. In terms of classification accuracy and F1 score, we found all deep learning models have similar levels of performance, and they all outperform SVM. Further, the HA-TCN model outperforms its TCN counterpart with regards to computational efficiency regardless of network depth, and in terms of performance particularly when the number of hidden layers is small. Lastly, HA-TCN models can consistently identify relevant time series segments in the relaxation phase of the handgrip time series, and exhibit increased robustness to noise when compared to attention-based LSTM models.
Tasks	Time Series, Time Series Classification
Published	2019-03-28
URL	http://arxiv.org/abs/1903.11748v1
PDF	http://arxiv.org/pdf/1903.11748v1.pdf
PWC	https://paperswithcode.com/paper/medical-time-series-classification-with
Repo
Framework

A Constructive Approach for Data-Driven Randomized Learning of Feedforward Neural Networks


Title	A Constructive Approach for Data-Driven Randomized Learning of Feedforward Neural Networks
Authors	Grzegorz Dudek
Abstract	Feedforward neural networks with random hidden nodes suffer from a problem with the generation of random weights and biases as these are difficult to set optimally to obtain a good projection space. Typically, random parameters are drawn from an interval which is fixed before or adapted during the learning process. Due to the different functions of the weights and biases, selecting them both from the same interval is not a good idea. Recently more sophisticated methods of random parameters generation have been developed, such as the data-driven method proposed in \cite{Anon19}, where the sigmoids are placed in randomly selected regions of the input space and then their slopes are adjusted to the local fluctuations of the target function. In this work, we propose an extended version of this method, which constructs iteratively the network architecture. This method successively generates new hidden nodes and accepts them if the training error decreases significantly. The threshold of acceptance is adapted to the current training stage. At the beginning of the training process only those nodes which lead to the largest error reduction are accepted. Then, the threshold is reduced by half to accept those nodes which model the target function details more accurately. This leads to faster convergence and more compact network architecture, as it includes only “significant” neurons. Several application examples are given which confirm this thesis.
Tasks
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01961v2
PDF	https://arxiv.org/pdf/1909.01961v2.pdf
PWC	https://paperswithcode.com/paper/a-constructive-approach-for-data-driven
Repo
Framework

Learning a Safety Verifiable Adaptive Cruise Controller from Human Driving Data


Title	Learning a Safety Verifiable Adaptive Cruise Controller from Human Driving Data
Authors	Qin Lin, Sicco Verwer, John Dolan
Abstract	Imitation learning provides a way to automatically construct a controller by mimicking human behavior from data. For safety-critical systems such as autonomous vehicles, it can be problematic to use controllers learned from data because they cannot be guaranteed to be collision-free. Recently, a method has been proposed for learning a multi-mode hybrid automaton cruise controller (MOHA). Besides being accurate, the logical nature of this model makes it suitable for formal verification. In this paper, we demonstrate this capability using the SpaceEx hybrid model checker as follows. After learning, we translate the automaton model into constraints and equations required by SpaceEx. We then verify that a pure MOHA controller is not collision-free. By adding a safety state based on headway in time, a rule that human drivers should follow anyway, we do obtain a provably safe cruise control. Moreover, the safe controller remains more human-like than existing cruise controllers.
Tasks	Autonomous Vehicles, Imitation Learning
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13526v1
PDF	https://arxiv.org/pdf/1910.13526v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-safety-verifiable-adaptive-cruise
Repo
Framework

UCAM Biomedical translation at WMT19: Transfer learning multi-domain ensembles


Title	UCAM Biomedical translation at WMT19: Transfer learning multi-domain ensembles
Authors	Danielle Saunders, Felix Stahlberg, Bill Byrne
Abstract	The 2019 WMT Biomedical translation task involved translating Medline abstracts. We approached this using transfer learning to obtain a series of strong neural models on distinct domains, and combining them into multi-domain ensembles. We further experiment with an adaptive language-model ensemble weighting scheme. Our submission achieved the best submitted results on both directions of English-Spanish.
Tasks	Language Modelling, Transfer Learning
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05786v1
PDF	https://arxiv.org/pdf/1906.05786v1.pdf
PWC	https://paperswithcode.com/paper/ucam-biomedical-translation-at-wmt19-transfer
Repo
Framework

Scenario Discovery via Rule Extraction


Title	Scenario Discovery via Rule Extraction
Authors	Vadim Arzamasov, Klemens Böhm
Abstract	Scenario discovery is the process of finding areas of interest, commonly referred to as scenarios, in data spaces resulting from simulations. For instance, one might search for conditions - which are inputs of the simulation model - where the system under investigation is unstable. A commonly used algorithm for scenario discovery is PRIM. It yields scenarios in the form of hyper-rectangles which are human-comprehensible. When the simulation model has many inputs, and the simulations are computationally expensive, PRIM may not produce good results, given the affordable volume of data. So we propose a new procedure for scenario discovery - we train an intermediate statistical model which generalizes fast, and use it to label (a lot of) data for PRIM. We provide the statistical intuition behind our idea. Our experimental study shows that this method is much better than PRIM itself. Specifically, our method reduces the number of simulations runs necessary by 75% on average.
Tasks
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01713v1
PDF	https://arxiv.org/pdf/1910.01713v1.pdf
PWC	https://paperswithcode.com/paper/scenario-discovery-via-rule-extraction
Repo
Framework