Paper Group ANR 650
Question Classification with Deep Contextualized Transformer. Why Do Masked Neural Language Models Still Need Common Sense Knowledge?. Random Vector Functional Link Neural Network based Ensemble Deep Learning. Action Semantics Network: Considering the Effects of Actions in Multiagent Systems. Hamiltonian Graph Networks with ODE Integrators. MonSter …
Question Classification with Deep Contextualized Transformer
Title | Question Classification with Deep Contextualized Transformer |
Authors | Haozheng Luo, Ningwei Liu, Charles Feng |
Abstract | The latest work for Question and Answer problems is to use the Stanford Parse Tree. We build on prior work and develop a new method to handle the Question and Answer problem with the Deep Contextualized Transformer to manage some aberrant expressions. We also conduct extensive evaluations of the SQuAD and SwDA dataset and show significant improvement over QA problem classification of industry needs. We also investigate the impact of different models for the accuracy and efficiency of the problem answers. It shows that our new method is more effective for solving QA problems with higher accuracy |
Tasks | |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.10492v1 |
https://arxiv.org/pdf/1910.10492v1.pdf | |
PWC | https://paperswithcode.com/paper/question-classification-with-deep |
Repo | |
Framework | |
Why Do Masked Neural Language Models Still Need Common Sense Knowledge?
Title | Why Do Masked Neural Language Models Still Need Common Sense Knowledge? |
Authors | Sunjae Kwon, Cheongwoong Kang, Jiyeon Han, Jaesik Choi |
Abstract | Currently, contextualized word representations are learned by intricate neural network models, such as masked neural language models (MNLMs). The new representations significantly enhanced the performance in automated question answering by reading paragraphs. However, identifying the detailed knowledge trained in the MNLMs is difficult owing to numerous and intermingled parameters. This paper provides empirical but insightful analyses on the pretrained MNLMs with respect to common sense knowledge. First, we propose a test that measures what types of common sense knowledge do pretrained MNLMs understand. From the test, we observed that MNLMs partially understand various types of common sense knowledge but do not accurately understand the semantic meaning of relations. In addition, based on the difficulty of the question-answering task problems, we observed that pretrained MLM-based models are still vulnerable to problems that require common sense knowledge. We also experimentally demonstrated that we can elevate existing MNLM-based models by combining knowledge from an external common sense repository. |
Tasks | Common Sense Reasoning, Question Answering |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03024v1 |
https://arxiv.org/pdf/1911.03024v1.pdf | |
PWC | https://paperswithcode.com/paper/why-do-masked-neural-language-models-still |
Repo | |
Framework | |
Random Vector Functional Link Neural Network based Ensemble Deep Learning
Title | Random Vector Functional Link Neural Network based Ensemble Deep Learning |
Authors | Rakesh Katuwal, P. N. Suganthan, M. Tanveer |
Abstract | In this paper, we propose a deep learning framework based on randomized neural network. In particular, inspired by the principles of Random Vector Functional Link (RVFL) network, we present a deep RVFL network (dRVFL) with stacked layers. The parameters of the hidden layers of the dRVFL are randomly generated within a suitable range and kept fixed while the output weights are computed using the closed form solution as in a standard RVFL network. We also propose an ensemble deep network (edRVFL) that can be regarded as a marriage of ensemble learning with deep learning. Unlike traditional ensembling approaches that require training several models independently from scratch, edRVFL is obtained by training a single dRVFL network once. Both dRVFL and edRVFL frameworks are generic and can be used with any RVFL variant. To illustrate this, we integrate the deep learning networks with a recently proposed sparse-pretrained RVFL (SP-RVFL). Extensive experiments on benchmark datasets from diverse domains show the superior performance of our proposed deep RVFL networks. |
Tasks | |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00350v1 |
https://arxiv.org/pdf/1907.00350v1.pdf | |
PWC | https://paperswithcode.com/paper/random-vector-functional-link-neural-network |
Repo | |
Framework | |
Action Semantics Network: Considering the Effects of Actions in Multiagent Systems
Title | Action Semantics Network: Considering the Effects of Actions in Multiagent Systems |
Authors | Weixun Wang, Tianpei Yang, Yong Liu, Jianye Hao, Xiaotian Hao, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao |
Abstract | In multiagent systems (MASs), each agent makes individual decisions but all of them contribute globally to the system evolution. Learning in MASs is difficult since each agent’s selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the increase in the number of agents. Previous works borrow various multiagent coordination mechanisms into deep learning architecture to facilitate multiagent coordination. However, none of them explicitly consider action semantics between agents that different actions have different influences on other agents. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions’ influence on other agents using neural networks based on the action semantics between them. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II micromanagement and Neural MMO show ASN significantly improves the performance of state-of-the-art DRL approaches compared with several network architectures. |
Tasks | Starcraft, Starcraft II |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11461v3 |
https://arxiv.org/pdf/1907.11461v3.pdf | |
PWC | https://paperswithcode.com/paper/action-semantics-network-considering-the |
Repo | |
Framework | |
Hamiltonian Graph Networks with ODE Integrators
Title | Hamiltonian Graph Networks with ODE Integrators |
Authors | Alvaro Sanchez-Gonzalez, Victor Bapst, Kyle Cranmer, Peter Battaglia |
Abstract | We introduce an approach for imposing physically informed inductive biases in learned simulation models. We combine graph networks with a differentiable ordinary differential equation integrator as a mechanism for predicting future states, and a Hamiltonian as an internal representation. We find that our approach outperforms baselines without these biases in terms of predictive accuracy, energy accuracy, and zero-shot generalization to time-step sizes and integrator orders not experienced during training. This advances the state-of-the-art of learned simulation, and in principle is applicable beyond physical domains. |
Tasks | |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1909.12790v1 |
https://arxiv.org/pdf/1909.12790v1.pdf | |
PWC | https://paperswithcode.com/paper/hamiltonian-graph-networks-with-ode |
Repo | |
Framework | |
MonSter: Awakening the Mono in Stereo
Title | MonSter: Awakening the Mono in Stereo |
Authors | Yotam Gil, Shay Elmalem, Harel Haim, Emanuel Marom, Raja Giryes |
Abstract | Passive depth estimation is among the most long-studied fields in computer vision. The most common methods for passive depth estimation are either a stereo or a monocular system. Using the former requires an accurate calibration process, and has a limited effective range. The latter, which does not require extrinsic calibration but generally achieves inferior depth accuracy, can be tuned to achieve better results in part of the depth range. In this work, we suggest combining the two frameworks. We propose a two-camera system, in which the cameras are used jointly to extract a stereo depth and individually to provide a monocular depth from each camera. The combination of these depth maps leads to more accurate depth estimation. Moreover, enforcing consistency between the extracted maps leads to a novel online self-calibration strategy. We present a prototype camera that demonstrates the benefits of the proposed combination, for both self-calibration and depth reconstruction in real-world scenes. |
Tasks | Calibration, Depth Estimation |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13708v1 |
https://arxiv.org/pdf/1910.13708v1.pdf | |
PWC | https://paperswithcode.com/paper/monster-awakening-the-mono-in-stereo |
Repo | |
Framework | |
A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity
Title | A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity |
Authors | Yifan Cui, Eric Tchetgen Tchetgen |
Abstract | There is a fast-growing literature on estimating optimal treatment regimes based on randomized trials or observational studies under a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a general instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we provide sufficient conditions for the identification of both value function $E[Y_{\cD(L)}]$ for a given regime $\cD$ and optimal regime $\arg \max_{\cD} E[Y_{\cD(L)}]$ with the aid of a binary instrumental variable, when no unmeasured confounding fails to hold. We also propose novel multiply robust classification-based estimators. Furthermore, we extend the proposed method to identify and estimate the optimal treatment regime among those who would comply to the assigned treatment under a standard monotonicity assumption. In this latter case, we establish the somewhat surprising result that the complier optimal regime can be consistently estimated without directly collecting compliance information and therefore without the complier average treatment effect itself being identified. Our approach is illustrated via extensive simulation studies and a data application on the effect of child rearing on labor participation. |
Tasks | |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09260v2 |
https://arxiv.org/pdf/1911.09260v2.pdf | |
PWC | https://paperswithcode.com/paper/a-semiparametric-instrumental-variable |
Repo | |
Framework | |
Automated Speech Generation from UN General Assembly Statements: Mapping Risks in AI Generated Texts
Title | Automated Speech Generation from UN General Assembly Statements: Mapping Risks in AI Generated Texts |
Authors | Joseph Bullock, Miguel Luengo-Oroz |
Abstract | Automated text generation has been applied broadly in many domains such as marketing and robotics, and used to create chatbots, product reviews and write poetry. The ability to synthesize text, however, presents many potential risks, while access to the technology required to build generative models is becoming increasingly easy. This work is aligned with the efforts of the United Nations and other civil society organisations to highlight potential political and societal risks arising through the malicious use of text generation software, and their potential impact on human rights. As a case study, we present the findings of an experiment to generate remarks in the style of political leaders by fine-tuning a pretrained AWD- LSTM model on a dataset of speeches made at the UN General Assembly. This work highlights the ease with which this can be accomplished, as well as the threats of combining these techniques with other technologies. |
Tasks | Text Generation |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01946v1 |
https://arxiv.org/pdf/1906.01946v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-speech-generation-from-un-general |
Repo | |
Framework | |
A Tensorized Transformer for Language Modeling
Title | A Tensorized Transformer for Language Modeling |
Authors | Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Dawei Song, Ming Zhou |
Abstract | Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition. |
Tasks | Language Modelling, Machine Translation |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.09777v3 |
https://arxiv.org/pdf/1906.09777v3.pdf | |
PWC | https://paperswithcode.com/paper/a-tensorized-transformer-for-language |
Repo | |
Framework | |
Deep Neural Network Ensembles against Deception: Ensemble Diversity, Accuracy and Robustness
Title | Deep Neural Network Ensembles against Deception: Ensemble Diversity, Accuracy and Robustness |
Authors | Ling Liu, Wenqi Wei, Ka-Ho Chow, Margaret Loper, Emre Gursoy, Stacey Truex, Yanzhao Wu |
Abstract | Ensemble learning is a methodology that integrates multiple DNN learners for improving prediction performance of individual learners. Diversity is greater when the errors of the ensemble prediction is more uniformly distributed. Greater diversity is highly correlated with the increase in ensemble accuracy. Another attractive property of diversity optimized ensemble learning is its robustness against deception: an adversarial perturbation attack can mislead one DNN model to misclassify but may not fool other ensemble DNN members consistently. In this paper we first give an overview of the concept of ensemble diversity and examine the three types of ensemble diversity in the context of DNN classifiers. We then describe a set of ensemble diversity measures, a suite of algorithms for creating diversity ensembles and for performing ensemble consensus (voted or learned) for generating high accuracy ensemble output by strategically combining outputs of individual members. This paper concludes with a discussion on a set of open issues in quantifying ensemble diversity for robust deep learning. |
Tasks | |
Published | 2019-08-29 |
URL | https://arxiv.org/abs/1908.11091v1 |
https://arxiv.org/pdf/1908.11091v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-ensembles-against |
Repo | |
Framework | |
Medical Time Series Classification with Hierarchical Attention-based Temporal Convolutional Networks: A Case Study of Myotonic Dystrophy Diagnosis
Title | Medical Time Series Classification with Hierarchical Attention-based Temporal Convolutional Networks: A Case Study of Myotonic Dystrophy Diagnosis |
Authors | Lei Lin, Beilei Xu, Wencheng Wu, Trevor Richardson, Edgar A. Bernal |
Abstract | Myotonia, which refers to delayed muscle relaxation after contraction, is the main symptom of myotonic dystrophy patients. We propose a hierarchical attention-based temporal convolutional network (HA-TCN) for myotonic dystrohpy diagnosis from handgrip time series data, and introduce mechanisms that enable model explainability. We compare the performance of the HA-TCN model against that of benchmark TCN models, LSTM models with and without attention mechanisms, and SVM approaches with handcrafted features. In terms of classification accuracy and F1 score, we found all deep learning models have similar levels of performance, and they all outperform SVM. Further, the HA-TCN model outperforms its TCN counterpart with regards to computational efficiency regardless of network depth, and in terms of performance particularly when the number of hidden layers is small. Lastly, HA-TCN models can consistently identify relevant time series segments in the relaxation phase of the handgrip time series, and exhibit increased robustness to noise when compared to attention-based LSTM models. |
Tasks | Time Series, Time Series Classification |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.11748v1 |
http://arxiv.org/pdf/1903.11748v1.pdf | |
PWC | https://paperswithcode.com/paper/medical-time-series-classification-with |
Repo | |
Framework | |
A Constructive Approach for Data-Driven Randomized Learning of Feedforward Neural Networks
Title | A Constructive Approach for Data-Driven Randomized Learning of Feedforward Neural Networks |
Authors | Grzegorz Dudek |
Abstract | Feedforward neural networks with random hidden nodes suffer from a problem with the generation of random weights and biases as these are difficult to set optimally to obtain a good projection space. Typically, random parameters are drawn from an interval which is fixed before or adapted during the learning process. Due to the different functions of the weights and biases, selecting them both from the same interval is not a good idea. Recently more sophisticated methods of random parameters generation have been developed, such as the data-driven method proposed in \cite{Anon19}, where the sigmoids are placed in randomly selected regions of the input space and then their slopes are adjusted to the local fluctuations of the target function. In this work, we propose an extended version of this method, which constructs iteratively the network architecture. This method successively generates new hidden nodes and accepts them if the training error decreases significantly. The threshold of acceptance is adapted to the current training stage. At the beginning of the training process only those nodes which lead to the largest error reduction are accepted. Then, the threshold is reduced by half to accept those nodes which model the target function details more accurately. This leads to faster convergence and more compact network architecture, as it includes only “significant” neurons. Several application examples are given which confirm this thesis. |
Tasks | |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01961v2 |
https://arxiv.org/pdf/1909.01961v2.pdf | |
PWC | https://paperswithcode.com/paper/a-constructive-approach-for-data-driven |
Repo | |
Framework | |
Learning a Safety Verifiable Adaptive Cruise Controller from Human Driving Data
Title | Learning a Safety Verifiable Adaptive Cruise Controller from Human Driving Data |
Authors | Qin Lin, Sicco Verwer, John Dolan |
Abstract | Imitation learning provides a way to automatically construct a controller by mimicking human behavior from data. For safety-critical systems such as autonomous vehicles, it can be problematic to use controllers learned from data because they cannot be guaranteed to be collision-free. Recently, a method has been proposed for learning a multi-mode hybrid automaton cruise controller (MOHA). Besides being accurate, the logical nature of this model makes it suitable for formal verification. In this paper, we demonstrate this capability using the SpaceEx hybrid model checker as follows. After learning, we translate the automaton model into constraints and equations required by SpaceEx. We then verify that a pure MOHA controller is not collision-free. By adding a safety state based on headway in time, a rule that human drivers should follow anyway, we do obtain a provably safe cruise control. Moreover, the safe controller remains more human-like than existing cruise controllers. |
Tasks | Autonomous Vehicles, Imitation Learning |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13526v1 |
https://arxiv.org/pdf/1910.13526v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-safety-verifiable-adaptive-cruise |
Repo | |
Framework | |
UCAM Biomedical translation at WMT19: Transfer learning multi-domain ensembles
Title | UCAM Biomedical translation at WMT19: Transfer learning multi-domain ensembles |
Authors | Danielle Saunders, Felix Stahlberg, Bill Byrne |
Abstract | The 2019 WMT Biomedical translation task involved translating Medline abstracts. We approached this using transfer learning to obtain a series of strong neural models on distinct domains, and combining them into multi-domain ensembles. We further experiment with an adaptive language-model ensemble weighting scheme. Our submission achieved the best submitted results on both directions of English-Spanish. |
Tasks | Language Modelling, Transfer Learning |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05786v1 |
https://arxiv.org/pdf/1906.05786v1.pdf | |
PWC | https://paperswithcode.com/paper/ucam-biomedical-translation-at-wmt19-transfer |
Repo | |
Framework | |
Scenario Discovery via Rule Extraction
Title | Scenario Discovery via Rule Extraction |
Authors | Vadim Arzamasov, Klemens Böhm |
Abstract | Scenario discovery is the process of finding areas of interest, commonly referred to as scenarios, in data spaces resulting from simulations. For instance, one might search for conditions - which are inputs of the simulation model - where the system under investigation is unstable. A commonly used algorithm for scenario discovery is PRIM. It yields scenarios in the form of hyper-rectangles which are human-comprehensible. When the simulation model has many inputs, and the simulations are computationally expensive, PRIM may not produce good results, given the affordable volume of data. So we propose a new procedure for scenario discovery - we train an intermediate statistical model which generalizes fast, and use it to label (a lot of) data for PRIM. We provide the statistical intuition behind our idea. Our experimental study shows that this method is much better than PRIM itself. Specifically, our method reduces the number of simulations runs necessary by 75% on average. |
Tasks | |
Published | 2019-10-03 |
URL | https://arxiv.org/abs/1910.01713v1 |
https://arxiv.org/pdf/1910.01713v1.pdf | |
PWC | https://paperswithcode.com/paper/scenario-discovery-via-rule-extraction |
Repo | |
Framework | |