Paper Group ANR 328
Retrofitting Contextualized Word Embeddings with Paraphrases. Label-PEnet: Sequential Label Propagation and Enhancement Networks forWeakly Supervised Instance Segmentation. Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF. One-To-Many Multilingual End-to-end Speech Translation. Assessing the Safety …
Retrofitting Contextualized Word Embeddings with Paraphrases
Title | Retrofitting Contextualized Word Embeddings with Paraphrases |
Authors | Weijia Shi, Muhao Chen, Pei Zhou, Kai-Wei Chang |
Abstract | Contextualized word embedding models, such as ELMo, generate meaningful representations of words and their context. These models have been shown to have a great impact on downstream applications. However, in many cases, the contextualized embedding of a word changes drastically when the context is paraphrased. As a result, the downstream model is not robust to paraphrasing and other linguistic variations. To enhance the stability of contextualized word embedding models, we propose an approach to retrofitting contextualized embedding models with paraphrase contexts. Our method learns an orthogonal transformation on the input space, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the retrofitted model significantly outperforms the original ELMo on various sentence classification and language inference tasks. |
Tasks | Sentence Classification, Word Embeddings |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.09700v1 |
https://arxiv.org/pdf/1909.09700v1.pdf | |
PWC | https://paperswithcode.com/paper/retrofitting-contextualized-word-embeddings |
Repo | |
Framework | |
Label-PEnet: Sequential Label Propagation and Enhancement Networks forWeakly Supervised Instance Segmentation
Title | Label-PEnet: Sequential Label Propagation and Enhancement Networks forWeakly Supervised Instance Segmentation |
Authors | Weifeng Ge, Sheng Guo, Weilin Huang, Matthew R. Scott |
Abstract | Weakly-supervised instance segmentation aims to detect and segment object instances precisely, given imagelevel labels only. Unlike previous methods which are composed of multiple offline stages, we propose Sequential Label Propagation and Enhancement Networks (referred as Label-PEnet) that progressively transform image-level labels to pixel-wise labels in a coarse-to-fine manner. We design four cascaded modules including multi-label classification, object detection, instance refinement and instance segmentation, which are implemented sequentially by sharing the same backbone. The cascaded pipeline is trained alternatively with a curriculum learning strategy that generalizes labels from high-level images to low-level pixels gradually with increasing accuracy. In addition, we design a proposal calibration module to explore the ability of classification networks to find key pixels that identify object parts, which serves as a post validation strategy running in the inverse order. We evaluate the efficiency of our Label-PEnet in mining instance masks on standard benchmarks: PASCAL VOC 2007 and 2012. Experimental results show that Label-PEnet outperforms the state-of-the-art algorithms by a clear margin, and obtains comparable performance even with the fully-supervised approaches. |
Tasks | Calibration, Instance Segmentation, Multi-Label Classification, Object Detection, Semantic Segmentation, Weakly-supervised instance segmentation |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02624v2 |
https://arxiv.org/pdf/1910.02624v2.pdf | |
PWC | https://paperswithcode.com/paper/label-penet-sequential-label-propagation-and |
Repo | |
Framework | |
Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF
Title | Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF |
Authors | Jan Kocoń, Michał Gawor |
Abstract | The article introduces a new set of Polish word embeddings, built using KGR10 corpus, which contains more than 4 billion words. These embeddings are evaluated in the problem of recognition of temporal expressions (timexes) for the Polish language. We described the process of KGR10 corpus creation and a new approach to the recognition problem using Bidirectional Long-Short Term Memory (BiLSTM) network with additional CRF layer, where specific embeddings are essential. We presented experiments and conclusions drawn from them. |
Tasks | Word Embeddings |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.04055v1 |
http://arxiv.org/pdf/1904.04055v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-kgr10-polish-word-embeddings-in |
Repo | |
Framework | |
One-To-Many Multilingual End-to-end Speech Translation
Title | One-To-Many Multilingual End-to-end Speech Translation |
Authors | Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi |
Abstract | Nowadays, training end-to-end neural models for spoken language translation (SLT) still has to confront with extreme data scarcity conditions. The existing SLT parallel corpora are indeed orders of magnitude smaller than those available for the closely related tasks of automatic speech recognition (ASR) and machine translation (MT), which usually comprise tens of millions of instances. To cope with data paucity, in this paper we explore the effectiveness of transfer learning in end-to-end SLT by presenting a multilingual approach to the task. Multilingual solutions are widely studied in MT and usually rely on ``\textit{target forcing}'', in which multilingual parallel data are combined to train a single model by prepending to the input sequences a language token that specifies the target language. However, when tested in speech translation, our experiments show that MT-like \textit{target forcing}, used as is, is not effective in discriminating among the target languages. Thus, we propose a variant that uses target-language embeddings to shift the input representations in different portions of the space according to the language, so to better support the production of output in the desired target language. Our experiments on end-to-end SLT from English into six languages show important improvements when translating into similar languages, especially when these are supported by scarce data. Further improvements are obtained when using English ASR data as an additional language (up to $+2.5$ BLEU points). | |
Tasks | Machine Translation, Speech Recognition, Transfer Learning |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03320v1 |
https://arxiv.org/pdf/1910.03320v1.pdf | |
PWC | https://paperswithcode.com/paper/one-to-many-multilingual-end-to-end-speech |
Repo | |
Framework | |
Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing
Title | Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing |
Authors | Xingyu Zhao, Valentin Robu, David Flynn, Kizito Salako, Lorenzo Strigini |
Abstract | There is an urgent societal need to assess whether autonomous vehicles (AVs) are safe enough. From published quantitative safety and reliability assessments of AVs, we know that, given the goal of predicting very low rates of accidents, road testing alone requires infeasible numbers of miles to be driven. However, previous analyses do not consider any knowledge prior to road testing - knowledge which could bring substantial advantages if the AV design allows strong expectations of safety before road testing. We present the advantages of a new variant of Conservative Bayesian Inference (CBI), which uses prior knowledge while avoiding optimistic biases. We then study the trend of disengagements (take-overs by human drivers) by applying Software Reliability Growth Models (SRGMs) to data from Waymo’s public road testing over 51 months, in view of the practice of software updates during this testing. Our approach is to not trust any specific SRGM, but to assess forecast accuracy and then improve forecasts. We show that, coupled with accuracy assessment and recalibration techniques, SRGMs could be a valuable test planning aid. |
Tasks | Autonomous Vehicles, Bayesian Inference |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06540v1 |
https://arxiv.org/pdf/1908.06540v1.pdf | |
PWC | https://paperswithcode.com/paper/assessing-the-safety-and-reliability-of |
Repo | |
Framework | |
Efficient Relaxed Gradient Support Pursuit for Sparsity Constrained Non-convex Optimization
Title | Efficient Relaxed Gradient Support Pursuit for Sparsity Constrained Non-convex Optimization |
Authors | Fanhua Shang, Bingkun Wei, Hongying Liu, Yuanyuan Liu, Jiacheng Zhuo |
Abstract | Large-scale non-convex sparsity-constrained problems have recently gained extensive attention. Most existing deterministic optimization methods (e.g., GraSP) are not suitable for large-scale and high-dimensional problems, and thus stochastic optimization methods with hard thresholding (e.g., SVRGHT) become more attractive. Inspired by GraSP, this paper proposes a new general relaxed gradient support pursuit (RGraSP) framework, in which the sub-algorithm only requires to satisfy a slack descent condition. We also design two specific semi-stochastic gradient hard thresholding algorithms. In particular, our algorithms have much less hard thresholding operations than SVRGHT, and their average per-iteration cost is much lower (i.e., O(d) vs. O(d log(d)) for SVRGHT), which leads to faster convergence. Our experimental results on both synthetic and real-world datasets show that our algorithms are superior to the state-of-the-art gradient hard thresholding methods. |
Tasks | Stochastic Optimization |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00858v1 |
https://arxiv.org/pdf/1912.00858v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-relaxed-gradient-support-pursuit |
Repo | |
Framework | |
BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding
Title | BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding |
Authors | Emad Elwany, Dave Moore, Gaurav Oberoi |
Abstract | Fine-tuning language models, such as BERT, on domain specific corpora has proven to be valuable in domains like scientific papers and biomedical text. In this paper, we show that fine-tuning BERT on legal documents similarly provides valuable improvements on NLP tasks in the legal domain. Demonstrating this outcome is significant for analyzing commercial agreements, because obtaining large legal corpora is challenging due to their confidential nature. As such, we show that having access to large legal corpora is a competitive advantage for commercial applications, and academic research on analyzing contracts. |
Tasks | |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.00473v1 |
https://arxiv.org/pdf/1911.00473v1.pdf | |
PWC | https://paperswithcode.com/paper/bert-goes-to-law-school-quantifying-the |
Repo | |
Framework | |
T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor
Title | T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor |
Authors | Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos, Maja Pantic |
Abstract | Recent findings indicate that over-parametrization, while crucial for successfully training deep neural networks, also introduces large amounts of redundancy. Tensor methods have the potential to efficiently parametrize over-complete representations by leveraging this redundancy. In this paper, we propose to fully parametrize Convolutional Neural Networks (CNNs) with a single high-order, low-rank tensor. Previous works on network tensorization have focused on parametrizing individual layers (convolutional or fully connected) only, and perform the tensorization layer-by-layer separately. In contrast, we propose to jointly capture the full structure of a neural network by parametrizing it with a single high-order tensor, the modes of which represent each of the architectural design parameters of the network (e.g. number of convolutional blocks, depth, number of stacks, input features, etc). This parametrization allows to regularize the whole network and drastically reduce the number of parameters. Our model is end-to-end trainable and the low-rank structure imposed on the weight tensor acts as an implicit regularization. We study the case of networks with rich structure, namely Fully Convolutional Networks (FCNs), which we propose to parametrize with a single 8th-order tensor. We show that our approach can achieve superior performance with small compression rates, and attain high compression rates with negligible drop in accuracy for the challenging task of human pose estimation. |
Tasks | Pose Estimation |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02698v1 |
http://arxiv.org/pdf/1904.02698v1.pdf | |
PWC | https://paperswithcode.com/paper/t-net-parametrizing-fully-convolutional-nets |
Repo | |
Framework | |
Progressive Generative Adversarial Binary Networks for Music Generation
Title | Progressive Generative Adversarial Binary Networks for Music Generation |
Authors | Manan Oza, Himanshu Vaghela, Kriti Srivastava |
Abstract | Recent improvements in generative adversarial network (GAN) training techniques prove that progressively training a GAN drastically stabilizes the training and improves the quality of outputs produced. Adding layers after the previous ones have converged has proven to help in better overall convergence and stability of the model as well as reducing the training time by a sufficient amount. Thus we use this training technique to train the model progressively in the time and pitch domain i.e. starting from a very small time value and pitch range we gradually expand the matrix sizes until the end result is a completely trained model giving outputs having tensor sizes [4 (bar) x 96 (time steps) x 84 (pitch values) x 8 (tracks)]. As proven in previously proposed models deterministic binary neurons also help in improving the results. Thus we make use of a layer of deterministic binary neurons at the end of the generator to get binary valued outputs instead of fractional values existing between 0 and 1. |
Tasks | Music Generation |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.04722v1 |
http://arxiv.org/pdf/1903.04722v1.pdf | |
PWC | https://paperswithcode.com/paper/progressive-generative-adversarial-binary |
Repo | |
Framework | |
Reproducibility Evaluation of SLANT Whole Brain Segmentation Across Clinical Magnetic Resonance Imaging Protocols
Title | Reproducibility Evaluation of SLANT Whole Brain Segmentation Across Clinical Magnetic Resonance Imaging Protocols |
Authors | Yunxi Xiong, Yuankai Huo, Jiachen Wang, L. Taylor Davis, Maureen McHugo, Bennett A. Landman |
Abstract | Whole brain segmentation on structural magnetic resonance imaging (MRI) is essential for understanding neuroanatomical-functional relationships. Traditionally, multi-atlas segmentation has been regarded as the standard method for whole brain segmentation. In past few years, deep convolutional neural network (DCNN) segmentation methods have demonstrated their advantages in both accuracy and computational efficiency. Recently, we proposed the spatially localized atlas network tiles (SLANT) method, which is able to segment a 3D MRI brain scan into 132 anatomical regions. Commonly, DCNN segmentation methods yield inferior performance under external validations, especially when the testing patterns were not presented in the training cohorts. Recently, we obtained a clinically acquired, multi-sequence MRI brain cohort with 1480 clinically acquired, de-identified brain MRI scans on 395 patients using seven different MRI protocols. Moreover, each subject has at least two scans from different MRI protocols. Herein, we assess the SLANT method’s intra- and inter-protocol reproducibility. SLANT achieved less than 0.05 coefficient of variation (CV) for intra-protocol experiments and less than 0.15 CV for inter-protocol experiments. The results show that the SLANT method achieved high intra- and inter- protocol reproducibility. |
Tasks | Brain Segmentation |
Published | 2019-01-07 |
URL | http://arxiv.org/abs/1901.02040v1 |
http://arxiv.org/pdf/1901.02040v1.pdf | |
PWC | https://paperswithcode.com/paper/reproducibility-evaluation-of-slant-whole |
Repo | |
Framework | |
Supervised Learning in Spiking Neural Networks with Phase-Change Memory Synapses
Title | Supervised Learning in Spiking Neural Networks with Phase-Change Memory Synapses |
Authors | S. R. Nandakumar, Irem Boybat, Manuel Le Gallo, Evangelos Eleftheriou, Abu Sebastian, Bipin Rajendran |
Abstract | Spiking neural networks (SNN) are artificial computational models that have been inspired by the brain’s ability to naturally encode and process information in the time domain. The added temporal dimension is believed to render them more computationally efficient than the conventional artificial neural networks, though their full computational capabilities are yet to be explored. Recently, computational memory architectures based on non-volatile memory crossbar arrays have shown great promise to implement parallel computations in artificial and spiking neural networks. In this work, we experimentally demonstrate for the first time, the feasibility to realize high-performance event-driven in-situ supervised learning systems using nanoscale and stochastic phase-change synapses. Our SNN is trained to recognize audio signals of alphabets encoded using spikes in the time domain and to generate spike trains at precise time instances to represent the pixel intensities of their corresponding images. Moreover, with a statistical model capturing the experimental behavior of the devices, we investigate architectural and systems-level solutions for improving the training and inference performance of our computational memory-based system. Combining the computational potential of supervised SNNs with the parallel compute power of computational memory, the work paves the way for next-generation of efficient brain-inspired systems. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11929v1 |
https://arxiv.org/pdf/1905.11929v1.pdf | |
PWC | https://paperswithcode.com/paper/supervised-learning-in-spiking-neural-2 |
Repo | |
Framework | |
Joint Architecture and Knowledge Distillation in Convolutional Neural Network for Offline Handwritten Chinese Text Recognition
Title | Joint Architecture and Knowledge Distillation in Convolutional Neural Network for Offline Handwritten Chinese Text Recognition |
Authors | Zi-Rui Wang, Jun Du |
Abstract | The technique of distillation helps transform cumbersome neural network into compact network so that the model can be deployed on alternative hardware devices. The main advantages of distillation based approaches include simple training process, supported by most off-the-shelf deep learning softwares and no special requirement of hardwares. In this paper, we propose a guideline to distill the architecture and knowledge of pre-trained standard CNNs simultaneously. We first make a quantitative analysis of the baseline network, including computational cost and storage overhead in different components. And then, according to the analysis results, optional strategies can be adopted to the compression of fully-connected layers. For vanilla convolution layers, the proposed parsimonious convolution (ParConv) block only consisting of depthwise separable convolution and pointwise convolution is used as a direct replacement without other adjustments such as the widths and depths in the network. Finally, the knowledge distillation with multiple losses is adopted to improve performance of the compact CNN. The proposed algorithm is first verified on offline handwritten Chinese text recognition (HCTR) where the CNNs are characterized by tens of thousands of output nodes and trained by hundreds of millions of training samples. Compared with the CNN in the state-of-the-art system, our proposed joint architecture and knowledge distillation can reduce the computational cost by >10x and model size by >8x with negligible accuracy loss. And then, by conducting experiments on one of the most popular data sets: MNIST, we demonstrate the proposed approach can also be successfully applied on mainstream backbone networks. |
Tasks | Handwritten Chinese Text Recognition |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.07806v1 |
https://arxiv.org/pdf/1912.07806v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-architecture-and-knowledge-distillation |
Repo | |
Framework | |
Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving
Title | Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving |
Authors | Jakob Suchan, Mehul Bhatt, Srikrishna Varadarajan |
Abstract | We demonstrate the need and potential of systematically integrated vision and semantics} solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in (deep learning based) visual computing, and is developed as a modular framework usable within hybrid architectures for perception & control. We evaluate and demo with community established benchmarks KITTIMOD and MOT. As use-case, we focus on the significance of human-centred visual sensemaking —e.g., semantic representation and explainability, question-answering, commonsense interpolation— in safety-critical autonomous driving situations. |
Tasks | Autonomous Driving, Question Answering |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.00107v1 |
https://arxiv.org/pdf/1906.00107v1.pdf | |
PWC | https://paperswithcode.com/paper/190600107 |
Repo | |
Framework | |
Inverse Cognitive Radar – A Revealed Preferences Approach
Title | Inverse Cognitive Radar – A Revealed Preferences Approach |
Authors | Vikram Krishnamurthy, Daniel Angley, Robin Evans, William Moran |
Abstract | We consider an adversarial signal processing problem involving us versus an enemy radar equipped with a Bayesian tracker. By observing the emissions of the enemy radar, how can we detect if the radar is cognitive (constrained utility maximizer)? Given knowledge of our state and the observed sequence of actions taken by the enemy radar, we consider three problems: (i) Are the enemy radar actions (waveform choice, beam scheduling) consistent with constrained utility maximization? If so how can we estimate the cognitive radar utility function that is consistent with its actions. We formulate and solve the problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and the algebraic Riccati equation. (ii) How to construct a statistical test for detecting a cognitive radar (constrained utility maximization) when we observe the radar actions in noise or the radar observes our probe signal in noise? We propose a statistical detector with a tight Type 2 error bound. (iii) How can we optimally probe (interrogate) the enemy radar by choosing our state to minimize the Type 2 error of detecting if the radar is deploying an economic rational strategy, subject to a constraint on the Type 1 detection error? We present a stochastic optimization algorithm to optimize our probe signal. Our state can be viewed as a probe signal which causes the enemy’s radar to act; so choosing the optimal state sequence is an input design problem. The main analysis framework used in this paper is that of revealed preferences from microeconomics. |
Tasks | Stochastic Optimization |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00331v2 |
https://arxiv.org/pdf/1912.00331v2.pdf | |
PWC | https://paperswithcode.com/paper/inverse-cognitive-radar-a-revealed |
Repo | |
Framework | |
The Partial Response Network
Title | The Partial Response Network |
Authors | Paulo J. G. Lisboa, Sandra Ortega-Martorell, Sadie Cashman, Ivan Olier |
Abstract | We propose a method to open the black box of the Multi-Layer Perceptron by inferring from it a simpler and generally more accurate general additive model. The resulting model comprises non-linear univariate and bivariate partial responses derived from the original Multi-Layer Perceptron. The responses are combined using the Lasso and further optimised within a modular structure. The approach is generic and provides a constructive framework to simplify and explain the Multi-Layer Perceptron for any data set, opening the door for validation against prior knowledge. Experimental results on benchmarking datasets indicate that the partial responses are intuitive to interpret and the Area Under the Curve is competitive with Gradient Boosting, Support Vector Machines and Random Forests. The performance improvement compared with a fully connected Multi-Layer Perceptron is attributed to reduced confounding in the second stage of optimisation of the weights. The main limitation of the method is that it explicitly models only up to pairwise interactions. For many practical applications this will be optimal, but where that is not the case then this will be indicated by the performance difference compared to the original model. The streamlined model simultaneously interprets and optimises this frequently used flexible model. |
Tasks | |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05978v1 |
https://arxiv.org/pdf/1908.05978v1.pdf | |
PWC | https://paperswithcode.com/paper/the-partial-response-network |
Repo | |
Framework | |