April 2, 2020

2817 words 14 mins read

Paper Group ANR 355

Informative Gaussian Scale Mixture Priors for Bayesian Neural Networks. Training with Streaming Annotation. Local Contextual Attention with Hierarchical Structure for Dialogue Act Recognition. Consistency of a Recurrent Language Model With Respect to Incomplete Decoding. Aligning the Pretraining and Finetuning Objectives of Language Models. Stabili …

Informative Gaussian Scale Mixture Priors for Bayesian Neural Networks


Title	Informative Gaussian Scale Mixture Priors for Bayesian Neural Networks
Authors	Tianyu Cui, Aki Havulinna, Pekka Marttinen, Samuel Kaski
Abstract	Encoding domain knowledge into the prior over the high-dimensional weight space is challenging in Bayesian neural networks. Two types of domain knowledge are commonly available in scientific applications: 1. feature sparsity (number of relevant features); 2. signal-to-noise ratio, quantified, for instance, as the proportion of variance explained (PVE). We show both types of domain knowledge can be encoded into the widely used Gaussian scale mixture priors with Automatic Relevance Determination. Specifically, we propose a new joint prior over the local (i.e., feature-specific) scale parameters to encode the knowledge about feature sparsity, and an algorithm to determine the global scale parameter (shared by all features) according to the PVE. Empirically, we show that the proposed informative prior improves prediction accuracy on publicly available datasets and in a genetics application.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10243v1
PDF	https://arxiv.org/pdf/2002.10243v1.pdf
PWC	https://paperswithcode.com/paper/informative-gaussian-scale-mixture-priors-for
Repo
Framework

Training with Streaming Annotation


Title	Training with Streaming Annotation
Authors	Tongtao Zhang, Heng Ji, Shih-Fu Chang, Marjorie Freedman
Abstract	In this paper, we address a practical scenario where training data is released in a sequence of small-scale batches and annotation in earlier phases has lower quality than the later counterparts. To tackle the situation, we utilize a pre-trained transformer network to preserve and integrate the most salient document information from the earlier batches while focusing on the annotation (presumably with higher quality) from the current batch. Using event extraction as a case study, we demonstrate in the experiments that our proposed framework can perform better than conventional approaches (the improvement ranges from 3.6 to 14.9% absolute F-score gain), especially when there is more noise in the early annotation; and our approach spares 19.1% time with regard to the best conventional method.
Tasks
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04165v1
PDF	https://arxiv.org/pdf/2002.04165v1.pdf
PWC	https://paperswithcode.com/paper/training-with-streaming-annotation
Repo
Framework

Local Contextual Attention with Hierarchical Structure for Dialogue Act Recognition


Title	Local Contextual Attention with Hierarchical Structure for Dialogue Act Recognition
Authors	Zhigang Dai, Jinhua Fu, Qile Zhu, Hengbin Cui, Xiaolong li, Yuan Qi
Abstract	Dialogue act recognition is a fundamental task for an intelligent dialogue system. Previous work models the whole dialog to predict dialog acts, which may bring the noise from unrelated sentences. In this work, we design a hierarchical model based on self-attention to capture intra-sentence and inter-sentence information. We revise the attention distribution to focus on the local and contextual semantic information by incorporating the relative position information between utterances. Based on the found that the length of dialog affects the performance, we introduce a new dialog segmentation mechanism to analyze the effect of dialog length and context padding length under online and offline settings. The experiment shows that our method achieves promising performance on two datasets: Switchboard Dialogue Act and DailyDialog with the accuracy of 80.34% and 85.81% respectively. Visualization of the attention weights shows that our method can learn the context dependency between utterances explicitly.
Tasks
Published	2020-03-12
URL	https://arxiv.org/abs/2003.06044v1
PDF	https://arxiv.org/pdf/2003.06044v1.pdf
PWC	https://paperswithcode.com/paper/local-contextual-attention-with-hierarchical
Repo
Framework

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding


Title	Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
Authors	Sean Welleck, Ilia Kulikov, Jaedeok Kim, Richard Yuanzhe Pang, Kyunghyun Cho
Abstract	Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms - greedy search, beam search, top-k sampling, and nucleus sampling - are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency.
Tasks	Language Modelling
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02492v1
PDF	https://arxiv.org/pdf/2002.02492v1.pdf
PWC	https://paperswithcode.com/paper/consistency-of-a-recurrent-language-model
Repo
Framework

Aligning the Pretraining and Finetuning Objectives of Language Models


Title	Aligning the Pretraining and Finetuning Objectives of Language Models
Authors	Nuo Wang Pierse, Jingwen Lu
Abstract	We demonstrate that explicitly aligning the pretraining objectives to the finetuning objectives in language model training significantly improves the finetuning task performance and reduces the minimum amount of finetuning examples required. The performance margin gained from objective alignment allows us to build language models with smaller sizes for tasks with less available training data. We provide empirical evidence of these claims by applying objective alignment to concept-of-interest tagging and acronym detection tasks. We found that, with objective alignment, our 768 by 3 and 512 by 3 transformer language models can reach accuracy of 83.9%/82.5% for concept-of-interest tagging and 73.8%/70.2% for acronym detection using only 200 finetuning examples per task, outperforming the 768 by 3 model pretrained without objective alignment by +4.8%/+3.4% and +9.9%/+6.3%. We name finetuning small language models in the presence of hundreds of training examples or less “Few Example learning”. In practice, Few Example Learning enabled by objective alignment not only saves human labeling costs, but also makes it possible to leverage language models in more real-time applications.
Tasks	Language Modelling
Published	2020-02-05
URL	https://arxiv.org/abs/2002.02000v1
PDF	https://arxiv.org/pdf/2002.02000v1.pdf
PWC	https://paperswithcode.com/paper/aligning-the-pretraining-and-finetuning
Repo
Framework

Stability for the Training of Deep Neural Networks and Other Classifiers


Title	Stability for the Training of Deep Neural Networks and Other Classifiers
Authors	Leonid Berlyand, Pierre-Emmanuel Jabin, C. Alex Safsten
Abstract	We examine the stability of loss-minimizing training processes that are used for deep neural network (DNN) and other classifiers. While a classifier is optimized during training through a so-called loss function, the performance of classifiers is usually evaluated by some measure of accuracy, such as the overall accuracy which quantifies the proportion of objects that are well classified. This leads to the guiding question of stability: does decreasing loss through training always result in increased accuracy? We formalize the notion of stability, and provide examples of instability. Our main result is two novel conditions on the classifier which, if either is satisfied, ensure stability of training, that is we derive tight bounds on accuracy as loss decreases. These conditions are explicitly verifiable in practice on a given dataset. Our results do not depend on the algorithm used for training, as long as loss decreases with training.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04122v1
PDF	https://arxiv.org/pdf/2002.04122v1.pdf
PWC	https://paperswithcode.com/paper/stability-for-the-training-of-deep-neural
Repo
Framework

Accumulated Polar Feature-based Deep Learning for Efficient and Lightweight Automatic Modulation Classification with Channel Compensation Mechanism


Title	Accumulated Polar Feature-based Deep Learning for Efficient and Lightweight Automatic Modulation Classification with Channel Compensation Mechanism
Authors	Chieh-Fang Teng, Ching-Yao Chou, Chun-Hsiang Chen, An-Yeu Wu
Abstract	In next-generation communications, massive machine-type communications (mMTC) induce severe burden on base stations. To address such an issue, automatic modulation classification (AMC) can help to reduce signaling overhead by blindly recognizing the modulation types without handshaking. Thus, it plays an important role in future intelligent modems. The emerging deep learning (DL) technique stores intelligence in the network, resulting in superior performance over traditional approaches. However, conventional DL-based approaches suffer from heavy training overhead, memory overhead, and computational complexity, which severely hinder practical applications for resource-limited scenarios, such as Vehicle-to-Everything (V2X) applications. Furthermore, the overhead of online retraining under time-varying fading channels has not been studied in the prior arts. In this work, an accumulated polar feature-based DL with a channel compensation mechanism is proposed to cope with the aforementioned issues. Firstly, the simulation results show that learning features from the polar domain with historical data information can approach near-optimal performance while reducing training overhead by 99.8 times. Secondly, the proposed neural network-based channel estimator (NN-CE) can learn the channel response and compensate for the distorted channel with 13% improvement. Moreover, in applying this lightweight NN-CE in a time-varying fading channel, two efficient mechanisms of online retraining are proposed, which can reduce transmission overhead and retraining overhead by 90% and 76%, respectively. Finally, the performance of the proposed approach is evaluated and compared with prior arts on a public dataset to demonstrate its great efficiency and lightness.
Tasks
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01395v2
PDF	https://arxiv.org/pdf/2001.01395v2.pdf
PWC	https://paperswithcode.com/paper/accumulated-polar-feature-based-deep-learning
Repo
Framework

CatBoostLSS – An extension of CatBoost to probabilistic forecasting


Title	CatBoostLSS – An extension of CatBoost to probabilistic forecasting
Authors	Alexander März
Abstract	We propose a new framework of CatBoost that predicts the entire conditional distribution of a univariate response variable. In particular, CatBoostLSS models all moments of a parametric distribution (i.e., mean, location, scale and shape [LSS]) instead of the conditional mean only. Choosing from a wide range of continuous, discrete and mixed discrete-continuous distributions, modelling and predicting the entire conditional distribution greatly enhances the flexibility of CatBoost, as it allows to gain insight into the data generating process, as well as to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived. We present both a simulation study and real-world examples that demonstrate the benefits of our approach.
Tasks
Published	2020-01-04
URL	https://arxiv.org/abs/2001.02121v1
PDF	https://arxiv.org/pdf/2001.02121v1.pdf
PWC	https://paperswithcode.com/paper/catboostlss-an-extension-of-catboost-to
Repo
Framework

On the Communication Latency of Wireless Decentralized Learning


Title	On the Communication Latency of Wireless Decentralized Learning
Authors	Navid Naderializadeh
Abstract	We consider a wireless network comprising $n$ nodes located within a circular area of radius $R$, which are participating in a decentralized learning algorithm to optimize a global objective function using their local datasets. To enable gradient exchanges across the network, we assume each node communicates only with a set of neighboring nodes, which are within a distance $R n^{-\beta}$ of itself, where $\beta\in(0,\frac{1}{2})$. We use tools from network information theory and random geometric graph theory to show that the communication delay for a single round of exchanging gradients on all the links throughout the network scales as $\mathcal{O}\left(\frac{n^{2-3\beta}}{\beta\log n}\right)$, increasing (at different rates) with both the number of nodes and the gradient exchange threshold distance.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04069v1
PDF	https://arxiv.org/pdf/2002.04069v1.pdf
PWC	https://paperswithcode.com/paper/on-the-communication-latency-of-wireless
Repo
Framework

The Knowledge Graph Track at OAEI – Gold Standards, Baselines, and the Golden Hammer Bias


Title	The Knowledge Graph Track at OAEI – Gold Standards, Baselines, and the Golden Hammer Bias
Authors	Sven Hertling, Heiko Paulheim
Abstract	The Ontology Alignment Evaluation Initiative (OAEI) is an annual evaluation of ontology matching tools. In 2018, we have started the Knowledge Graph track, whose goal is to evaluate the simultaneous matching of entities and schemas of large-scale knowledge graphs. In this paper, we discuss the design of the track and two different strategies of gold standard creation. We analyze results and experiences obtained in first editions of the track, and, by revealing a hidden task, we show that all tools submitted to the track (and probably also to other tracks) suffer from a bias which we name the golden hammer bias.
Tasks	Knowledge Graphs
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10283v1
PDF	https://arxiv.org/pdf/2002.10283v1.pdf
PWC	https://paperswithcode.com/paper/the-knowledge-graph-track-at-oaei-gold
Repo
Framework

Hypergraph Optimization for Multi-structural Geometric Model Fitting


Title	Hypergraph Optimization for Multi-structural Geometric Model Fitting
Authors	Shuyuan Lin, Guobao Xiao, Yan Yan, David Suter, Hanzi Wang
Abstract	Recently, some hypergraph-based methods have been proposed to deal with the problem of model fitting in computer vision, mainly due to the superior capability of hypergraph to represent the complex relationship between data points. However, a hypergraph becomes extremely complicated when the input data include a large number of data points (usually contaminated with noises and outliers), which will significantly increase the computational burden. In order to overcome the above problem, we propose a novel hypergraph optimization based model fitting (HOMF) method to construct a simple but effective hypergraph. Specifically, HOMF includes two main parts: an adaptive inlier estimation algorithm for vertex optimization and an iterative hyperedge optimization algorithm for hyperedge optimization. The proposed method is highly efficient, and it can obtain accurate model fitting results within a few iterations. Moreover, HOMF can then directly apply spectral clustering, to achieve good fitting performance. Extensive experimental results show that HOMF outperforms several state-of-the-art model fitting methods on both synthetic data and real images, especially in sampling efficiency and in handling data with severe outliers.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05350v1
PDF	https://arxiv.org/pdf/2002.05350v1.pdf
PWC	https://paperswithcode.com/paper/hypergraph-optimization-for-multi-structural
Repo
Framework

Improving Neural Network Learning Through Dual Variable Learning Rates


Title	Improving Neural Network Learning Through Dual Variable Learning Rates
Authors	Elizabeth Liner, Risto Miikkulainen
Abstract	This paper introduces and evaluates a novel training method for neural networks: Dual Variable Learning Rates (DVLR). Building on techniques and insights from behavioral psychology, the dual learning rates are used to emphasize correct and incorrect responses differently, thereby making the feedback to the network more specific. Further, the learning rates are varied as a function of the network’s performance, thereby making it more efficient. DVLR was implemented on both a simple feedforward neural network and a convolutional neural network. Both networks are trained faster and achieve an increased accuracy on the MNIST and CIFAR-10 domains demonstrating that DVLR is a promising, psychologically motivated technique for training neural network models.
Tasks
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03428v2
PDF	https://arxiv.org/pdf/2002.03428v2.pdf
PWC	https://paperswithcode.com/paper/improving-neural-network-learning-through
Repo
Framework

CLAI: A Platform for AI Skills on the Command Line


Title	CLAI: A Platform for AI Skills on the Command Line
Authors	Mayank Agarwal, Jorge J. Barroso, Tathagata Chakraborti, Eli M. Dow, Kshitij Fadnis, Borja Godoy, Kartik Talamadupula
Abstract	This paper reports on the open source project CLAI (Command Line AI), aimed at bringing the power of AI to the command line interface. The platform sets up the CLI as a new environment for AI researchers to conquer by surfacing the command line as a generic environment that researchers can interface to using a simple sense-act API much like the traditional AI agent architecture. In this paper, we discuss the design and implementation of the platform in detail, through illustrative use cases of new end user interaction patterns enabled by this design, and through quantitative evaluation of the system footprint of a CLAI-enabled terminal. We also report on some early user feedback on its features from an internal survey.
Tasks
Published	2020-01-31
URL	https://arxiv.org/abs/2002.00762v1
PDF	https://arxiv.org/pdf/2002.00762v1.pdf
PWC	https://paperswithcode.com/paper/clai-a-platform-for-ai-skills-on-the-command
Repo
Framework

Towards Evaluating Gaussian Blurring in Perceptual Hashing as a Facial Image Filter


Title	Towards Evaluating Gaussian Blurring in Perceptual Hashing as a Facial Image Filter
Authors	Yigit Alparslan, Mannika Kshettry, Louis Kratz
Abstract	With the growth in social media, there is a huge amount of images of faces available on the internet. Often, people use other people’s pictures on their own profile. Perceptual hashing is often used to detect whether two images are identical. Therefore, it can be used to detect whether people are misusing others’ pictures. In perceptual hashing, a hash is calculated for a given image, and a new test image is mapped to one of the existing hashes if duplicate features are present. Therefore, it can be used as an image filter to flag banned image content or adversarial attacks –which are modifications that are made on purpose to deceive the filter– even though the content might be changed to deceive the filters. For this reason, it is critical for perceptual hashing to be robust enough to take transformations such as resizing, cropping, and slight pixel modifications into account. In this paper, we would like to propose to experiment with effect of gaussian blurring in perceptual hashing for detecting misuse of personal images specifically for face images. We hypothesize that use of gaussian blurring on the image before calculating its hash will increase the accuracy of our filter that detects adversarial attacks which consist of image cropping, adding text annotation, and image rotation.
Tasks	Image Cropping
Published	2020-02-01
URL	https://arxiv.org/abs/2002.00140v1
PDF	https://arxiv.org/pdf/2002.00140v1.pdf
PWC	https://paperswithcode.com/paper/towards-evaluating-gaussian-blurring-in
Repo
Framework

Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy


Title	Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy
Authors	Ben Shneiderman
Abstract	Well-designed technologies that offer high levels of human control and high levels of computer automation can increase human performance, leading to wider adoption. The Human-Centered Artificial Intelligence (HCAI) framework clarifies how to (1) design for high levels of human control and high levels of computer automation so as to increase human performance, (2) understand the situations in which full human control or full computer control are necessary, and (3) avoid the dangers of excessive human control or excessive computer control. The methods of HCAI are more likely to produce designs that are Reliable, Safe & Trustworthy (RST). Achieving these goals will dramatically increase human performance, while supporting human self-efficacy, mastery, creativity, and responsibility.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04087v2
PDF	https://arxiv.org/pdf/2002.04087v2.pdf
PWC	https://paperswithcode.com/paper/human-centered-artificial-intelligence
Repo
Framework