# Paper Group ANR 3

EM Algorithm is Sample-Optimal for Learning Mixtures of Well-Separated Gaussians. Backdoor Attacks against Transfer Learning with Pre-trained Deep Learning Models. Transport Gaussian Processes for Regression. LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition. Some convergent results for Backtracking Gradient Descent me …

#### EM Algorithm is Sample-Optimal for Learning Mixtures of Well-Separated Gaussians

Title | EM Algorithm is Sample-Optimal for Learning Mixtures of Well-Separated Gaussians |

Authors | Jeongyeol Kwon, Constantine Caramanis |

Abstract | We consider the problem of spherical Gaussian Mixture models with $k \geq 3$ components when the components are well separated. A fundamental previous result established that separation of $\Omega(\sqrt{\log k})$ is necessary and sufficient for identifiability of the parameters with polynomial sample complexity (Regev and Vijayaraghavan, 2017). We show that $\tilde{O}(kd/\epsilon^2)$ samples suffice, closing the gap from polynomial to linear, and thus giving the first sample-optimal upper bound for the parameter estimation of well-separated Gaussian mixtures (up to logarithmic factors). We accomplish this by proving a new result for the Expectation-Maximization (EM) algorithm: we show that EM converges locally, under separation $\Omega(\sqrt{\log k})$. The previous best-known guarantee required $\Omega(\sqrt{k})$ separation (Yan, et al., 2017). Unlike prior work, our results do not assume or use prior knowledge of the (potentially different) mixing weights or variances of the Gaussian components. Furthermore, our results show that the finite-sample error of EM does not depend on non-universal quantities such as pairwise distances between means of Gaussian components. |

Tasks | |

Published | 2020-02-02 |

URL | https://arxiv.org/abs/2002.00329v1 |

https://arxiv.org/pdf/2002.00329v1.pdf | |

PWC | https://paperswithcode.com/paper/em-algorithm-is-sample-optimal-for-learning |

Repo | |

Framework | |

#### Backdoor Attacks against Transfer Learning with Pre-trained Deep Learning Models

Title | Backdoor Attacks against Transfer Learning with Pre-trained Deep Learning Models |

Authors | Shuo Wang, Surya Nepal, Carsten Rudolph, Marthie Grobler, Shangyu Chen, Tianle Chen |

Abstract | Transfer learning provides an effective solution for feasibly and fast customize accurate \textit{Student} models, by transferring the learned knowledge of pre-trained \textit{Teacher} models over large datasets via fine-tuning. Many pre-trained Teacher models used in transfer learning are publicly available and maintained by public platforms, increasing their vulnerability to backdoor attacks. In this paper, we demonstrate a backdoor threat to transfer learning tasks on both image and time-series data leveraging the knowledge of publicly accessible Teacher models, aimed at defeating three commonly-adopted defenses: \textit{pruning-based}, \textit{retraining-based} and \textit{input pre-processing-based defenses}. Specifically, (A) ranking-based selection mechanism to speed up the backdoor trigger generation and perturbation process while defeating \textit{pruning-based} and/or \textit{retraining-based defenses}. (B) autoencoder-powered trigger generation is proposed to produce a robust trigger that can defeat the \textit{input pre-processing-based defense}, while guaranteeing that selected neuron(s) can be significantly activated. (C) defense-aware retraining to generate the manipulated model using reverse-engineered model inputs. We launch effective misclassification attacks on Student models over real-world images, brain Magnetic Resonance Imaging (MRI) data and Electrocardiography (ECG) learning systems. The experiments reveal that our enhanced attack can maintain the $98.4%$ and $97.2%$ classification accuracy as the genuine model on clean image and time series inputs respectively while improving $27.9%-100%$ and $27.1%-56.1%$ attack success rate on trojaned image and time series inputs respectively in the presence of pruning-based and/or retraining-based defenses. |

Tasks | EEG, Electrocardiography (ECG), Time Series, Transfer Learning |

Published | 2020-01-10 |

URL | https://arxiv.org/abs/2001.03274v2 |

https://arxiv.org/pdf/2001.03274v2.pdf | |

PWC | https://paperswithcode.com/paper/backdoor-attacks-against-transfer-learning |

Repo | |

Framework | |

#### Transport Gaussian Processes for Regression

Title | Transport Gaussian Processes for Regression |

Authors | Gonzalo Rios |

Abstract | Gaussian process (GP) priors are non-parametric generative models with appealing modelling properties for Bayesian inference: they can model non-linear relationships through noisy observations, have closed-form expressions for training and inference, and are governed by interpretable hyperparameters. However, GP models rely on Gaussianity, an assumption that does not hold in several real-world scenarios, e.g., when observations are bounded or have extreme-value dependencies, a natural phenomenon in physics, finance and social sciences. Although beyond-Gaussian stochastic processes have caught the attention of the GP community, a principled definition and rigorous treatment is still lacking. In this regard, we propose a methodology to construct stochastic processes, which include GPs, warped GPs, Student-t processes and several others under a single unified approach. We also provide formulas and algorithms for training and inference of the proposed models in the regression problem. Our approach is inspired by layers-based models, where each proposed layer changes a specific property over the generated stochastic process. That, in turn, allows us to push-forward a standard Gaussian white noise prior towards other more expressive stochastic processes, for which marginals and copulas need not be Gaussian, while retaining the appealing properties of GPs. We validate the proposed model through experiments with real-world data. |

Tasks | Bayesian Inference, Gaussian Processes |

Published | 2020-01-30 |

URL | https://arxiv.org/abs/2001.11473v1 |

https://arxiv.org/pdf/2001.11473v1.pdf | |

PWC | https://paperswithcode.com/paper/transport-gaussian-processes-for-regression |

Repo | |

Framework | |

#### LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition

Title | LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition |

Authors | Mingyi Liu, Zhiying Tu, Zhongjie Wang, Xiaofei Xu |

Abstract | In recent years, deep learning has achieved great success in many natural language processing tasks including named entity recognition. The shortcoming is that a large amount of manually-annotated data is usually required. Previous studies have demonstrated that both transfer learning and active learning could elaborately reduce the cost of data annotation in terms of their corresponding advantages, but there is still plenty of room for improvement. We assume that the convergence of the two methods can complement with each other, so that the model could be trained more accurately with less labelled data, and active learning method could enhance transfer learning method to accurately select the minimum data samples for iterative learning. However, in real applications we found this approach is challenging because the sample selection of traditional active learning strategy merely depends on the final probability value of its model output, and this makes it quite difficult to evaluate the quality of the selected data samples. In this paper, we first examine traditional active learning strategies in a specific case of BERT-CRF that has been widely used in named entity recognition. Then we propose an uncertainty-based active learning strategy called Lowest Token Probability (LTP) which considers not only the final output but also the intermediate results. We test LTP on multiple datasets, and the experiments show that LTP performs better than traditional strategies (incluing LC and NLC) on both token-level $F_1$ and sentence-level accuracy, especially in complex imbalanced datasets. |

Tasks | Active Learning, Named Entity Recognition, Transfer Learning |

Published | 2020-01-08 |

URL | https://arxiv.org/abs/2001.02524v1 |

https://arxiv.org/pdf/2001.02524v1.pdf | |

PWC | https://paperswithcode.com/paper/ltp-a-new-active-learning-strategy-for-bert |

Repo | |

Framework | |

#### Some convergent results for Backtracking Gradient Descent method on Banach spaces

Title | Some convergent results for Backtracking Gradient Descent method on Banach spaces |

Authors | Tuyen Trung Truong |

Abstract | Our main result concerns the following condition: {\bf Condition C.} Let $X$ be a Banach space. A $C^1$ function $f:X\rightarrow \mathbb{R}$ satisfies Condition C if whenever ${x_n}$ weakly converges to $x$ and $\lim _{n\rightarrow\infty}\nabla f(x_n)=0$, then $\nabla f(x)=0$. We assume that there is given a canonical isomorphism between $X$ and its dual $X^*$, for example when $X$ is a Hilbert space. {\bf Theorem.} Let $X$ be a reflexive, complete Banach space and $f:X\rightarrow \mathbb{R}$ be a $C^2$ function which satisfies Condition C. Moreover, we assume that for every bounded set $S\subset X$, then $\sup _{x\in S}\nabla ^2f(x)<\infty$. We choose a random point $x_0\in X$ and construct by the Local Backtracking GD procedure (which depends on $3$ hyper-parameters $\alpha ,\beta ,\delta 0$, see later for details) the sequence $x{n+1}=x_n-\delta (x_n)\nabla f(x_n)$. Then we have: 1) Every cluster point of ${x_n}$, in the {\bf weak} topology, is a critical point of $f$. 2) Either $\lim _{n\rightarrow\infty}f(x_n)=-\infty$ or $\lim {n\rightarrow\infty}x{n+1}-x_n=0$. 3) Here we work with the weak topology. Let $\mathcal{C}$ be the set of critical points of $f$. Assume that $\mathcal{C}$ has a bounded component $A$. Let $\mathcal{B}$ be the set of cluster points of ${x_n}$. If $\mathcal{B}\cap A\not= \emptyset$, then $\mathcal{B}\subset A$ and $\mathcal{B}$ is connected. 4) Assume that $X$ is separable. Then for generic choices of $\alpha ,\beta ,\delta _0$ and the initial point $x_0$, if the sequence ${x_n}$ converges - in the {\bf weak} topology, then the limit point cannot be a saddle point. |

Tasks | |

Published | 2020-01-16 |

URL | https://arxiv.org/abs/2001.05768v2 |

https://arxiv.org/pdf/2001.05768v2.pdf | |

PWC | https://paperswithcode.com/paper/some-convergent-results-for-backtracking |

Repo | |

Framework | |

#### SUR-FeatNet: Predicting the Satisfied User Ratio Curvefor Image Compression with Deep Feature Learning

Title | SUR-FeatNet: Predicting the Satisfied User Ratio Curvefor Image Compression with Deep Feature Learning |

Authors | Hanhe Lin, Vlad Hosu, Chunling Fan, Yun Zhang, Yuchen Mu, Raouf Hamzaoui, Dietmar Saupe |

Abstract | The Satisfied User Ratio (SUR) curve for a lossy image compression scheme, e.g., JPEG, gives the distribution function of the Just Noticeable Difference (JND), the smallest distortion level that can be perceived by a subject when a reference image is compared to a distorted one. A sequence of JNDs can be defined with a suitable successive choice of reference images. We propose the first deep learning approach to predict SUR curves. We show how to exploit maximum likelihood estimation and the Kolmogorov-Smirnov test to select a suitable parametric model for the distribution function. We then use deep feature learning to predict samples of the SUR curve and apply the method of least squares to fit the parametric model to the predicted samples. Our deep learning approach relies on a Siamese Convolutional Neural Networks (CNN), transfer learning, and deep feature learning, using pairs consisting of a reference image and compressed image for training. Experiments on the MCL-JCI dataset showed state-of-the-art performance. For example, the mean Bhattacharyya distances between the predicted and ground truth first, second, and third JND distributions were 0.0810, 0.0702, and 0.0522, respectively, and the corresponding average absolute differences of the peak signal-to-noise ratio at the median of the distributions were 0.56, 0.65, and 0.53 dB. |

Tasks | Image Compression, Transfer Learning |

Published | 2020-01-07 |

URL | https://arxiv.org/abs/2001.02002v1 |

https://arxiv.org/pdf/2001.02002v1.pdf | |

PWC | https://paperswithcode.com/paper/sur-featnet-predicting-the-satisfied-user |

Repo | |

Framework | |

#### Convergence Guarantees for Gaussian Process Approximations Under Several Observation Models

Title | Convergence Guarantees for Gaussian Process Approximations Under Several Observation Models |

Authors | George Wynne, François-Xavier Briol, Mark Girolami |

Abstract | Gaussian processes are ubiquitous in statistical analysis, machine learning and applied mathematics. They provide a flexible modelling framework for approximating functions, whilst simultaneously quantifying our uncertainty about this task in a computationally tractable manner. An important question is whether these approximations will be accurate, and if so how accurate, given our various modelling choices and the difficulty of the problem. This is of practical relevance, since the answer informs our choice of model and sampling distribution for a given application. Our paper provides novel approximation guarantees for Gaussian process models based on covariance functions with finite smoothness, such as the Mat'ern and Wendland covariance functions. They are derived from a sampling inequality which facilitates a systematic approach to obtaining upper bounds on Sobolev norms in terms of properties of the design used to collect data. This approach allows us to refine some existing results which apply in the misspecified smoothness setting and which allow for adaptive selection of hyperparameters. However, the main novelty in this paper is that our results cover a wide range of observation models including interpolation, approximation with deterministic corruption and regression with Gaussian noise. |

Tasks | Gaussian Processes |

Published | 2020-01-29 |

URL | https://arxiv.org/abs/2001.10818v1 |

https://arxiv.org/pdf/2001.10818v1.pdf | |

PWC | https://paperswithcode.com/paper/convergence-guarantees-for-gaussian-process |

Repo | |

Framework | |

#### Conceptual Game Expansion

Title | Conceptual Game Expansion |

Authors | Matthew Guzdial, Mark Riedl |

Abstract | Automated game design is the problem of automatically producing games through computational processes. Traditionally these methods have relied on the authoring of search spaces by a designer, defining the space of all possible games for the system to author. In this paper we instead learn representations of existing games and use these to approximate a search space of novel games. In a human subject study we demonstrate that these novel games are indistinguishable from human games for certain measures. |

Tasks | |

Published | 2020-02-22 |

URL | https://arxiv.org/abs/2002.09636v1 |

https://arxiv.org/pdf/2002.09636v1.pdf | |

PWC | https://paperswithcode.com/paper/conceptual-game-expansion |

Repo | |

Framework | |

#### An Outer-approximation Guided Optimization Approach for Constrained Neural Network Inverse Problems

Title | An Outer-approximation Guided Optimization Approach for Constrained Neural Network Inverse Problems |

Authors | Myun-Seok Cheon |

Abstract | This paper discusses an outer-approximation guided optimization method for constrained neural network inverse problems with rectified linear units. The constrained neural network inverse problems refer to an optimization problem to find the best set of input values of a given trained neural network in order to produce a predefined desired output in presence of constraints on input values. This paper analyzes the characteristics of optimal solutions of neural network inverse problems with rectified activation units and proposes an outer-approximation algorithm by exploiting their characteristics. The proposed outer-approximation guided optimization comprises primal and dual phases. The primal phase incorporates neighbor curvatures with neighbor outer-approximations to expedite the process. The dual phase identifies and utilizes the structure of local convex regions to improve the convergence to a local optimal solution. At last, computation experiments demonstrate the superiority of the proposed algorithm compared to a projected gradient method. |

Tasks | |

Published | 2020-02-24 |

URL | https://arxiv.org/abs/2002.10404v1 |

https://arxiv.org/pdf/2002.10404v1.pdf | |

PWC | https://paperswithcode.com/paper/an-outer-approximation-guided-optimization |

Repo | |

Framework | |

#### DNNs as Layers of Cooperating Classifiers

Title | DNNs as Layers of Cooperating Classifiers |

Authors | Marelie H. Davel, Marthinus W. Theunissen, Arnold M. Pretorius, Etienne Barnard |

Abstract | A robust theoretical framework that can describe and predict the generalization ability of deep neural networks (DNNs) in general circumstances remains elusive. Classical attempts have produced complexity metrics that rely heavily on global measures of compactness and capacity with little investigation into the effects of sub-component collaboration. We demonstrate intriguing regularities in the activation patterns of the hidden nodes within fully-connected feedforward networks. By tracing the origin of these patterns, we show how such networks can be viewed as the combination of two information processing systems: one continuous and one discrete. We describe how these two systems arise naturally from the gradient-based optimization process, and demonstrate the classification ability of the two systems, individually and in collaboration. This perspective on DNN classification offers a novel way to think about generalization, in which different subsets of the training data are used to train distinct classifiers; those classifiers are then combined to perform the classification task, and their consistency is crucial for accurate classification. |

Tasks | |

Published | 2020-01-17 |

URL | https://arxiv.org/abs/2001.06178v1 |

https://arxiv.org/pdf/2001.06178v1.pdf | |

PWC | https://paperswithcode.com/paper/dnns-as-layers-of-cooperating-classifiers |

Repo | |

Framework | |

#### Privately Learning Markov Random Fields

Title | Privately Learning Markov Random Fields |

Authors | Huanyu Zhang, Gautam Kamath, Janardhan Kulkarni, Zhiwei Steven Wu |

Abstract | We consider the problem of learning Markov Random Fields (including the prototypical example, the Ising model) under the constraint of differential privacy. Our learning goals include both structure learning, where we try to estimate the underlying graph structure of the model, as well as the harder goal of parameter learning, in which we additionally estimate the parameter on each edge. We provide algorithms and lower bounds for both problems under a variety of privacy constraints – namely pure, concentrated, and approximate differential privacy. While non-privately, both learning goals enjoy roughly the same complexity, we show that this is not the case under differential privacy. In particular, only structure learning under approximate differential privacy maintains the non-private logarithmic dependence on the dimensionality of the data, while a change in either the learning goal or the privacy notion would necessitate a polynomial dependence. As a result, we show that the privacy constraint imposes a strong separation between these two learning problems in the high-dimensional data regime. |

Tasks | |

Published | 2020-02-21 |

URL | https://arxiv.org/abs/2002.09463v1 |

https://arxiv.org/pdf/2002.09463v1.pdf | |

PWC | https://paperswithcode.com/paper/privately-learning-markov-random-fields |

Repo | |

Framework | |

#### DAN: Dual-View Representation Learning for Adapting Stance Classifiers to New Domains

Title | DAN: Dual-View Representation Learning for Adapting Stance Classifiers to New Domains |

Authors | Chang Xu, Cecile Paris, Surya Nepal, Ross Sparks, Chong Long, Yafang Wang |

Abstract | We address the issue of having a limited number of annotations for stance classification in a new domain, by adapting out-of-domain classifiers with domain adaptation. Existing approaches often align different domains in a single, global feature space (or view), which may fail to fully capture the richness of the languages used for expressing stances, leading to reduced adaptability on stance data. In this paper, we identify two major types of stance expressions that are linguistically distinct, and we propose a tailored dual-view adaptation network (DAN) to adapt these expressions across domains. The proposed model first learns a separate view for domain transfer in each expression channel and then selects the best adapted parts of both views for optimal transfer. We find that the learned view features can be more easily aligned and more stance-discriminative in either or both views, leading to more transferable overall features after combining the views. Results from extensive experiments show that our method can enhance the state-of-the-art single-view methods in matching stance data across different domains, and that it consistently improves those methods on various adaptation tasks. |

Tasks | Domain Adaptation, Representation Learning |

Published | 2020-03-13 |

URL | https://arxiv.org/abs/2003.06514v1 |

https://arxiv.org/pdf/2003.06514v1.pdf | |

PWC | https://paperswithcode.com/paper/dan-dual-view-representation-learning-for |

Repo | |

Framework | |

#### Character Segmentation in Asian Collector’s Seal Imprints: An Attempt to Retrieval Based on Ancient Character Typeface

Title | Character Segmentation in Asian Collector’s Seal Imprints: An Attempt to Retrieval Based on Ancient Character Typeface |

Authors | Kangying Li, Biligsaikhan Batjargal, Akira Maeda |

Abstract | Collector’s seals provide important clues about the ownership of a book. They contain much information pertaining to the essential elements of ancient materials and also show the details of possession, its relation to the book, the identity of the collectors and their social status and wealth, amongst others. Asian collectors have typically used artistic ancient characters rather than modern ones to make their seals. In addition to the owner’s name, several other words are used to express more profound meanings. A system that automatically recognizes these characters can help enthusiasts and professionals better understand the background information of these seals. However, there is a lack of training data and labelled images, as samples of some seals are scarce and most of them are degraded images. It is necessary to find new ways to make full use of such scarce data. While these data are available online, they do not contain information on the characters’position. The goal of this research is to provide retrieval tools assist in obtaining more information from Asian collector’s seals imprints without consuming a lot of computational resources. In this paper, a character segmentation method is proposed to predict the candidate characters’area without any labelled training data that contain character coordinate information. A retrieval-based recognition system that focuses on a single character is also proposed to support seal retrieval and matching. The experimental results demonstrate that the proposed character segmentation method performs well on Asian collector’s seals, with 92% of the test data being correctly segmented. |

Tasks | |

Published | 2020-02-13 |

URL | https://arxiv.org/abs/2003.00831v1 |

https://arxiv.org/pdf/2003.00831v1.pdf | |

PWC | https://paperswithcode.com/paper/character-segmentation-in-asian-collectors |

Repo | |

Framework | |

#### Exploration Based Language Learning for Text-Based Games

Title | Exploration Based Language Learning for Text-Based Games |

Authors | Andrea Madotto, Mahdi Namazifar, Joost Huizinga, Piero Molino, Adrien Ecoffet, Huaixiu Zheng, Alexandros Papangelis, Dian Yu, Chandra Khatri, Gokhan Tur |

Abstract | This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, and language generation by artificial agents. Moreover, they provide a learning environment in which these skills can be acquired through interactions with an environment rather than using fixed corpora. One aspect that makes these games particularly challenging for learning agents is the combinatorially large action space. Existing methods for solving text-based games are limited to games that are either very simple or have an action space restricted to a predetermined set of admissible actions. In this work, we propose to use the exploration approach of Go-Explore for solving text-based games. More specifically, in an initial exploration phase, we first extract trajectories with high rewards, after which we train a policy to solve the game by imitating these trajectories. Our experiments show that this approach outperforms existing solutions in solving text-based games, and it is more sample efficient in terms of the number of interactions with the environment. Moreover, we show that the learned policy can generalize better than existing solutions to unseen games without using any restriction on the action space. |

Tasks | Imitation Learning, Text Generation |

Published | 2020-01-24 |

URL | https://arxiv.org/abs/2001.08868v1 |

https://arxiv.org/pdf/2001.08868v1.pdf | |

PWC | https://paperswithcode.com/paper/exploration-based-language-learning-for-text-1 |

Repo | |

Framework | |

#### A Class of Linear Programs Solvable by Coordinate-wise Minimization

Title | A Class of Linear Programs Solvable by Coordinate-wise Minimization |

Authors | Tomáš Dlask, Tomáš Werner |

Abstract | Coordinate-wise minimization is a simple popular method for large-scale optimization. Unfortunately, for general (non-differentiable) convex problems it may not find global minima. We present a class of linear programs that coordinate-wise minimization solves exactly. We show that dual LP relaxations of several well-known combinatorial optimization problems are in this class and the method finds a global minimum with sufficient accuracy in reasonable runtimes. Moreover, for extensions of these problems that no longer are in this class the method yields reasonably good suboptima. Though the presented LP relaxations can be solved by more efficient methods (such as max-flow), our results are theoretically non-trivial and can lead to new large-scale optimization algorithms in the future. |

Tasks | Combinatorial Optimization |

Published | 2020-01-28 |

URL | https://arxiv.org/abs/2001.10467v4 |

https://arxiv.org/pdf/2001.10467v4.pdf | |

PWC | https://paperswithcode.com/paper/a-class-of-linear-programs-solvable-by |

Repo | |

Framework | |