April 2, 2020

2939 words 14 mins read

Paper Group ANR 282

Few-Shot Learning with Geometric Constraints. Reinforcement Quantum Annealing: A Quantum-Assisted Learning Automata Approach. Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks. Crowdsourced Labeling for Worker-Task Specialization Block Model. Feature-Robustness, Flatness and Generalization Error for Deep Neural Networks. When R …

Few-Shot Learning with Geometric Constraints


Title	Few-Shot Learning with Geometric Constraints
Authors	Hong-Gyu Jung, Seong-Whan Lee
Abstract	In this article, we consider the problem of few-shot learning for classification. We assume a network trained for base categories with a large number of training examples, and we aim to add novel categories to it that have only a few, e.g., one or five, training examples. This is a challenging scenario because: 1) high performance is required in both the base and novel categories; and 2) training the network for the new categories with a few training examples can contaminate the feature space trained well for the base categories. To address these challenges, we propose two geometric constraints to fine-tune the network with a few training examples. The first constraint enables features of the novel categories to cluster near the category weights, and the second maintains the weights of the novel categories far from the weights of the base categories. By applying the proposed constraints, we extract discriminative features for the novel categories while preserving the feature space learned for the base categories. Using public data sets for few-shot learning that are subsets of ImageNet, we demonstrate that the proposed method outperforms prevalent methods by a large margin.
Tasks	Few-Shot Learning
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09151v1
PDF	https://arxiv.org/pdf/2003.09151v1.pdf
PWC	https://paperswithcode.com/paper/few-shot-learning-with-geometric-constraints
Repo
Framework

Reinforcement Quantum Annealing: A Quantum-Assisted Learning Automata Approach


Title	Reinforcement Quantum Annealing: A Quantum-Assisted Learning Automata Approach
Authors	Ramin Ayanzadeh, Milton Halem, Tim Finin
Abstract	We introduce the reinforcement quantum annealing (RQA) scheme in which an intelligent agent interacts with a quantum annealer that plays the stochastic environment role of learning automata and tries to iteratively find better Ising Hamiltonians for the given problem of interest. As a proof-of-concept, we propose a novel approach for reducing the NP-complete problem of Boolean satisfiability (SAT) to minimizing Ising Hamiltonians and show how to apply the RQA for increasing the probability of finding the global optimum. Our experimental results on two different benchmark SAT problems (namely factoring pseudo-prime numbers and random SAT with phase transitions), using a D-Wave 2000Q quantum processor, demonstrated that RQA finds notably better solutions with fewer samples, compared to state-of-the-art techniques in the realm of quantum annealing.
Tasks
Published	2020-01-01
URL	https://arxiv.org/abs/2001.00234v1
PDF	https://arxiv.org/pdf/2001.00234v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-quantum-annealing-a-quantum
Repo
Framework

Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks


Title	Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks
Authors	Agustinus Kristiadi, Matthias Hein, Philipp Hennig
Abstract	The point estimates of ReLU classification networks—arguably the most widely used neural network architecture—have been shown to yield arbitrarily high confidence far away from the training data. This architecture, in conjunction with a maximum a posteriori estimation scheme, is thus not calibrated nor robust. Approximate Bayesian inference has been empirically demonstrated to improve predictive uncertainty in neural networks, although the theoretical analysis of such Bayesian approximations is limited. We theoretically analyze approximate Gaussian posterior distributions on the weights of ReLU networks and show that they fix the overconfidence problem. Furthermore, we show that even a simplistic, thus cheap, Bayesian approximation, also fixes these issues. This indicates that a sufficient condition for a calibrated uncertainty on a ReLU network is ``to be a bit Bayesian’'. These theoretical results validate the usage of last-layer Bayesian approximation and motivate a range of a fidelity-cost trade-off. We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations. \|
Tasks	Bayesian Inference
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10118v1
PDF	https://arxiv.org/pdf/2002.10118v1.pdf
PWC	https://paperswithcode.com/paper/being-bayesian-even-just-a-bit-fixes
Repo
Framework

Crowdsourced Labeling for Worker-Task Specialization Block Model


Title	Crowdsourced Labeling for Worker-Task Specialization Block Model
Authors	Doyeon Kim, Hye Won Chung
Abstract	We consider crowdsourced labeling under a worker-task specialization block model, where each worker and task is associated with one particular type among a finite set of types and a worker provides a more reliable answer to tasks of the matched type than to the tasks of unmatched types. We design an inference algorithm that recovers binary task labels (up to any given recovery accuracy) by using worker clustering and weighted majority voting. The designed inference algorithm does not require any information about worker types, task types as well as worker reliability parameters, and achieve any targeted recovery accuracy with the best known performance (minimum number of queries per task) for any parameter regimes.
Tasks
Published	2020-03-21
URL	https://arxiv.org/abs/2004.00101v1
PDF	https://arxiv.org/pdf/2004.00101v1.pdf
PWC	https://paperswithcode.com/paper/crowdsourced-labeling-for-worker-task
Repo
Framework

Feature-Robustness, Flatness and Generalization Error for Deep Neural Networks


Title	Feature-Robustness, Flatness and Generalization Error for Deep Neural Networks
Authors	Henning Petzka, Linara Adilova, Michael Kamp, Cristian Sminchisescu
Abstract	The performance of deep neural networks is often attributed to their automated, task-related feature construction. It remains an open question, though, why this leads to solutions with good generalization, even in cases where the number of parameters is larger than the number of samples. Back in the 90s, Hochreiter and Schmidhuber observed that flatness of the loss surface around a local minimum correlates with low generalization error. For several flatness measures, this correlation has been empirically validated. However, it has recently been shown that existing measures of flatness cannot theoretically be related to generalization: if a network uses ReLU activations, the network function can be reparameterized without changing its output in such a way that flatness is changed almost arbitrarily. This paper proposes a natural modification of existing flatness measures that results in invariance to reparameterization. The proposed measures imply a robustness of the network to changes in the input and the hidden layers. Connecting this feature robustness to generalization leads to a generalized definition of the representativeness of data. With this, the generalization error of a model trained on representative data can be bounded by its feature robustness which depends on our novel flatness measure.
Tasks
Published	2020-01-03
URL	https://arxiv.org/abs/2001.00939v2
PDF	https://arxiv.org/pdf/2001.00939v2.pdf
PWC	https://paperswithcode.com/paper/feature-robustness-flatness-and-1
Repo
Framework

When Relation Networks meet GANs: Relation GANs with Triplet Loss


Title	When Relation Networks meet GANs: Relation GANs with Triplet Loss
Authors	Runmin Wu, Kunyao Zhang, Lijun Wang, Yue Wang, Pingping Zhang, Huchuan Lu, Yizhou Yu
Abstract	Though recent research has achieved remarkable progress in generating realistic images with generative adversarial networks (GANs), the lack of training stability is still a lingering concern of most GANs, especially on high-resolution inputs and complex datasets. Since the randomly generated distribution can hardly overlap with the real distribution, training GANs often suffers from the gradient vanishing problem. A number of approaches have been proposed to address this issue by constraining the discriminator’s capabilities using empirical techniques, like weight clipping, gradient penalty, spectral normalization etc. In this paper, we provide a more principled approach as an alternative solution to this issue. Instead of training the discriminator to distinguish real and fake input samples, we investigate the relationship between paired samples by training the discriminator to separate paired samples from the same distribution and those from different distributions. To this end, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Extensive experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks including unconditional and conditional image generation and image translation.
Tasks	Conditional Image Generation, Image Generation
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10174v3
PDF	https://arxiv.org/pdf/2002.10174v3.pdf
PWC	https://paperswithcode.com/paper/when-relation-networks-meet-gans-relation
Repo
Framework

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning


Title	Maxmin Q-learning: Controlling the Estimation Bias of Q-learning
Authors	Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White
Abstract	Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias interacts with performance, and the extent to which existing algorithms mitigate bias. In this paper, we 1) highlight that the effect of overestimation bias on learning efficiency is environment-dependent; 2) propose a generalization of Q-learning, called \emph{Maxmin Q-learning}, which provides a parameter to flexibly control bias; 3) show theoretically that there exists a parameter choice for Maxmin Q-learning that leads to unbiased estimation with a lower approximation variance than Q-learning; and 4) prove the convergence of our algorithm in the tabular case, as well as convergence of several previous Q-learning variants, using a novel Generalized Q-learning framework. We empirically verify that our algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.
Tasks	Q-Learning
Published	2020-02-16
URL	https://arxiv.org/abs/2002.06487v1
PDF	https://arxiv.org/pdf/2002.06487v1.pdf
PWC	https://paperswithcode.com/paper/maxmin-q-learning-controlling-the-estimation-1
Repo
Framework

Latent Variable Modelling with Hyperbolic Normalizing Flows


Title	Latent Variable Modelling with Hyperbolic Normalizing Flows
Authors	Avishek Joey Bose, Ariella Smofsky, Renjie Liao, Prakash Panangaden, William L. Hamilton
Abstract	The choice of approximate posterior distributions plays a central role in stochastic variational inference (SVI). One effective solution is the use of normalizing flows \cut{defined on Euclidean spaces} to construct flexible posterior distributions. However, one key limitation of existing normalizing flows is that they are restricted to the Euclidean space and are ill-equipped to model data with an underlying hierarchical structure. To address this fundamental limitation, we present the first extension of normalizing flows to hyperbolic spaces. We first elevate normalizing flows to hyperbolic spaces using coupling transforms defined on the tangent bundle, termed Tangent Coupling ($\mathcal{TC}$). We further introduce Wrapped Hyperboloid Coupling ($\mathcal{W}\mathbb{H}C$), a fully invertible and learnable transformation that explicitly utilizes the geometric structure of hyperbolic spaces, allowing for expressive posteriors while being efficient to sample from. We demonstrate the efficacy of our novel normalizing flow over hyperbolic VAEs and Euclidean normalizing flows. Our approach achieves improved performance on density estimation, as well as reconstruction of real-world graph data, which exhibit a hierarchical structure. Finally, we show that our approach can be used to power a generative model over hierarchical data using hyperbolic latent variables.
Tasks	Density Estimation
Published	2020-02-15
URL	https://arxiv.org/abs/2002.06336v2
PDF	https://arxiv.org/pdf/2002.06336v2.pdf
PWC	https://paperswithcode.com/paper/latent-variable-modelling-with-hyperbolic
Repo
Framework

Trajectory Grouping with Curvature Regularization for Tubular Structure Tracking


Title	Trajectory Grouping with Curvature Regularization for Tubular Structure Tracking
Authors	Li Liu, Jiong Zhang, Da Chen, HUazhong Shu, Laurent D. Cohen
Abstract	Tubular structure tracking is an important and difficult problem in the fields of computer vision and medical image analysis. The minimal path models have exhibited its power in tracing tubular structures, by which a centerline can be naturally treated as a minimal path with a suitable geodesic metric. However, existing minimal path-based tubular structure tracing models still suffer from difficulty like the shortcuts and short branches combination problems, especially when dealing with the images with a complicated background. We introduce a new minima path-based model for minimally interactive tubular structure centerline extraction in conjunction with a perceptual grouping scheme. We take into account the prescribed tubular trajectories and the relevant curvature-penalized geodesic distances for minimal paths extraction in a graph-based optimization way. Experimental results on both synthetic and real images prove that the proposed model indeed obtains outperformance comparing to state-of-the-art minimal path-based tubular structure tracing algorithms.
Tasks
Published	2020-03-08
URL	https://arxiv.org/abs/2003.03710v1
PDF	https://arxiv.org/pdf/2003.03710v1.pdf
PWC	https://paperswithcode.com/paper/trajectory-grouping-with-curvature
Repo
Framework

Nonparametric Estimation in the Dynamic Bradley-Terry Model


Title	Nonparametric Estimation in the Dynamic Bradley-Terry Model
Authors	Heejong Bong, Wanshan Li, Shamindra Shrotriya, Alessandro Rinaldo
Abstract	We propose a time-varying generalization of the Bradley-Terry model that allows for nonparametric modeling of dynamic global rankings of distinct teams. We develop a novel estimator that relies on kernel smoothing to pre-process the pairwise comparisons over time and is applicable in sparse settings where the Bradley-Terry may not be fit. We obtain necessary and sufficient conditions for the existence and uniqueness of our estimator. We also derive time-varying oracle bounds for both the estimation error and the excess risk in the model-agnostic setting where the Bradley-Terry model is not necessarily the true data generating process. We thoroughly test the practical effectiveness of our model using both simulated and real world data and suggest an efficient data-driven approach for bandwidth tuning.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2003.00083v1
PDF	https://arxiv.org/pdf/2003.00083v1.pdf
PWC	https://paperswithcode.com/paper/nonparametric-estimation-in-the-dynamic
Repo
Framework


Title	VCNet: A Robust Approach to Blind Image Inpainting
Authors	Yi Wang, Ying-Cong Chen, Xin Tao, Jiaya Jia
Abstract	Blind inpainting is a task to automatically complete visual contents without specifying masks for missing areas in an image. Previous works assume missing region patterns are known, limiting its application scope. In this paper, we relax the assumption by defining a new blind inpainting setting, making training a blind inpainting neural system robust against various unknown missing region patterns. Specifically, we propose a two-stage visual consistency network (VCN), meant to estimate where to fill (via masks) and generate what to fill. In this procedure, the unavoidable potential mask prediction errors lead to severe artifacts in the subsequent repairing. To address it, our VCN predicts semantically inconsistent regions first, making mask prediction more tractable. Then it repairs these estimated missing regions using a new spatial normalization, enabling VCN to be robust to the mask prediction errors. In this way, semantically convincing and visually compelling content is thus generated. Extensive experiments are conducted, showing our method is effective and robust in blind image inpainting. And our VCN allows for a wide spectrum of applications.
Tasks	Image Inpainting
Published	2020-03-15
URL	https://arxiv.org/abs/2003.06816v1
PDF	https://arxiv.org/pdf/2003.06816v1.pdf
PWC	https://paperswithcode.com/paper/vcnet-a-robust-approach-to-blind-image
Repo
Framework

Parsing Early Modern English for Linguistic Search


Title	Parsing Early Modern English for Linguistic Search
Authors	Seth Kulick, Neville Ryant
Abstract	We investigate the question of whether advances in NLP over the last few years make it possible to vastly increase the size of data usable for research in historical syntax. This brings together many of the usual tools in NLP - word embeddings, tagging, and parsing - in the service of linguistic queries over automatically annotated corpora. We train a part-of-speech (POS) tagger and parser on a corpus of historical English, using ELMo embeddings trained over a billion words of similar text. The evaluation is based on the standard metrics, as well as on the accuracy of the query searches using the parsed data.
Tasks	Word Embeddings
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10546v1
PDF	https://arxiv.org/pdf/2002.10546v1.pdf
PWC	https://paperswithcode.com/paper/parsing-early-modern-english-for-linguistic
Repo
Framework

Convolutional Networks with Dense Connectivity


Title	Convolutional Networks with Dense Connectivity
Authors	Gao Huang, Zhuang Liu, Geoff Pleiss, Laurens van der Maaten, Kilian Q. Weinberger
Abstract	Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, encourage feature reuse and substantially improve parameter efficiency. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less parameters and computation to achieve high performance.
Tasks	Object Recognition
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02394v1
PDF	https://arxiv.org/pdf/2001.02394v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-networks-with-dense
Repo
Framework

Scaling Up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph


Title	Scaling Up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph
Authors	Chuangchuang Sun, Macheng Shen, Jonathan P. How
Abstract	The complexity of multiagent reinforcement learning (MARL) in multiagent systems increases exponentially with respect to the agent number. This scalability issue prevents MARL from being applied in large-scale multiagent systems. However, one critical feature in MARL that is often neglected is that the interactions between agents are quite sparse. Without exploiting this sparsity structure, existing works aggregate information from all of the agents and thus have a high sample complexity. To address this issue, we propose an adaptive sparse attention mechanism by generalizing a sparsity-inducing activation function. Then a sparse communication graph in MARL is learned by graph neural networks based on this new attention mechanism. Through this sparsity structure, the agents can communicate in an effective as well as efficient way via only selectively attending to agents that matter the most and thus the scale of the MARL problem is reduced with little optimality compromised. Comparative results show that our algorithm can learn an interpretable sparse structure and outperforms previous works by a significant margin on applications involving a large-scale multiagent system.
Tasks
Published	2020-03-02
URL	https://arxiv.org/abs/2003.01040v2
PDF	https://arxiv.org/pdf/2003.01040v2.pdf
PWC	https://paperswithcode.com/paper/scaling-up-multiagent-reinforcement-learning
Repo
Framework

One or Two Components? The Scattering Transform Answers


Title	One or Two Components? The Scattering Transform Answers
Authors	Vincent Lostanlen, Alice Cohen-Hadria, Juan Pablo Bello
Abstract	With the aim of constructing a biologically plausible model of machine listening, we study the representation of a multicomponent stationary signal by a wavelet scattering network. First, we show that renormalizing second-order nodes by their first-order parents gives a simple numerical criterion to assess whether two neighboring components will interfere psychoacoustically. Secondly, we run a manifold learning algorithm (Isomap) on scattering coefficients to visualize the similarity space underlying parametric additive synthesis. Thirdly, we generalize the “one or two components” framework to three sine waves or more, and prove that the effective scattering depth of a Fourier series grows in logarithmic proportion to its bandwidth.
Tasks
Published	2020-03-02
URL	https://arxiv.org/abs/2003.01037v1
PDF	https://arxiv.org/pdf/2003.01037v1.pdf
PWC	https://paperswithcode.com/paper/one-or-two-components-the-scattering
Repo
Framework