April 2, 2020

3286 words 16 mins read

Paper Group ANR 123

Paper Group ANR 123

Enlarging Discriminative Power by Adding an Extra Class in Unsupervised Domain Adaptation. Bi-Directional Generation for Unsupervised Domain Adaptation. LAMBERT: Layout-Aware language Modeling using BERT for information extraction. B-PINNs: Bayesian Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Noisy Data. Adaptive Loss …

Enlarging Discriminative Power by Adding an Extra Class in Unsupervised Domain Adaptation

Title Enlarging Discriminative Power by Adding an Extra Class in Unsupervised Domain Adaptation
Authors Hai H. Tran, Sumyeong Ahn, Taeyoung Lee, Yung Yi
Abstract In this paper, we study the problem of unsupervised domain adaptation that aims at obtaining a prediction model for the target domain using labeled data from the source domain and unlabeled data from the target domain. There exists an array of recent research based on the idea of extracting features that are not only invariant for both domains but also provide high discriminative power for the target domain. In this paper, we propose an idea of empowering the discriminativeness: Adding a new, artificial class and training the model on the data together with the GAN-generated samples of the new class. The trained model based on the new class samples is capable of extracting the features that are more discriminative by repositioning data of current classes in the target domain and therefore drawing the decision boundaries more effectively. Our idea is highly generic so that it is compatible with many existing methods such as DANN, VADA, and DIRT-T. We conduct various experiments for the standard data commonly used for the evaluation of unsupervised domain adaptations and demonstrate that our algorithm achieves the SOTA performance for many scenarios.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2020-02-19
URL https://arxiv.org/abs/2002.08041v1
PDF https://arxiv.org/pdf/2002.08041v1.pdf
PWC https://paperswithcode.com/paper/enlarging-discriminative-power-by-adding-an

Bi-Directional Generation for Unsupervised Domain Adaptation

Title Bi-Directional Generation for Unsupervised Domain Adaptation
Authors Guanglei Yang, Haifeng Xia, Mingli Ding, Zhengming Ding
Abstract Unsupervised domain adaptation facilitates the unlabeled target domain relying on well-established source domain information. The conventional methods forcefully reducing the domain discrepancy in the latent space will result in the destruction of intrinsic data structure. To balance the mitigation of domain gap and the preservation of the inherent structure, we propose a Bi-Directional Generation domain adaptation model with consistent classifiers interpolating two intermediate domains to bridge source and target domains. Specifically, two cross-domain generators are employed to synthesize one domain conditioned on the other. The performance of our proposed method can be further enhanced by the consistent classifiers and the cross-domain alignment constraints. We also design two classifiers which are jointly optimized to maximize the consistency on target sample prediction. Extensive experiments verify that our proposed model outperforms the state-of-the-art on standard cross domain visual benchmarks.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2020-02-12
URL https://arxiv.org/abs/2002.04869v1
PDF https://arxiv.org/pdf/2002.04869v1.pdf
PWC https://paperswithcode.com/paper/bi-directional-generation-for-unsupervised

LAMBERT: Layout-Aware language Modeling using BERT for information extraction

Title LAMBERT: Layout-Aware language Modeling using BERT for information extraction
Authors Łukasz Garncarek, Rafał Powalski, Tomasz Stanisławek, Bartosz Topolski, Piotr Halama, Filip Graliński
Abstract In this paper we introduce a novel approach to the problem of understanding documents where the local semantics is influenced by non-trivial layout. Namely, we modify the Transformer architecture in a way that allows it to use the graphical features defined by the layout, without the need to re-learn the language semantics from scratch, thanks to starting the training process from a model pretrained on classical language modeling tasks.
Tasks Language Modelling
Published 2020-02-19
URL https://arxiv.org/abs/2002.08087v2
PDF https://arxiv.org/pdf/2002.08087v2.pdf
PWC https://paperswithcode.com/paper/lambert-layout-aware-language-modeling-using

B-PINNs: Bayesian Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Noisy Data

Title B-PINNs: Bayesian Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Noisy Data
Authors Liu Yang, Xuhui Meng, George Em Karniadakis
Abstract We propose a Bayesian physics-informed neural network (B-PINN) to solve both forward and inverse nonlinear problems described by partial differential equations (PDEs) and noisy data. In this Bayesian framework, the Bayesian neural network (BNN) combined with a PINN for PDEs serves as the prior while the Hamiltonian Monte Carlo (HMC) or the variational inference (VI) could serve as an estimator of the posterior. B-PINNs make use of both physical laws and scattered noisy measurements to provide predictions and quantify the aleatoric uncertainty arising from the noisy data in the Bayesian framework. Compared with PINNs, in addition to uncertainty quantification, B-PINNs obtain more accurate predictions in scenarios with large noise due to their capability of avoiding overfitting. We conduct a systematic comparison between the two different approaches for the B-PINN posterior estimation (i.e., HMC or VI), along with dropout used for quantifying uncertainty in deep neural networks. Our experiments show that HMC is more suitable than VI for the B-PINNs posterior estimation, while dropout employed in PINNs can hardly provide accurate predictions with reasonable uncertainty. Finally, we replace the BNN in the prior with a truncated Karhunen-Lo`eve (KL) expansion combined with HMC or a deep normalizing flow (DNF) model as posterior estimators. The KL is as accurate as BNN and much faster but this framework cannot be easily extended to high-dimensional problems unlike the BNN based framework.
Published 2020-03-13
URL https://arxiv.org/abs/2003.06097v1
PDF https://arxiv.org/pdf/2003.06097v1.pdf
PWC https://paperswithcode.com/paper/b-pinns-bayesian-physics-informed-neural

Adaptive Loss Function for Super Resolution Neural Networks Using Convex Optimization Techniques

Title Adaptive Loss Function for Super Resolution Neural Networks Using Convex Optimization Techniques
Authors Seyed Mehdi Ayyoubzadeh, Xiaolin Wu
Abstract Single Image Super-Resolution (SISR) task refers to learn a mapping from low-resolution images to the corresponding high-resolution ones. This task is known to be extremely difficult since it is an ill-posed problem. Recently, Convolutional Neural Networks (CNNs) have achieved state of the art performance on SISR. However, the images produced by CNNs do not contain fine details of the images. Generative Adversarial Networks (GANs) aim to solve this issue and recover sharp details. Nevertheless, GANs are notoriously difficult to train. Besides that, they generate artifacts in the high-resolution images. In this paper, we have proposed a method in which CNNs try to align images in different spaces rather than only the pixel space. Such a space is designed using convex optimization techniques. CNNs are encouraged to learn high-frequency components of the images as well as low-frequency components. We have shown that the proposed method can recover fine details of the images and it is stable in the training process.
Tasks Image Super-Resolution, Super-Resolution
Published 2020-01-21
URL https://arxiv.org/abs/2001.07766v1
PDF https://arxiv.org/pdf/2001.07766v1.pdf
PWC https://paperswithcode.com/paper/adaptive-loss-function-for-super-resolution

Reinforcement Learning Based Cooperative Coded Caching under Dynamic Popularities in Ultra-Dense Networks

Title Reinforcement Learning Based Cooperative Coded Caching under Dynamic Popularities in Ultra-Dense Networks
Authors Shen Gao, Peihao Dong, Zhiwen Pan, Geoffrey Ye Li
Abstract For ultra-dense networks with wireless backhaul, caching strategy at small base stations (SBSs), usually with limited storage, is critical to meet massive high data rate requests. Since the content popularity profile varies with time in an unknown way, we exploit reinforcement learning (RL) to design a cooperative caching strategy with maximum-distance separable (MDS) coding. We model the MDS coding based cooperative caching as a Markov decision process to capture the popularity dynamics and maximize the long-term expected cumulative traffic load served directly by the SBSs without accessing the macro base station. For the formulated problem, we first find the optimal solution for a small-scale system by embedding the cooperative MDS coding into Q-learning. To cope with the large-scale case, we approximate the state-action value function heuristically. The approximated function includes only a small number of learnable parameters and enables us to propose a fast and efficient action-selection approach, which dramatically reduces the complexity. Numerical results verify the optimality/near-optimality of the proposed RL based algorithms and show the superiority compared with the baseline schemes. They also exhibit good robustness to different environments.
Tasks Q-Learning
Published 2020-03-08
URL https://arxiv.org/abs/2003.03758v1
PDF https://arxiv.org/pdf/2003.03758v1.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-based-cooperative

What Changed Your Mind: The Roles of Dynamic Topics and Discourse in Argumentation Process

Title What Changed Your Mind: The Roles of Dynamic Topics and Discourse in Argumentation Process
Authors Jichuan Zeng, Jing Li, Yulan He, Cuiyun Gao, Michael R. Lyu, Irwin King
Abstract In our world with full of uncertainty, debates and argumentation contribute to the progress of science and society. Despite of the increasing attention to characterize human arguments, most progress made so far focus on the debate outcome, largely ignoring the dynamic patterns in argumentation processes. This paper presents a study that automatically analyzes the key factors in argument persuasiveness, beyond simply predicting who will persuade whom. Specifically, we propose a novel neural model that is able to dynamically track the changes of latent topics and discourse in argumentative conversations, allowing the investigation of their roles in influencing the outcomes of persuasion. Extensive experiments have been conducted on argumentative conversations on both social media and supreme court. The results show that our model outperforms state-of-the-art models in identifying persuasive arguments via explicitly exploring dynamic factors of topic and discourse. We further analyze the effects of topics and discourse on persuasiveness, and find that they are both useful - topics provide concrete evidence while superior discourse styles may bias participants, especially in social media arguments. In addition, we draw some findings from our empirical results, which will help people better engage in future persuasive conversations.
Published 2020-02-10
URL https://arxiv.org/abs/2002.03536v1
PDF https://arxiv.org/pdf/2002.03536v1.pdf
PWC https://paperswithcode.com/paper/what-changed-your-mind-the-roles-of-dynamic

Listwise Learning to Rank with Deep Q-Networks

Title Listwise Learning to Rank with Deep Q-Networks
Authors Abhishek Sharma
Abstract Learning to Rank is the problem involved with ranking a sequence of documents based on their relevance to a given query. Deep Q-Learning has been shown to be a useful method for training an agent in sequential decision making. In this paper, we show that DeepQRank, our deep q-learning to rank agent, demonstrates performance that can be considered state-of-the-art. Though less computationally efficient than a supervised learning approach such as linear regression, our agent has fewer limitations in terms of which format of data it can use for training and evaluation. We run our algorithm against Microsoft’s LETOR listwise dataset and achieve an NDCG@1 (ranking accuracy in the range [0,1]) of 0.5075, narrowly beating out the leading supervised learning model, SVMRank (0.4958).
Tasks Decision Making, Learning-To-Rank, Q-Learning
Published 2020-02-13
URL https://arxiv.org/abs/2002.07651v1
PDF https://arxiv.org/pdf/2002.07651v1.pdf
PWC https://paperswithcode.com/paper/listwise-learning-to-rank-with-deep-q

Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel

Title Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel
Authors Zheng Yu, Xuhui Fan, Marcin Pietrasik, Marek Reformat
Abstract The Mixed-Membership Stochastic Blockmodel~(MMSB) is proposed as one of the state-of-the-art Bayesian relational methods suitable for learning the complex hidden structure underlying the network data. However, the current formulation of MMSB suffers from the following two issues: (1), the prior information~(e.g. entities’ community structural information) can not be well embedded in the modelling; (2), community evolution can not be well described in the literature. Therefore, we propose a non-parametric fragmentation coagulation based Mixed Membership Stochastic Blockmodel (fcMMSB). Our model performs entity-based clustering to capture the community information for entities and linkage-based clustering to derive the group information for links simultaneously. Besides, the proposed model infers the network structure and models community evolution, manifested by appearances and disappearances of communities, using the discrete fragmentation coagulation process (DFCP). By integrating the community structure with the group compatibility matrix we derive a generalized version of MMSB. An efficient Gibbs sampling scheme with Polya Gamma (PG) approach is implemented for posterior inference. We validate our model on synthetic and real world data.
Published 2020-01-17
URL https://arxiv.org/abs/2002.00901v1
PDF https://arxiv.org/pdf/2002.00901v1.pdf
PWC https://paperswithcode.com/paper/fragmentation-coagulation-based-mixed

Determination of the relative inclination and the viewing angle of an interacting pair of galaxies using convolutional neural networks

Title Determination of the relative inclination and the viewing angle of an interacting pair of galaxies using convolutional neural networks
Authors Prem Prakash, Arunima Banerjee, Pavan Kumar Perepu
Abstract Constructing dynamical models for interacting pair of galaxies as constrained by their observed structure and kinematics crucially depends on the correct choice of the values of the relative inclination ($i$) between their galactic planes as well as the viewing angle ($\theta$), the angle between the line of sight and the normal to the plane of their orbital motion. We construct Deep Convolutional Neural Network (DCNN) models to determine the relative inclination ($i$) and the viewing angle ($\theta$) of interacting galaxy pairs, using N-body $+$ Smoothed Particle Hydrodynamics (SPH) simulation data from the GALMER database for training the same. In order to classify galaxy pairs based on their $i$ values only, we first construct DCNN models for a (a) 2-class ( $i$ = 0 $^{\circ}$, 45$^{\circ}$ ) and (b) 3-class ($i = 0^{\circ}, 45^{\circ} \text{ and } 90^{\circ}$) classification, obtaining $F_1$ scores of 99% and 98% respectively. Further, for a classification based on both $i$ and $\theta$ values, we develop a DCNN model for a 9-class classification ($(i,\theta) \sim (0^{\circ},15^{\circ}) ,(0^{\circ},45^{\circ}), (0^{\circ},90^{\circ}), (45^{\circ},15^{\circ}), (45^{\circ}, 45^{\circ}), (45^{\circ}, 90^{\circ}), (90^{\circ}, 15^{\circ}), (90^{\circ}, 45^{\circ}), (90^{\circ},90^{\circ})$), and the $F_1$ score was 97$%$. Finally, we tested our 2-class model on real data of interacting galaxy pairs from the Sloan Digital Sky Survey (SDSS) DR15, and achieve an $F_1$ score of 78%. Our DCNN models could be further extended to determine additional parameters needed to model dynamics of interacting galaxy pairs, which is currently accomplished by trial and error method.
Published 2020-02-04
URL https://arxiv.org/abs/2002.01238v1
PDF https://arxiv.org/pdf/2002.01238v1.pdf
PWC https://paperswithcode.com/paper/determination-of-the-relative-inclination-and

Explaining Memorization and Generalization: A Large-Scale Study with Coherent Gradients

Title Explaining Memorization and Generalization: A Large-Scale Study with Coherent Gradients
Authors Piotr Zielinski, Shankar Krishnan, Satrajit Chatterjee
Abstract Coherent Gradients is a recently proposed hypothesis to explain why over-parameterized neural networks trained with gradient descent generalize well even though they have sufficient capacity to memorize the training set. Inspired by random forests, Coherent Gradients proposes that (Stochastic) Gradient Descent (SGD) finds common patterns amongst examples (if such common patterns exist) since descent directions that are common to many examples add up in the overall gradient, and thus the biggest changes to the network parameters are those that simultaneously help many examples. The original Coherent Gradients paper validated the theory through causal intervention experiments on shallow, fully connected networks on MNIST. In this work, we perform similar intervention experiments on more complex architectures (such as VGG, Inception and ResNet) on more complex datasets (such as CIFAR-10 and ImageNet). Our results are in good agreement with the small scale study in the original paper, thus providing the first validation of coherent gradients in more practically relevant settings. We also confirm in these settings that suppressing incoherent updates by natural modifications to SGD can significantly reduce overfitting–lending credence to the hypothesis that memorization occurs when few examples are responsible for most of the gradient used in the update. Furthermore, we use the coherent gradients theory to explore a new characterization of why some examples are learned earlier than other examples, i.e., “easy” and “hard” examples.
Published 2020-03-16
URL https://arxiv.org/abs/2003.07422v1
PDF https://arxiv.org/pdf/2003.07422v1.pdf
PWC https://paperswithcode.com/paper/explaining-memorization-and-generalization-a

Random Bundle: Brain Metastases Segmentation Ensembling through Annotation Randomization

Title Random Bundle: Brain Metastases Segmentation Ensembling through Annotation Randomization
Authors Darvin Yi, Endre Gøvik, Michael Iv, Elizabeth Tong, Greg Zaharchuk, Daniel Rubin
Abstract We introduce a novel ensembling method, Random Bundle (RB), that improves performance for brain metastases segmentation. We create our ensemble by training each network on our dataset with 50% of our annotated lesions censored out. We also apply a lopsided bootstrap loss to recover performance after inducing an in silico 50% false negative rate and make our networks more sensitive. We improve our network detection of lesions’s mAP value by 39% and more than triple the sensitivity at 80% precision. We also show slight improvements in segmentation quality through DICE score. Further, RB ensembling improves performance over baseline by a larger margin than a variety of popular ensembling strategies. Finally, we show that RB ensembling is computationally efficient by comparing its performance to a single network when both systems are constrained to have the same compute.
Published 2020-02-23
URL https://arxiv.org/abs/2002.09809v1
PDF https://arxiv.org/pdf/2002.09809v1.pdf
PWC https://paperswithcode.com/paper/random-bundle-brain-metastases-segmentation

Visual search over billions of aerial and satellite images

Title Visual search over billions of aerial and satellite images
Authors Ryan Keisler, Samuel W. Skillman, Sunny Gonnabathula, Justin Poehnelt, Xander Rudelis, Michael S. Warren
Abstract We present a system for performing visual search over billions of aerial and satellite images. The purpose of visual search is to find images that are visually similar to a query image. We define visual similarity using 512 abstract visual features generated by a convolutional neural network that has been trained on aerial and satellite imagery. The features are converted to binary values to reduce data and compute requirements. We employ a hash-based search using Bigtable, a scalable database service from Google Cloud. Searching the continental United States at 1-meter pixel resolution, corresponding to approximately 2 billion images, takes approximately 0.1 seconds. This system enables real-time visual search over the surface of the earth, and an interactive demo is available at https://search.descarteslabs.com.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02624v1
PDF https://arxiv.org/pdf/2002.02624v1.pdf
PWC https://paperswithcode.com/paper/visual-search-over-billions-of-aerial-and

Frequency Bias in Neural Networks for Input of Non-Uniform Density

Title Frequency Bias in Neural Networks for Input of Non-Uniform Density
Authors Ronen Basri, Meirav Galun, Amnon Geifman, David Jacobs, Yoni Kasten, Shira Kritchman
Abstract Recent works have partly attributed the generalization ability of over-parameterized neural networks to frequency bias – networks trained with gradient descent on data drawn from a uniform distribution find a low frequency fit before high frequency ones. As realistic training sets are not drawn from a uniform distribution, we here use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics. Our results, which combine analytic and empirical observations, show that when learning a pure harmonic function of frequency $\kappa$, convergence at a point $\x \in \Sphere^{d-1}$ occurs in time $O(\kappa^d/p(\x))$ where $p(\x)$ denotes the local density at $\x$. Specifically, for data in $\Sphere^1$ we analytically derive the eigenfunctions of the kernel associated with the NTK for two-layer networks. We further prove convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK. Our empirical study highlights similarities and differences between deep and shallow networks in this model.
Published 2020-03-10
URL https://arxiv.org/abs/2003.04560v1
PDF https://arxiv.org/pdf/2003.04560v1.pdf
PWC https://paperswithcode.com/paper/frequency-bias-in-neural-networks-for-input

PIANO: A Fast Parallel Iterative Algorithm for Multinomial and Sparse Multinomial Logistic Regression

Title PIANO: A Fast Parallel Iterative Algorithm for Multinomial and Sparse Multinomial Logistic Regression
Authors R. Jyothi, P. Babu
Abstract Multinomial Logistic Regression is a well-studied tool for classification and has been widely used in fields like image processing, computer vision and, bioinformatics, to name a few. Under a supervised classification scenario, a Multinomial Logistic Regression model learns a weight vector to differentiate between any two classes by optimizing over the likelihood objective. With the advent of big data, the inundation of data has resulted in large dimensional weight vector and has also given rise to a huge number of classes, which makes the classical methods applicable for model estimation not computationally viable. To handle this issue, we here propose a parallel iterative algorithm: Parallel Iterative Algorithm for MultiNomial LOgistic Regression (PIANO) which is based on the Majorization Minimization procedure, and can parallely update each element of the weight vectors. Further, we also show that PIANO can be easily extended to solve the Sparse Multinomial Logistic Regression problem - an extensively studied problem because of its attractive feature selection property. In particular, we work out the extension of PIANO to solve the Sparse Multinomial Logistic Regression problem with l1 and l0 regularizations. We also prove that PIANO converges to a stationary point of the Multinomial and the Sparse Multinomial Logistic Regression problems. Simulations were conducted to compare PIANO with the existing methods, and it was found that the proposed algorithm performs better than the existing methods in terms of speed of convergence.
Tasks Feature Selection
Published 2020-02-21
URL https://arxiv.org/abs/2002.09133v1
PDF https://arxiv.org/pdf/2002.09133v1.pdf
PWC https://paperswithcode.com/paper/piano-a-fast-parallel-iterative-algorithm-for
comments powered by Disqus