April 2, 2020

# Paper Group ANR 197

An Iterative Approach for Identifying Complaint Based Tweets in Social Media Platforms. Watch and learn – a generalized approach for transferrable learning in deep neural networks via physical principles. Hierarchical Context Enhanced Multi-Domain Dialogue System for Multi-domain Task Completion. COVID-ResNet: A Deep Learning Framework for Screeni …

#### An Iterative Approach for Identifying Complaint Based Tweets in Social Media Platforms

Title An Iterative Approach for Identifying Complaint Based Tweets in Social Media Platforms
Authors Gyanesh Anand, Akash Gautam, Puneet Mathur, Debanjan Mahata, Rajiv Ratn Shah, Ramit Sawhney
Abstract Twitter is a social media platform where users express opinions over a variety of issues. Posts offering grievances or complaints can be utilized by private/ public organizations to improve their service and promptly gauge a low-cost assessment. In this paper, we propose an iterative methodology which aims to identify complaint based posts pertaining to the transport domain. We perform comprehensive evaluations along with releasing a novel dataset for the research purposes.
Published 2020-01-24
URL https://arxiv.org/abs/2001.09215v1
PDF https://arxiv.org/pdf/2001.09215v1.pdf
PWC https://paperswithcode.com/paper/an-iterative-approach-for-identifying
Repo
Framework

#### Watch and learn – a generalized approach for transferrable learning in deep neural networks via physical principles

Title Watch and learn – a generalized approach for transferrable learning in deep neural networks via physical principles
Authors Kyle Sprague, Juan Carrasquilla, Steve Whitelam, Isaac Tamblyn
Abstract Transfer learning refers to the use of knowledge gained while solving a machine learning task and applying it to the solution of a closely related problem. Such an approach has enabled scientific breakthroughs in computer vision and natural language processing where the weights learned in state-of-the-art models can be used to initialize models for other tasks which dramatically improve their performance and save computational time. Here we demonstrate an unsupervised learning approach augmented with basic physical principles that achieves fully transferrable learning for problems in statistical physics across different physical regimes. By coupling a sequence model based on a recurrent neural network to an extensive deep neural network, we are able to learn the equilibrium probability distributions and inter-particle interaction models of classical statistical mechanical systems. Our approach, distribution-consistent learning, DCL, is a general strategy that works for a variety of canonical statistical mechanical models (Ising and Potts) as well as disordered (spin-glass) interaction potentials. Using data collected from a single set of observation conditions, DCL successfully extrapolates across all temperatures, thermodynamic phases, and can be applied to different length-scales. This constitutes a fully transferrable physics-based learning in a generalizable approach.
Published 2020-03-03
URL https://arxiv.org/abs/2003.02647v1
PDF https://arxiv.org/pdf/2003.02647v1.pdf
PWC https://paperswithcode.com/paper/watch-and-learn-a-generalized-approach-for
Repo
Framework

#### Hierarchical Context Enhanced Multi-Domain Dialogue System for Multi-domain Task Completion

Title Hierarchical Context Enhanced Multi-Domain Dialogue System for Multi-domain Task Completion
Authors Jingyuan Yang, Guang Liu, Yuzhao Mao, Zhiwei Zhao, Weiguo Gao, Xuan Li, Haiqin Yang, Jianping Shen
Abstract Task 1 of the DSTC8-track1 challenge aims to develop an end-to-end multi-domain dialogue system to accomplish complex users’ goals under tourist information desk settings. This paper describes our submitted solution, Hierarchical Context Enhanced Dialogue System (HCEDS), for this task. The main motivation of our system is to comprehensively explore the potential of hierarchical context for sufficiently understanding complex dialogues. More specifically, we apply BERT to capture token-level information and employ the attention mechanism to capture sentence-level information. The results listed in the leaderboard show that our system achieves first place in automatic evaluation and the second place in human evaluation.
Published 2020-03-03
URL https://arxiv.org/abs/2003.01338v1
PDF https://arxiv.org/pdf/2003.01338v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-context-enhanced-multi-domain
Repo
Framework

#### COVID-ResNet: A Deep Learning Framework for Screening of COVID19 from Radiographs

Title COVID-ResNet: A Deep Learning Framework for Screening of COVID19 from Radiographs
Abstract In the last few months, the novel COVID19 pandemic has spread all over the world. Due to its easy transmission, developing techniques to accurately and easily identify the presence of COVID19 and distinguish it from other forms of flu and pneumonia is crucial. Recent research has shown that the chest Xrays of patients suffering from COVID19 depicts certain abnormalities in the radiography. However, those approaches are closed source and not made available to the research community for re-producibility and gaining deeper insight. The goal of this work is to build open source and open access datasets and present an accurate Convolutional Neural Network framework for differentiating COVID19 cases from other pneumonia cases. Our work utilizes state of the art training techniques including progressive resizing, cyclical learning rate finding and discriminative learning rates to training fast and accurate residual neural networks. Using these techniques, we showed the state of the art results on the open-access COVID-19 dataset. This work presents a 3-step technique to fine-tune a pre-trained ResNet-50 architecture to improve model performance and reduce training time. We call it COVIDResNet. This is achieved through progressively re-sizing of input images to 128x128x3, 224x224x3, and 229x229x3 pixels and fine-tuning the network at each stage. This approach along with the automatic learning rate selection enabled us to achieve the state of the art accuracy of 96.23% (on all the classes) on the COVIDx dataset with only 41 epochs. This work presented a computationally efficient and highly accurate model for multi-class classification of three different infection types from along with Normal individuals. This model can help in the early screening of COVID19 cases and help reduce the burden on healthcare systems.
Published 2020-03-31
URL https://arxiv.org/abs/2003.14395v1
PDF https://arxiv.org/pdf/2003.14395v1.pdf
PWC https://paperswithcode.com/paper/covid-resnet-a-deep-learning-framework-for
Repo
Framework

#### Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features

Title Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features
Authors Liang Ding, Rui Tuo, Shahin Shahrampour
Abstract Despite their success, kernel methods suffer from a massive computational cost in practice. In this paper, in lieu of commonly used kernel expansion with respect to $N$ inputs, we develop a novel optimal design maximizing the entropy among kernel features. This procedure results in a kernel expansion with respect to entropic optimal features (EOF), improving the data representation dramatically due to features dissimilarity. Under mild technical assumptions, our generalization bound shows that with only $O(N^{\frac{1}{4}})$ features (disregarding logarithmic factors), we can achieve the optimal statistical accuracy (i.e., $O(1/\sqrt{N})$). The salient feature of our design is its sparsity that significantly reduces the time and space cost. Our numerical experiments on benchmark datasets verify the superiority of EOF over the state-of-the-art in kernel approximation.
Published 2020-02-11
URL https://arxiv.org/abs/2002.04195v1
PDF https://arxiv.org/pdf/2002.04195v1.pdf
PWC https://paperswithcode.com/paper/generalization-guarantees-for-sparse-kernel
Repo
Framework

#### A Critical View of the Structural Causal Model

Title A Critical View of the Structural Causal Model
Authors Tomer Galanti, Ofir Nabati, Lior Wolf
Abstract In the univariate case, we show that by comparing the individual complexities of univariate cause and effect, one can identify the cause and the effect, without considering their interaction at all. In our framework, complexities are captured by the reconstruction error of an autoencoder that operates on the quantiles of the distribution. Comparing the reconstruction errors of the two autoencoders, one for each variable, is shown to perform surprisingly well on the accepted causality directionality benchmarks. Hence, the decision as to which of the two is the cause and which is the effect may not be based on causality but on complexity. In the multivariate case, where one can ensure that the complexities of the cause and effect are balanced, we propose a new adversarial training method that mimics the disentangled structure of the causal model. We prove that in the multidimensional case, such modeling is likely to fit the data only in the direction of causality. Furthermore, a uniqueness result shows that the learned model is able to identify the underlying causal and residual (noise) components. Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.
Published 2020-02-23
URL https://arxiv.org/abs/2002.10007v1
PDF https://arxiv.org/pdf/2002.10007v1.pdf
PWC https://paperswithcode.com/paper/a-critical-view-of-the-structural-causal
Repo
Framework

#### A Multi-Scale Tensor Network Architecture for Classification and Regression

Title A Multi-Scale Tensor Network Architecture for Classification and Regression
Authors Justin Reyes, Miles Stoudenmire
Abstract We present an algorithm for supervised learning using tensor networks, employing a step of preprocessing the data by coarse-graining through a sequence of wavelet transformations. We represent these transformations as a set of tensor network layers identical to those in a multi-scale entanglement renormalization ansatz (MERA) tensor network, and perform supervised learning and regression tasks through a model based on a matrix product state (MPS) tensor network acting on the coarse-grained data. Because the entire model consists of tensor contractions (apart from the initial non-linear feature map), we can adaptively fine-grain the optimized MPS model backwards through the layers with essentially no loss in performance. The MPS itself is trained using an adaptive algorithm based on the density matrix renormalization group (DMRG) algorithm. We test our methods by performing a classification task on audio data and a regression task on temperature time-series data, studying the dependence of training accuracy on the number of coarse-graining layers and showing how fine-graining through the network may be used to initialize models with access to finer-scale features.
Published 2020-01-22
URL https://arxiv.org/abs/2001.08286v1
PDF https://arxiv.org/pdf/2001.08286v1.pdf
PWC https://paperswithcode.com/paper/a-multi-scale-tensor-network-architecture-for
Repo
Framework

#### Convolution Neural Network Architecture Learning for Remote Sensing Scene Classification

Title Convolution Neural Network Architecture Learning for Remote Sensing Scene Classification
Authors Jie Chen, Haozhe Huang, Jian Peng, Jiawei Zhu, Li Chen, Wenbo Li, Binyu Sun, Haifeng Li
Abstract Remote sensing image scene classification is a fundamental but challenging task in understanding remote sensing images. Recently, deep learning-based methods, especially convolutional neural network-based (CNN-based) methods have shown enormous potential to understand remote sensing images. CNN-based methods meet with success by utilizing features learned from data rather than features designed manually. The feature-learning procedure of CNN largely depends on the architecture of CNN. However, most of the architectures of CNN used for remote sensing scene classification are still designed by hand which demands a considerable amount of architecture engineering skills and domain knowledge, and it may not play CNN’s maximum potential on a special dataset. In this paper, we proposed an automatically architecture learning procedure for remote sensing scene classification. We designed a parameters space in which every set of parameters represents a certain architecture of CNN (i.e., some parameters represent the type of operators used in the architecture such as convolution, pooling, no connection or identity, and the others represent the way how these operators connect). To discover the optimal set of parameters for a given dataset, we introduced a learning strategy which can allow efficient search in the architecture space by means of gradient descent. An architecture generator finally maps the set of parameters into the CNN used in our experiments.
Published 2020-01-27
URL https://arxiv.org/abs/2001.09614v1
PDF https://arxiv.org/pdf/2001.09614v1.pdf
PWC https://paperswithcode.com/paper/convolution-neural-network-architecture
Repo
Framework

#### Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Title Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement
Abstract We propose the Recursive Non-autoregressive Graph-to-graph Transformer architecture (RNG-Tr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. The Graph-to-Graph Transformer architecture of \newcite{mohammadshahi2019graphtograph} has previously been used for autoregressive graph prediction, but here we use it to predict all edges of the graph independently, conditioned on a previous prediction of the same graph. We demonstrate the power and effectiveness of RNG-Tr on several dependency corpora, using a refinement model pre-trained with BERT \cite{devlin2018bert}. We also introduce Dependency BERT (DepBERT), a non-recursive parser similar to our refinement model. RNG-Tr is able to improve the accuracy of a variety of initial parsers on 13 languages from the Universal Dependencies Treebanks and the English and Chinese Penn Treebanks, even improving over the new state-of-the-art results achieved by DepBERT, significantly improving the state-of-the-art for all corpora tested.
Published 2020-03-29
URL https://arxiv.org/abs/2003.13118v1
PDF https://arxiv.org/pdf/2003.13118v1.pdf
PWC https://paperswithcode.com/paper/recursive-non-autoregressive-graph-to-graph
Repo
Framework

#### Contextual Sense Making by Fusing Scene Classification, Detections, and Events in Full Motion Video

Title Contextual Sense Making by Fusing Scene Classification, Detections, and Events in Full Motion Video
Authors Marc Bosch, Joseph Nassar, Benjamin Ortiz, Brendan Lammers, David Lindenbaum, John Wahl, Robert Mangum, Margaret Smith
Abstract With the proliferation of imaging sensors, the volume of multi-modal imagery far exceeds the ability of human analysts to adequately consume and exploit it. Full motion video (FMV) possesses the extra challenge of containing large amounts of redundant temporal data. We aim to address the needs of human analysts to consume and exploit data given aerial FMV. We have investigated and designed a system capable of detecting events and activities of interest that deviate from the baseline patterns of observation given FMV feeds. We have divided the problem into three tasks: (1) Context awareness, (2) object cataloging, and (3) event detection. The goal of context awareness is to constraint the problem of visual search and detection in video data. A custom image classifier categorizes the scene with one or multiple labels to identify the operating context and environment. This step helps reducing the semantic search space of downstream tasks in order to increase their accuracy. The second step is object cataloging, where an ensemble of object detectors locates and labels any known objects found in the scene (people, vehicles, boats, planes, buildings, etc.). Finally, context information and detections are sent to the event detection engine to monitor for certain behaviors. A series of analytics monitor the scene by tracking object counts, and object interactions. If these object interactions are not declared to be commonly observed in the current scene, the system will report, geolocate, and log the event. Events of interest include identifying a gathering of people as a meeting and/or a crowd, alerting when there are boats on a beach unloading cargo, increased count of people entering a building, people getting in and/or out of vehicles of interest, etc. We have applied our methods on data from different sensors at different resolutions in a variety of geographical areas.
Published 2020-01-16
URL https://arxiv.org/abs/2001.05979v1
PDF https://arxiv.org/pdf/2001.05979v1.pdf
PWC https://paperswithcode.com/paper/contextual-sense-making-by-fusing-scene
Repo
Framework

#### Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?

Title Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?
Authors Yu Zhang, Zhenghua Li, Houquan Zhou, Min Zhang
Abstract In the pre deep learning era, part-of-speech tags have been considered as indispensable ingredients for feature engineering in dependency parsing due to their important role in alleviating data sparseness of purely lexical features, and quite a few works focus on joint tagging and parsing models to avoid error propagation. In contrast, recent studies suggest that POS tagging becomes much less important or even useless for neural parsing, especially when using character-based word representations such as CharLSTM. Yet there still lacks a full and systematic investigation on this interesting issue, both empirically and linguistically. To answer this, we design four typical multi-task learning frameworks (i.e., Share-Loose, Share-Tight, Stack-Discrete, Stack-Hidden), for joint tagging and parsing based on the state-of-the-art biaffine parser. Considering that it is much cheaper to annotate POS tags than parse trees, we also investigate the utilization of large-scale heterogeneous POS-tag data. We conduct experiments on both English and Chinese datasets, and the results clearly show that POS tagging (both homogeneous and heterogeneous) can still significantly improve parsing performance when using the Stack-Hidden joint framework. We conduct detailed analysis and gain more insights from the linguistic aspect.
Published 2020-03-06
URL https://arxiv.org/abs/2003.03204v1
PDF https://arxiv.org/pdf/2003.03204v1.pdf
Repo
Framework

#### A comparison of different types of Niching Genetic Algorithms for variable selection in solar radiation estimation

Title A comparison of different types of Niching Genetic Algorithms for variable selection in solar radiation estimation
Authors Jorge Bustos, Victor A. Jimenez, Adrian Will
Abstract Variable selection problems generally present more than a single solution and, sometimes, it is worth to find as many solutions as possible. The use of Evolutionary Algorithms applied to this kind of problem proves to be one of the best methods to find optimal solutions. Moreover, there are variants designed to find all or almost all local optima, known as Niching Genetic Algorithms (NGA). There are several different NGA methods developed in order to achieve this task. The present work compares the behavior of eight different niching techniques, applied to a climatic database of four weather stations distributed in Tucuman, Argentina. The goal is to find different sets of input variables that have been used as the input variable by the estimation method. Final results were evaluated based on low estimation error and low dispersion error, as well as a high number of different results and low computational time. A second experiment was carried out to study the capability of the method to identify critical variables. The best results were obtained with Deterministic Crowding. In contrast, Steady State Worst Among Most Similar and Probabilistic Crowding showed good results but longer processing times and less ability to determine the critical factors.
Published 2020-02-14
URL https://arxiv.org/abs/2002.06036v1
PDF https://arxiv.org/pdf/2002.06036v1.pdf
PWC https://paperswithcode.com/paper/a-comparison-of-different-types-of-niching
Repo
Framework

#### DeepSUM++: Non-local Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images

Title DeepSUM++: Non-local Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images
Authors Andrea Bordone Molini, Diego Valsesia, Giulia Fracastoro, Enrico Magli
Abstract Deep learning methods for super-resolution of a remote sensing scene from multiple unregistered low-resolution images have recently gained attention thanks to a challenge proposed by the European Space Agency. This paper presents an evolution of the winner of the challenge, showing how incorporating non-local information in a convolutional neural network allows to exploit self-similar patterns that provide enhanced regularization of the super-resolution problem. Experiments on the dataset of the challenge show improved performance over the state-of-the-art, which does not exploit non-local information.
Published 2020-01-15
URL https://arxiv.org/abs/2001.06342v1
PDF https://arxiv.org/pdf/2001.06342v1.pdf
PWC https://paperswithcode.com/paper/deepsum-non-local-deep-neural-network-for
Repo
Framework

#### Federating Recommendations Using Differentially Private Prototypes

Title Federating Recommendations Using Differentially Private Prototypes
Authors Mónica Ribero, Jette Henderson, Sinead Williamson, Haris Vikalo
Abstract Machine learning methods allow us to make recommendations to users in applications across fields including entertainment, dating, and commerce, by exploiting similarities in users’ interaction patterns. However, in domains that demand protection of personally sensitive data, such as medicine or banking, how can we learn such a model without accessing the sensitive data, and without inadvertently leaking private information? We propose a new federated approach to learning global and local private models for recommendation without collecting raw data, user statistics, or information about personal preferences. Our method produces a set of prototypes that allows us to infer global behavioral patterns, while providing differential privacy guarantees for users in any database of the system. By requiring only two rounds of communication, we both reduce the communication costs and avoid the excessive privacy loss associated with iterative procedures. We test our framework on synthetic data as well as real federated medical data and Movielens ratings data. We show local adaptation of the global model allows our method to outperform centralized matrix-factorization-based recommender system models, both in terms of accuracy of matrix reconstruction and in terms of relevance of the recommendations, while maintaining provable privacy guarantees. We also show that our method is more robust and is characterized by smaller variance than individual models learned by independent entities.
Published 2020-03-01
URL https://arxiv.org/abs/2003.00602v1
PDF https://arxiv.org/pdf/2003.00602v1.pdf
PWC https://paperswithcode.com/paper/federating-recommendations-using
Repo
Framework

#### The Curious Case of Adversarially Robust Models: More Data Can Help, Double Descend, or Hurt Generalization

Title The Curious Case of Adversarially Robust Models: More Data Can Help, Double Descend, or Hurt Generalization
Authors Yifei Min, Lin Chen, Amin Karbasi
Abstract Despite remarkable success, deep neural networks are sensitive to human-imperceptible small perturbations on the data and could be adversarially misled to produce incorrect or even dangerous predictions. To circumvent these issues, practitioners introduced adversarial training to produce adversarially robust models whose predictions are robust to small perturbations to the data. It is widely believed that more training data will help adversarially robust models generalize better on the test data. In this paper, however, we challenge this conventional belief and show that more training data could hurt the generalization of adversarially robust models for the linear classification problem. We identify three regimes based on the strength of the adversary. In the weak adversary regime, more data improves the generalization of adversarially robust models. In the medium adversary regime, with more training data, the generalization loss exhibits a double descent curve. This implies that in this regime, there is an intermediate stage where more training data hurts their generalization. In the strong adversary regime, more data almost immediately causes the generalization error to increase.