February 2, 2020

3789 words 18 mins read

# Paper Group AWR 1

Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards. Robustness Assessment for Adversarial Machine Learning: Problems, Solutions and a Survey of Current Neural Networks and Defenses. Generated Loss, Augmented Training, and Multiscale VAE. Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit …

#### Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards

Title Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards
Authors Hou Pong Chan, Wang Chen, Lu Wang, Irwin King
Abstract Generating keyphrases that summarize the main points of a document is a fundamental task in natural language processing. Although existing generative models are capable of predicting multiple keyphrases for an input document as well as determining the number of keyphrases to generate, they still suffer from the problem of generating too few keyphrases. To address this problem, we propose a reinforcement learning (RL) approach for keyphrase generation, with an adaptive reward function that encourages a model to generate both sufficient and accurate keyphrases. Furthermore, we introduce a new evaluation method that incorporates name variations of the ground-truth keyphrases using the Wikipedia knowledge base. Thus, our evaluation method can more robustly evaluate the quality of predicted keyphrases. Extensive experiments on five real-world datasets of different scales demonstrate that our RL approach consistently and significantly improves the performance of the state-of-the-art generative models with both conventional and new evaluation methods.
Published 2019-06-10
URL https://arxiv.org/abs/1906.04106v1
PDF https://arxiv.org/pdf/1906.04106v1.pdf
PWC https://paperswithcode.com/paper/neural-keyphrase-generation-via-reinforcement
Repo https://github.com/kenchan0226/keyphrase-generation-rl
Framework pytorch

#### Robustness Assessment for Adversarial Machine Learning: Problems, Solutions and a Survey of Current Neural Networks and Defenses

Title Robustness Assessment for Adversarial Machine Learning: Problems, Solutions and a Survey of Current Neural Networks and Defenses
Authors Danilo Vasconcellos Vargas, Shashank Kotyan
Abstract In adversarial machine learning, there are a huge number of attacks of various types which evaluates robustness for new models and defences a daunting task. To make matters worse, there is an inherent bias in attacks and defences. Here, we organize the problems faced (model dependence, insufficient evaluation, false adversarial samples and perturbation dependent results) and propose a model agnostic dual ($L_0$ and $L_\infty$) quality assessment method together with the concept of robustness levels to tackle them. We validate the dual quality assessment on state-of-the-art models (WideResNet, ResNet, AllConv, DenseNet, NIN, LeNet and CapsNet) as well as the current hardest defences proposed at ICLR 2018 and the widely known adversarial training, showing that current models and defences are vulnerable in all levels of robustness. The robustness assessment show that depending on the metric used (i.e., $L_0$ or $L_\infty$) the robustness may change significantly and therefore duality should be taken into account for a correct assessment. Moreover, a mathematical derivation, as well as a counterexample, suggest that $L_1$ and $L_2$ metrics alone are not enough to avoid false adversarial samples. Interestingly, a by-product of the assessment proposed is a novel $L_\infty$ black-box method which requires even less perturbation than the One-Pixel Attack (only 12% of One-Pixel Attack’s amount of perturbation) to achieve similar results. Thus, this paper elucidates the problems of robustness evaluation, proposes a dual quality assessment to tackle them as well as survey the robustness of current models and defences. Code available at http://bit.ly/DualQualityAssessment.
Published 2019-06-14
URL https://arxiv.org/abs/1906.06026v2
PDF https://arxiv.org/pdf/1906.06026v2.pdf
PWC https://paperswithcode.com/paper/model-agnostic-dual-quality-assessment-for
Repo https://github.com/shashankkotyan/DualQualityAssessment
Framework tf

#### Generated Loss, Augmented Training, and Multiscale VAE

Title Generated Loss, Augmented Training, and Multiscale VAE
Authors Jason Chou, Gautam Hathi
Abstract The variational autoencoder (VAE) framework remains a popular option for training unsupervised generative models, especially for discrete data where generative adversarial networks (GANs) require workaround to create gradient for the generator. In our work modeling US postal addresses, we show that our discrete VAE with tree recursive architecture demonstrates limited capability of capturing field correlations within structured data, even after overcoming the challenge of posterior collapse with scheduled sampling and tuning of the KL-divergence weight $\beta$. Worse, VAE seems to have difficulty mapping its generated samples to the latent space, as their VAE loss lags behind or even increases during the training process. Motivated by this observation, we show that augmenting training data with generated variants (augmented training) and training a VAE with multiple values of $\beta$ simultaneously (multiscale VAE) both improve the generation quality of VAE. Despite their differences in motivation and emphasis, we show that augmented training and multiscale VAE are actually connected and have similar effects on the model.
Published 2019-04-23
URL http://arxiv.org/abs/1904.10446v1
PDF http://arxiv.org/pdf/1904.10446v1.pdf
PWC https://paperswithcode.com/paper/generated-loss-augmented-training-and
Framework none

#### Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization

Title Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization
Authors Navid Azizan, Sahin Lale, Babak Hassibi
Abstract Most modern learning problems are highly overparameterized, meaning that there are many more parameters than the number of training data points, and as a result, the training loss may have infinitely many global minima (parameter vectors that perfectly interpolate the training data). Therefore, it is important to understand which interpolating solutions we converge to, how they depend on the initialization point and the learning algorithm, and whether they lead to different generalization performances. In this paper, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which the popular stochastic gradient descent (SGD) is a special case. Our contributions are both theoretical and experimental. On the theory side, we show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima (something that comes for free in the highly overparameterized case), SMD with sufficiently small step size converges to a global minimum that is approximately the closest one in Bregman divergence. On the experimental side, our extensive experiments on standard datasets and models, using various initializations, various mirror descents, and various Bregman divergences, consistently confirms that this phenomenon happens in deep learning. Our experiments further indicate that there is a clear difference in the generalization performance of the solutions obtained by different SMD algorithms. Experimenting on a standard image dataset and network architecture with SMD with different kinds of implicit regularization, $\ell_1$ to encourage sparsity, $\ell_2$ yielding SGD, and $\ell_{10}$ to discourage large components in the parameter vector, consistently and definitively shows that $\ell_{10}$-SMD has better generalization performance than SGD, which in turn has better generalization performance than $\ell_1$-SMD.
Published 2019-06-10
URL https://arxiv.org/abs/1906.03830v1
PDF https://arxiv.org/pdf/1906.03830v1.pdf
PWC https://paperswithcode.com/paper/stochastic-mirror-descent-on
Repo https://github.com/SahinLale/StochasticMirrorDescent
Framework pytorch

#### Quantifying and Alleviating the Language Prior Problem in Visual Question Answering

Title Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Authors Yangyang Guo, Zhiyong Cheng, Liqiang Nie, Yibing Liu, Yinglong Wang, Mohan Kankanhalli
Published 2019-05-13
URL https://arxiv.org/abs/1905.04877v1
PDF https://arxiv.org/pdf/1905.04877v1.pdf
PWC https://paperswithcode.com/paper/quantifying-and-alleviating-the-language
Repo https://github.com/guoyang9/vqa-prior
Framework pytorch

#### Recovery Guarantees for Compressible Signals with Adversarial Noise

Title Recovery Guarantees for Compressible Signals with Adversarial Noise
Authors Jasjeet Dhaliwal, Kyle Hambrook
Abstract We provide recovery guarantees for compressible signals that have been corrupted with noise and extend the framework introduced in \cite{bafna2018thwarting} to defend neural networks against $\ell_0$-norm, $\ell_2$-norm, and $\ell_{\infty}$-norm attacks. Our results are general as they can be applied to most unitary transforms used in practice and hold for $\ell_0$-norm, $\ell_2$-norm, and $\ell_\infty$-norm bounded noise. In the case of $\ell_0$-norm noise, we prove recovery guarantees for Iterative Hard Thresholding (IHT) and Basis Pursuit (BP). For $\ell_2$-norm bounded noise, we provide recovery guarantees for BP and for the case of $\ell_\infty$-norm bounded noise, we provide recovery guarantees for Dantzig Selector (DS). These guarantees theoretically bolster the defense framework introduced in \cite{bafna2018thwarting} for defending neural networks against adversarial inputs. Finally, we experimentally demonstrate the effectiveness of this defense framework against an array of $\ell_0$, $\ell_2$ and $\ell_\infty$ norm attacks.
Published 2019-07-15
URL https://arxiv.org/abs/1907.06565v3
PDF https://arxiv.org/pdf/1907.06565v3.pdf
PWC https://paperswithcode.com/paper/recovery-guarantees-for-compressible-signals
Repo https://github.com/jasjeetIM/recovering_compressible_signals
Framework tf

#### Repurposing Entailment for Multi-Hop Question Answering Tasks

Title Repurposing Entailment for Multi-Hop Question Answering Tasks
Authors Harsh Trivedi, Heeyoung Kwon, Tushar Khot, Ashish Sabharwal, Niranjan Balasubramanian
Abstract Question Answering (QA) naturally reduces to an entailment problem, namely, verifying whether some text entails the answer to a question. However, for multi-hop QA tasks, which require reasoning with multiple sentences, it remains unclear how best to utilize entailment models pre-trained on large scale datasets such as SNLI, which are based on sentence pairs. We introduce Multee, a general architecture that can effectively use entailment models for multi-hop QA tasks. Multee uses (i) a local module that helps locate important sentences, thereby avoiding distracting information, and (ii) a global module that aggregates information by effectively incorporating importance weights. Importantly, we show that both modules can use entailment functions pre-trained on a large scale NLI datasets. We evaluate performance on MultiRC and OpenBookQA, two multihop QA datasets. When using an entailment function pre-trained on NLI datasets, Multee outperforms QA models trained only on the target QA datasets and the OpenAI transformer models. The code is available at https://github.com/StonyBrookNLP/multee.
Published 2019-04-20
URL http://arxiv.org/abs/1904.09380v1
PDF http://arxiv.org/pdf/1904.09380v1.pdf
PWC https://paperswithcode.com/paper/190409380
Repo https://github.com/StonyBrookNLP/multee
Framework pytorch

#### Learning Temporal Pose Estimation from Sparsely-Labeled Videos

Title Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Authors Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani
Abstract Modern approaches for multi-person pose estimation in video require large amounts of dense annotations. However, labeling every frame in a video is costly and labor intensive. To reduce the need for dense annotations, we propose a PoseWarper network that leverages training videos with sparse annotations (every k frames) to learn to perform dense temporal pose propagation and estimation. Given a pair of video frames—a labeled Frame A and an unlabeled Frame B—we train our model to predict human pose in Frame A using the features from Frame B by means of deformable convolutions to implicitly learn the pose warping between A and B. We demonstrate that we can leverage our trained PoseWarper for several applications. First, at inference time we can reverse the application direction of our network in order to propagate pose information from manually annotated frames to unlabeled frames. This makes it possible to generate pose annotations for the entire video given only a few manually-labeled frames. Compared to modern label propagation methods based on optical flow, our warping mechanism is much more compact (6M vs 39M parameters), and also more accurate (88.7% mAP vs 83.8% mAP). We also show that we can improve the accuracy of a pose estimator by training it on an augmented dataset obtained by adding our propagated poses to the original manual labels. Lastly, we can use our PoseWarper to aggregate temporal pose information from neighboring frames during inference. This allows our system to achieve state-of-the-art pose detection results on the PoseTrack2017 and PoseTrack2018 datasets. Code has been made available at: https://github.com/facebookresearch/PoseWarper.
Tasks Multi-Person Pose Estimation, Optical Flow Estimation, Pose Estimation
Published 2019-06-06
URL https://arxiv.org/abs/1906.04016v3
PDF https://arxiv.org/pdf/1906.04016v3.pdf
PWC https://paperswithcode.com/paper/learning-temporal-pose-estimation-from
Framework pytorch

#### False Data Injection Attacks in Internet of Things and Deep Learning enabled Predictive Analytics

Title False Data Injection Attacks in Internet of Things and Deep Learning enabled Predictive Analytics
Authors Gautam Raj Mode, Prasad Calyam, Khaza Anuarul Hoque
Abstract Industry 4.0 is the latest industrial revolution primarily merging automation with advanced manufacturing to reduce direct human effort and resources. Predictive maintenance (PdM) is an industry 4.0 solution, which facilitates predicting faults in a component or a system powered by state-of-the-art machine learning (ML) algorithms and the Internet-of-Things (IoT) sensors. However, IoT sensors and deep learning (DL) algorithms, both are known for their vulnerabilities to cyber-attacks. In the context of PdM systems, such attacks can have catastrophic consequences as they are hard to detect due to the nature of the attack. To date, the majority of the published literature focuses on the accuracy of DL enabled PdM systems and often ignores the effect of such attacks. In this paper, we demonstrate the effect of IoT sensor attacks on a PdM system. At first, we use three state-of-the-art DL algorithms, specifically, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN) for predicting the Remaining Useful Life (RUL) of a turbofan engine using NASA’s C-MAPSS dataset. The obtained results show that the GRU-based PdM model outperforms some of the recent literature on RUL prediction using the C-MAPSS dataset. Afterward, we model two different types of false data injection attacks (FDIA) on turbofan engine sensor data and evaluate their impact on CNN, LSTM, and GRU-based PdM systems. The obtained results demonstrate that FDI attacks on even a few IoT sensors can strongly defect the RUL prediction. However, the GRU-based PdM model performs better in terms of accuracy and resiliency. Lastly, we perform a study on the GRU-based PdM model using four different GRU networks with different sequence lengths. Our experiments reveal an interesting relationship between the accuracy, resiliency and sequence length for the GRU-based PdM models.
Published 2019-10-03
URL https://arxiv.org/abs/1910.01716v4
PDF https://arxiv.org/pdf/1910.01716v4.pdf
PWC https://paperswithcode.com/paper/false-data-injection-attacks-in-internet-of
Repo https://github.com/dependable-cps/FDIA-PdM
Framework none

#### Latent Variable Sentiment Grammar

Title Latent Variable Sentiment Grammar
Authors Liwen Zhang, Kewei Tu, Yue Zhang
Abstract Neural models have been investigated for sentiment classification over constituent trees. They learn phrase composition automatically by encoding tree structures but do not explicitly model sentiment composition, which requires to encode sentiment class labels. To this end, we investigate two formalisms with deep sentiment representations that capture sentiment subtype expressions by latent variables and Gaussian mixture vectors, respectively. Experiments on Stanford Sentiment Treebank (SST) show the effectiveness of sentiment grammar over vanilla neural encoders. Using ELMo embeddings, our method gives the best results on this benchmark.
Published 2019-06-29
URL https://arxiv.org/abs/1907.00218v2
PDF https://arxiv.org/pdf/1907.00218v2.pdf
PWC https://paperswithcode.com/paper/latent-variable-sentiment-grammar
Repo https://github.com/Ehaschia/bi-tree-lstm-crf
Framework pytorch

#### Faking and Discriminating the Navigation Data of a Micro Aerial Vehicle Using Quantum Generative Adversarial Networks

Title Faking and Discriminating the Navigation Data of a Micro Aerial Vehicle Using Quantum Generative Adversarial Networks
Authors Michel Barbeau, Joaquin Garcia-Alfaro
Abstract We show that the Quantum Generative Adversarial Network (QGAN) paradigm can be employed by an adversary to learn generating data that deceives the monitoring of a Cyber-Physical System (CPS) and to perpetrate a covert attack. As a test case, the ideas are elaborated considering the navigation data of a Micro Aerial Vehicle (MAV). A concrete QGAN design is proposed to generate fake MAV navigation data. Initially, the adversary is entirely ignorant about the dynamics of the CPS, the strength of the approach from the point of view of the bad guy. A design is also proposed to discriminate between genuine and fake MAV navigation data. The designs combine classical optimization, qubit quantum computing and photonic quantum computing. Using the PennyLane software simulation, they are evaluated over a classical computing platform. We assess the learning time and accuracy of the navigation data generator and discriminator versus space complexity, i.e., the amount of quantum memory needed to solve the problem.
Published 2019-07-05
URL https://arxiv.org/abs/1907.03038v3
PDF https://arxiv.org/pdf/1907.03038v3.pdf
Repo https://github.com/jgalfaro/mirrored-QGANMAV
Framework none

#### Please Stop Permuting Features: An Explanation and Alternatives

Title Please Stop Permuting Features: An Explanation and Alternatives
Authors Giles Hooker, Lucas Mentch
Abstract This paper advocates against permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because of their ability to provide model-agnostic measures that depend only on the pre-trained model output. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. Rather than simply add to this growing literature by further demonstrating such issues, here we seek to provide an explanation for the observed behavior. In particular, we argue that breaking dependencies between features in hold-out data places undue emphasis on sparse regions of the feature space by forcing the original model to extrapolate to regions where there is little to no data. We explore these effects through various settings where a ground-truth is understood and find support for previous claims in the literature that PaP metrics tend to over-emphasize correlated features both in variable importance and partial dependence plots, even though applying permutation methods to the ground-truth models do not. As an alternative, we recommend more direct approaches that have proven successful in other settings: explicitly removing features, conditional permutations, or model distillation methods.
Published 2019-05-01
URL http://arxiv.org/abs/1905.03151v1
PDF http://arxiv.org/pdf/1905.03151v1.pdf
PWC https://paperswithcode.com/paper/190503151
Framework none

#### Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments

Title Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments
Authors Vasilis Syrgkanis, Victor Lei, Miruna Oprescu, Maggie Hei, Keith Battocchi, Greg Lewis
Abstract We consider the estimation of heterogeneous treatment effects with arbitrary machine learning methods in the presence of unobserved confounders with the aid of a valid instrument. Such settings arise in A/B tests with an intent-to-treat structure, where the experimenter randomizes over which user will receive a recommendation to take an action, and we are interested in the effect of the downstream action. We develop a statistical learning approach to the estimation of heterogeneous effects, reducing the problem to the minimization of an appropriate loss function that depends on a set of auxiliary models (each corresponding to a separate prediction task). The reduction enables the use of all recent algorithmic advances (e.g. neural nets, forests). We show that the estimated effect model is robust to estimation errors in the auxiliary models, by showing that the loss satisfies a Neyman orthogonality criterion. Our approach can be used to estimate projections of the true effect model on simpler hypothesis spaces. When these spaces are parametric, then the parameter estimates are asymptotically normal, which enables construction of confidence sets. We applied our method to estimate the effect of membership on downstream webpage engagement on TripAdvisor, using as an instrument an intent-to-treat A/B test among 4 million TripAdvisor users, where some users received an easier membership sign-up process. We also validate our method on synthetic data and on public datasets for the effects of schooling on income.
Published 2019-05-24
URL https://arxiv.org/abs/1905.10176v3
PDF https://arxiv.org/pdf/1905.10176v3.pdf
PWC https://paperswithcode.com/paper/machine-learning-estimation-of-heterogeneous-1
Repo https://github.com/Microsoft/EconML
Framework none

#### Learning Nonsymmetric Determinantal Point Processes

Title Learning Nonsymmetric Determinantal Point Processes
Authors Mike Gartrell, Victor-Emmanuel Brunel, Elvis Dohmatob, Syrine Krichene
Abstract Determinantal point processes (DPPs) have attracted substantial attention as an elegant probabilistic model that captures the balance between quality and diversity within sets. DPPs are conventionally parameterized by a positive semi-definite kernel matrix, and this symmetric kernel encodes only repulsive interactions between items. These so-called symmetric DPPs have significant expressive power, and have been successfully applied to a variety of machine learning tasks, including recommendation systems, information retrieval, and automatic summarization, among many others. Efficient algorithms for learning symmetric DPPs and sampling from these models have been reasonably well studied. However, relatively little attention has been given to nonsymmetric DPPs, which relax the symmetric constraint on the kernel. Nonsymmetric DPPs allow for both repulsive and attractive item interactions, which can significantly improve modeling power, resulting in a model that may better fit for some applications. We present a method that enables a tractable algorithm, based on maximum likelihood estimation, for learning nonsymmetric DPPs from data composed of observed subsets. Our method imposes a particular decomposition of the nonsymmetric kernel that enables such tractable learning algorithms, which we analyze both theoretically and experimentally. We evaluate our model on synthetic and real-world datasets, demonstrating improved predictive performance compared to symmetric DPPs, which have previously shown strong performance on modeling tasks associated with these datasets.
Tasks Information Retrieval, Point Processes, Recommendation Systems
Published 2019-05-30
URL https://arxiv.org/abs/1905.12962v2
PDF https://arxiv.org/pdf/1905.12962v2.pdf
PWC https://paperswithcode.com/paper/learning-nonsymmetric-determinantal-point
Repo https://github.com/cgartrel/nonsymmetric-DPP-learning
Framework pytorch

#### Tightness-aware Evaluation Protocol for Scene Text Detection

Title Tightness-aware Evaluation Protocol for Scene Text Detection
Authors Yuliang Liu, Lianwen Jin, Zecheng Xie, Canjie Luo, Shuaitao Zhang, Lele Xie
Abstract Evaluation protocols play key role in the developmental progress of text detection methods. There are strict requirements to ensure that the evaluation methods are fair, objective and reasonable. However, existing metrics exhibit some obvious drawbacks: 1) They are not goal-oriented; 2) they cannot recognize the tightness of detection methods; 3) existing one-to-many and many-to-one solutions involve inherent loopholes and deficiencies. Therefore, this paper proposes a novel evaluation protocol called Tightness-aware Intersect-over-Union (TIoU) metric that could quantify completeness of ground truth, compactness of detection, and tightness of matching degree. Specifically, instead of merely using the IoU value, two common detection behaviors are properly considered; meanwhile, directly using the score of TIoU to recognize the tightness. In addition, we further propose a straightforward method to address the annotation granularity issue, which can fairly evaluate word and text-line detections simultaneously. By adopting the detection results from published methods and general object detection frameworks, comprehensive experiments on ICDAR 2013 and ICDAR 2015 datasets are conducted to compare recent metrics and the proposed TIoU metric. The comparison demonstrated some promising new prospects, e.g., determining the methods and frameworks for which the detection is tighter and more beneficial to recognize. Our method is extremely simple; however, the novelty is none other than the proposed metric can utilize simplest but reasonable improvements to lead to many interesting and insightful prospects and solving most the issues of the previous metrics. The code is publicly available at https://github.com/Yuliang-Liu/TIoU-metric .
Tasks Object Detection, Scene Text Detection
Published 2019-03-27
URL http://arxiv.org/abs/1904.00813v1
PDF http://arxiv.org/pdf/1904.00813v1.pdf
PWC https://paperswithcode.com/paper/tightness-aware-evaluation-protocol-for-scene
Repo https://github.com/Yuliang-Liu/TIoU-metric
Framework none