October 18, 2019

3084 words 15 mins read

Paper Group ANR 471

Gradient-Leaks: Understanding and Controlling Deanonymization in Federated Learning. Integrating Feature and Image Pyramid: A Lung Nodule Detector Learned in Curriculum Fashion. Learning Non-Stationary Space-Time Models for Environmental Monitoring. Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for S …

Gradient-Leaks: Understanding and Controlling Deanonymization in Federated Learning


Title	Gradient-Leaks: Understanding and Controlling Deanonymization in Federated Learning
Authors	Tribhuvanesh Orekondy, Seong Joon Oh, Yang Zhang, Bernt Schiele, Mario Fritz
Abstract	Federated Learning (FL) systems are gaining popularity as a solution to training Machine Learning (ML) models from large-scale user data collected on personal devices (e.g., smartphones) without their raw data leaving the device. At the core of FL is a network of anonymous user devices sharing minimal training information (model parameter deltas) computed locally on personal data. However, the degree to which user-specific information is encoded in the model deltas is poorly understood. In this paper, we identify model deltas encode subtle variations in which users capture and generate data. The variations provide a powerful statistical signal, allowing an adversary to effectively deanonymize participating devices using a limited set of auxiliary data. We analyze resulting deanonymization attacks on diverse tasks on real-world (anonymized) user-generated data across a range of closed- and open-world scenarios. We study various strategies to mitigate the risks of deanonymization. As random perturbation methods do not offer convincing operating points, we propose data-augmentation strategies which introduces adversarial biases in device data and thereby, offer substantial protection against deanonymization threats with little effect on utility.
Tasks	Data Augmentation, Speech Recognition
Published	2018-05-15
URL	https://arxiv.org/abs/1805.05838v2
PDF	https://arxiv.org/pdf/1805.05838v2.pdf
PWC	https://paperswithcode.com/paper/understanding-and-controlling-user
Repo
Framework

Integrating Feature and Image Pyramid: A Lung Nodule Detector Learned in Curriculum Fashion


Title	Integrating Feature and Image Pyramid: A Lung Nodule Detector Learned in Curriculum Fashion
Authors	Benyuan Sun, Zhen Zhou, Fandong Zhang, Xiuli Li, Yizhou Wang
Abstract	Lung nodules suffer large variation in size and appearance in CT images. Nodules less than 10mm can easily lose information after down-sampling in convolutional neural networks, which results in low sensitivity. In this paper, a combination of 3D image and feature pyramid is exploited to integrate lower-level texture features with high-level semantic features, thus leading to a higher recall. However, 3D operations are time and memory consuming, which aggravates the situation with the explosive growth of medical images. To tackle this problem, we propose a general curriculum training strategy to speed up training. An dynamic sampling method is designed to pick up partial samples which give the best contribution to network training, thus leading to much less time consuming. In experiments, we demonstrate that the proposed network outperforms previous state-of-the-art methods. Meanwhile, our sampling strategy halves the training time of the proposal network on LUNA16.
Tasks
Published	2018-07-21
URL	http://arxiv.org/abs/1807.08135v2
PDF	http://arxiv.org/pdf/1807.08135v2.pdf
PWC	https://paperswithcode.com/paper/integrating-feature-and-image-pyramid-a-lung
Repo
Framework

Learning Non-Stationary Space-Time Models for Environmental Monitoring


Title	Learning Non-Stationary Space-Time Models for Environmental Monitoring
Authors	Sahil Garg, Amarjeet Singh, Fabio Ramos
Abstract	One of the primary aspects of sustainable development involves accurate understanding and modeling of environmental phenomena. Many of these phenomena exhibit variations in both space and time and it is imperative to develop a deeper understanding of techniques that can model space-time dynamics accurately. In this paper we propose NOSTILL-GP - NOn-stationary Space TIme variable Latent Length scale GP, a generic non-stationary, spatio-temporal Gaussian Process (GP) model. We present several strategies, for efficient training of our model, necessary for real-world applicability. Extensive empirical validation is performed using three real-world environmental monitoring datasets, with diverse dynamics across space and time. Results from the experiments clearly demonstrate general applicability and effectiveness of our approach for applications in environmental monitoring.
Tasks
Published	2018-04-27
URL	http://arxiv.org/abs/1804.10535v1
PDF	http://arxiv.org/pdf/1804.10535v1.pdf
PWC	https://paperswithcode.com/paper/learning-non-stationary-space-time-models-for
Repo
Framework

Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL?


Title	Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL?
Authors	Emma Strubell, Andrew McCallum
Abstract	Do unsupervised methods for learning rich, contextualized token representations obviate the need for explicit modeling of linguistic structure in neural network models for semantic role labeling (SRL)? We address this question by incorporating the massively successful ELMo embeddings (Peters et al., 2018) into LISA (Strubell et al., 2018), a strong, linguistically-informed neural network architecture for SRL. In experiments on the CoNLL-2005 shared task we find that though ELMo out-performs typical word embeddings, beginning to close the gap in F1 between LISA with predicted and gold syntactic parses, syntactically-informed models still out-perform syntax-free models when both use ELMo, especially on out-of-domain data. Our results suggest that linguistic structures are indeed still relevant in this golden age of deep learning for NLP.
Tasks	Semantic Role Labeling, Word Embeddings
Published	2018-11-12
URL	http://arxiv.org/abs/1811.04773v1
PDF	http://arxiv.org/pdf/1811.04773v1.pdf
PWC	https://paperswithcode.com/paper/syntax-helps-elmo-understand-semantics-is
Repo
Framework

A Two-Stream Mutual Attention Network for Semi-supervised Biomedical Segmentation with Noisy Labels


Title	A Two-Stream Mutual Attention Network for Semi-supervised Biomedical Segmentation with Noisy Labels
Authors	Shaobo Min, Xuejin Chen, Zheng-Jun Zha, Feng Wu, Yongdong Zhang
Abstract	\begin{abstract} Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation. Although many semi-supervised methods have been proposed to provide extra training data, automatically generated labels are usually too noisy to retrain models effectively. In this paper, we propose a Two-Stream Mutual Attention Network (TSMAN) that weakens the influence of back-propagated gradients caused by incorrect labels, thereby rendering the network robust to unclean data. The proposed TSMAN consists of two sub-networks that are connected by three types of attention models in different layers. The target of each attention model is to indicate potentially incorrect gradients in a certain layer for both sub-networks by analyzing their inferred features using the same input. In order to achieve this purpose, the attention models are designed based on the propagation analysis of noisy gradients at different layers. This allows the attention models to effectively discover incorrect labels and weaken their influence during the parameter updating process. By exchanging multi-level features within the two-stream architecture, the effects of noisy labels in each sub-network are reduced by decreasing the updating gradients. Furthermore, a hierarchical distillation is developed to provide more reliable pseudo labels for unlabelded data, which further boosts the performance of our retrained TSMAN. The experiments using both the HVSMR 2016 and BRATS 2015 benchmarks demonstrate that our semi-supervised learning framework surpasses the state-of-the-art fully-supervised results.
Tasks
Published	2018-07-31
URL	http://arxiv.org/abs/1807.11719v3
PDF	http://arxiv.org/pdf/1807.11719v3.pdf
PWC	https://paperswithcode.com/paper/a-two-stream-mutual-attention-network-for
Repo
Framework

Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point


Title	Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point
Authors	Liane Guillou, Christian Hardmeier
Abstract	We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semi-automatic metrics and test suites in place of fully automatic metrics.
Tasks
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04164v1
PDF	http://arxiv.org/pdf/1808.04164v1.pdf
PWC	https://paperswithcode.com/paper/automatic-reference-based-evaluation-of
Repo
Framework

Loss Guided Activation for Action Recognition in Still Images


Title	Loss Guided Activation for Action Recognition in Still Images
Authors	Lu Liu, Robby T. Tan, Shaodi You
Abstract	One significant problem of deep-learning based human action recognition is that it can be easily misled by the presence of irrelevant objects or backgrounds. Existing methods commonly address this problem by employing bounding boxes on the target humans as part of the input, in both training and testing stages. This requirement of bounding boxes as part of the input is needed to enable the methods to ignore irrelevant contexts and extract only human features. However, we consider this solution is inefficient, since the bounding boxes might not be available. Hence, instead of using a person bounding box as an input, we introduce a human-mask loss to automatically guide the activations of the feature maps to the target human who is performing the action, and hence suppress the activations of misleading contexts. We propose a multi-task deep learning method that jointly predicts the human action class and human location heatmap. Extensive experiments demonstrate our approach is more robust compared to the baseline methods under the presence of irrelevant misleading contexts. Our method achieves 94.06% and 40.65% (in terms of mAP) on Stanford40 and MPII dataset respectively, which are 3.14% and 12.6% relative improvements over the best results reported in the literature, and thus set new state-of-the-art results. Additionally, unlike some existing methods, we eliminate the requirement of using a person bounding box as an input during testing.
Tasks	Action Recognition In Still Images, Temporal Action Localization
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04194v1
PDF	http://arxiv.org/pdf/1812.04194v1.pdf
PWC	https://paperswithcode.com/paper/loss-guided-activation-for-action-recognition
Repo
Framework

Rational Neural Networks for Approximating Jump Discontinuities of Graph Convolution Operator


Title	Rational Neural Networks for Approximating Jump Discontinuities of Graph Convolution Operator
Authors	Zhiqian Chen, Feng Chen, Rongjie Lai, Xuchao Zhang, Chang-Tien Lu
Abstract	For node level graph encoding, a recent important state-of-art method is the graph convolutional networks (GCN), which nicely integrate local vertex features and graph topology in the spectral domain. However, current studies suffer from several drawbacks: (1) graph CNNs relies on Chebyshev polynomial approximation which results in oscillatory approximation at jump discontinuities; (2) Increasing the order of Chebyshev polynomial can reduce the oscillations issue, but also incurs unaffordable computational cost; (3) Chebyshev polynomials require degree $\Omega$(poly(1/$\epsilon$)) to approximate a jump signal such as $x$, while rational function only needs $\mathcal{O}$(poly log(1/$\epsilon$))\cite{liang2016deep,telgarsky2017neural}. However, it’s non-trivial to apply rational approximation without increasing computational complexity due to the denominator. In this paper, the superiority of rational approximation is exploited for graph signal recovering. RatioanlNet is proposed to integrate rational function and neural networks. We show that rational function of eigenvalues can be rewritten as a function of graph Laplacian, which can avoid multiplication by the eigenvector matrix. Focusing on the analysis of approximation on graph convolution operation, a graph signal regression task is formulated. Under graph signal regression task, its time complexity can be significantly reduced by graph Fourier transform. To overcome the local minimum problem of neural networks model, a relaxed Remez algorithm is utilized to initialize the weight parameters. Convergence rate of RatioanlNet and polynomial based methods on jump signal is analyzed for a theoretical guarantee. The extensive experimental results demonstrated that our approach could effectively characterize the jump discontinuities, outperforming competing methods by a substantial margin on both synthetic and real-world graphs.
Tasks
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10073v1
PDF	http://arxiv.org/pdf/1808.10073v1.pdf
PWC	https://paperswithcode.com/paper/rational-neural-networks-for-approximating
Repo
Framework

Understanding Patch-Based Learning by Explaining Predictions


Title	Understanding Patch-Based Learning by Explaining Predictions
Authors	Christopher Anders, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller
Abstract	Deep networks are able to learn highly predictive models of video data. Due to video length, a common strategy is to train them on small video snippets. We apply the deep Taylor / LRP technique to understand the deep network’s classification decisions, and identify a “border effect”: a tendency of the classifier to look mainly at the bordering frames of the input. This effect relates to the step size used to build the video snippet, which we can then tune in order to improve the classifier’s accuracy without retraining the model. To our knowledge, this is the the first work to apply the deep Taylor / LRP technique on any video analyzing neural network.
Tasks
Published	2018-06-11
URL	http://arxiv.org/abs/1806.06926v1
PDF	http://arxiv.org/pdf/1806.06926v1.pdf
PWC	https://paperswithcode.com/paper/understanding-patch-based-learning-by
Repo
Framework

Smooth input preparation for quantum and quantum-inspired machine learning


Title	Smooth input preparation for quantum and quantum-inspired machine learning
Authors	Zhikuan Zhao, Jack K. Fitzsimons, Patrick Rebentrost, Vedran Dunjko, Joseph F. Fitzsimons
Abstract	Machine learning has recently emerged as a fruitful area for finding potential quantum computational advantage. Many of the quantum enhanced machine learning algorithms critically hinge upon the ability to efficiently produce states proportional to high-dimensional data points stored in a quantum accessible memory. Even given query access to exponentially many entries stored in a database, the construction of which is considered a one-off overhead, it has been argued that the cost of preparing such amplitude-encoded states may offset any exponential quantum advantage. Here we prove using smoothed analysis, that if the data-analysis algorithm is robust against small entry-wise input perturbation, state preparation can always be achieved with constant queries. This criterion is typically satisfied in realistic machine learning applications, where input data is subjective to moderate noise. Our results are equally applicable to the recent seminal progress in quantum-inspired algorithms, where specially constructed databases suffice for polylogarithmic classical algorithm in low-rank cases. The consequence of our finding is that for the purpose of practical machine learning, polylogarithmic processing time is possible under a general and flexible input model with quantum algorithms or quantum-inspired classical algorithms in the low-rank cases.
Tasks
Published	2018-04-01
URL	https://arxiv.org/abs/1804.00281v2
PDF	https://arxiv.org/pdf/1804.00281v2.pdf
PWC	https://paperswithcode.com/paper/a-note-on-state-preparation-for-quantum
Repo
Framework

Physics-Driven Regularization of Deep Neural Networks for Enhanced Engineering Design and Analysis


Title	Physics-Driven Regularization of Deep Neural Networks for Enhanced Engineering Design and Analysis
Authors	Mohammad Amin Nabian, Hadi Meidani
Abstract	In this paper, we introduce a physics-driven regularization method for training of deep neural networks (DNNs) for use in engineering design and analysis problems. In particular, we focus on prediction of a physical system, for which in addition to training data, partial or complete information on a set of governing laws is also available. These laws often appear in the form of differential equations, derived from first principles, empirically-validated laws, or domain expertise, and are usually neglected in data-driven prediction of engineering systems. We propose a training approach that utilizes the known governing laws and regularizes data-driven DNN models by penalizing divergence from those laws. The first two numerical examples are synthetic examples, where we show that in constructing a DNN model that best fits the measurements from a physical system, the use of our proposed regularization results in DNNs that are more interpretable with smaller generalization errors, compared to other common regularization methods. The last two examples concern metamodeling for a random Burgers’ system and for aerodynamic analysis of passenger vehicles, where we demonstrate that the proposed regularization provides superior generalization accuracy compared to other common alternatives.
Tasks
Published	2018-10-11
URL	https://arxiv.org/abs/1810.05547v2
PDF	https://arxiv.org/pdf/1810.05547v2.pdf
PWC	https://paperswithcode.com/paper/physics-informed-regularization-of-deep
Repo
Framework

Nonconvex Demixing From Bilinear Measurements


Title	Nonconvex Demixing From Bilinear Measurements
Authors	Jialin Dong, Yuanming Shi
Abstract	We consider the problem of demixing a sequence of source signals from the sum of noisy bilinear measurements. It is a generalized mathematical model for blind demixing with blind deconvolution, which is prevalent across the areas of dictionary learning, image processing, and communications. However, state-of- the-art convex methods for blind demixing via semidefinite programming are computationally infeasible for large-scale problems. Although the existing nonconvex algorithms are able to address the scaling issue, they normally require proper regularization to establish optimality guarantees. The additional regularization yields tedious algorithmic parameters and pessimistic convergence rates with conservative step sizes. To address the limitations of existing methods, we thus develop a provable nonconvex demixing procedure viaWirtinger flow, much like vanilla gradient descent, to harness the benefits of regularization-free fast convergence rate with aggressive step size and computational optimality guarantees. This is achieved by exploiting the benign geometry of the blind demixing problem, thereby revealing that Wirtinger flow enforces the regularization-free iterates in the region of strong convexity and qualified level of smoothness, where the step size can be chosen aggressively.
Tasks	Dictionary Learning
Published	2018-09-18
URL	http://arxiv.org/abs/1809.06796v2
PDF	http://arxiv.org/pdf/1809.06796v2.pdf
PWC	https://paperswithcode.com/paper/nonconvex-demixing-from-bilinear-measurements
Repo
Framework

Learning to Discover, Ground and Use Words with Segmental Neural Language Models


Title	Learning to Discover, Ground and Use Words with Segmental Neural Language Models
Authors	Kazuya Kawakami, Chris Dyer, Phil Blunsom
Abstract	We propose a segmental neural language model that combines the generalization power of neural networks with the ability to discover word-like units that are latent in unsegmented character sequences. In contrast to previous segmentation models that treat word segmentation as an isolated task, our model unifies word discovery, learning how words fit together to form sentences, and, by conditioning the model on visual context, how words’ meanings ground in representations of non-linguistic modalities. Experiments show that the unconditional model learns predictive distributions better than character LSTM models, discovers words competitively with nonparametric Bayesian word segmentation models, and that modeling language conditional on visual context improves performance on both.
Tasks	Language Modelling
Published	2018-11-23
URL	https://arxiv.org/abs/1811.09353v2
PDF	https://arxiv.org/pdf/1811.09353v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-word-discovery-with-segmental
Repo
Framework

Differentially Private Contextual Linear Bandits


Title	Differentially Private Contextual Linear Bandits
Authors	Roshan Shariff, Or Sheffet
Abstract	We study the contextual linear bandit problem, a version of the standard stochastic multi-armed bandit (MAB) problem where a learner sequentially selects actions to maximize a reward which depends also on a user provided per-round context. Though the context is chosen arbitrarily or adversarially, the reward is assumed to be a stochastic function of a feature vector that encodes the context and selected action. Our goal is to devise private learners for the contextual linear bandit problem. We first show that using the standard definition of differential privacy results in linear regret. So instead, we adopt the notion of joint differential privacy, where we assume that the action chosen on day $t$ is only revealed to user $t$ and thus needn’t be kept private that day, only on following days. We give a general scheme converting the classic linear-UCB algorithm into a joint differentially private algorithm using the tree-based algorithm. We then apply either Gaussian noise or Wishart noise to achieve joint-differentially private algorithms and bound the resulting algorithms’ regrets. In addition, we give the first lower bound on the additional regret any private algorithms for the MAB problem must incur.
Tasks
Published	2018-09-28
URL	http://arxiv.org/abs/1810.00068v1
PDF	http://arxiv.org/pdf/1810.00068v1.pdf
PWC	https://paperswithcode.com/paper/differentially-private-contextual-linear
Repo
Framework

Contrastive Training for Models of Information Cascades


Title	Contrastive Training for Models of Information Cascades
Authors	Shaobin Xu, David A. Smith
Abstract	This paper proposes a model of information cascades as directed spanning trees (DSTs) over observed documents. In addition, we propose a contrastive training procedure that exploits partial temporal ordering of node infections in lieu of labeled training links. This combination of model and unsupervised training makes it possible to improve on models that use infection times alone and to exploit arbitrary features of the nodes and of the text content of messages in information cascades. With only basic node and time lag features similar to previous models, the DST model achieves performance with unsupervised training comparable to strong baselines on a blog network inference task. Unsupervised training with additional content features achieves significantly better results, reaching half the accuracy of a fully supervised model.
Tasks
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04677v1
PDF	http://arxiv.org/pdf/1812.04677v1.pdf
PWC	https://paperswithcode.com/paper/contrastive-training-for-models-of
Repo
Framework