April 2, 2020

3102 words 15 mins read

Paper Group ANR 295

Paper Group ANR 295

SA vs SAA for population Wasserstein barycenter calculation. TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook. FD-GAN: Generative Adversarial Networks with Fusion-discriminator for Single Image Dehazing. Infodemiological Study Using Google Trends on Coronavirus Epidemic in Wuhan, China. Relation Embedding for P …

SA vs SAA for population Wasserstein barycenter calculation

Title SA vs SAA for population Wasserstein barycenter calculation
Authors Darina Dvinskikh
Abstract In Machine Learning and Optimization community there are two main approaches for convex risk minimization problem. The first approach is Stochastic Averaging (SA) (online) and the second one is Stochastic Average Approximation (SAA) (Monte Carlo, Empirical Risk Minimization, offline) with proper regularization in non-strongly convex case. At the moment, it is known that both approaches are on average equivalent (up to a logarithmic factor) in terms of oracle complexity (required number of stochastic gradient evaluations). What is the situation with total complexity? The answer depends on specific problem. However, starting from work [Nemirovski et al. (2009)] it was generally accepted that SA is better than SAA. Nevertheless, in case of large-scale problems SA may ran out of memory problems since storing all data on one machine and organizing online access to it can be impossible without communications with other machines. SAA in contradistinction to SA allows parallel/distributed calculations. In this paper we show that SAA may outperform SA in the problem of calculating an estimation for population ({\mu}-entropy regularized) Wasserstein barycenter even for non-parallel (non-decenralized) set up.
Published 2020-01-21
URL https://arxiv.org/abs/2001.07697v2
PDF https://arxiv.org/pdf/2001.07697v2.pdf
PWC https://paperswithcode.com/paper/sa-vs-saa-for-population-wasserstein

TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook

Title TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook
Authors Nima Noorshams, Saurabh Verma, Aude Hofleitner
Abstract Since its inception, Facebook has become an integral part of the online social community. People rely on Facebook to make connections with others and build communities. As a result, it is paramount to protect the integrity of such a rapidly growing network in a fast and scalable manner. In this paper, we present our efforts to protect various social media entities at Facebook from people who try to abuse our platform. We present a novel Temporal Interaction EmbeddingS (TIES) model that is designed to capture rogue social interactions and flag them for further suitable actions. TIES is a supervised, deep learning, production ready model at Facebook-scale networks. Prior works on integrity problems are mostly focused on capturing either only static or certain dynamic features of social entities. In contrast, TIES can capture both these variant behaviors in a unified model owing to the recent strides made in the domains of graph embedding and deep sequential pattern learning. To show the real-world impact of TIES, we present a few applications especially for preventing spread of misinformation, fake account detection, and reducing ads payment risks in order to enhance the platform’s integrity.
Tasks Graph Embedding
Published 2020-02-18
URL https://arxiv.org/abs/2002.07917v1
PDF https://arxiv.org/pdf/2002.07917v1.pdf
PWC https://paperswithcode.com/paper/ties-temporal-interaction-embeddings-for

FD-GAN: Generative Adversarial Networks with Fusion-discriminator for Single Image Dehazing

Title FD-GAN: Generative Adversarial Networks with Fusion-discriminator for Single Image Dehazing
Authors Yu Dong, Yihao Liu, He Zhang, Shifeng Chen, Yu Qiao
Abstract Recently, convolutional neural networks (CNNs) have achieved great improvements in single image dehazing and attained much attention in research. Most existing learning-based dehazing methods are not fully end-to-end, which still follow the traditional dehazing procedure: first estimate the medium transmission and the atmospheric light, then recover the haze-free image based on the atmospheric scattering model. However, in practice, due to lack of priors and constraints, it is hard to precisely estimate these intermediate parameters. Inaccurate estimation further degrades the performance of dehazing, resulting in artifacts, color distortion and insufficient haze removal. To address this, we propose a fully end-to-end Generative Adversarial Networks with Fusion-discriminator (FD-GAN) for image dehazing. With the proposed Fusion-discriminator which takes frequency information as additional priors, our model can generator more natural and realistic dehazed images with less color distortion and fewer artifacts. Moreover, we synthesize a large-scale training dataset including various indoor and outdoor hazy images to boost the performance and we reveal that for learning-based dehazing methods, the performance is strictly influenced by the training data. Experiments have shown that our method reaches state-of-the-art performance on both public synthetic datasets and real-world images with more visually pleasing dehazed results.
Tasks Image Dehazing, Single Image Dehazing
Published 2020-01-20
URL https://arxiv.org/abs/2001.06968v1
PDF https://arxiv.org/pdf/2001.06968v1.pdf
PWC https://paperswithcode.com/paper/fd-gan-generative-adversarial-networks-with
Title Infodemiological Study Using Google Trends on Coronavirus Epidemic in Wuhan, China
Authors Artur Strzelecki
Abstract The recent emergence of a new coronavirus (2019-nCoV) has gained a high cover in public media and worldwide news. This caused a viral pneumonia in thousands of people in Wuhan, a central city of China. This short communication gives a brief introduction on how the demand for information on this new epidemic is reported through Google Trends. Author draw conclusions on current infodemiological data on 2019-nCov using three main search queries: coronavirus, SARS and MERS. Two approaches are set. First is worldwide perspective, second is Chinese perspective. Chinese perspective reveals that in China, this disease in the beginning days was more often referred to SARS then to general coronaviruses, whereas worldwide, since the beginning is more often referred to coronaviruses.
Published 2020-01-29
URL https://arxiv.org/abs/2001.11021v1
PDF https://arxiv.org/pdf/2001.11021v1.pdf
PWC https://paperswithcode.com/paper/infodemiological-study-using-google-trends-on

Relation Embedding for Personalised POI Recommendation

Title Relation Embedding for Personalised POI Recommendation
Authors Xianjing Wang, Flora D. Salim, Yongli Ren, Piotr Koniusz
Abstract Point-of-Interest (POI) recommendation is one of the most important location-based services helping people discover interesting venues or services. However, the extreme user-POI matrix sparsity and the varying spatio-temporal context pose challenges for POI systems, which affects the quality of POI recommendations. To this end, we propose a translation-based relation embedding for POI recommendation. Our approach encodes the temporal and geographic information, as well as semantic contents effectively in a low-dimensional relation space by using Knowledge Graph Embedding techniques. To further alleviate the issue of user-POI matrix sparsity, a combined matrix factorization framework is built on a user-POI graph to enhance the inference of dynamic personal interests by exploiting the side-information. Experiments on two real-world datasets demonstrate the effectiveness of our proposed model.
Tasks Graph Embedding, Knowledge Graph Embedding
Published 2020-02-09
URL https://arxiv.org/abs/2002.03461v2
PDF https://arxiv.org/pdf/2002.03461v2.pdf
PWC https://paperswithcode.com/paper/relation-embedding-for-personalised-poi

Controlling generative models with continuous factors of variations

Title Controlling generative models with continuous factors of variations
Authors Antoine Plumerault, Hervé Le Borgne, Céline Hudelot
Abstract Recent deep generative models are able to provide photo-realistic images as well as visual or textual content embeddings useful to address various tasks of computer vision and natural language processing. Their usefulness is nevertheless often limited by the lack of control over the generative process or the poor understanding of the learned representation. To overcome these major issues, very recent work has shown the interest of studying the semantics of the latent space of generative models. In this paper, we propose to advance on the interpretability of the latent space of generative models by introducing a new method to find meaningful directions in the latent space of any generative model along which we can move to control precisely specific properties of the generated image like the position or scale of the object in the image. Our method does not require human annotations and is particularly well suited for the search of directions encoding simple transformations of the generated image, such as translation, zoom or color variations. We demonstrate the effectiveness of our method qualitatively and quantitatively, both for GANs and variational auto-encoders.
Published 2020-01-28
URL https://arxiv.org/abs/2001.10238v1
PDF https://arxiv.org/pdf/2001.10238v1.pdf
PWC https://paperswithcode.com/paper/controlling-generative-models-with-continuous-1

Better Theory for SGD in the Nonconvex World

Title Better Theory for SGD in the Nonconvex World
Authors Ahmed Khaled, Peter Richtárik
Abstract Large-scale nonconvex optimization problems are ubiquitous in modern machine learning, and among practitioners interested in solving them, Stochastic Gradient Descent (SGD) reigns supreme. We revisit the analysis of SGD in the nonconvex setting and propose a new variant of the recently introduced expected smoothness assumption which governs the behaviour of the second moment of the stochastic gradient. We show that our assumption is both more general and more reasonable than assumptions made in all prior work. Moreover, our results yield the optimal $\mathcal{O}(\varepsilon^{-4})$ rate for finding a stationary point of nonconvex smooth functions, and recover the optimal $\mathcal{O}(\varepsilon^{-1})$ rate for finding a global solution if the Polyak-{\L}ojasiewicz condition is satisfied. We compare against convergence rates under convexity and prove a theorem on the convergence of SGD under Quadratic Functional Growth and convexity, which might be of independent interest. Moreover, we perform our analysis in a framework which allows for a detailed study of the effects of a wide array of sampling strategies and minibatch sizes for finite-sum optimization problems. We corroborate our theoretical results with experiments on real and synthetic data.
Published 2020-02-09
URL https://arxiv.org/abs/2002.03329v2
PDF https://arxiv.org/pdf/2002.03329v2.pdf
PWC https://paperswithcode.com/paper/better-theory-for-sgd-in-the-nonconvex-world

An Optimal Multistage Stochastic Gradient Method for Minimax Problems

Title An Optimal Multistage Stochastic Gradient Method for Minimax Problems
Authors Alireza Fallah, Asuman Ozdaglar, Sarath Pattathil
Abstract In this paper, we study the minimax optimization problem in the smooth and strongly convex-strongly concave setting when we have access to noisy estimates of gradients. In particular, we first analyze the stochastic Gradient Descent Ascent (GDA) method with constant stepsize, and show that it converges to a neighborhood of the solution of the minimax problem. We further provide tight bounds on the convergence rate and the size of this neighborhood. Next, we propose a multistage variant of stochastic GDA (M-GDA) that runs in multiple stages with a particular learning rate decay schedule and converges to the exact solution of the minimax problem. We show M-GDA achieves the lower bounds in terms of noise dependence without any assumptions on the knowledge of noise characteristics. We also show that M-GDA obtains a linear decay rate with respect to the error’s dependence on the initial error, although the dependence on condition number is suboptimal. In order to improve this dependence, we apply the multistage machinery to the stochastic Optimistic Gradient Descent Ascent (OGDA) algorithm and propose the M-OGDA algorithm which also achieves the optimal linear decay rate with respect to the initial error. To the best of our knowledge, this method is the first to simultaneously achieve the best dependence on noise characteristic as well as the initial error and condition number.
Published 2020-02-13
URL https://arxiv.org/abs/2002.05683v1
PDF https://arxiv.org/pdf/2002.05683v1.pdf
PWC https://paperswithcode.com/paper/an-optimal-multistage-stochastic-gradient

Early Response Assessment in Lung Cancer Patients using Spatio-temporal CBCT Images

Title Early Response Assessment in Lung Cancer Patients using Spatio-temporal CBCT Images
Authors Bijju Kranthi Veduruparthi, Jayanta Mukherjee, Partha Pratim Das, Mandira Saha, Sanjoy Chatterjee, Raj Kumar Shrimali, Soumendranath Ray, Sriram Prasath
Abstract We report a model to predict patient’s radiological response to curative radiation therapy (RT) for non-small-cell lung cancer (NSCLC). Cone-Beam Computed Tomography images acquired weekly during the six-week course of RT were contoured with the Gross Tumor Volume (GTV) by senior radiation oncologists for 53 patients (7 images per patient). Deformable registration of the images yielded six deformation fields for each pair of consecutive images per patient. Jacobian of a field provides a measure of local expansion/contraction and is used in our model. Delineations were compared post-registration to compute unchanged ($U$), newly grown ($G$), and reduced ($R$) regions within GTV. The mean Jacobian of these regions $\mu_U$, $\mu_G$ and $\mu_R$ are statistically compared and a response assessment model is proposed. A good response is hypothesized if $\mu_R < 1.0$, $\mu_R < \mu_U$, and $\mu_G < \mu_U$. For early prediction of post-treatment response, first, three weeks’ images are used. Our model predicted clinical response with a precision of $74%$. Using reduction in CT numbers (CTN) and percentage GTV reduction as features in logistic regression, yielded an area-under-curve of 0.65 with p=0.005. Combining logistic regression model with the proposed hypothesis yielded an odds ratio of 20.0 (p=0.0).
Published 2020-03-07
URL https://arxiv.org/abs/2003.05408v1
PDF https://arxiv.org/pdf/2003.05408v1.pdf
PWC https://paperswithcode.com/paper/early-response-assessment-in-lung-cancer

An Ontology-driven Treatment Article Retrieval System for Precision Oncology

Title An Ontology-driven Treatment Article Retrieval System for Precision Oncology
Authors Zheng Chen, Sadid A. Hasan, Joey Liu, Vivek Datla, Md Shamsuzzaman, Hafiz Khan, Mohammad S Sorower, Gabe Mankovich, Rob van Ommering, Nevenka Dimitrova
Abstract This paper presents an ontology-driven treatment article retrieval system developed and experimented using the data and ground truths provided by the TREC 2017 precision medicine track. The key aspects of our system include: meaningful integration of various disease, gene, and drug name ontologies, training of a novel perceptron model for article relevance labeling, a ranking module that considers additional factors such as journal impact and article publication year, and comprehensive query matching rules. Experimental results demonstrate that our proposed system considerably outperforms the results of the best participating system of the TREC 2017 precision medicine challenge.
Published 2020-02-13
URL https://arxiv.org/abs/2002.05653v1
PDF https://arxiv.org/pdf/2002.05653v1.pdf
PWC https://paperswithcode.com/paper/an-ontology-driven-treatment-article

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Title Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Authors Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Deming Chen, Marianne Winslett, Hassan Sajjad, Preslav Nakov
Abstract Transformer-based models pre-trained on large-scale corpora achieve state-of-the-art accuracy for natural language processing tasks, but are too resource-hungry and compute-intensive to suit low-capability devices or applications with strict latency requirements. One potential remedy is model compression, which has attracted extensive attention. This paper summarizes the branches of research on compressing Transformers, focusing on the especially popular BERT model. BERT’s complex architecture means that a compression technique that is highly effective on one part of the model, e.g., attention layers, may be less successful on another part, e.g., fully connected layers. In this systematic study, we identify the state of the art in compression for each part of BERT, clarify current best practices for compressing large-scale Transformer models, and provide insights into the inner workings of various methods. Our categorization and analysis also shed light on promising future research directions for achieving a lightweight, accurate, and generic natural language processing model.
Tasks Model Compression
Published 2020-02-27
URL https://arxiv.org/abs/2002.11985v1
PDF https://arxiv.org/pdf/2002.11985v1.pdf
PWC https://paperswithcode.com/paper/compressing-large-scale-transformer-based

Learning Halfspaces with Massart Noise Under Structured Distributions

Title Learning Halfspaces with Massart Noise Under Structured Distributions
Authors Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis
Abstract We study the problem of learning halfspaces with Massart noise in the distribution-specific PAC model. We give the first computationally efficient algorithm for this problem with respect to a broad family of distributions, including log-concave distributions. This resolves an open question posed in a number of prior works. Our approach is extremely simple: We identify a smooth {\em non-convex} surrogate loss with the property that any approximate stationary point of this loss defines a halfspace that is close to the target halfspace. Given this structural result, we can use SGD to solve the underlying learning problem.
Published 2020-02-13
URL https://arxiv.org/abs/2002.05632v1
PDF https://arxiv.org/pdf/2002.05632v1.pdf
PWC https://paperswithcode.com/paper/learning-halfspaces-with-massart-noise-under

Evaluating the Progress of Deep Learning for Visual Relational Concepts

Title Evaluating the Progress of Deep Learning for Visual Relational Concepts
Authors Sebastian Stabinger, Justus Piater, Antonio Rodríguez-Sánchez
Abstract Convolutional Neural Networks (CNNs) have become the state of the art method for image classification in the last 7 years, but despite the fact that they achieve super human performance on many classification datasets, there are lesser known datasets where they almost fail completely and perform much worse than humans. We will show that these problems correspond to relational concepts as defined by the field of concept learning. Therefore, we will present current deep learning research for visual relational concepts. Analyzing the current literature, we will hypothesise that iterative processing of the input, together with shifting attention between the iterations will be needed to efficiently and reliably solve real world relational concept learning. In addition, we will conclude that many current datasets overestimate the performance of tested systems by providing data in an already pre-attended form.
Tasks Image Classification
Published 2020-01-29
URL https://arxiv.org/abs/2001.10857v1
PDF https://arxiv.org/pdf/2001.10857v1.pdf
PWC https://paperswithcode.com/paper/evaluating-the-progress-of-deep-learning-for

Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation

Title Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation
Authors Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, Kwanghoon Sohn
Abstract Existing techniques to encode spatial invariance within deep convolutional neural networks only model 2D transformation fields. This does not account for the fact that objects in a 2D space are a projection of 3D ones, and thus they have limited ability to severe object viewpoint changes. To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space. CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint. With the view-specific feature, we simultaneously determine objective category and viewpoints using the proposed sinusoidal soft-argmax module. Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.
Tasks Object Detection, Viewpoint Estimation
Published 2020-03-25
URL https://arxiv.org/abs/2003.11303v1
PDF https://arxiv.org/pdf/2003.11303v1.pdf
PWC https://paperswithcode.com/paper/cylindrical-convolutional-networks-for-joint

$¶$ILCRO: Making Importance Landscapes Flat Again

Title $¶$ILCRO: Making Importance Landscapes Flat Again
Authors Vincent Moens, Simiao Yu, Gholamreza Salimi-Khorshidi
Abstract Convolutional neural networks have had a great success in numerous tasks, including image classification, object detection, sequence modelling, and many more. It is generally assumed that such neural networks are translation invariant, meaning that they can detect a given feature independent of its location in the input image. While this is true for simple cases, where networks are composed of a restricted number of layer classes and where images are fairly simple, complex images with common state-of-the-art networks do not usually enjoy this property as one might hope. This paper shows that most of the existing convolutional architectures define, at initialisation, a specific feature importance landscape that conditions their capacity to attend to different locations of the images later during training or even at test time. We demonstrate how this phenomenon occurs under specific conditions and how it can be adjusted under some assumptions. We derive the P-objective, or PILCRO for Pixel-wise Importance Landscape Curvature Regularised Objective, a simple regularisation technique that favours weight configurations that produce smooth, low-curvature importance landscapes that are conditioned on the data and not on the chosen architecture. Through extensive experiments, we further show that P-regularised versions of popular computer vision networks have a flat importance landscape, train faster, result in a better accuracy and are more robust to noise at test time, when compared to their original counterparts in common computer-vision classification settings.
Tasks Feature Importance, Image Classification, Object Detection
Published 2020-01-27
URL https://arxiv.org/abs/2001.09696v2
PDF https://arxiv.org/pdf/2001.09696v2.pdf
PWC https://paperswithcode.com/paper/ilcro-making-importance-landscapes-flat-again
comments powered by Disqus