Paper Group ANR 757
The Quo Vadis submission at Traffic4cast 2019. Neural Abstractive Text Summarization and Fake News Detection. Learning Feature Interactions with Lorentzian Factorization Machine. On the Impact of the Activation Function on Deep Neural Networks Training. Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction. Scen …
The Quo Vadis submission at Traffic4cast 2019
Title | The Quo Vadis submission at Traffic4cast 2019 |
Authors | Dan Oneata, Cosmin George Alexandru, Marius Stanescu, Octavian Pascu, Alexandru Magan, Adrian Postelnicu, Horia Cucu |
Abstract | We describe the submission of the Quo Vadis team to the Traffic4cast competition, which was organized as part of the NeurIPS 2019 series of challenges. Our system consists of a temporal regression module, implemented as $1\times1$ 2d convolutions, augmented with spatio-temporal biases. We have found that using biases is a straightforward and efficient way to include seasonal patterns and to improve the performance of the temporal regression model. Our implementation obtains a mean squared error of $9.47\times 10^{-3}$ on the test data, placing us on the eight place team-wise. We also present our attempts at incorporating spatial correlations into the model; however, contrary to our expectations, adding this type of auxiliary information did not benefit the main system. Our code is available at https://github.com/danoneata/traffic4cast. |
Tasks | |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12363v1 |
https://arxiv.org/pdf/1910.12363v1.pdf | |
PWC | https://paperswithcode.com/paper/the-quo-vadis-submission-at-traffic4cast-2019 |
Repo | |
Framework | |
Neural Abstractive Text Summarization and Fake News Detection
Title | Neural Abstractive Text Summarization and Fake News Detection |
Authors | Soheil Esmaeilzadeh, Gao Xian Peh, Angela Xu |
Abstract | In this work, we study abstractive text summarization by exploring different models such as LSTM-encoder-decoder with attention, pointer-generator networks, coverage mechanisms, and transformers. Upon extensive and careful hyperparameter tuning we compare the proposed architectures against each other for the abstractive text summarization task. Finally, as an extension of our work, we apply our text summarization model as a feature extractor for a fake news detection task where the news articles prior to classification will be summarized and the results are compared against the classification using only the original news text. keywords: LSTM, encoder-deconder, abstractive text summarization, pointer-generator, coverage mechanism, transformers, fake news detection |
Tasks | Abstractive Text Summarization, Fake News Detection, Text Summarization |
Published | 2019-03-24 |
URL | https://arxiv.org/abs/1904.00788v2 |
https://arxiv.org/pdf/1904.00788v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-abstractive-text-summarization-and |
Repo | |
Framework | |
Learning Feature Interactions with Lorentzian Factorization Machine
Title | Learning Feature Interactions with Lorentzian Factorization Machine |
Authors | Canran Xu, Ming Wu |
Abstract | Learning representations for feature interactions to model user behaviors is critical for recommendation system and click-trough rate (CTR) predictions. Recent advances in this area are empowered by deep learning methods which could learn sophisticated feature interactions and achieve the state-of-the-art result in an end-to-end manner. These approaches require large number of training parameters integrated with the low-level representations, and thus are memory and computational inefficient. In this paper, we propose a new model named “LorentzFM” that can learn feature interactions embedded in a hyperbolic space in which the violation of triangle inequality for Lorentz distances is available. To this end, the learned representation is benefited by the peculiar geometric properties of hyperbolic triangles, and result in a significant reduction in the number of parameters (20% to 80%) because all the top deep learning layers are not required. With such a lightweight architecture, LorentzFM achieves comparable and even materially better results than the deep learning methods such as DeepFM, xDeepFM and Deep & Cross in both recommendation and CTR prediction tasks. |
Tasks | Click-Through Rate Prediction |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.09821v1 |
https://arxiv.org/pdf/1911.09821v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-feature-interactions-with-lorentzian |
Repo | |
Framework | |
On the Impact of the Activation Function on Deep Neural Networks Training
Title | On the Impact of the Activation Function on Deep Neural Networks Training |
Authors | Soufiane Hayou, Arnaud Doucet, Judith Rousseau |
Abstract | The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Samuel et al (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `Edge of Chaos’ can lead to good performance. While the work by Samuel et al (2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance. | |
Tasks | |
Published | 2019-02-19 |
URL | https://arxiv.org/abs/1902.06853v2 |
https://arxiv.org/pdf/1902.06853v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-impact-of-the-activation-function-on |
Repo | |
Framework | |
Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction
Title | Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction |
Authors | Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak Dilipkumar, Ferenc Huszar, Steven Yoo, Wenzhe Shi |
Abstract | One of the challenges in display advertising is that the distribution of features and click through rate (CTR) can exhibit large shifts over time due to seasonality, changes to ad campaigns and other factors. The predominant strategy to keep up with these shifts is to train predictive models continuously, on fresh data, in order to prevent them from becoming stale. However, in many ad systems positive labels are only observed after a possibly long and random delay. These delayed labels pose a challenge to data freshness in continuous training: fresh data may not have complete label information at the time they are ingested by the training algorithm. Naive strategies which consider any data point a negative example until a positive label becomes available tend to underestimate CTR, resulting in inferior user experience and suboptimal performance for advertisers. The focus of this paper is to identify the best combination of loss functions and models that enable large-scale learning from a continuous stream of data in the presence of delayed labels. In this work, we compare 5 different loss functions, 3 of them applied to this problem for the first time. We benchmark their performance in offline settings on both public and proprietary datasets in conjunction with shallow and deep model architectures. We also discuss the engineering cost associated with implementing each loss function in a production environment. Finally, we carried out online experiments with the top performing methods, in order to validate their performance in a continuous training scheme. While training on 668 million in-house data points offline, our proposed methods outperform previous state-of-the-art by 3% relative cross entropy (RCE). During online experiments, we observed 55% gain in revenue per thousand requests (RPMq) against naive log loss. |
Tasks | Click-Through Rate Prediction |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06558v1 |
https://arxiv.org/pdf/1907.06558v1.pdf | |
PWC | https://paperswithcode.com/paper/addressing-delayed-feedback-for-continuous |
Repo | |
Framework | |
SceneCode: Monocular Dense Semantic Reconstruction using Learned Encoded Scene Representations
Title | SceneCode: Monocular Dense Semantic Reconstruction using Learned Encoded Scene Representations |
Authors | Shuaifeng Zhi, Michael Bloesch, Stefan Leutenegger, Andrew J. Davison |
Abstract | Systems which incrementally create 3D semantic maps from image sequences must store and update representations of both geometry and semantic entities. However, while there has been much work on the correct formulation for geometrical estimation, state-of-the-art systems usually rely on simple semantic representations which store and update independent label estimates for each surface element (depth pixels, surfels, or voxels). Spatial correlation is discarded, and fused label maps are incoherent and noisy. We introduce a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image. Using this learned latent space, we can tackle semantic label fusion by jointly optimising the low-dimenional codes associated with each of a set of overlapping images, producing consistent fused label maps which preserve spatial correlation. We also show how this approach can be used within a monocular keyframe based semantic mapping system where a similar code approach is used for geometry. The probabilistic formulation allows a flexible formulation where we can jointly estimate motion, geometry and semantics in a unified optimisation. |
Tasks | |
Published | 2019-03-15 |
URL | http://arxiv.org/abs/1903.06482v2 |
http://arxiv.org/pdf/1903.06482v2.pdf | |
PWC | https://paperswithcode.com/paper/scenecode-monocular-dense-semantic |
Repo | |
Framework | |
Res-embedding for Deep Learning Based Click-Through Rate Prediction Modeling
Title | Res-embedding for Deep Learning Based Click-Through Rate Prediction Modeling |
Authors | Guorui Zhou, Kailun Wu, Weijie Bian, Zhao Yang, Xiaoqiang Zhu, Kun Gai |
Abstract | Recently, click-through rate (CTR) prediction models have evolved from shallow methods to deep neural networks. Most deep CTR models follow an Embedding&MLP paradigm, that is, first mapping discrete id features, e.g. user visited items, into low dimensional vectors with an embedding module, then learn a multi-layer perception (MLP) to fit the target. In this way, embedding module performs as the representative learning and plays a key role in the model performance. However, in many real-world applications, deep CTR model often suffers from poor generalization performance, which is mostly due to the learning of embedding parameters. In this paper, we model user behavior using an interest delay model, study carefully the embedding mechanism, and obtain two important results: (i) We theoretically prove that small aggregation radius of embedding vectors of items which belongs to a same user interest domain will result in good generalization performance of deep CTR model. (ii) Following our theoretical analysis, we design a new embedding structure named res-embedding. In res-embedding module, embedding vector of each item is the sum of two components: (i) a central embedding vector calculated from an item-based interest graph (ii) a residual embedding vector with its scale to be relatively small. Empirical evaluation on several public datasets demonstrates the effectiveness of the proposed res-embedding structure, which brings significant improvement on the model performance. |
Tasks | Click-Through Rate Prediction |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10304v1 |
https://arxiv.org/pdf/1906.10304v1.pdf | |
PWC | https://paperswithcode.com/paper/res-embedding-for-deep-learning-based-click |
Repo | |
Framework | |
Solving Hard Coreference Problems
Title | Solving Hard Coreference Problems |
Authors | Haoruo Peng, Daniel Khashabi, Dan Roth |
Abstract | Coreference resolution is a key problem in natural language understanding that still escapes reliable solutions. One fundamental difficulty has been that of resolving instances involving pronouns since they often require deep language understanding and use of background knowledge. In this paper, we propose an algorithmic solution that involves a new representation for the knowledge required to address hard coreference problems, along with a constrained optimization framework that uses this knowledge in coreference decision making. Our representation, Predicate Schemas, is instantiated with knowledge acquired in an unsupervised way, and is compiled automatically into constraints that impact the coreference decision. We present a general coreference resolution system that significantly improves state-of-the-art performance on hard, Winograd-style, pronoun resolution cases, while still performing at the state-of-the-art level on standard coreference resolution datasets. |
Tasks | Coreference Resolution, Decision Making |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05524v1 |
https://arxiv.org/pdf/1907.05524v1.pdf | |
PWC | https://paperswithcode.com/paper/solving-hard-coreference-problems-1 |
Repo | |
Framework | |
A general model for plane-based clustering with loss function
Title | A general model for plane-based clustering with loss function |
Authors | Zhen Wang, Yuan-Hai Shao, Lan Bai, Chun-Na Li, Li-Ming Liu |
Abstract | In this paper, we propose a general model for plane-based clustering. The general model contains many existing plane-based clustering methods, e.g., k-plane clustering (kPC), proximal plane clustering (PPC), twin support vector clustering (TWSVC) and its extensions. Under this general model, one may obtain an appropriate clustering method for specific purpose. The general model is a procedure corresponding to an optimization problem, where the optimization problem minimizes the total loss of the samples. Thereinto, the loss of a sample derives from both within-cluster and between-cluster. In theory, the termination conditions are discussed, and we prove that the general model terminates in a finite number of steps at a local or weak local optimal point. Furthermore, based on this general model, we propose a plane-based clustering method by introducing a new loss function to capture the data distribution precisely. Experimental results on artificial and public available datasets verify the effectiveness of the proposed method. |
Tasks | |
Published | 2019-01-26 |
URL | http://arxiv.org/abs/1901.09178v1 |
http://arxiv.org/pdf/1901.09178v1.pdf | |
PWC | https://paperswithcode.com/paper/a-general-model-for-plane-based-clustering |
Repo | |
Framework | |
Field-aware Calibration: A Simple and Empirically Strong Method for Reliable Probabilistic Predictions
Title | Field-aware Calibration: A Simple and Empirically Strong Method for Reliable Probabilistic Predictions |
Authors | Feiyang Pan, Xiang Ao, Pingzhong Tang, Min Lu, Dapeng Liu, Lei Xiao, Qing He |
Abstract | It is often observed that the probabilistic predictions given by a machine learning model can disagree with averaged actual outcomes on specific subsets of data, which is also known as the issue of miscalibration. It is responsible for the unreliability of practical machine learning systems. For example, in online advertising, an ad can receive a click-through rate prediction of 0.1 over some population of users where its actual click rate is 0.15. In such cases, the probabilistic predictions have to be fixed before the system can be deployed. In this paper, we first introduce a new evaluation metric named field-level calibration error that measures the bias in predictions over the sensitive input field that the decision-maker concerns. We show that existing post-hoc calibration methods have limited improvements in the new field-level metric and other non-calibration metrics such as the AUC score. To this end, we propose Neural Calibration, a simple yet powerful post-hoc calibration method that learns to calibrate by making full use of the field-aware information over the validation set. We present extensive experiments on five large-scale datasets. The results showed that Neural Calibration significantly improves against uncalibrated predictions in common metrics such as the negative log-likelihood, Brier score and AUC, as well as the proposed field-level calibration error. |
Tasks | Calibration, Click-Through Rate Prediction |
Published | 2019-05-26 |
URL | https://arxiv.org/abs/1905.10713v3 |
https://arxiv.org/pdf/1905.10713v3.pdf | |
PWC | https://paperswithcode.com/paper/towards-reliable-and-fair-probabilistic |
Repo | |
Framework | |
Unpaired Thermal to Visible Spectrum Transfer using Adversarial Training
Title | Unpaired Thermal to Visible Spectrum Transfer using Adversarial Training |
Authors | Adam Nyberg, Abdelrahman Eldesokey, David Bergström, David Gustafsson |
Abstract | Thermal Infrared (TIR) cameras are gaining popularity in many computer vision applications due to their ability to operate under low-light conditions. Images produced by TIR cameras are usually difficult for humans to perceive visually, which limits their usability. Several methods in the literature were proposed to address this problem by transforming TIR images into realistic visible spectrum (VIS) images. However, existing TIR-VIS datasets suffer from imperfect alignment between TIR-VIS image pairs which degrades the performance of supervised methods. We tackle this problem by learning this transformation using an unsupervised Generative Adversarial Network (GAN) which trains on unpaired TIR and VIS images. When trained and evaluated on KAIST-MS dataset, our proposed methods was shown to produce significantly more realistic and sharp VIS images than the existing state-of-the-art supervised methods. In addition, our proposed method was shown to generalize very well when evaluated on a new dataset of new environments. |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02242v1 |
http://arxiv.org/pdf/1904.02242v1.pdf | |
PWC | https://paperswithcode.com/paper/unpaired-thermal-to-visible-spectrum-transfer |
Repo | |
Framework | |
The Tale of Evil Twins: Adversarial Inputs versus Backdoored Models
Title | The Tale of Evil Twins: Adversarial Inputs versus Backdoored Models |
Authors | Ren Pang, Xinyang Zhang, Shouling Ji, Yevgeniy Vorobeychik, Xiaopu Luo, Ting Wang |
Abstract | Despite their tremendous success in a wide range of applications, deep neural network (DNN) models are inherently vulnerable to two types of malicious manipulations: adversarial inputs, which are crafted samples that deceive target DNNs, and backdoored models, which are forged DNNs that misbehave on trigger-embedded inputs. While prior work has intensively studied the two attack vectors in parallel, there is still a lack of understanding about their fundamental connection, which is critical for assessing the holistic vulnerability of DNNs deployed in realistic settings. In this paper, we bridge this gap by conducting the first systematic study of the two attack vectors within a unified framework. More specifically, (i) we develop a new attack model that integrates both adversarial inputs and backdoored models; (ii) with both analytical and empirical evidence, we reveal that there exists an intricate “mutual reinforcement” effect between the two attack vectors; (iii) we demonstrate that this effect enables a large spectrum for the adversary to optimize the attack strategies, such as maximizing attack evasiveness with respect to various defenses and designing trigger patterns satisfying multiple desiderata; (v) finally, we discuss potential countermeasures against this unified attack and their technical challenges, which lead to several promising research directions. |
Tasks | |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01559v1 |
https://arxiv.org/pdf/1911.01559v1.pdf | |
PWC | https://paperswithcode.com/paper/the-tale-of-evil-twins-adversarial-inputs |
Repo | |
Framework | |
Using machine learning and information visualisation for discovering latent topics in Twitter news
Title | Using machine learning and information visualisation for discovering latent topics in Twitter news |
Authors | Vladimir Vargas-Calderón, Marlon Steibeck Dominguez, N. Parra-A., Herbert Vinck-Posada, Jorge E. Camargo |
Abstract | We propose a method to discover latent topics and visualise large collections of tweets for easy identification and interpretation of topics, and exemplify its use with tweets from a Colombian mass media giant in the period 2014–2019. The latent topic analysis is performed in two ways: with the training of a Latent Dirichlet Allocation model, and with the combination of the FastText unsupervised model to represent tweets as vectors and the implementation of K-means clustering to group tweets into topics. Using a classification task, we found that people respond differently according to the various news topics. The classification tasks consists of the following: given a reply to a news tweet, we train a supervised algorithm to predict the topic of the news tweet solely from the reply. Furthermore, we show how the Colombian peace treaty has had a profound impact on the Colombian society, as it is the topic in which most people engage to show their opinions. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09114v1 |
https://arxiv.org/pdf/1910.09114v1.pdf | |
PWC | https://paperswithcode.com/paper/using-machine-learning-and-information |
Repo | |
Framework | |
EEG based Continuous Speech Recognition using Transformers
Title | EEG based Continuous Speech Recognition using Transformers |
Authors | Gautam Krishna, Co Tran, Mason Carnahan, Ahmed H Tewfik |
Abstract | In this paper we investigate continuous speech recognition using electroencephalography (EEG) features using recently introduced end-to-end transformer based automatic speech recognition (ASR) model. Our results show that transformer based model demonstrate faster inference and training compared to recurrent neural network (RNN) based sequence-to-sequence EEG models but performance of the RNN based models were better than transformer based model during test time on a limited English vocabulary. |
Tasks | EEG, Speech Recognition |
Published | 2019-12-31 |
URL | https://arxiv.org/abs/2001.00501v2 |
https://arxiv.org/pdf/2001.00501v2.pdf | |
PWC | https://paperswithcode.com/paper/eeg-based-continuous-speech-recognition-using |
Repo | |
Framework | |
A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks
Title | A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks |
Authors | Sanaa Hamid Mohamed, Taisir E. H. El-Gorashi, Jaafar M. H. Elmirghani |
Abstract | This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source platform; Hadoop, are enabling the development of a large number of cloud-based services and big data applications. MapReduce and Hadoop thus introduce innovative, efficient, and accelerated intensive computations and analytics. These services usually utilize commodity clusters within geographically-distributed data centers and provide cost-effective and elastic solutions. However, the increasing traffic between and within the data centers that migrate, store, and process big data, is becoming a bottleneck that calls for enhanced infrastructures capable of reducing the congestion and power consumption. Moreover, enterprises with multiple tenants requesting various big data services are challenged by the need to optimize leasing their resources at reduced running costs and power consumption while avoiding under or over utilization. In this survey, we present a summary of the characteristics of various big data programming models and applications and provide a review of cloud computing infrastructures, and related technologies such as virtualization, and software-defined networking that increasingly support big data systems. Moreover, we provide a brief review of data centers topologies, routing protocols, and traffic characteristics, and emphasize the implications of big data on such cloud data centers and their supporting networks. Wide ranging efforts were devoted to optimize systems that handle big data in terms of various applications performance metrics and/or infrastructure energy efficiency. Finally, some insights and future research directions are provided. |
Tasks | |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00731v1 |
https://arxiv.org/pdf/1910.00731v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-big-data-machine-learning |
Repo | |
Framework | |