April 2, 2020

3076 words 15 mins read

Paper Group ANR 218

Paper Group ANR 218

Derivation of QUBO formulations for sparse estimation. ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training. Deep Time-Stream Framework for Click-Through Rate Prediction by Tracking Interest Evolution. Predictions of 2019-nCoV Transmission Ending via Comprehensive Methods. Investigating Potential Factors …

Derivation of QUBO formulations for sparse estimation

Title Derivation of QUBO formulations for sparse estimation
Authors Tomohiro Yokota, Makiko Konoshima, Hirotaka Tamura, Jun Ohkubo
Abstract We propose a quadratic unconstrained binary optimization (QUBO) formulation of the l1-norm, which enables us to perform sparse estimation of Ising-type annealing methods such as quantum annealing. The QUBO formulation is derived using the Legendre transformation and the Wolfe theorem, which have recently been employed to derive the QUBO formulations of ReLU-type functions. It is shown that a simple application of the derivation method to the l1-norm case results in a redundant variable. Finally a simplified QUBO formulation is obtained by removing the redundant variable.
Published 2020-01-11
URL https://arxiv.org/abs/2001.03715v2
PDF https://arxiv.org/pdf/2001.03715v2.pdf
PWC https://paperswithcode.com/paper/derivaton-of-qubo-formulations-for-sparse

ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training

Title ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training
Authors Qinqing Zheng, Bor-Yiing Su, Jiyan Yang, Alisson Azzolini, Qiang Wu, Ou Jin, Shri Karandikar, Hagay Lupesko, Liang Xiong, Eric Zhou
Abstract Distributed training is useful to train complicated models to shorten the training time. As each of the workers only sees a small fraction of data, workers need to synchronize on the parameter updates. One of the central questions in distributed training is how to parsimoniously synchronize parameters while preserving model quality. To address this problem, we propose the \textbf{ShadowSync} framework, in which we isolate synchronization from training and run it in the background. In contrast to common strategies including synchronous stochastic gradient descent (SGD), asynchronous SGD, and model averaging on independently trained sub-models, where synchronization happens in the foreground, ShadowSync synchronization is neither part of the backward pass, nor happens every $k$ iterations. Our framework is generic to host various types of synchronization algorithms, and we propose 3 approaches under this theme. The superiority of ShadowSync is confirmed by experiments on training deep neural networks for click-through-rate prediction. Our methods all succeed in making the training throughput linearly scale with the number of trainers. Comparing to their foreground counterparts, our methods exhibit neutral to better model quality and better scalability when we keep the number of parameter servers the same. In our training system which expresses both replication and Hogwild parallelism, ShadowSync also accomplishes the highest example level parallelism number comparing to the prior arts.
Tasks Click-Through Rate Prediction
Published 2020-03-07
URL https://arxiv.org/abs/2003.03477v1
PDF https://arxiv.org/pdf/2003.03477v1.pdf
PWC https://paperswithcode.com/paper/shadowsync-performing-synchronization-in-the

Deep Time-Stream Framework for Click-Through Rate Prediction by Tracking Interest Evolution

Title Deep Time-Stream Framework for Click-Through Rate Prediction by Tracking Interest Evolution
Authors Shu-Ting Shi, Wenhao Zheng, Jun Tang, Qing-Guo Chen, Yao Hu, Jianke Zhu, Ming Li
Abstract Click-through rate (CTR) prediction is an essential task in industrial applications such as video recommendation. Recently, deep learning models have been proposed to learn the representation of users’ overall interests, while ignoring the fact that interests may dynamically change over time. We argue that it is necessary to consider the continuous-time information in CTR models to track user interest trend from rich historical behaviors. In this paper, we propose a novel Deep Time-Stream framework (DTS) which introduces the time information by an ordinary differential equations (ODE). DTS continuously models the evolution of interests using a neural network, and thus is able to tackle the challenge of dynamically representing users’ interests based on their historical behaviors. In addition, our framework can be seamlessly applied to any existing deep CTR models by leveraging the additional Time-Stream Module, while no changes are made to the original CTR models. Experiments on public dataset as well as real industry dataset with billions of samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with existing methods.
Tasks Click-Through Rate Prediction
Published 2020-01-08
URL https://arxiv.org/abs/2001.03025v1
PDF https://arxiv.org/pdf/2001.03025v1.pdf
PWC https://paperswithcode.com/paper/deep-time-stream-framework-for-click-through

Predictions of 2019-nCoV Transmission Ending via Comprehensive Methods

Title Predictions of 2019-nCoV Transmission Ending via Comprehensive Methods
Authors Tianyu Zeng, Yunong Zhang, Zhenyu Li, Xiao Liu, Binbin Qiu
Abstract Since the SARS outbreak in 2003, a lot of predictive epidemiological models have been proposed. At the end of 2019, a novel coronavirus, termed as 2019-nCoV, has broken out and is propagating in China and the world. Here we propose a multi-model ordinary differential equation set neural network (MMODEs-NN) and model-free methods to predict the interprovincial transmissions in mainland China, especially those from Hubei Province. Compared with the previously proposed epidemiological models, the proposed network can simulate the transportations with the ODEs activation method, while the model-free methods based on the sigmoid function, Gaussian function, and Poisson distribution are linear and fast to generate reasonable predictions. According to the numerical experiments and the realities, the special policies for controlling the disease are successful in some provinces, and the transmission of the epidemic, whose outbreak time is close to the beginning of China Spring Festival travel rush, is more likely to decelerate before February 18 and to end before April 2020. The proposed mathematical and artificial intelligence methods can give consistent and reasonable predictions of the 2019-nCoV ending. We anticipate our work to be a starting point for comprehensive prediction researches of the 2019-nCoV.
Published 2020-02-12
URL https://arxiv.org/abs/2002.04945v2
PDF https://arxiv.org/pdf/2002.04945v2.pdf
PWC https://paperswithcode.com/paper/predictions-of-2019-ncov-transmission-ending

Investigating Potential Factors Associated with Gender Discrimination in Collaborative Recommender Systems

Title Investigating Potential Factors Associated with Gender Discrimination in Collaborative Recommender Systems
Authors Masoud Mansoury, Himan Abdollahpouri, Jessie Smith, Arman Dehpanah, Mykola Pechenizkiy, Bamshad Mobasher
Abstract The proliferation of personalized recommendation technologies has raised concerns about discrepancies in their recommendation performance across different genders, age groups, and racial or ethnic populations. This varying degree of performance could impact users’ trust in the system and may pose legal and ethical issues in domains where fairness and equity are critical concerns, like job recommendation. In this paper, we investigate several potential factors that could be associated with discriminatory performance of a recommendation algorithm for women versus men. We specifically study several characteristics of user profiles and analyze their possible associations with disparate behavior of the system towards different genders. These characteristics include the anomaly in rating behavior, the entropy of users’ profiles, and the users’ profile size. Our experimental results on a public dataset using four recommendation algorithms show that, based on all the three mentioned factors, women get less accurate recommendations than men indicating an unfair nature of recommendation algorithms across genders.
Tasks Recommendation Systems
Published 2020-02-18
URL https://arxiv.org/abs/2002.07786v1
PDF https://arxiv.org/pdf/2002.07786v1.pdf
PWC https://paperswithcode.com/paper/investigating-potential-factors-associated

Neural Machine Translation System of Indic Languages – An Attention based Approach

Title Neural Machine Translation System of Indic Languages – An Attention based Approach
Authors Parth Shah, Vishvajit Bakrola
Abstract Neural machine translation (NMT) is a recent and effective technique which led to remarkable improvements in comparison of conventional machine translation techniques. Proposed neural machine translation model developed for the Gujarati language contains encoder-decoder with attention mechanism. In India, almost all the languages are originated from their ancestral language - Sanskrit. They are having inevitable similarities including lexical and named entity similarity. Translating into Indic languages is always be a challenging task. In this paper, we have presented the neural machine translation system (NMT) that can efficiently translate Indic languages like Hindi and Gujarati that together covers more than 58.49 percentage of total speakers in the country. We have compared the performance of our NMT model with automatic evaluation matrices such as BLEU, perplexity and TER matrix. The comparison of our network with Google translate is also presented where it outperformed with a margin of 6 BLEU score on English-Gujarati translation.
Tasks Machine Translation
Published 2020-02-02
URL https://arxiv.org/abs/2002.02758v1
PDF https://arxiv.org/pdf/2002.02758v1.pdf
PWC https://paperswithcode.com/paper/neural-machine-translation-system-of-indic

Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation

Title Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation
Authors Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita
Abstract Sequence-to-sequence (S2S) pre-training using large monolingual data is known to improve performance for various S2S NLP tasks in low-resource settings. However, large monolingual corpora might not always be available for the languages of interest (LOI). To this end, we propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the LOI. A case study of low-resource Japanese-English neural machine translation (NMT) reveals that leveraging large Chinese and French monolingual corpora can help overcome the shortage of Japanese and English monolingual corpora, respectively, for S2S pre-training. We further show how to utilize script mapping (Chinese to Japanese) to increase the similarity between the two monolingual corpora leading to further improvements in translation quality. Additionally, we propose simple data-selection techniques to be used prior to pre-training that significantly impact the quality of S2S pre-training. An empirical comparison of our proposed methods reveals that leveraging assisting language monolingual corpora, data selection and script mapping are extremely important for NMT pre-training in low-resource scenarios.
Tasks Machine Translation
Published 2020-01-23
URL https://arxiv.org/abs/2001.08353v1
PDF https://arxiv.org/pdf/2001.08353v1.pdf
PWC https://paperswithcode.com/paper/pre-training-via-leveraging-assisting

Semi-Autoregressive Training Improves Mask-Predict Decoding

Title Semi-Autoregressive Training Improves Mask-Predict Decoding
Authors Marjan Ghazvininejad, Omer Levy, Luke Zettlemoyer
Abstract The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach. We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict, producing training examples that contain model predictions as part of their inputs. Models trained with SMART produce higher-quality translations when using mask-predict decoding, effectively closing the remaining performance gap with fully autoregressive models.
Tasks Machine Translation
Published 2020-01-23
URL https://arxiv.org/abs/2001.08785v1
PDF https://arxiv.org/pdf/2001.08785v1.pdf
PWC https://paperswithcode.com/paper/semi-autoregressive-training-improves-mask

On robot compliance. A cerebellar control approach

Title On robot compliance. A cerebellar control approach
Authors Ignacio Abadia, Francisco Naveros, Jesus A. Garrido, Eduardo Ros, Niceto R. Luque
Abstract The work presented here is a novel biological approach for the compliant control of a robotic arm in real time (RT). We integrate a spiking cerebellar network at the core of a feedback control loop performing torque-driven control. The spiking cerebellar controller provides torque commands allowing for accurate and coordinated arm movements. To compute these output motor commands, the spiking cerebellar controller receives the robot’s sensorial signals, the robot’s goal behavior, and an instructive signal. These input signals are translated into a set of evolving spiking patterns representing univocally a specific system state at every point of time. Spike-timing-dependent plasticity (STDP) is then supported, allowing for building adaptive control. The spiking cerebellar controller continuously adapts the torque commands provided to the robot from experience as STDP is deployed. Adaptive torque commands, in turn, help the spiking cerebellar controller to cope with built-in elastic elements within the robot’s actuators mimicking human muscles (inherently elastic). We propose a natural integration of a bio inspired control scheme, based on the cerebellum, with a compliant robot. We prove that our compliant approach outperforms the accuracy of the default factory-installed position control in a set of tasks used for addressing cerebellar motor behavior: controlling six degrees of freedom (DoF) in smooth movements, fast ballistic movements, and unstructured scenario compliant movements.
Published 2020-03-02
URL https://arxiv.org/abs/2003.01033v2
PDF https://arxiv.org/pdf/2003.01033v2.pdf
PWC https://paperswithcode.com/paper/on-robot-compliance-a-cerebellar-control

A Two stage Adaptive Knowledge Transfer Evolutionary Multi-tasking Based on Population Distribution for Multi/Many-Objective Optimization

Title A Two stage Adaptive Knowledge Transfer Evolutionary Multi-tasking Based on Population Distribution for Multi/Many-Objective Optimization
Authors Zhengping Liang, Weiqi Liang, Xiuju Xu, Zexuan Zhu
Abstract Multi-tasking optimization can usually achieve better performance than traditional single-tasking optimization through knowledge transfer between tasks. However, current multi-tasking optimization algorithms have some deficiencies. For high similarity problems, the knowledge that can accelerate the convergence rate of tasks has not been utilized fully. For low similarity problems, the probability of generating negative transfer is high, which may result in optimization performance degradation. In addition, some knowledge transfer methods proposed previously do not fully consider how to deal with the situation in which the population falls into local optimum. To solve these issues, a two stage adaptive knowledge transfer evolutionary multi-tasking optimization algorithm based on population distribution, labeled as EMT-PD, is proposed. EMT-PD can accelerate and improve the convergence performance of tasks based on the knowledge extracted from the probability model that reflects the search trend of the whole population. At the first transfer stage, an adaptive weight is used to adjust the step size of individual’s search, which can reduce the impact of negative transfer. At the second stage of knowledge transfer, the individual’s search range is further adjusted dynamically, which can increase the diversity of population and beneficial for jumping out of local optimum. Experimental results on multi-tasking multi-objective optimization test suites show that EMT-PD is superior to six state-of-the-art optimization algorithms. In order to further investigate the effectiveness of EMT-PD on many-objective optimization problems, a multi-tasking many-objective test suite is designed. The experimental results on it also demonstrate that EMT-PD has obvious competitiveness.
Tasks Transfer Learning
Published 2020-01-03
URL https://arxiv.org/abs/2001.00810v1
PDF https://arxiv.org/pdf/2001.00810v1.pdf
PWC https://paperswithcode.com/paper/a-two-stage-adaptive-knowledge-transfer

Correlated daily time series and forecasting in the M4 competition

Title Correlated daily time series and forecasting in the M4 competition
Authors Anti Ingel, Novin Shahroudi, Markus Kängsepp, Andre Tättar, Viacheslav Komisarenko, Meelis Kull
Abstract We participated in the M4 competition for time series forecasting and describe here our methods for forecasting daily time series. We used an ensemble of five statistical forecasting methods and a method that we refer to as the correlator. Our retrospective analysis using the ground truth values published by the M4 organisers after the competition demonstrates that the correlator was responsible for most of our gains over the naive constant forecasting method. We identify data leakage as one reason for its success, partly due to test data selected from different time intervals, and partly due to quality issues in the original time series. We suggest that future forecasting competitions should provide actual dates for the time series so that some of those leakages could be avoided by the participants.
Tasks Time Series, Time Series Forecasting
Published 2020-03-28
URL https://arxiv.org/abs/2003.12796v2
PDF https://arxiv.org/pdf/2003.12796v2.pdf
PWC https://paperswithcode.com/paper/correlated-daily-time-series-and-forecasting

Conditional Mutual information-based Contrastive Loss for Financial Time Series Forecasting

Title Conditional Mutual information-based Contrastive Loss for Financial Time Series Forecasting
Authors Hanwei Wu, Ather Gattami, Markus Flierl
Abstract We present a method for financial time series forecasting using representation learning techniques. Recent progress on deep autoregressive models has shown their ability to capture long-term dependencies of the sequence data. However, the shortage of available financial data for training will make the deep models susceptible to the overfitting problem. In this paper, we propose a neural-network-powered conditional mutual information (CMI) estimator for learning representations for the forecasting task. Specifically, we first train an encoder to maximize the mutual information between the latent variables and the label information conditioned on the encoded observed variables. Then the features extracted from the trained encoder are used to learn a subsequent logistic regression model for predicting time series movements. Our proposed estimator transforms the CMI maximization problem to a classification problem whether two encoded representations are sampled from the same class or not. This is equivalent to perform pairwise comparisons of the training datapoints, and thus, improves the generalization ability of the deep autoregressive model. Empirical experiments indicate that our proposed method has the potential to advance the state-of-the-art performance.
Tasks Representation Learning, Time Series, Time Series Forecasting
Published 2020-02-18
URL https://arxiv.org/abs/2002.07638v1
PDF https://arxiv.org/pdf/2002.07638v1.pdf
PWC https://paperswithcode.com/paper/conditional-mutual-information-based

BRPO: Batch Residual Policy Optimization

Title BRPO: Batch Residual Policy Optimization
Authors Sungryull Sohn, Yinlam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier
Abstract In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state. This can cause batch RL to be overly conservative, unable to exploit large policy changes at frequently-visited, high-confidence states without risking poor performance at sparsely-visited states. To remedy this, we propose residual policies, where the allowable deviation of the learned policy is state-action-dependent. We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance. We show that BRPO achieves the state-of-the-art performance in a number of tasks.
Published 2020-02-08
URL https://arxiv.org/abs/2002.05522v2
PDF https://arxiv.org/pdf/2002.05522v2.pdf
PWC https://paperswithcode.com/paper/brpo-batch-residual-policy-optimization

Generalized Visual Information Analysis via Tensorial Algebra

Title Generalized Visual Information Analysis via Tensorial Algebra
Authors Liang Liao, Stephen John Maybank
Abstract High order data is modeled using matrices whose entries are numerical arrays of a fixed size. These arrays, called t-scalars, form a commutative ring under the convolution product. Matrices with elements in the ring of t-scalars are referred to as t-matrices. The t-matrices can be scaled, added and multiplied in the usual way. There are t-matrix generalizations of positive matrices, orthogonal matrices and Hermitian symmetric matrices. With the t-matrix model, it is possible to generalize many well-known matrix algorithms. In particular, the t-matrices are used to generalize the SVD (Singular Value Decomposition), HOSVD (High Order SVD), PCA (Principal Component Analysis), 2DPCA (Two Dimensional PCA) and GCA (Grassmannian Component Analysis). The generalized t-matrix algorithms, namely TSVD, THOSVD,TPCA, T2DPCA and TGCA, are applied to low-rank approximation, reconstruction,and supervised classification of images. Experiments show that the t-matrix algorithms compare favorably with standard matrix algorithms.
Published 2020-01-31
URL https://arxiv.org/abs/2001.11708v1
PDF https://arxiv.org/pdf/2001.11708v1.pdf
PWC https://paperswithcode.com/paper/generalized-visual-information-analysis-via

Statistical Exploration of Relationships Between Routine and Agnostic Features Towards Interpretable Risk Characterization

Title Statistical Exploration of Relationships Between Routine and Agnostic Features Towards Interpretable Risk Characterization
Authors Eric Wolsztynski
Abstract As is typical in other fields of application of high throughput systems, radiology is faced with the challenge of interpreting increasingly sophisticated predictive models such as those derived from radiomics analyses. Interpretation may be guided by the learning output from machine learning models, which may however vary greatly with each technique. Whatever this output model, it will raise some essential questions. How do we interpret the prognostic model for clinical implementation? How can we identify potential information structures within sets of radiomic features, in order to create clinically interpretable models? And how can we recombine or exploit potential relationships between features towards improved interpretability? A number of statistical techniques are explored to assess (possibly nonlinear) relationships between radiological features from different angles.
Published 2020-01-28
URL https://arxiv.org/abs/2001.10353v1
PDF https://arxiv.org/pdf/2001.10353v1.pdf
PWC https://paperswithcode.com/paper/statistical-exploration-of-relationships
comments powered by Disqus