Paper Group ANR 178
Curvature-Exploiting Acceleration of Elastic Net Computations. Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization. Unsupervised Deep Learning for Ultra-reliable and Low-latency Communications. Bayesian Network Models for Incomplete and Dynamic Data. Syntactic Recurrent Neural Network for Authorship Attribution. Gaussian …
Curvature-Exploiting Acceleration of Elastic Net Computations
Title | Curvature-Exploiting Acceleration of Elastic Net Computations |
Authors | Vien V. Mai, Mikael Johansson |
Abstract | This paper introduces an efficient second-order method for solving the elastic net problem. Its key innovation is a computationally efficient technique for injecting curvature information in the optimization process which admits a strong theoretical performance guarantee. In particular, we show improved run time over popular first-order methods and quantify the speed-up in terms of statistical measures of the data matrix. The improved time complexity is the result of an extensive exploitation of the problem structure and a careful combination of second-order information, variance reduction techniques, and momentum acceleration. Beside theoretical speed-up, experimental results demonstrate great practical performance benefits of curvature information, especially for ill-conditioned data sets. |
Tasks | |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1901.08523v1 |
http://arxiv.org/pdf/1901.08523v1.pdf | |
PWC | https://paperswithcode.com/paper/curvature-exploiting-acceleration-of-elastic |
Repo | |
Framework | |
Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization
Title | Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization |
Authors | Xiao Fu, Shahana Ibrahim, Hoi-To Wai, Cheng Gao, Kejun Huang |
Abstract | This work considers the problem of computing the canonical polyadic decomposition (CPD) of large tensors. Prior works mostly leverage data sparsity to handle this problem, which is not suitable for handling dense tensors that often arise in applications such as medical imaging, computer vision, and remote sensing. Stochastic optimization is known for its low memory cost and per-iteration complexity when handling dense data. However, exisiting stochastic CPD algorithms are not flexible enough to incorporate a variety of constraints/regularizations that are of interest in signal and data analytics. Convergence properties of many such algorithms are also unclear. In this work, we propose a stochastic optimization framework for large-scale CPD with constraints/regularizations. The framework works under a doubly randomized fashion, and can be regarded as a judicious combination of randomized block coordinate descent (BCD) and stochastic proximal gradient (SPG). The algorithm enjoys lightweight updates and small memory footprint. In addition, this framework entails considerable flexibility—many frequently used regularizers and constraints can be readily handled under the proposed scheme. The approach is also supported by convergence analysis. Numerical results on large-scale dense tensors are employed to showcase the effectiveness of the proposed approach. |
Tasks | Stochastic Optimization |
Published | 2019-01-16 |
URL | https://arxiv.org/abs/1901.05529v3 |
https://arxiv.org/pdf/1901.05529v3.pdf | |
PWC | https://paperswithcode.com/paper/block-randomized-stochastic-proximal-gradient |
Repo | |
Framework | |
Unsupervised Deep Learning for Ultra-reliable and Low-latency Communications
Title | Unsupervised Deep Learning for Ultra-reliable and Low-latency Communications |
Authors | Chengjian Sun, Chenyang Yang |
Abstract | In this paper, we study how to solve resource allocation problems in ultra-reliable and low-latency communications by unsupervised deep learning, which often yield functional optimization problems with quality-of-service (QoS) constraints. We take a joint power and bandwidth allocation problem as an example, which minimizes the total bandwidth required to guarantee the QoS of each user in terms of the delay bound and overall packet loss probability. The global optimal solution is found in a symmetric scenario. A neural network was introduced to find an approximated optimal solution in general scenarios, where the QoS is ensured by using the property that the optimal solution should satisfy as the “supervision signal”. Simulation results show that the learning-based solution performs the same as the optimal solution in the symmetric scenario, and can save around 40% bandwidth with respect to the state-of-the-art policy. |
Tasks | |
Published | 2019-04-26 |
URL | https://arxiv.org/abs/1905.13014v2 |
https://arxiv.org/pdf/1905.13014v2.pdf | |
PWC | https://paperswithcode.com/paper/190513014 |
Repo | |
Framework | |
Bayesian Network Models for Incomplete and Dynamic Data
Title | Bayesian Network Models for Incomplete and Dynamic Data |
Authors | Marco Scutari |
Abstract | Bayesian networks are a versatile and powerful tool to model complex phenomena and the interplay of their components in a probabilistically principled way. Moving beyond the comparatively simple case of completely observed, static data, which has received the most attention in the literature, in this paper we will review how Bayesian networks can model dynamic data and data with incomplete observations. Such data are the norm at the forefront of research and in practical applications, and Bayesian networks are uniquely positioned to model them due to their explainability and interpretability. |
Tasks | |
Published | 2019-06-15 |
URL | https://arxiv.org/abs/1906.06513v2 |
https://arxiv.org/pdf/1906.06513v2.pdf | |
PWC | https://paperswithcode.com/paper/from-incomplete-dynamic-data-to-bayesian |
Repo | |
Framework | |
Syntactic Recurrent Neural Network for Authorship Attribution
Title | Syntactic Recurrent Neural Network for Authorship Attribution |
Authors | Fereshteh Jafariakinabad, Sansiri Tarnpradab, Kien A. Hua |
Abstract | Writing style is a combination of consistent decisions at different levels of language production including lexical, syntactic, and structural associated to a specific author (or author groups). While lexical-based models have been widely explored in style-based text classification, relying on content makes the model less scalable when dealing with heterogeneous data comprised of various topics. On the other hand, syntactic models which are content-independent, are more robust against topic variance. In this paper, we introduce a syntactic recurrent neural network to encode the syntactic patterns of a document in a hierarchical structure. The model first learns the syntactic representation of sentences from the sequence of part-of-speech tags. For this purpose, we exploit both convolutional filters and long short-term memories to investigate the short-term and long-term dependencies of part-of-speech tags in the sentences. Subsequently, the syntactic representations of sentences are aggregated into document representation using recurrent neural networks. Our experimental results on PAN 2012 dataset for authorship attribution task shows that syntactic recurrent neural network outperforms the lexical model with the identical architecture by approximately 14% in terms of accuracy. |
Tasks | Text Classification |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09723v2 |
http://arxiv.org/pdf/1902.09723v2.pdf | |
PWC | https://paperswithcode.com/paper/syntactic-recurrent-neural-network-for |
Repo | |
Framework | |
Gaussian Mean Field Regularizes by Limiting Learned Information
Title | Gaussian Mean Field Regularizes by Limiting Learned Information |
Authors | Julius Kunze, Louis Kirsch, Hippolyt Ritter, David Barber |
Abstract | Variational inference with a factorized Gaussian posterior estimate is a widely used approach for learning parameters and hidden variables. Empirically, a regularizing effect can be observed that is poorly understood. In this work, we show how mean field inference improves generalization by limiting mutual information between learned parameters and the data through noise. We quantify a maximum capacity when the posterior variance is either fixed or learned and connect it to generalization error, even when the KL-divergence in the objective is rescaled. Our experiments demonstrate that bounding information between parameters and data effectively regularizes neural networks on both supervised and unsupervised tasks. |
Tasks | |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04340v1 |
http://arxiv.org/pdf/1902.04340v1.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-mean-field-regularizes-by-limiting |
Repo | |
Framework | |
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Title | Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology |
Authors | Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, Jim McFadden, Tushar Chandra, Craig Boutilier |
Abstract | Most practical recommender systems focus on estimating immediate user engagement without considering the long-term effects of recommendations on user behavior. Reinforcement learning (RL) methods offer the potential to optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items - which may have interacting effects on user choice - methods are required to deal with the combinatorics of the RL action space. In this work, we address the challenge of making slate-based recommendations to optimize long-term value using RL. Our contributions are three-fold. (i) We develop SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. (ii) We outline a methodology that leverages existing myopic learning-based recommenders to quickly develop a recommender that handles LTV. (iii) We demonstrate our methods in simulation, and validate the scalability of decomposed TD-learning using SLATEQ in live experiments on YouTube. |
Tasks | Q-Learning, Recommendation Systems |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12767v2 |
https://arxiv.org/pdf/1905.12767v2.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-for-slate-based |
Repo | |
Framework | |
Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking
Title | Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking |
Authors | Yue Wang, Yao Wan, Chenwei Zhang, Lixin Cui, Lu Bai, Philip S. Yu |
Abstract | Counterfactual thinking describes a psychological phenomenon that people re-infer the possible results with different solutions about things that have already happened. It helps people to gain more experience from mistakes and thus to perform better in similar future tasks. This paper investigates the counterfactual thinking for agents to find optimal decision-making strategies in multi-agent reinforcement learning environments. In particular, we propose a multi-agent deep reinforcement learning model with a structure which mimics the human-psychological counterfactual thinking process to improve the competitive abilities for agents. To this end, our model generates several possible actions (intent actions) with a parallel policy structure and estimates the rewards and regrets for these intent actions based on its current understanding of the environment. Our model incorporates a scenario-based framework to link the estimated regrets with its inner policies. During the iterations, our model updates the parallel policies and the corresponding scenario-based regrets for agents simultaneously. To verify the effectiveness of our proposed model, we conduct extensive experiments on two different environments with real-world applications. Experimental results show that counterfactual thinking can actually benefit the agents to obtain more accumulative rewards from the environments with fair information by comparing to their opponents while keeping high performing efficiency. |
Tasks | Decision Making, Multi-agent Reinforcement Learning |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.04573v2 |
https://arxiv.org/pdf/1908.04573v2.pdf | |
PWC | https://paperswithcode.com/paper/competitive-multi-agent-deep-reinforcement |
Repo | |
Framework | |
Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search
Title | Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search |
Authors | Pepa Atanasova, Georgi Karadzhov, Yasen Kiprov, Preslav Nakov, Fabrizio Sebastiani |
Abstract | In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the “ten blue links’’ metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking follow-up questions in order to better understand the user’s intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user’s intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily. |
Tasks | Question Answering |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10565v1 |
https://arxiv.org/pdf/1905.10565v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-variable-length-multiple-option |
Repo | |
Framework | |
AutoQ: Automated Kernel-Wise Neural Network Quantization
Title | AutoQ: Automated Kernel-Wise Neural Network Quantization |
Authors | Qian Lou, Feng Guo, Lantao Liu, Minje Kim, Lei Jiang |
Abstract | Network quantization is one of the most hardware friendly techniques to enable the deployment of convolutional neural networks (CNNs) on low-power mobile devices. Recent network quantization techniques quantize each weight kernel in a convolutional layer independently for higher inference accuracy, since the weight kernels in a layer exhibit different variances and hence have different amounts of redundancy. The quantization bitwidth or bit number (QBN) directly decides the inference accuracy, latency, energy and hardware overhead. To effectively reduce the redundancy and accelerate CNN inferences, various weight kernels should be quantized with different QBNs. However, prior works use only one QBN to quantize each convolutional layer or the entire CNN, because the design space of searching a QBN for each weight kernel is too large. The hand-crafted heuristic of the kernel-wise QBN search is so sophisticated that domain experts can obtain only sub-optimal results. It is difficult for even deep reinforcement learning (DRL) Deep Deterministic Policy Gradient (DDPG)-based agents to find a kernel-wise QBN configuration that can achieve reasonable inference accuracy. In this paper, we propose a hierarchical-DRL-based kernel-wise network quantization technique, AutoQ, to automatically search a QBN for each weight kernel, and choose another QBN for each activation layer. Compared to the models quantized by the state-of-the-art DRL-based schemes, on average, the same models quantized by AutoQ reduce the inference latency by 54.06%, and decrease the inference energy consumption by 50.69%, while achieving the same inference accuracy. |
Tasks | AutoML, Quantization |
Published | 2019-02-15 |
URL | https://arxiv.org/abs/1902.05690v3 |
https://arxiv.org/pdf/1902.05690v3.pdf | |
PWC | https://paperswithcode.com/paper/autoqb-automl-for-network-quantization-and |
Repo | |
Framework | |
Phase Contrast Microscopy Cell PopulationSegmentation: A Survey
Title | Phase Contrast Microscopy Cell PopulationSegmentation: A Survey |
Authors | Lin Zhang |
Abstract | Phase contrast microscopy (PCM) has been widely used in biomedicine research, which allows users to observe objectives without staining or killing them. One important related research is to employ PCM to monitor live cells. How to segment cell populations in obtained PCM images gains more and more attention as its a critical step for downstream applications, such as cell tracking, cell classification and others. Many papers have been published to deal with this problem from different perspectives. In this paper we aim to present a comprehensive review on the development of PCM cell population segmentation. |
Tasks | |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.11111v1 |
https://arxiv.org/pdf/1911.11111v1.pdf | |
PWC | https://paperswithcode.com/paper/phase-contrast-microscopy-cell |
Repo | |
Framework | |
Language Model-Driven Unsupervised Neural Machine Translation
Title | Language Model-Driven Unsupervised Neural Machine Translation |
Authors | Wei Zhang, Youyuan Lin, Ruoran Ren, Xiaodong Wang, Zhenshuang Liang, Zhen Huang |
Abstract | Unsupervised neural machine translation(NMT) is associated with noise and errors in synthetic data when executing vanilla back-translations. Here, we explicitly exploits language model(LM) to drive construction of an unsupervised NMT system. This features two steps. First, we initialize NMT models using synthetic data generated via temporary statistical machine translation(SMT). Second, unlike vanilla back-translation, we formulate a weight function, that scores synthetic data at each step of subsequent iterative training; this allows unsupervised training to an improved outcome. We present the detailed mathematical construction of our method. Experimental WMT2014 English-French, and WMT2016 English-German and English-Russian translation tasks revealed that our method outperforms the best prior systems by more than 3 BLEU points. |
Tasks | Language Modelling, Machine Translation |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03937v1 |
https://arxiv.org/pdf/1911.03937v1.pdf | |
PWC | https://paperswithcode.com/paper/language-model-driven-unsupervised-neural |
Repo | |
Framework | |
Optimizing method for Neural Network based on Genetic Random Weight Change Learning Algorithm
Title | Optimizing method for Neural Network based on Genetic Random Weight Change Learning Algorithm |
Authors | Mohammad Ibrahim Sarker, Zubaer Ibna Mannan, Hyongsuk Kim |
Abstract | Random weight change (RWC) algorithm is extremely component and robust for the hardware implementation of neural networks. RWC and Genetic algorithm (GA) are well known methodologies used for optimizing and learning the neural network (NN). Individually, each of these two algorithms has its strength and weakness along with separate objectives. However, recently, researchers combine these two algorithms for better learning and optimization of NN. In this paper, we proposed a methodology by combining the RWC and GA, namely Genetic Random Weight Change (GRWC), as well as demonstrate a seminal way to reduce the complexity of the neural network by removing weak weights of GRWC. In contrast to RWC and GA, GRWC contains an effective optimization procedure which is worthy at exploring a large and complex space in intellectual strategies influenced by the GA/RWC synergy. The learning behavior of the proposed algorithm was tested on MNIST dataset and it was able to prove its performance. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1907.07254v1 |
https://arxiv.org/pdf/1907.07254v1.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-method-for-neural-network-based-on |
Repo | |
Framework | |
Robust supervised classification and feature selection using a primal-dual method
Title | Robust supervised classification and feature selection using a primal-dual method |
Authors | Michel Barlaud, Antonin Chambolle, Jean-Baptiste Caillau |
Abstract | This paper deals with supervised classification and feature selection in high dimensional space. A classical approach is to project data on a low dimensional space and classify by minimizing an appropriate quadratic cost. A strict control on sparsity is moreover obtained by adding an $\ell_1$ constraint, here on the matrix of weights used for projecting the data. Tuning the sparsity bound results in selecting the relevant features for supervised classification. It is well known that using a quadratic cost is not robust to outliers. We cope with this problem by using an $\ell_1$ norm both for the constraint and for the loss function. In this case, the criterion is convex but not gradient Lipschitz anymore. Another second issue is that we optimize simultaneously the projection matrix and the centers used for classification. In this paper, we provide a novel tailored constrained primal-dual method to compute jointly selected features and classifiers. Extending our primal-dual method to other criteria is easy provided that efficient projection (on the dual ball for the loss data term) and prox (for the regularization term) algorithms are available. We illustrate such an extension in the case of a Frobenius norm for the loss term. We provide a convergence proof of our primal-dual method, and demonstrate its effectiveness on three datasets (one synthetic, two from biological data) on which we compare $\ell_1$ and $\ell_2$ costs. |
Tasks | Feature Selection |
Published | 2019-02-05 |
URL | https://arxiv.org/abs/1902.01600v4 |
https://arxiv.org/pdf/1902.01600v4.pdf | |
PWC | https://paperswithcode.com/paper/robust-supervised-classification-and-feature |
Repo | |
Framework | |
A unifying approach for doubly-robust $\ell_1$ regularized estimation of causal contrasts
Title | A unifying approach for doubly-robust $\ell_1$ regularized estimation of causal contrasts |
Authors | Ezequiel Smucler, Andrea Rotnitzky, James M. Robins |
Abstract | We consider inference about a scalar parameter under a non-parametric model based on a one-step estimator computed as a plug in estimator plus the empirical mean of an estimator of the parameter’s influence function. We focus on a class of parameters that have influence function which depends on two infinite dimensional nuisance functions and such that the bias of the one-step estimator of the parameter of interest is the expectation of the product of the estimation errors of the two nuisance functions. Our class includes many important treatment effect contrasts of interest in causal inference and econometrics, such as ATE, ATT, an integrated causal contrast with a continuous treatment, and the mean of an outcome missing not at random. We propose estimators of the target parameter that entertain approximately sparse regression models for the nuisance functions allowing for the number of potential confounders to be even larger than the sample size. By employing sample splitting, cross-fitting and $\ell_1$-regularized regression estimators of the nuisance functions based on objective functions whose directional derivatives agree with those of the parameter’s influence function, we obtain estimators of the target parameter with two desirable robustness properties: (1) they are rate doubly-robust in that they are root-n consistent and asymptotically normal when both nuisance functions follow approximately sparse models, even if one function has a very non-sparse regression coefficient, so long as the other has a sufficiently sparse regression coefficient, and (2) they are model doubly-robust in that they are root-n consistent and asymptotically normal even if one of the nuisance functions does not follow an approximately sparse model so long as the other nuisance function follows an approximately sparse model with a sufficiently sparse regression coefficient. |
Tasks | Causal Inference |
Published | 2019-04-07 |
URL | https://arxiv.org/abs/1904.03737v3 |
https://arxiv.org/pdf/1904.03737v3.pdf | |
PWC | https://paperswithcode.com/paper/a-unifying-approach-for-doubly-robust-ell_1 |
Repo | |
Framework | |