April 2, 2020

3190 words 15 mins read

Paper Group ANR 262

Causally Correct Partial Models for Reinforcement Learning. aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption. On the Effectiveness of Richardson Extrapolation in Machine Learning. Understanding Crowd Behaviors in a Social Event by Passive WiFi Sensing and Data Mining. Policy Teaching via Environment Poisoning: Training-t …

Causally Correct Partial Models for Reinforcement Learning


Title	Causally Correct Partial Models for Reinforcement Learning
Authors	Danilo J. Rezende, Ivo Danihelka, George Papamakarios, Nan Rosemary Ke, Ray Jiang, Theophane Weber, Karol Gregor, Hamza Merzic, Fabio Viola, Jane Wang, Jovana Mitrovic, Frederic Besse, Ioannis Antonoglou, Lars Buesing
Abstract	In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent’s next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this paper, we show that partial models can be causally incorrect: they are confounded by the observations they don’t model, and can therefore lead to incorrect planning. To address this, we introduce a general family of partial models that are provably causally correct, yet remain fast because they do not need to fully model future observations.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02836v1
PDF	https://arxiv.org/pdf/2002.02836v1.pdf
PWC	https://paperswithcode.com/paper/causally-correct-partial-models-for-1
Repo
Framework

aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption


Title	aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption
Authors	Chiranjib Sur
Abstract	Region visual features enhance the generative capability of the machines based on features, however they lack proper interaction attentional perceptions and thus ends up with biased or uncorrelated sentences or pieces of misinformation. In this work, we propose Attribute Interaction-Tensor Product Representation (aiTPR) which is a convenient way of gathering more information through orthogonal combination and learning the interactions as physical entities (tensors) and improving the captions. Compared to previous works, where features are added up to undefined feature spaces, TPR helps in maintaining sanity in combinations and orthogonality helps in defining familiar spaces. We have introduced a new concept layer that defines the objects and also their interactions that can play a crucial role in determination of different descriptions. The interaction portions have contributed heavily for better caption quality and has out-performed different previous works on this domain and MSCOCO dataset. We introduced, for the first time, the notion of combining regional image features and abstracted interaction likelihood embedding for image captioning.
Tasks	Image Captioning
Published	2020-01-27
URL	https://arxiv.org/abs/2001.09545v1
PDF	https://arxiv.org/pdf/2001.09545v1.pdf
PWC	https://paperswithcode.com/paper/aitpr-attribute-interaction-tensor-product
Repo
Framework

On the Effectiveness of Richardson Extrapolation in Machine Learning


Title	On the Effectiveness of Richardson Extrapolation in Machine Learning
Authors	Francis Bach
Abstract	Richardson extrapolation is a classical technique from numerical analysis that can improve the approximation error of an estimation method by combining linearly several estimates obtained from different values of one of its hyperparameters, without the need to know in details the inner structure of the original estimation method. The main goal of this paper is to study when Richardson extrapolation can be used within machine learning, beyond the existing applications to step-size adaptations in stochastic gradient descent. We identify two situations where Richardson interpolation can be useful: (1) when the hyperparameter is the number of iterations of an existing iterative optimization algorithm, with applications to averaged gradient descent and Frank-Wolfe algorithms (where we obtain asymptotically rates of $O(1/k^2)$ on polytopes, where $k$ is the number of iterations), and (2) when it is a regularization parameter, with applications to Nesterov smoothing techniques for minimizing non-smooth functions (where we obtain asymptotically rates close to $O(1/k^2)$ for non-smooth functions), and ridge regression. In all these cases, we show that extrapolation techniques come with no significant loss in performance, but with sometimes strong gains, and we provide theoretical justifications based on asymptotic developments for such gains, as well as empirical illustrations on classical problems from machine learning.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02835v1
PDF	https://arxiv.org/pdf/2002.02835v1.pdf
PWC	https://paperswithcode.com/paper/on-the-effectiveness-of-richardson
Repo
Framework


Title	Understanding Crowd Behaviors in a Social Event by Passive WiFi Sensing and Data Mining
Authors	Yuren Zhou, Billy Pik Lik Lau, Zann Koh, Chau Yuen, Benny Kai Kiat Ng
Abstract	Understanding crowd behaviors in a large social event is crucial for event management. Passive WiFi sensing, by collecting WiFi probe requests sent from mobile devices, provides a better way to monitor crowds compared with people counters and cameras in terms of free interference, larger coverage, lower cost, and more information on people’s movement. In existing studies, however, not enough attention has been paid to the thorough analysis and mining of collected data. Especially, the power of machine learning has not been fully exploited. In this paper, therefore, we propose a comprehensive data analysis framework to fully analyze the collected probe requests to extract three types of patterns related to crowd behaviors in a large social event, with the help of statistics, visualization, and unsupervised machine learning. First, trajectories of the mobile devices are extracted from probe requests and analyzed to reveal the spatial patterns of the crowds’ movement. Hierarchical agglomerative clustering is adopted to find the interconnections between different locations. Next, k-means and k-shape clustering algorithms are applied to extract temporal visiting patterns of the crowds by days and locations, respectively. Finally, by combining with time, trajectories are transformed into spatiotemporal patterns, which reveal how trajectory duration changes over the length and how the overall trends of crowd movement change over time. The proposed data analysis framework is fully demonstrated using real-world data collected in a large social event. Results show that one can extract comprehensive patterns from data collected by a network of passive WiFi sensors.
Tasks
Published	2020-02-05
URL	https://arxiv.org/abs/2002.04401v1
PDF	https://arxiv.org/pdf/2002.04401v1.pdf
PWC	https://paperswithcode.com/paper/understanding-crowd-behaviors-in-a-social
Repo
Framework

Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning


Title	Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
Authors	Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, Adish Singla
Abstract	We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings. The attacker can manipulate the rewards or the transition dynamics in the learning environment at training-time and is interested in doing so in a stealthy manner. We propose an optimization framework for finding an \emph{optimal stealthy attack} for different measures of attack cost. We provide sufficient technical conditions under which the attack is feasible and provide lower/upper bounds on the attack cost. We instantiate our attacks in two settings: (i) an \emph{offline} setting where the agent is doing planning in the poisoned environment, and (ii) an \emph{online} setting where the agent is learning a policy using a regret-minimization framework with poisoned feedback. Our results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.
Tasks
Published	2020-03-28
URL	https://arxiv.org/abs/2003.12909v1
PDF	https://arxiv.org/pdf/2003.12909v1.pdf
PWC	https://paperswithcode.com/paper/policy-teaching-via-environment-poisoning
Repo
Framework

Development of modeling and control strategies for an approximated Gaussian process


Title	Development of modeling and control strategies for an approximated Gaussian process
Authors	Shisheng Cui, Chia-Jung Chang
Abstract	The Gaussian process (GP) model, which has been extensively applied as priors of functions, has demonstrated excellent performance. The specification of a large number of parameters affects the computational efficiency and the feasibility of implementation of a control strategy. We propose a linear model to approximate GPs; this model expands the GP model by a series of basis functions. Several examples and simulation studies are presented to demonstrate the advantages of the proposed method. A control strategy is provided with the proposed linear model.
Tasks
Published	2020-02-12
URL	https://arxiv.org/abs/2002.05105v1
PDF	https://arxiv.org/pdf/2002.05105v1.pdf
PWC	https://paperswithcode.com/paper/development-of-modeling-and-control
Repo
Framework

ArcText: A Unified Text Approach to Describing Convolutional Neural Network Architectures


Title	ArcText: A Unified Text Approach to Describing Convolutional Neural Network Architectures
Authors	Yanan Sun, Ziyao Ren
Abstract	Convolutional Neural Networks (CNNs) have demonstrated their promising performance in the field of computer vision. The superiority of CNNs mainly relies on their architectures that are often manually crafted with extensive human expertise. Data mining on existing CNN can discover useful patterns and fundamental sub-comments from their architectures, providing researchers with strong prior knowledge to design proper CNN architectures when they have no expertise in CNNs. There have been various state-of-the-art data mining algorithms at hand, while there is only rare work that has been done for the mining. One of the main reasons is the barrier between CNN architectures and data mining algorithms. Specifically, the current CNN architecture descriptions cannot be exactly vectorized as input to data mining algorithms. In this paper, we propose a unified approach, named ArcText, to describing CNN architectures based on text. Particularly, four different units and an ordering method have been elaborately designed in ArcText, to uniquely describe the same architecture with sufficient information. Also, the resulted description can be exactly converted back to the corresponding CNN architecture. ArcText bridges the gap between CNN architectures and data mining researchers, and has the potentiality to be utilized to wider scenarios.
Tasks
Published	2020-02-16
URL	https://arxiv.org/abs/2002.10233v3
PDF	https://arxiv.org/pdf/2002.10233v3.pdf
PWC	https://paperswithcode.com/paper/arctext-an-unified-text-approach-to
Repo
Framework

In Defense of Grid Features for Visual Question Answering


Title	In Defense of Grid Features for Visual Question Answering
Authors	Huaizu Jiang, Ishan Misra, Marcus Rohrbach, Erik Learned-Miller, Xinlei Chen
Abstract	Popularized as ‘bottom-up’ attention, bounding box (or region) based visual features have recently surpassed vanilla grid-based convolutional features as the de facto standard for vision and language tasks like visual question answering (VQA). However, it is not clear whether the advantages of regions (e.g. better localization) are the key reasons for the success of bottom-up attention. In this paper, we revisit grid features for VQA and find they can work surprisingly well-running more than an order of magnitude faster with the same accuracy. Through extensive experiments, we verify that this observation holds true across different VQA models, datasets, and generalizes well to other tasks like image captioning. As grid features make the model design and training process much simpler, this enables us to train them end-to-end and also use a more flexible network design. We learn VQA models end-to-end, from pixels directly to answers, and show that strong performance is achievable without using any region annotations in pre-training. We hope our findings help further improve the scientific understanding and the practical application of VQA. Code and features will be made available.
Tasks	Image Captioning, Question Answering, Visual Question Answering
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03615v1
PDF	https://arxiv.org/pdf/2001.03615v1.pdf
PWC	https://paperswithcode.com/paper/in-defense-of-grid-features-for-visual
Repo
Framework

Segmentation and Optimal Region Selection of Physiological Signals using Deep Neural Networks and Combinatorial Optimization


Title	Segmentation and Optimal Region Selection of Physiological Signals using Deep Neural Networks and Combinatorial Optimization
Authors	Jorge Oliveira, Margarida Carvalho, Diogo Marcelo Nogueira, Miguel Coimbra
Abstract	Physiological signals, such as the electrocardiogram and the phonocardiogram are very often corrupted by noisy sources. Usually, artificial intelligent algorithms analyze the signal regardless of its quality. On the other hand, physicians use a completely orthogonal strategy. They do not assess the entire recording, instead they search for a segment where the fundamental and abnormal waves are easily detected, and only then a prognostic is attempted. Inspired by this fact, a new algorithm that automatically selects an optimal segment for a post-processing stage, according to a criteria defined by the user is proposed. In the process, a Neural Network is used to compute the output state probability distribution for each sample. Using the aforementioned quantities, a graph is designed, whereas state transition constraints are physically imposed into the graph and a set of constraints are used to retrieve a subset of the recording that maximizes the likelihood function, proposed by the user. The developed framework is tested and validated in two applications. In both cases, the system performance is boosted significantly, e.g in heart sound segmentation, sensitivity increases 2.4% when compared to the standard approaches in the literature.
Tasks	Combinatorial Optimization
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07981v1
PDF	https://arxiv.org/pdf/2003.07981v1.pdf
PWC	https://paperswithcode.com/paper/segmentation-and-optimal-region-selection-of
Repo
Framework

Reinforcement Learning for Combinatorial Optimization: A Survey


Title	Reinforcement Learning for Combinatorial Optimization: A Survey
Authors	Nina Mazyavkina, Sergey Sviridov, Sergei Ivanov, Evgeny Burnaev
Abstract	Combinatorial optimization (CO) is the workhorse of numerous important applications in operations research, engineering and other fields and, thus, has been attracting enormous attention from the research community for over a century. Many efficient solutions to common problems involve using hand-crafted heuristics to sequentially construct a solution. Therefore, it is intriguing to see how a combinatorial optimization problem can be formulated as a sequential decision making process and whether efficient heuristics can be implicitly learned by a reinforcement learning agent to find a solution. This survey explores the synergy between CO and reinforcement learning (RL) framework, which can become a promising direction for solving combinatorial problems.
Tasks	Combinatorial Optimization, Decision Making
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03600v1
PDF	https://arxiv.org/pdf/2003.03600v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-for-combinatorial
Repo
Framework

Sparse Optimization for Green Edge AI Inference


Title	Sparse Optimization for Green Edge AI Inference
Authors	Xiangyu Yang, Sheng Hua, Yuanming Shi, Hao Wang, Jun Zhang, Khaled B. Letaief
Abstract	With the rapid upsurge of deep learning tasks at the network edge, effective edge artificial intelligence (AI) inference becomes critical to provide low-latency intelligent services for mobile users via leveraging the edge computing capability. In such scenarios, energy efficiency becomes a primary concern. In this paper, we present a joint inference task selection and downlink beamforming strategy to achieve energy-efficient edge AI inference through minimizing the overall power consumption consisting of both computation and transmission power consumption, yielding a mixed combinatorial optimization problem. By exploiting the inherent connections between the set of task selection and group sparsity structural transmit beamforming vector, we reformulate the optimization as a group sparse beamforming problem. To solve this challenging problem, we propose a log-sum function based three-stage approach. By adopting the log-sum function to enhance the group sparsity, a proximal iteratively reweighted algorithm is developed. Furthermore, we establish the global convergence analysis and provide the ergodic worst-case convergence rate for this algorithm. Simulation results will demonstrate the effectiveness of the proposed approach for improving energy efficiency in edge AI inference systems.
Tasks	Combinatorial Optimization
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10080v2
PDF	https://arxiv.org/pdf/2002.10080v2.pdf
PWC	https://paperswithcode.com/paper/sparse-optimization-for-green-edge-ai
Repo
Framework

Incremental Sampling Without Replacement for Sequence Models


Title	Incremental Sampling Without Replacement for Sequence Models
Authors	Kensen Shi, David Bieber, Charles Sutton
Abstract	Sampling is a fundamental technique, and sampling without replacement is often desirable when duplicate samples are not beneficial. Within machine learning, sampling is useful for generating diverse outputs from a trained model. We present an elegant procedure for sampling without replacement from a broad class of randomized programs, including generative neural models that construct outputs sequentially. Our procedure is efficient even for exponentially-large output spaces. Unlike prior work, our approach is incremental, i.e., samples can be drawn one at a time, allowing for increased flexibility. We also present a new estimator for computing expectations from samples drawn without replacement. We show that incremental sampling without replacement is applicable to many domains, e.g., program synthesis and combinatorial optimization.
Tasks	Combinatorial Optimization, Program Synthesis
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09067v1
PDF	https://arxiv.org/pdf/2002.09067v1.pdf
PWC	https://paperswithcode.com/paper/incremental-sampling-without-replacement-for
Repo
Framework

Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm


Title	Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm
Authors	Dimitri Bertsekas
Abstract	We consider an extension of the rollout algorithm that applies to constrained deterministic dynamic programming, including challenging combinatorial optimization problems. The algorithm relies on a suboptimal policy, called base heuristic. Under suitable assumptions, we show that if the base heuristic produces a feasible solution, the rollout algorithm has a cost improvement property: it produces a feasible solution, whose cost is no worse than the base heuristic’s cost. We then focus on multiagent problems, where the control at each stage consists of multiple components (one per agent), which are coupled either through the cost function or the constraints or both. We show that the cost improvement property is maintained with an alternative implementation that has greatly reduced computational requirements, and makes possible the use of rollout in problems with many agents. We demonstrate this alternative algorithm by applying it to layered graph problems that involve both a spatial and a temporal structure. We consider in some detail a prominent example of such problems: multidimensional assignment, where we use the auction algorithm for 2-dimensional assignment as a base heuristic. This auction algorithm is particularly well-suited for our context, because through the use of prices, it can advantageously use the solution of an assignment problem as a starting point for solving other related assignment problems, and this can greatly speed up the execution of the rollout algorithm.
Tasks	Combinatorial Optimization
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07407v1
PDF	https://arxiv.org/pdf/2002.07407v1.pdf
PWC	https://paperswithcode.com/paper/constrained-multiagent-rollout-and
Repo
Framework

Evolutionary Bi-objective Optimization for the Dynamic Chance-Constrained Knapsack Problem Based on Tail Bound Objectives


Title	Evolutionary Bi-objective Optimization for the Dynamic Chance-Constrained Knapsack Problem Based on Tail Bound Objectives
Authors	Hirad Assimi, Oscar Harper, Yue Xie, Aneta Neumann, Frank Neumann
Abstract	Real-world combinatorial optimization problems are often stochastic and dynamic. Therefore, it is essential to make optimal and reliable decisions with a holistic approach. In this paper, we consider the dynamic chance-constrained knapsack problem where the weight of each item is stochastic, the capacity constraint changes dynamically over time, and the objective is to maximize the total profit subject to the probability that total weight exceeds the capacity. We make use of prominent tail inequalities such as Chebyshev’s inequality, and Chernoff bound to approximate the probabilistic constraint. Our key contribution is to introduce an additional objective which estimates the minimal capacity bound for a given stochastic solution that still meets the chance constraint. This objective helps to cater for dynamic changes to the stochastic problem. We apply single- and multi-objective evolutionary algorithms to the problem and show how bi-objective optimization can help to deal with dynamic chance-constrained problems.
Tasks	Combinatorial Optimization
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06766v1
PDF	https://arxiv.org/pdf/2002.06766v1.pdf
PWC	https://paperswithcode.com/paper/evolutionary-bi-objective-optimization-for
Repo
Framework

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization


Title	Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization
Authors	Dmitrii Beloborodov, A. E. Ulanov, Jakob N. Foerster, Shimon Whiteson, A. I. Lvovsky
Abstract	Quantum hardware and quantum-inspired algorithms are becoming increasingly popular for combinatorial optimization. However, these algorithms may require careful hyperparameter tuning for each problem instance. We use a reinforcement learning agent in conjunction with a quantum-inspired algorithm to solve the Ising energy minimization problem, which is equivalent to the Maximum Cut problem. The agent controls the algorithm by tuning one of its parameters with the goal of improving recently seen solutions. We propose a new Rescaled Ranked Reward (R3) method that enables stable single-player version of self-play training that helps the agent to escape local optima. The training on any problem instance can be accelerated by applying transfer learning from an agent trained on randomly generated problems. Our approach allows sampling high-quality solutions to the Ising problem with high probability and outperforms both baseline heuristics and a black-box hyperparameter optimization approach.
Tasks	Combinatorial Optimization, Hyperparameter Optimization, Transfer Learning
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04676v2
PDF	https://arxiv.org/pdf/2002.04676v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-enhanced-quantum
Repo
Framework