April 2, 2020

3058 words 15 mins read

Paper Group ANR 119

Paper Group ANR 119

Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature. Options of Interest: Temporal Abstraction with Interest Functions. Self-Constructing Graph Convolutional Networks for Semantic Labeling. Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using St …

Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature

Title Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature
Authors Fusataka Kuniyoshi, Kohei Makino, Jun Ozawa, Makoto Miwa
Abstract The synthesis process is essential for achieving computational experiment design in the field of inorganic materials chemistry. In this work, we present a novel corpus of the synthesis process for all-solid-state batteries and an automated machine reading system for extracting the synthesis processes buried in the scientific literature. We define the representation of the synthesis processes using flow graphs, and create a corpus from the experimental sections of 243 papers. The automated machine-reading system is developed by a deep learning-based sequence tagger and simple heuristic rule-based relation extractor. Our experimental results demonstrate that the sequence tagger with the optimal setting can detect the entities with a macro-averaged F1 score of 0.826, while the rule-based relation extractor can achieve high performance with a macro-averaged F1 score of 0.887.
Tasks Reading Comprehension
Published 2020-02-18
URL https://arxiv.org/abs/2002.07339v1
PDF https://arxiv.org/pdf/2002.07339v1.pdf
PWC https://paperswithcode.com/paper/annotating-and-extracting-synthesis-process

Options of Interest: Temporal Abstraction with Interest Functions

Title Options of Interest: Temporal Abstraction with Interest Functions
Authors Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup
Abstract Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.
Published 2020-01-01
URL https://arxiv.org/abs/2001.00271v1
PDF https://arxiv.org/pdf/2001.00271v1.pdf
PWC https://paperswithcode.com/paper/options-of-interest-temporal-abstraction-with

Self-Constructing Graph Convolutional Networks for Semantic Labeling

Title Self-Constructing Graph Convolutional Networks for Semantic Labeling
Authors Qinghui Liu, Michael Kampffmeyer, Robert Jenssen, Arnt-Børre Salberg
Abstract Graph Neural Networks (GNNs) have received increasing attention in many fields. However, due to the lack of prior graphs, their use for semantic labeling has been limited. Here, we propose a novel architecture called the Self-Constructing Graph (SCG), which makes use of learnable latent variables to generate embeddings and to self-construct the underlying graphs directly from the input features without relying on manually built prior knowledge graphs. SCG can automatically obtain optimized non-local context graphs from complex-shaped objects in aerial imagery. We optimize SCG via an adaptive diagonal enhancement method and a variational lower bound that consists of a customized graph reconstruction term and a Kullback-Leibler divergence regularization term. We demonstrate the effectiveness and flexibility of the proposed SCG on the publicly available ISPRS Vaihingen dataset and our model SCG-Net achieves competitive results in terms of F1-score with much fewer parameters and at a lower computational cost compared to related pure-CNN based work. Our code will be made public soon.
Tasks Knowledge Graphs
Published 2020-03-15
URL https://arxiv.org/abs/2003.06932v1
PDF https://arxiv.org/pdf/2003.06932v1.pdf
PWC https://paperswithcode.com/paper/self-constructing-graph-convolutional

Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

Title Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss
Authors Pinkesh Badjatiya, Mausoom Sarkar, Abhishek Sinha, Siddharth Singh, Nikaash Puri, Jayakumar Subramanian, Balaji Krishnamurthy
Abstract In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss) that encourages an agent to stick to the status quo, rather than repeatedly changing its policy. We show how agents trained with SQLoss evolve cooperative behavior in several social dilemma matrix games. To work with social dilemma games that have visual input, we propose GameDistill. GameDistill uses self-supervision and clustering to automatically extract cooperative and selfish policies from a social dilemma game. We combine GameDistill and SQLoss to show how agents evolve socially desirable cooperative behavior in the Coin Game.
Tasks Multi-agent Reinforcement Learning
Published 2020-01-15
URL https://arxiv.org/abs/2001.05458v2
PDF https://arxiv.org/pdf/2001.05458v2.pdf
PWC https://paperswithcode.com/paper/inducing-cooperation-in-multi-agent-games

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

Title Cautious Reinforcement Learning via Distributional Risk in the Dual Domain
Authors Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
Abstract We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent. To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning. The caution measures the distributional risk of a policy, which is a function of the policy’s long-term state occupancy distribution. To solve this problem in an online model-free manner, we propose a stochastic variant of primal-dual method that uses Kullback-Lieber (KL) divergence as its proximal term. We establish that the number of iterations/samples required to attain approximately optimal solutions of this scheme matches tight dependencies on the cardinality of the state and action spaces, but differs in its dependence on the infinity norm of the gradient of the risk measure. Experiments demonstrate the merits of this approach for improving the reliability of reward accumulation without additional computational burdens.
Published 2020-02-27
URL https://arxiv.org/abs/2002.12475v1
PDF https://arxiv.org/pdf/2002.12475v1.pdf
PWC https://paperswithcode.com/paper/cautious-reinforcement-learning-via

Design Optimisation of Power-Efficient Submarine Line through Machine Learning

Title Design Optimisation of Power-Efficient Submarine Line through Machine Learning
Authors Maria Ionescu, Amirhossein Ghazisaeidi, Jérémie Renaudier, Pascal Pecci, Olivier Courtois
Abstract An optimised subsea system design for energy-efficient SDM operation is demonstrated using machine learning. The removal of gain-flattening filters employed in submarine optical amplifiers can result in capacity gains at no additional overall repeater cost.
Published 2020-02-24
URL https://arxiv.org/abs/2002.11037v1
PDF https://arxiv.org/pdf/2002.11037v1.pdf
PWC https://paperswithcode.com/paper/design-optimisation-of-power-efficient

Efficient Clustering for Stretched Mixtures: Landscape and Optimality

Title Efficient Clustering for Stretched Mixtures: Landscape and Optimality
Authors Kaizheng Wang, Yuling Yan, Mateo Diaz
Abstract This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around -1 and 1, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable landscape properties as long as the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision even without good initialization. We also propose a general methodology for multi-class clustering tasks with flexible choices of feature transforms and loss objectives.
Published 2020-03-22
URL https://arxiv.org/abs/2003.09960v1
PDF https://arxiv.org/pdf/2003.09960v1.pdf
PWC https://paperswithcode.com/paper/efficient-clustering-for-stretched-mixtures

Theoretical Understanding of Batch-normalization: A Markov Chain Perspective

Title Theoretical Understanding of Batch-normalization: A Markov Chain Perspective
Authors Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi
Abstract Batch-normalization (BN) is a key component to effectively train deep neural networks. Empirical evidence has shown that without BN, the training process is prone to unstabilities. This is however not well understood from a theoretical point of view. Leveraging tools from Markov chain theory, we show that BN has a direct effect on the rank of the pre-activation matrices of a neural network. Specifically, while deep networks without BN exhibit rank collapse and poor training performance, networks equipped with BN have a higher rank. In an extensive set of experiments on standard neural network architectures and datasets, we show that the latter quantity is a good predictor for the optimization speed of training.
Published 2020-03-03
URL https://arxiv.org/abs/2003.01652v2
PDF https://arxiv.org/pdf/2003.01652v2.pdf
PWC https://paperswithcode.com/paper/theoretical-understanding-of-batch

Towards Productionizing Subjective Search Systems

Title Towards Productionizing Subjective Search Systems
Authors Aaron Feng, Shuwei Chen, Yuliang Li, Hiroshi Matsuda, Hidekazu Tamaki, Wang-Chiew Tan
Abstract Existing e-commerce search engines typically support search only over objective attributes, such as price and locations, leaving the more desirable subjective attributes, such as romantic vibe and worklife balance unsearchable. We found that this is also the case for Recruit Group, which operates a wide range of online booking and search services, including jobs, travel, housing, bridal, dining, beauty, and where each service is among the biggest in Japan, if not internationally. We present our progress towards productionizing a recent subjective search prototype (OpineDB) developed by Megagon Labs for Recruit Group. Several components within OpineDB are enhanced to satisfy production demands, including adding a BERT language model pre-trained on massive hospitality domain review corpora. We also found that the challenges of productionizing the system are beyond enhancing the components. In particular, an important requirement in production-quality systems is to instrument a proper way of measuring the search quality, which is extremely tricky when the search results are subjective. This led to the creation of a high-quality benchmark dataset from scratch, involving over 600 queries by user interviews and a collection of more than 120,000 query-entity relevancy labels. Also, we found that the existing search algorithms do not meet the search quality standard required by production systems. Consequently, we enhanced the ranking model by fine-tuning several search algorithms and combining them under a learning-to-rank framework. The model achieves 5%-10% overall precision improvement and 90+% precision on more than half of the benchmark testing queries making these queries ready for AB-testing. While some enhancements can be immediately applied to other verticals, our experience reveals that benchmarking and fine-tuning ranking algorithms are specific to each domain and cannot be avoided.
Tasks Language Modelling, Learning-To-Rank
Published 2020-03-31
URL https://arxiv.org/abs/2003.13968v1
PDF https://arxiv.org/pdf/2003.13968v1.pdf
PWC https://paperswithcode.com/paper/towards-productionizing-subjective-search

A Time-dependent SIR model for COVID-19 with Undetectable Infected Persons

Title A Time-dependent SIR model for COVID-19 with Undetectable Infected Persons
Authors Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, Tzu-Hsuan Liu
Abstract In this paper, we propose two mathematical models for analyzing and predicting the number of confirmed cases of COVID-19. Our first model is a time-dependent susceptible-infected-recovered (SIR) model that tracks two time series, the transmission rate at time t, and the recovering rate at time t. Our time-dependent SIR method is better than the traditional static SIR model as it can adapt to the change of contagious disease control policies such as city lockdowns. Moreover, it is also more robust than the direct estimation of the number of confirmed cases, as a sudden change of the definition of confirmed cases might result in a spike in the number of new cases. Using the data provided by the National Health Commission of the People’s Republic of China [1], we show that the one-day prediction errors for the numbers of confirmed cases are almost less than 3%. Also, the turning point, defined as the day that the transmission rate is less than the recovering rate, is predicted to be Feb. 17, 2020. After that day, the basic reproduction number is less than 1 if the current contagious disease control policies are maintained in China. In that case, the total number of confirmed cases is predicted to be around 80,000 cases in China under our deterministic model. One problem for the first model is that there are asymptomatic infections for COVID-19. To model this, we propose our second SIR model that has two types of infected persons: detectable infected persons (type I) and undetectable infected cases (type II). To analyze whether there is an outbreak in such a model is characterized by the spectral radius of a 2 by 2 matrix that is closely related to the basic reproduction number. Our numerical results show that there are several countries, including South Korea, Italy, and Iran, that are above the percolation threshold curve, and they are on the verge of COVID-19 outbreaks on Mar. 2, 2020.
Tasks Time Series
Published 2020-02-28
URL https://arxiv.org/abs/2003.00122v2
PDF https://arxiv.org/pdf/2003.00122v2.pdf
PWC https://paperswithcode.com/paper/a-time-dependent-sir-model-for-covid-19

Dissecting Neural ODEs

Title Dissecting Neural ODEs
Authors Stefano Massaroli, Michael Poli, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
Abstract Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we “open the box” and offer a system-theoretic perspective, including state augmentation strategies and robustness, with the aim of clarifying the influence of several design choices on the underlying dynamics. We also introduce novel architectures: among them, a Galerkin-inspired depth-varying parameter model and neural ODEs with data-controlled vector fields.
Published 2020-02-19
URL https://arxiv.org/abs/2002.08071v2
PDF https://arxiv.org/pdf/2002.08071v2.pdf
PWC https://paperswithcode.com/paper/dissecting-neural-odes

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Title ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Authors Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
Abstract Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a more sample-efficient pre-training task called replaced token detection. Instead of masking the input, our approach corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments demonstrate this new pre-training task is more efficient than MLM because the task is defined over all input tokens rather than just the small subset that was masked out. As a result, the contextual representations learned by our approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale, where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when using the same amount of compute.
Tasks Language Modelling
Published 2020-03-23
URL https://arxiv.org/abs/2003.10555v1
PDF https://arxiv.org/pdf/2003.10555v1.pdf
PWC https://paperswithcode.com/paper/electra-pre-training-text-encoders-as-1

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection

Title SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
Authors Xiaoya Li, Yuxian Meng, Qinghong Han, Fei Wu, Jiwei Li
Abstract While the self-attention mechanism has been widely used in a wide variety of tasks, it has the unfortunate property of a quadratic cost with respect to the input length, which makes it difficult to deal with long inputs. In this paper, we present a method for accelerating and structuring self-attentions: Sparse Adaptive Connection (SAC). In SAC, we regard the input sequence as a graph and attention operations are performed between linked nodes. In contrast with previous self-attention models with pre-defined structures (edges), the model learns to construct attention edges to improve task-specific performances. In this way, the model is able to select the most salient nodes and reduce the quadratic complexity regardless of the sequence length. Based on SAC, we show that previous variants of self-attention models are its special cases. Through extensive experiments on neural machine translation, language modeling, graph representation learning and image classification, we demonstrate SAC is competitive with state-of-the-art models while significantly reducing memory cost.
Tasks Graph Representation Learning, Image Classification, Language Modelling, Machine Translation, Representation Learning
Published 2020-03-22
URL https://arxiv.org/abs/2003.09833v1
PDF https://arxiv.org/pdf/2003.09833v1.pdf
PWC https://paperswithcode.com/paper/sac-accelerating-and-structuring-self

A Novel Method of Extracting Topological Features from Word Embeddings

Title A Novel Method of Extracting Topological Features from Word Embeddings
Authors Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
Abstract In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.
Tasks Text Classification, Topological Data Analysis, Word Embeddings
Published 2020-03-29
URL https://arxiv.org/abs/2003.13074v1
PDF https://arxiv.org/pdf/2003.13074v1.pdf
PWC https://paperswithcode.com/paper/a-novel-method-of-extracting-topological

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Title How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Authors Adam Roberts, Colin Raffel, Noam Shazeer
Abstract It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge. We show that this approach scales surprisingly well with model size and outperforms models that explicitly look up knowledge on the open-domain variants of Natural Questions and WebQuestions.
Tasks Language Modelling
Published 2020-02-10
URL https://arxiv.org/abs/2002.08910v2
PDF https://arxiv.org/pdf/2002.08910v2.pdf
PWC https://paperswithcode.com/paper/how-much-knowledge-can-you-pack-into-the
comments powered by Disqus