April 2, 2020

3058 words 15 mins read

Paper Group ANR 119

Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature. Options of Interest: Temporal Abstraction with Interest Functions. Self-Constructing Graph Convolutional Networks for Semantic Labeling. Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using St …

Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature


Title	Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature
Authors	Fusataka Kuniyoshi, Kohei Makino, Jun Ozawa, Makoto Miwa
Abstract	The synthesis process is essential for achieving computational experiment design in the field of inorganic materials chemistry. In this work, we present a novel corpus of the synthesis process for all-solid-state batteries and an automated machine reading system for extracting the synthesis processes buried in the scientific literature. We define the representation of the synthesis processes using flow graphs, and create a corpus from the experimental sections of 243 papers. The automated machine-reading system is developed by a deep learning-based sequence tagger and simple heuristic rule-based relation extractor. Our experimental results demonstrate that the sequence tagger with the optimal setting can detect the entities with a macro-averaged F1 score of 0.826, while the rule-based relation extractor can achieve high performance with a macro-averaged F1 score of 0.887.
Tasks	Reading Comprehension
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07339v1
PDF	https://arxiv.org/pdf/2002.07339v1.pdf
PWC	https://paperswithcode.com/paper/annotating-and-extracting-synthesis-process
Repo
Framework

Options of Interest: Temporal Abstraction with Interest Functions


Title	Options of Interest: Temporal Abstraction with Interest Functions
Authors	Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup
Abstract	Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.
Tasks
Published	2020-01-01
URL	https://arxiv.org/abs/2001.00271v1
PDF	https://arxiv.org/pdf/2001.00271v1.pdf
PWC	https://paperswithcode.com/paper/options-of-interest-temporal-abstraction-with
Repo
Framework

Self-Constructing Graph Convolutional Networks for Semantic Labeling


Title	Self-Constructing Graph Convolutional Networks for Semantic Labeling
Authors	Qinghui Liu, Michael Kampffmeyer, Robert Jenssen, Arnt-Børre Salberg
Abstract	Graph Neural Networks (GNNs) have received increasing attention in many fields. However, due to the lack of prior graphs, their use for semantic labeling has been limited. Here, we propose a novel architecture called the Self-Constructing Graph (SCG), which makes use of learnable latent variables to generate embeddings and to self-construct the underlying graphs directly from the input features without relying on manually built prior knowledge graphs. SCG can automatically obtain optimized non-local context graphs from complex-shaped objects in aerial imagery. We optimize SCG via an adaptive diagonal enhancement method and a variational lower bound that consists of a customized graph reconstruction term and a Kullback-Leibler divergence regularization term. We demonstrate the effectiveness and flexibility of the proposed SCG on the publicly available ISPRS Vaihingen dataset and our model SCG-Net achieves competitive results in terms of F1-score with much fewer parameters and at a lower computational cost compared to related pure-CNN based work. Our code will be made public soon.
Tasks	Knowledge Graphs
Published	2020-03-15
URL	https://arxiv.org/abs/2003.06932v1
PDF	https://arxiv.org/pdf/2003.06932v1.pdf
PWC	https://paperswithcode.com/paper/self-constructing-graph-convolutional
Repo
Framework


Title	Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss
Authors	Pinkesh Badjatiya, Mausoom Sarkar, Abhishek Sinha, Siddharth Singh, Nikaash Puri, Jayakumar Subramanian, Balaji Krishnamurthy
Abstract	In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss) that encourages an agent to stick to the status quo, rather than repeatedly changing its policy. We show how agents trained with SQLoss evolve cooperative behavior in several social dilemma matrix games. To work with social dilemma games that have visual input, we propose GameDistill. GameDistill uses self-supervision and clustering to automatically extract cooperative and selfish policies from a social dilemma game. We combine GameDistill and SQLoss to show how agents evolve socially desirable cooperative behavior in the Coin Game.
Tasks	Multi-agent Reinforcement Learning
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05458v2
PDF	https://arxiv.org/pdf/2001.05458v2.pdf
PWC	https://paperswithcode.com/paper/inducing-cooperation-in-multi-agent-games
Repo
Framework

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain


Title	Cautious Reinforcement Learning via Distributional Risk in the Dual Domain
Authors	Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
Abstract	We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent. To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning. The caution measures the distributional risk of a policy, which is a function of the policy’s long-term state occupancy distribution. To solve this problem in an online model-free manner, we propose a stochastic variant of primal-dual method that uses Kullback-Lieber (KL) divergence as its proximal term. We establish that the number of iterations/samples required to attain approximately optimal solutions of this scheme matches tight dependencies on the cardinality of the state and action spaces, but differs in its dependence on the infinity norm of the gradient of the risk measure. Experiments demonstrate the merits of this approach for improving the reliability of reward accumulation without additional computational burdens.
Tasks
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12475v1
PDF	https://arxiv.org/pdf/2002.12475v1.pdf
PWC	https://paperswithcode.com/paper/cautious-reinforcement-learning-via
Repo
Framework

Design Optimisation of Power-Efficient Submarine Line through Machine Learning


Title	Design Optimisation of Power-Efficient Submarine Line through Machine Learning
Authors	Maria Ionescu, Amirhossein Ghazisaeidi, Jérémie Renaudier, Pascal Pecci, Olivier Courtois
Abstract	An optimised subsea system design for energy-efficient SDM operation is demonstrated using machine learning. The removal of gain-flattening filters employed in submarine optical amplifiers can result in capacity gains at no additional overall repeater cost.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.11037v1
PDF	https://arxiv.org/pdf/2002.11037v1.pdf
PWC	https://paperswithcode.com/paper/design-optimisation-of-power-efficient
Repo
Framework

Efficient Clustering for Stretched Mixtures: Landscape and Optimality


Title	Efficient Clustering for Stretched Mixtures: Landscape and Optimality
Authors	Kaizheng Wang, Yuling Yan, Mateo Diaz
Abstract	This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around -1 and 1, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable landscape properties as long as the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision even without good initialization. We also propose a general methodology for multi-class clustering tasks with flexible choices of feature transforms and loss objectives.
Tasks
Published	2020-03-22
URL	https://arxiv.org/abs/2003.09960v1
PDF	https://arxiv.org/pdf/2003.09960v1.pdf
PWC	https://paperswithcode.com/paper/efficient-clustering-for-stretched-mixtures
Repo
Framework

Theoretical Understanding of Batch-normalization: A Markov Chain Perspective


Title	Theoretical Understanding of Batch-normalization: A Markov Chain Perspective
Authors	Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi
Abstract	Batch-normalization (BN) is a key component to effectively train deep neural networks. Empirical evidence has shown that without BN, the training process is prone to unstabilities. This is however not well understood from a theoretical point of view. Leveraging tools from Markov chain theory, we show that BN has a direct effect on the rank of the pre-activation matrices of a neural network. Specifically, while deep networks without BN exhibit rank collapse and poor training performance, networks equipped with BN have a higher rank. In an extensive set of experiments on standard neural network architectures and datasets, we show that the latter quantity is a good predictor for the optimization speed of training.
Tasks
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01652v2
PDF	https://arxiv.org/pdf/2003.01652v2.pdf
PWC	https://paperswithcode.com/paper/theoretical-understanding-of-batch
Repo
Framework

Towards Productionizing Subjective Search Systems


Title	Towards Productionizing Subjective Search Systems
Authors	Aaron Feng, Shuwei Chen, Yuliang Li, Hiroshi Matsuda, Hidekazu Tamaki, Wang-Chiew Tan
Abstract	Existing e-commerce search engines typically support search only over objective attributes, such as price and locations, leaving the more desirable subjective attributes, such as romantic vibe and worklife balance unsearchable. We found that this is also the case for Recruit Group, which operates a wide range of online booking and search services, including jobs, travel, housing, bridal, dining, beauty, and where each service is among the biggest in Japan, if not internationally. We present our progress towards productionizing a recent subjective search prototype (OpineDB) developed by Megagon Labs for Recruit Group. Several components within OpineDB are enhanced to satisfy production demands, including adding a BERT language model pre-trained on massive hospitality domain review corpora. We also found that the challenges of productionizing the system are beyond enhancing the components. In particular, an important requirement in production-quality systems is to instrument a proper way of measuring the search quality, which is extremely tricky when the search results are subjective. This led to the creation of a high-quality benchmark dataset from scratch, involving over 600 queries by user interviews and a collection of more than 120,000 query-entity relevancy labels. Also, we found that the existing search algorithms do not meet the search quality standard required by production systems. Consequently, we enhanced the ranking model by fine-tuning several search algorithms and combining them under a learning-to-rank framework. The model achieves 5%-10% overall precision improvement and 90+% precision on more than half of the benchmark testing queries making these queries ready for AB-testing. While some enhancements can be immediately applied to other verticals, our experience reveals that benchmarking and fine-tuning ranking algorithms are specific to each domain and cannot be avoided.
Tasks	Language Modelling, Learning-To-Rank
Published	2020-03-31
URL	https://arxiv.org/abs/2003.13968v1
PDF	https://arxiv.org/pdf/2003.13968v1.pdf
PWC	https://paperswithcode.com/paper/towards-productionizing-subjective-search
Repo
Framework

A Time-dependent SIR model for COVID-19 with Undetectable Infected Persons


Title	A Time-dependent SIR model for COVID-19 with Undetectable Infected Persons
Authors	Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, Tzu-Hsuan Liu
Abstract	In this paper, we propose two mathematical models for analyzing and predicting the number of confirmed cases of COVID-19. Our first model is a time-dependent susceptible-infected-recovered (SIR) model that tracks two time series, the transmission rate at time t, and the recovering rate at time t. Our time-dependent SIR method is better than the traditional static SIR model as it can adapt to the change of contagious disease control policies such as city lockdowns. Moreover, it is also more robust than the direct estimation of the number of confirmed cases, as a sudden change of the definition of confirmed cases might result in a spike in the number of new cases. Using the data provided by the National Health Commission of the People’s Republic of China [1], we show that the one-day prediction errors for the numbers of confirmed cases are almost less than 3%. Also, the turning point, defined as the day that the transmission rate is less than the recovering rate, is predicted to be Feb. 17, 2020. After that day, the basic reproduction number is less than 1 if the current contagious disease control policies are maintained in China. In that case, the total number of confirmed cases is predicted to be around 80,000 cases in China under our deterministic model. One problem for the first model is that there are asymptomatic infections for COVID-19. To model this, we propose our second SIR model that has two types of infected persons: detectable infected persons (type I) and undetectable infected cases (type II). To analyze whether there is an outbreak in such a model is characterized by the spectral radius of a 2 by 2 matrix that is closely related to the basic reproduction number. Our numerical results show that there are several countries, including South Korea, Italy, and Iran, that are above the percolation threshold curve, and they are on the verge of COVID-19 outbreaks on Mar. 2, 2020.
Tasks	Time Series
Published	2020-02-28
URL	https://arxiv.org/abs/2003.00122v2
PDF	https://arxiv.org/pdf/2003.00122v2.pdf
PWC	https://paperswithcode.com/paper/a-time-dependent-sir-model-for-covid-19
Repo
Framework

Dissecting Neural ODEs


Title	Dissecting Neural ODEs
Authors	Stefano Massaroli, Michael Poli, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
Abstract	Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we “open the box” and offer a system-theoretic perspective, including state augmentation strategies and robustness, with the aim of clarifying the influence of several design choices on the underlying dynamics. We also introduce novel architectures: among them, a Galerkin-inspired depth-varying parameter model and neural ODEs with data-controlled vector fields.
Tasks
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08071v2
PDF	https://arxiv.org/pdf/2002.08071v2.pdf
PWC	https://paperswithcode.com/paper/dissecting-neural-odes
Repo
Framework

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators


Title	ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Authors	Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
Abstract	Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a more sample-efficient pre-training task called replaced token detection. Instead of masking the input, our approach corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments demonstrate this new pre-training task is more efficient than MLM because the task is defined over all input tokens rather than just the small subset that was masked out. As a result, the contextual representations learned by our approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale, where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when using the same amount of compute.
Tasks	Language Modelling
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10555v1
PDF	https://arxiv.org/pdf/2003.10555v1.pdf
PWC	https://paperswithcode.com/paper/electra-pre-training-text-encoders-as-1
Repo
Framework

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection


Title	SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
Authors	Xiaoya Li, Yuxian Meng, Qinghong Han, Fei Wu, Jiwei Li
Abstract	While the self-attention mechanism has been widely used in a wide variety of tasks, it has the unfortunate property of a quadratic cost with respect to the input length, which makes it difficult to deal with long inputs. In this paper, we present a method for accelerating and structuring self-attentions: Sparse Adaptive Connection (SAC). In SAC, we regard the input sequence as a graph and attention operations are performed between linked nodes. In contrast with previous self-attention models with pre-defined structures (edges), the model learns to construct attention edges to improve task-specific performances. In this way, the model is able to select the most salient nodes and reduce the quadratic complexity regardless of the sequence length. Based on SAC, we show that previous variants of self-attention models are its special cases. Through extensive experiments on neural machine translation, language modeling, graph representation learning and image classification, we demonstrate SAC is competitive with state-of-the-art models while significantly reducing memory cost.
Tasks	Graph Representation Learning, Image Classification, Language Modelling, Machine Translation, Representation Learning
Published	2020-03-22
URL	https://arxiv.org/abs/2003.09833v1
PDF	https://arxiv.org/pdf/2003.09833v1.pdf
PWC	https://paperswithcode.com/paper/sac-accelerating-and-structuring-self
Repo
Framework

A Novel Method of Extracting Topological Features from Word Embeddings


Title	A Novel Method of Extracting Topological Features from Word Embeddings
Authors	Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
Abstract	In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.
Tasks	Text Classification, Topological Data Analysis, Word Embeddings
Published	2020-03-29
URL	https://arxiv.org/abs/2003.13074v1
PDF	https://arxiv.org/pdf/2003.13074v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-method-of-extracting-topological
Repo
Framework

How Much Knowledge Can You Pack Into the Parameters of a Language Model?


Title	How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Authors	Adam Roberts, Colin Raffel, Noam Shazeer
Abstract	It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge. We show that this approach scales surprisingly well with model size and outperforms models that explicitly look up knowledge on the open-domain variants of Natural Questions and WebQuestions.
Tasks	Language Modelling
Published	2020-02-10
URL	https://arxiv.org/abs/2002.08910v2
PDF	https://arxiv.org/pdf/2002.08910v2.pdf
PWC	https://paperswithcode.com/paper/how-much-knowledge-can-you-pack-into-the
Repo
Framework