April 3, 2020

3023 words 15 mins read

Paper Group AWR 48

Paper Group AWR 48

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation. PatentTransformer-2: Controlling Patent Text Generation by Structural Metadata. Reduced Dilation-Erosion Perceptron for Binary Classification. Learning State Abstractions for Transfer in Continuous Control. Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI C …

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

Title A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation
Authors Jian Guan, Fei Huang, Zhihao Zhao, Xiaoyan Zhu, Minlie Huang
Abstract Story generation, namely generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we employ multi-task learning which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.
Tasks Multi-Task Learning, Text Generation
Published 2020-01-15
URL https://arxiv.org/abs/2001.05139v1
PDF https://arxiv.org/pdf/2001.05139v1.pdf
PWC https://paperswithcode.com/paper/a-knowledge-enhanced-pretraining-model-for
Repo https://github.com/thu-coai/CommonsenseStoryGen
Framework tf

PatentTransformer-2: Controlling Patent Text Generation by Structural Metadata

Title PatentTransformer-2: Controlling Patent Text Generation by Structural Metadata
Authors Jieh-Sheng Lee, Jieh Hsiang
Abstract PatentTransformer is our codename for patent text generation based on Transformer-based models. Our goal is “Augmented Inventing.” In this second version, we leverage more of the structural metadata in patents. The structural metadata includes patent title, abstract, and dependent claim, in addition to independent claim previously. Metadata controls what kind of patent text for the model to generate. Also, we leverage the relation between metadata to build a text-to-text generation flow, for example, from a few words to a title, the title to an abstract, the abstract to an independent claim, and the independent claim to multiple dependent claims. The text flow can go backward because the relation is trained bidirectionally. We release our GPT-2 models trained from scratch and our code for inference so that readers can verify and generate patent text on their own. As for generation quality, we measure it by both ROUGE and Google Universal Sentence Encoder.
Tasks Text Generation
Published 2020-01-11
URL https://arxiv.org/abs/2001.03708v1
PDF https://arxiv.org/pdf/2001.03708v1.pdf
PWC https://paperswithcode.com/paper/patenttransformer-2-controlling-patent-text
Repo https://github.com/jiehsheng/PatentTransformer
Framework tf

Reduced Dilation-Erosion Perceptron for Binary Classification

Title Reduced Dilation-Erosion Perceptron for Binary Classification
Authors Marcos Eduardo Valle
Abstract Dilation and erosion are two elementary operations from mathematical morphology, a non-linear lattice computing methodology widely used for image processing and analysis. The dilation-erosion perceptron (DEP) is a morphological neural network obtained by a convex combination of a dilation and an erosion followed by the application of a hard-limiter function for binary classification tasks. A DEP classifier can be trained using a convex-concave procedure along with the minimization of the hinge loss function. As a lattice computing model, the DEP classifier assumes the feature and class spaces are partially ordered sets. In many practical situations, however, there is no natural ordering for the feature patterns. Using concepts from multi-valued mathematical morphology, this paper introduces the reduced dilation-erosion (r-DEP) classifier. An r-DEP classifier is obtained by endowing the feature space with an appropriate reduced ordering. Such reduced ordering can be determined using two approaches: One based on an ensemble of support vector classifiers (SVCs) with different kernels and the other based on a bagging of similar SVCs trained using different samples of the training set. Using several binary classification datasets from the OpenML repository, the ensemble and bagging r-DEP classifiers yielded in mean higher balanced accuracy scores than the linear, polynomial, and radial basis function (RBF) SVCs as well as their ensemble and a bagging of RBF SVCs.
Tasks
Published 2020-03-04
URL https://arxiv.org/abs/2003.02306v1
PDF https://arxiv.org/pdf/2003.02306v1.pdf
PWC https://paperswithcode.com/paper/reduced-dilation-erosion-perceptron-for
Repo https://github.com/mevalle/r-DEP-Classifier
Framework none

Learning State Abstractions for Transfer in Continuous Control

Title Learning State Abstractions for Transfer in Continuous Control
Authors Kavosh Asadi, David Abel, Michael L. Littman
Abstract Can simple algorithms with a good representation solve challenging reinforcement learning problems? In this work, we answer this question in the affirmative, where we take “simple learning algorithm” to be tabular Q-Learning, the “good representations” to be a learned state abstraction, and “challenging problems” to be continuous control tasks. Our main contribution is a learning algorithm that abstracts a continuous state-space into a discrete one. We transfer this learned representation to unseen problems to enable effective learning. We provide theory showing that learned abstractions maintain a bounded value loss, and we report experiments showing that the abstractions empower tabular Q-Learning to learn efficiently in unseen tasks.
Tasks Continuous Control, Q-Learning
Published 2020-02-08
URL https://arxiv.org/abs/2002.05518v1
PDF https://arxiv.org/pdf/2002.05518v1.pdf
PWC https://paperswithcode.com/paper/learning-state-abstractions-for-transfer-in
Repo https://github.com/anonicml2019/icml_2019_state_abstraction
Framework none

Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning

Title Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning
Authors Jieshan Chen, Chunyang Chen, Zhenchang Xing, Xiwei Xu, Liming Zhu, Guoqiang Li, Jinshui Wang
Abstract According to the World Health Organization(WHO), it is estimated that approximately 1.3 billion people live with some forms of vision impairment globally, of whom 36 million are blind. Due to their disability, engaging these minority into the society is a challenging problem. The recent rise of smart mobile phones provides a new solution by enabling blind users’ convenient access to the information and service for understanding the world. Users with vision impairment can adopt the screen reader embedded in the mobile operating systems to read the content of each screen within the app, and use gestures to interact with the phone. However, the prerequisite of using screen readers is that developers have to add natural-language labels to the image-based components when they are developing the app. Unfortunately, more than 77% apps have issues of missing labels, according to our analysis of 10,408 Android apps. Most of these issues are caused by developers’ lack of awareness and knowledge in considering the minority. And even if developers want to add the labels to UI components, they may not come up with concise and clear description as most of them are of no visual issues. To overcome these challenges, we develop a deep-learning based model, called LabelDroid, to automatically predict the labels of image-based buttons by learning from large-scale commercial apps in Google Play. The experimental results show that our model can make accurate predictions and the generated labels are of higher quality than that from real Android developers.
Tasks
Published 2020-03-01
URL https://arxiv.org/abs/2003.00380v1
PDF https://arxiv.org/pdf/2003.00380v1.pdf
PWC https://paperswithcode.com/paper/unblind-your-apps-predicting-natural-language
Repo https://github.com/chenjshnn/LabelDroid
Framework none

Understanding the Downstream Instability of Word Embeddings

Title Understanding the Downstream Instability of Word Embeddings
Authors Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré
Abstract Many industrial machine learning (ML) systems require frequent retraining to keep up-to-date with constantly changing data. This retraining exacerbates a large challenge facing ML systems today: model training is unstable, i.e., small changes in training data can cause significant changes in the model’s predictions. In this paper, we work on developing a deeper understanding of this instability, with a focus on how a core building block of modern natural language processing (NLP) pipelines—pre-trained word embeddings—affects the instability of downstream NLP models. We first empirically reveal a tradeoff between stability and memory: increasing the embedding memory 2x can reduce the disagreement in predictions due to small changes in training data by 5% to 37% (relative). To theoretically explain this tradeoff, we introduce a new measure of embedding instability—the eigenspace instability measure—which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings. Practically, we show that the eigenspace instability measure can be a cost-effective way to choose embedding parameters to minimize instability without training downstream models, outperforming other embedding distance measures and performing competitively with a nearest neighbor-based measure. Finally, we demonstrate that the observed stability-memory tradeoffs extend to other types of embeddings as well, including knowledge graph and contextual word embeddings.
Tasks Word Embeddings
Published 2020-02-29
URL https://arxiv.org/abs/2003.04983v1
PDF https://arxiv.org/pdf/2003.04983v1.pdf
PWC https://paperswithcode.com/paper/understanding-the-downstream-instability-of
Repo https://github.com/HazyResearch/anchor-stability
Framework none

DistNet: Deep Tracking by displacement regression: application to bacteria growing in the Mother Machine

Title DistNet: Deep Tracking by displacement regression: application to bacteria growing in the Mother Machine
Authors Jean Ollion, Charles Ollion
Abstract The mother machine is a popular microfluidic device that allows long-term time-lapse imaging of thousands of cells in parallel by microscopy. It has become a valuable tool for single-cell level quantitative analysis and characterization of many cellular processes such as gene expression and regulation, mutagenesis or response to antibiotics. The automated and quantitative analysis of the massive amount of data generated by such experiments is now the limiting step. In particular the segmentation and tracking of bacteria cells imaged in phase-contrast microscopy—with error rates compatible with high-throughput data—is a challenging problem. In this work, we describe a novel formulation of the multi-object tracking problem, in which tracking is performed by a regression of the bacteria’s displacement, allowing simultaneous tracking of multiple bacteria, despite their growth and division over time. Our method performs jointly segmentation and tracking, leveraging sequential information to increase segmentation accuracy. We introduce a Deep Neural Network architecture taking advantage of a self-attention mechanism which yields less than 0.005% tracking error rate and less than 0.03% segmentation error rate. We demonstrate superior performance and speed compared to state-of-the-art methods. While this method is particularly well suited for mother machine microscopy data, its general joint tracking and segmentation formulation could be applied to many other problems with different geometries.
Tasks Multi-Object Tracking, Object Tracking
Published 2020-03-17
URL https://arxiv.org/abs/2003.07790v1
PDF https://arxiv.org/pdf/2003.07790v1.pdf
PWC https://paperswithcode.com/paper/distnet-deep-tracking-by-displacement
Repo https://github.com/jeanollion/distnet
Framework tf

Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

Title Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent
Authors David Holzmüller, Ingo Steinwart
Abstract We prove that two-layer (Leaky)ReLU networks initialized by e.g. the widely used method proposed by He et al. (2015) and trained using gradient descent on a least-squares loss are not universally consistent. Specifically, we describe a large class of data-generating distributions for which, with high probability, gradient descent only finds a bad local minimum of the optimization landscape. It turns out that in these cases, the found network essentially performs linear regression even if the target function is non-linear. We further provide numerical evidence that this happens in practical situations and that stochastic gradient descent exhibits similar behavior.
Tasks
Published 2020-02-12
URL https://arxiv.org/abs/2002.04861v1
PDF https://arxiv.org/pdf/2002.04861v1.pdf
PWC https://paperswithcode.com/paper/training-two-layer-relu-networks-with
Repo https://github.com/dholzmueller/nn_inconsistency
Framework none

Objective Social Choice: Using Auxiliary Information to Improve Voting Outcomes

Title Objective Social Choice: Using Auxiliary Information to Improve Voting Outcomes
Authors Silviu Pitis, Michael R. Zhang
Abstract How should one combine noisy information from diverse sources to make an inference about an objective ground truth? This frequently recurring, normative question lies at the core of statistics, machine learning, policy-making, and everyday life. It has been called “combining forecasts”, “meta-analysis”, “ensembling”, and the “MLE approach to voting”, among other names. Past studies typically assume that noisy votes are identically and independently distributed (i.i.d.), but this assumption is often unrealistic. Instead, we assume that votes are independent but not necessarily identically distributed and that our ensembling algorithm has access to certain auxiliary information related to the underlying model governing the noise in each vote. In our present work, we: (1) define our problem and argue that it reflects common and socially relevant real world scenarios, (2) propose a multi-arm bandit noise model and count-based auxiliary information set, (3) derive maximum likelihood aggregation rules for ranked and cardinal votes under our noise model, (4) propose, alternatively, to learn an aggregation rule using an order-invariant neural network, and (5) empirically compare our rules to common voting rules and naive experience-weighted modifications. We find that our rules successfully use auxiliary information to outperform the naive baselines.
Tasks
Published 2020-01-27
URL https://arxiv.org/abs/2001.10092v1
PDF https://arxiv.org/pdf/2001.10092v1.pdf
PWC https://paperswithcode.com/paper/objective-social-choice-using-auxiliary
Repo https://github.com/spitis/objective_social_choice
Framework pytorch

Towards Explainability of Machine Learning Models in Insurance Pricing

Title Towards Explainability of Machine Learning Models in Insurance Pricing
Authors Kevin Kuo, Daniel Lupton
Abstract Machine learning methods have garnered increasing interest among actuaries in recent years. However, their adoption by practitioners has been limited, partly due to the lack of transparency of these methods, as compared to generalized linear models. In this paper, we discuss the need for model interpretability in property & casualty insurance ratemaking, propose a framework for explaining models, and present a case study to illustrate the framework.
Tasks
Published 2020-03-24
URL https://arxiv.org/abs/2003.10674v1
PDF https://arxiv.org/pdf/2003.10674v1.pdf
PWC https://paperswithcode.com/paper/towards-explainability-of-machine-learning
Repo https://github.com/kasaai/explain-ml-pricing
Framework none

Higher-Order Label Homogeneity and Spreading in Graphs

Title Higher-Order Label Homogeneity and Spreading in Graphs
Authors Dhivya Eswaran, Srijan Kumar, Christos Faloutsos
Abstract Do higher-order network structures aid graph semi-supervised learning? Given a graph and a few labeled vertices, labeling the remaining vertices is a high-impact problem with applications in several tasks, such as recommender systems, fraud detection and protein identification. However, traditional methods rely on edges for spreading labels, which is limited as all edges are not equal. Vertices with stronger connections participate in higher-order structures in graphs, which calls for methods that can leverage these structures in the semi-supervised learning tasks. To this end, we propose Higher-Order Label Spreading (HOLS) to spread labels using higher-order structures. HOLS has strong theoretical guarantees and reduces to standard label spreading in the base case. Via extensive experiments, we show that higher-order label spreading using triangles in addition to edges is up to 4.7% better than label spreading using edges alone. Compared to prior traditional and state-of-the-art methods, the proposed method leads to statistically significant accuracy gains in all-but-one cases, while remaining fast and scalable to large graphs.
Tasks Fraud Detection, Recommendation Systems
Published 2020-02-18
URL https://arxiv.org/abs/2002.07833v1
PDF https://arxiv.org/pdf/2002.07833v1.pdf
PWC https://paperswithcode.com/paper/higher-order-label-homogeneity-and-spreading
Repo https://github.com/dhivyaeswaran/hols
Framework none

Choice Set Optimization Under Discrete Choice Models of Group Decisions

Title Choice Set Optimization Under Discrete Choice Models of Group Decisions
Authors Kiran Tomlinson, Austin R. Benson
Abstract The way that people make choices or exhibit preferences can be strongly affected by the set of available alternatives, often called the choice set. Furthermore, there are usually heterogeneous preferences, either at an individual level within small groups or within sub-populations of large groups. Given the availability of choice data, there are now many models that capture this behavior in order to make effective predictions. However, there is little work in understanding how directly changing the choice set can be used to influence a group’s preferences or decisions. Here, we use discrete choice modeling to develop an optimization framework of such interventions for several problems of group influence, including maximizing agreement or disagreement and promoting a particular choice. We show that these problems are NP-hard in general but imposing restrictions reveals a fundamental boundary: promoting an item is easier than maximizing agreement or disagreement. After, we design approximation algorithms for the hard problems and show that they work extremely well for real-world choice data.
Tasks
Published 2020-02-02
URL https://arxiv.org/abs/2002.00421v1
PDF https://arxiv.org/pdf/2002.00421v1.pdf
PWC https://paperswithcode.com/paper/choice-set-optimization-under-discrete-choice
Repo https://github.com/tomlinsonk/choice-set-opt
Framework pytorch

6D Object Pose Regression via Supervised Learning on Point Clouds

Title 6D Object Pose Regression via Supervised Learning on Point Clouds
Authors Ge Gao, Mikko Lauri, Yulong Wang, Xiaolin Hu, Jianwei Zhang, Simone Frintrop
Abstract This paper addresses the task of estimating the 6 degrees of freedom pose of a known 3D object from depth information represented by a point cloud. Deep features learned by convolutional neural networks from color information have been the dominant features to be used for inferring object poses, while depth information receives much less attention. However, depth information contains rich geometric information of the object shape, which is important for inferring the object pose. We use depth information represented by point clouds as the input to both deep networks and geometry-based pose refinement and use separate networks for rotation and translation regression. We argue that the axis-angle representation is a suitable rotation representation for deep learning, and use a geodesic loss function for rotation regression. Ablation studies show that these design choices outperform alternatives such as the quaternion representation and L2 loss, or regressing translation and rotation with the same network. Our simple yet effective approach clearly outperforms state-of-the-art methods on the YCB-video dataset. The implementation and trained model are avaliable at: https://github.com/GeeeG/CloudPose.
Tasks
Published 2020-01-24
URL https://arxiv.org/abs/2001.08942v1
PDF https://arxiv.org/pdf/2001.08942v1.pdf
PWC https://paperswithcode.com/paper/6d-object-pose-regression-via-supervised
Repo https://github.com/GeeeG/CloudPose
Framework tf

CausalML: Python Package for Causal Machine Learning

Title CausalML: Python Package for Causal Machine Learning
Authors Huigang Chen, Totte Harinen, Jeong-Yoon Lee, Mike Yung, Zhenyu Zhao
Abstract CausalML is a Python implementation of algorithms related to causal inference and machine learning. Algorithms combining causal inference and machine learning have been a trending topic in recent years. This package tries to bridge the gap between theoretical work on methodology and practical applications by making a collection of methods in this field available in Python. This paper introduces the key concepts, scope, and use cases of this package.
Tasks Causal Inference
Published 2020-02-25
URL https://arxiv.org/abs/2002.11631v2
PDF https://arxiv.org/pdf/2002.11631v2.pdf
PWC https://paperswithcode.com/paper/causalml-python-package-for-causal-machine
Repo https://github.com/uber/causalml
Framework none

PointAugment: an Auto-Augmentation Framework for Point Cloud Classification

Title PointAugment: an Auto-Augmentation Framework for Point Cloud Classification
Authors Ruihui Li, Xianzhi Li, Pheng-Ann Heng, Chi-Wing Fu
Abstract We present PointAugment, a new auto-augmentation framework that automatically optimizes and augments point cloud samples to enrich the data diversity when we train a classification network. Different from existing auto-augmentation methods for 2D images, PointAugment is sample-aware and takes an adversarial learning strategy to jointly optimize an augmentor network and a classifier network, such that the augmentor can learn to produce augmented samples that best fit the classifier. Moreover, we formulate a learnable point augmentation function with a shape-wise transformation and a point-wise displacement, and carefully design loss functions to adopt the augmented samples based on the learning progress of the classifier. Extensive experiments also confirm PointAugment’s effectiveness and robustness to improve the performance of various networks on shape classification and retrieval.
Tasks
Published 2020-02-25
URL https://arxiv.org/abs/2002.10876v2
PDF https://arxiv.org/pdf/2002.10876v2.pdf
PWC https://paperswithcode.com/paper/pointaugment-an-auto-augmentation-framework
Repo https://github.com/liruihui/PointAugment
Framework none
comments powered by Disqus