February 1, 2020

3022 words 15 mins read

Paper Group AWR 295

Paper Group AWR 295

Gromov-Wasserstein Factorization Models for Graph Clustering. Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System. Tree Transformer: Integrating Tree Structures into Self-Attention. Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness. Place Recognition for Stereo VisualOdometry us …

Gromov-Wasserstein Factorization Models for Graph Clustering

Title Gromov-Wasserstein Factorization Models for Graph Clustering
Authors Hongteng Xu
Abstract We propose a new nonlinear factorization model for graphs that are with topological structures, and optionally, node attributes. This model is based on a pseudometric called Gromov-Wasserstein (GW) discrepancy, which compares graphs in a relational way. It estimates observed graphs as GW barycenters constructed by a set of atoms with different weights. By minimizing the GW discrepancy between each observed graph and its GW barycenter-based estimation, we learn the atoms and their weights associated with the observed graphs. The model achieves a novel and flexible factorization mechanism under GW discrepancy, in which both the observed graphs and the learnable atoms can be unaligned and with different sizes. We design an effective approximate algorithm for learning this Gromov-Wasserstein factorization (GWF) model, unrolling loopy computations as stacked modules and computing gradients with backpropagation. The stacked modules can be with two different architectures, which correspond to the proximal point algorithm (PPA) and Bregman alternating direction method of multipliers (BADMM), respectively. Experiments show that our model obtains encouraging results on clustering graphs.
Tasks Graph Clustering
Published 2019-11-19
URL https://arxiv.org/abs/1911.08530v1
PDF https://arxiv.org/pdf/1911.08530v1.pdf
PWC https://paperswithcode.com/paper/gromov-wasserstein-factorization-models-for
Repo https://github.com/HongtengXu/Relational-Factorization-Model
Framework pytorch

Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System

Title Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System
Authors Ye Liu, Chenwei Zhang, Xiaohui Yan, Yi Chang, Philip S. Yu
Abstract In real-world question-answering (QA) systems, ill-formed questions, such as wrong words, ill word order, and noisy expressions, are common and may prevent the QA systems from understanding and answering them accurately. In order to eliminate the effect of ill-formed questions, we approach the question refinement task and propose a unified model, QREFINE, to refine the ill-formed questions to well-formed question. The basic idea is to learn a Seq2Seq model to generate a new question from the original one. To improve the quality and retrieval performance of the generated questions, we make two major improvements: 1) To better encode the semantics of ill-formed questions, we enrich the representation of questions with character embedding and the recent proposed contextual word embedding such as BERT, besides the traditional context-free word embeddings; 2) To make it capable to generate desired questions, we train the model with deep reinforcement learning techniques that considers an appropriate wording of the generation as an immediate reward and the correlation between generated question and answer as time-delayed long-term rewards. Experimental results on real-world datasets show that the proposed QREFINE method can generate refined questions with more readability but fewer mistakes than the original questions provided by users. Moreover, the refined questions also significantly improve the accuracy of answer retrieval.
Tasks Question Answering, Word Embeddings
Published 2019-08-13
URL https://arxiv.org/abs/1908.05604v3
PDF https://arxiv.org/pdf/1908.05604v3.pdf
PWC https://paperswithcode.com/paper/generative-question-refinement-with-deep
Repo https://github.com/yeliu918/QREFINE-PPO
Framework tf

Tree Transformer: Integrating Tree Structures into Self-Attention

Title Tree Transformer: Integrating Tree Structures into Self-Attention
Authors Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen
Abstract Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed “Constituent Attention” module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores.
Tasks Language Modelling
Published 2019-09-14
URL https://arxiv.org/abs/1909.06639v2
PDF https://arxiv.org/pdf/1909.06639v2.pdf
PWC https://paperswithcode.com/paper/tree-transformer-integrating-tree-structures
Repo https://github.com/yaushian/Tree-Transformer
Framework pytorch

Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness

Title Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness
Authors Saeed Mahloujifar, Xiao Zhang, Mohammad Mahmoody, David Evans
Abstract Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018b), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with $\Omega(1)$ (e.g., 1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to $\ell_\infty$ and $\ell_2$ perturbations of several image classification benchmarks. Code for our experiments is available at https://github.com/xiaozhanguva/Measure-Concentration.
Tasks Image Classification
Published 2019-05-29
URL https://arxiv.org/abs/1905.12202v2
PDF https://arxiv.org/pdf/1905.12202v2.pdf
PWC https://paperswithcode.com/paper/empirically-measuring-concentration
Repo https://github.com/xiaozhanguva/Measure-Concentration
Framework pytorch

Place Recognition for Stereo VisualOdometry using LiDAR Descriptors

Title Place Recognition for Stereo VisualOdometry using LiDAR Descriptors
Authors Jiawei Mo, Junaed Sattar
Abstract Place recognition is a core component in SLAM, and in most visual SLAM systems, it is based on the similarity between 2D images. However, the 3D points generated by visual odometry, and the structure information embedded within, are not exploited. In this paper, we adapt place recognition methods for 3D point clouds into stereo visual odometry. Stereo visual odometry generates 3D point clouds with a consistent scale. Thus, we are able to use global LiDAR descriptors for 3D point clouds to determine the similarity between places. 3D point clouds are more reliable than 2D visual cues (e.g., 2D features) against environmental changes such as varying illumination and can benefit visual SLAM systems in long-term deployment scenarios. Extensive evaluation on a public dataset (Oxford RobotCar) demonstrates the accuracy and efficiency of using 3D point clouds for place recognition over 2D methods.
Tasks Visual Odometry
Published 2019-09-16
URL https://arxiv.org/abs/1909.07267v2
PDF https://arxiv.org/pdf/1909.07267v2.pdf
PWC https://paperswithcode.com/paper/place-recognition-for-stereo-visualodometry
Repo https://github.com/jiawei-mo/3d_place_recognition
Framework none

Least Squares Auto-Tuning

Title Least Squares Auto-Tuning
Authors Shane Barratt, Stephen Boyd
Abstract Least squares is by far the simplest and most commonly applied computational method in many fields. In almost all applications, the least squares objective is rarely the true objective. We account for this discrepancy by parametrizing the least squares problem and automatically adjusting these parameters using an optimization algorithm. We apply our method, which we call least squares auto-tuning, to data fitting.
Tasks
Published 2019-04-10
URL http://arxiv.org/abs/1904.05460v1
PDF http://arxiv.org/pdf/1904.05460v1.pdf
PWC https://paperswithcode.com/paper/least-squares-auto-tuning
Repo https://github.com/sbarratt/lsat
Framework pytorch

Sentence Centrality Revisited for Unsupervised Summarization

Title Sentence Centrality Revisited for Unsupervised Summarization
Authors Hao Zheng, Mirella Lapata
Abstract Single document summarization has enjoyed renewed interests in recent years thanks to the popularity of neural network models and the availability of large-scale datasets. In this paper we develop an unsupervised approach arguing that it is unrealistic to expect large-scale and high-quality training data to be available or created for different types of summaries, domains, or languages. We revisit a popular graph-based ranking algorithm and modify how node (aka sentence) centrality is computed in two ways: (a)~we employ BERT, a state-of-the-art neural representation learning model to better capture sentential meaning and (b)~we build graphs with directed edges arguing that the contribution of any two nodes to their respective centrality is influenced by their relative position in a document. Experimental results on three news summarization datasets representative of different languages and writing styles show that our approach outperforms strong baselines by a wide margin.
Tasks Document Summarization, Representation Learning
Published 2019-06-08
URL https://arxiv.org/abs/1906.03508v1
PDF https://arxiv.org/pdf/1906.03508v1.pdf
PWC https://paperswithcode.com/paper/sentence-centrality-revisited-for
Repo https://github.com/mswellhao/PacSum
Framework pytorch

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks

Title Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
Authors Alexandros Kastanos, Anton Ragni, Mark Gales
Abstract Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for quality control as only word sequences are typically available. This paper examines this limited resource scenario for confidence estimation, a measure commonly used to assess transcription reliability. In particular, it explores what other sources of word and sub-word level information available in the transcription process could be used to improve confidence scores. To encode all such information this paper extends lattice recurrent neural networks to handle sub-words. Experimental results using the IARPA OpenKWS 2016 evaluation system show that the use of additional information yields significant gains in confidence estimation accuracy. The implementation for this model can be found online.
Tasks Speech Recognition
Published 2019-10-25
URL https://arxiv.org/abs/1910.11933v2
PDF https://arxiv.org/pdf/1910.11933v2.pdf
PWC https://paperswithcode.com/paper/confidence-estimation-for-black-box-automatic
Repo https://github.com/alecokas/lattice_rnn
Framework pytorch

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Title AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
Authors Amanda J. Minnich, Kevin McLoughlin, Margaret Tse, Jason Deng, Andrew Weber, Neha Murad, Benjamin D. Madej, Bharath Ramsundar, Tom Rush, Stacie Calad-Thomson, Jim Brase, Jonathan E. Allen
Abstract One of the key requirements for incorporating machine learning into the drug discovery process is complete reproducibility and traceability of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of machine learning and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical datasets covering a wide range of parameters. As a result of these comprehensive experiments, we have found that physicochemical descriptors and deep learning-based graph representations significantly outperform traditional fingerprints in the characterization of molecular features. We have also found that dataset size is directly correlated to prediction performance, and that single-task deep learning models only outperform shallow learners if there is sufficient data. Likewise, dataset size has a direct impact on model predictivity, independent of comprehensive hyperparameter model tuning. Our findings point to the need for public dataset integration or multi-task/transfer learning approaches. Lastly, we found that uncertainty quantification (UQ) analysis may help identify model error; however, efficacy of UQ to filter predictions varies considerably between datasets and featurization/model types. AMPL is open source and available for download at http://github.com/ATOMconsortium/AMPL.
Tasks Drug Discovery, Transfer Learning
Published 2019-11-13
URL https://arxiv.org/abs/1911.05211v2
PDF https://arxiv.org/pdf/1911.05211v2.pdf
PWC https://paperswithcode.com/paper/ampl-a-data-driven-modeling-pipeline-for-drug
Repo https://github.com/ATOMconsortium/AMPL
Framework none

Information Gathering in Decentralized POMDPs by Policy Graph Improvement

Title Information Gathering in Decentralized POMDPs by Policy Graph Improvement
Authors Mikko Lauri, Joni Pajarinen, Jan Peters
Abstract Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest without the ability to communicate. Decentralized partially observable Markov decision processes (Dec-POMDPs) are a general, principled model well-suited for such decentralized multiagent decision-making problems. In this paper, we investigate Dec-POMDPs for decentralized information gathering problems. An optimal solution of a Dec-POMDP maximizes the expected sum of rewards over time. To encourage information gathering, we set the reward as a function of the agents’ state information, for example the negative Shannon entropy. We prove that if the reward is convex, then the finite-horizon value function of the corresponding Dec-POMDP is also convex. We propose the first heuristic algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving problems an order of magnitude larger than previous state-of-the-art.
Tasks Decision Making
Published 2019-02-26
URL http://arxiv.org/abs/1902.09840v1
PDF http://arxiv.org/pdf/1902.09840v1.pdf
PWC https://paperswithcode.com/paper/information-gathering-in-decentralized-pomdps
Repo https://github.com/laurimi/npgi
Framework none

Extremely Weak Supervised Image-to-Image Translation for Semantic Segmentation

Title Extremely Weak Supervised Image-to-Image Translation for Semantic Segmentation
Authors Samarth Shukla, Luc Van Gool, Radu Timofte
Abstract Recent advances in generative models and adversarial training have led to a flourishing image-to-image (I2I) translation literature. The current I2I translation approaches require training images from the two domains that are either all paired (supervised) or all unpaired (unsupervised). In practice, obtaining paired training data in sufficient quantities is often very costly and cumbersome. Therefore solutions that employ unpaired data, while less accurate, are largely preferred. In this paper, we aim to bridge the gap between supervised and unsupervised I2I translation, with application to semantic image segmentation. We build upon pix2pix and CycleGAN, state-of-the-art seminal I2I translation techniques. We propose a method to select (very few) paired training samples and achieve significant improvements in both supervised and unsupervised I2I translation settings over random selection. Further, we boost the performance by incorporating both (selected) paired and unpaired samples in the training process. Our experiments show that an extremely weak supervised I2I translation solution using only one paired training sample can achieve a quantitative performance much better than the unsupervised CycleGAN model, and comparable to that of the supervised pix2pix model trained on thousands of pairs.
Tasks Image-to-Image Translation, Semantic Segmentation
Published 2019-09-18
URL https://arxiv.org/abs/1909.08542v1
PDF https://arxiv.org/pdf/1909.08542v1.pdf
PWC https://paperswithcode.com/paper/extremely-weak-supervised-image-to-image
Repo https://github.com/samarthshukla/ws-i2i
Framework pytorch

Min-Entropy Latent Model for Weakly Supervised Object Detection

Title Min-Entropy Latent Model for Weakly Supervised Object Detection
Authors Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, Qixiang Ye
Abstract Weakly supervised object detection is a challenging task when provided with image category supervision but required to learn, at the same time, object locations and object detectors. The inconsistency between the weak supervision and learning objectives introduces significant randomness to object locations and ambiguity to detectors. In this paper, a min-entropy latent model (MELM) is proposed for weakly supervised object detection. Min-entropy serves as a model to learn object locations and a metric to measure the randomness of object localization during learning. It aims to principally reduce the variance of learned instances and alleviate the ambiguity of detectors. MELM is decomposed into three components including proposal clique partition, object clique discovery, and object localization. MELM is optimized with a recurrent learning algorithm, which leverages continuation optimization to solve the challenging non-convexity problem. Experiments demonstrate that MELM significantly improves the performance of weakly supervised object detection, weakly supervised object localization, and image classification, against the state-of-the-art approaches.
Tasks Image Classification, Object Detection, Object Localization, Weakly Supervised Object Detection, Weakly-Supervised Object Localization
Published 2019-02-16
URL http://arxiv.org/abs/1902.06057v1
PDF http://arxiv.org/pdf/1902.06057v1.pdf
PWC https://paperswithcode.com/paper/min-entropy-latent-model-for-weakly
Repo https://github.com/WinFrand/MELM
Framework pytorch

Modified Distribution Alignment for Domain Adaptation with Pre-trained Inception ResNet

Title Modified Distribution Alignment for Domain Adaptation with Pre-trained Inception ResNet
Authors Youshan Zhang, Brian D. Davison
Abstract Deep neural networks have been widely used in computer vision. There are several well trained deep neural networks for the ImageNet classification challenge, which has played a significant role in image recognition. However, little work has explored pre-trained neural networks for image recognition in domain adaption. In this paper, we are the first to extract better-represented features from a pre-trained Inception ResNet model for domain adaptation. We then present a modified distribution alignment method for classification using the extracted features. We test our model using three benchmark datasets (Office+Caltech-10, Office-31, and Office-Home). Extensive experiments demonstrate significant improvements (4.8%, 5.5%, and 10%) in classification accuracy over the state-of-the-art.
Tasks Domain Adaptation
Published 2019-04-04
URL http://arxiv.org/abs/1904.02322v2
PDF http://arxiv.org/pdf/1904.02322v2.pdf
PWC https://paperswithcode.com/paper/modified-distribution-alignment-for-domain
Repo https://github.com/heaventian93/MDAIR
Framework none

Adversarially Robust Generalization Just Requires More Unlabeled Data

Title Adversarially Robust Generalization Just Requires More Unlabeled Data
Authors Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Liwei Wang
Abstract Neural network robustness has recently been highlighted by the existence of adversarial examples. Many previous works show that the learned networks do not perform well on perturbed test data, and significantly more labeled data is required to achieve adversarially robust generalization. In this paper, we theoretically and empirically show that with just more unlabeled data, we can learn a model with better adversarially robust generalization. The key insight of our results is based on a risk decomposition theorem, in which the expected robust risk is separated into two parts: the stability part which measures the prediction stability in the presence of perturbations, and the accuracy part which evaluates the standard classification accuracy. As the stability part does not depend on any label information, we can optimize this part using unlabeled data. We further prove that for a specific Gaussian mixture problem, adversarially robust generalization can be almost as easy as the standard generalization in supervised learning if a sufficiently large amount of unlabeled data is provided. Inspired by the theoretical findings, we further show that a practical adversarial training algorithm that leverages unlabeled data can improve adversarial robust generalization on MNIST and Cifar-10.
Tasks
Published 2019-06-03
URL https://arxiv.org/abs/1906.00555v2
PDF https://arxiv.org/pdf/1906.00555v2.pdf
PWC https://paperswithcode.com/paper/190600555
Repo https://github.com/RuntianZ/adversarial-robustness-unlabeled
Framework pytorch

Multi-modal 3D Shape Reconstruction Under Calibration Uncertainty using Parametric Level Set Methods

Title Multi-modal 3D Shape Reconstruction Under Calibration Uncertainty using Parametric Level Set Methods
Authors Moshe Eliasof, Andrei Sharf, Eran Treister
Abstract We consider the problem of 3D shape reconstruction from multi-modal data, given uncertain calibration parameters. Typically, 3D data modalities can be in diverse forms such as sparse point sets, volumetric slices, 2D photos and so on. To jointly process these data modalities, we exploit a parametric level set method that utilizes ellipsoidal radial basis functions. This method not only allows us to analytically and compactly represent the object, it also confers on us the ability to overcome calibration related noise that originates from inaccurate acquisition parameters. This essentially implicit regularization leads to a highly robust and scalable reconstruction, surpassing other traditional methods. In our results we first demonstrate the ability of the method to compactly represent complex objects. We then show that our reconstruction method is robust both to a small number of measurements and to noise in the acquisition parameters. Finally, we demonstrate our reconstruction abilities from diverse modalities such as volume slices obtained from liquid displacement (similar to CTscans and XRays), and visual measurements obtained from shape silhouettes.
Tasks Calibration
Published 2019-04-23
URL https://arxiv.org/abs/1904.10379v2
PDF https://arxiv.org/pdf/1904.10379v2.pdf
PWC https://paperswithcode.com/paper/multi-modal-3d-shape-reconstruction-under
Repo https://github.com/BGUCompSci/ShapeReconstructionPaLS.jl
Framework none
comments powered by Disqus