February 1, 2020

3022 words 15 mins read

Paper Group AWR 295

Gromov-Wasserstein Factorization Models for Graph Clustering. Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System. Tree Transformer: Integrating Tree Structures into Self-Attention. Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness. Place Recognition for Stereo VisualOdometry us …

Gromov-Wasserstein Factorization Models for Graph Clustering


Title	Gromov-Wasserstein Factorization Models for Graph Clustering
Authors	Hongteng Xu
Abstract	We propose a new nonlinear factorization model for graphs that are with topological structures, and optionally, node attributes. This model is based on a pseudometric called Gromov-Wasserstein (GW) discrepancy, which compares graphs in a relational way. It estimates observed graphs as GW barycenters constructed by a set of atoms with different weights. By minimizing the GW discrepancy between each observed graph and its GW barycenter-based estimation, we learn the atoms and their weights associated with the observed graphs. The model achieves a novel and flexible factorization mechanism under GW discrepancy, in which both the observed graphs and the learnable atoms can be unaligned and with different sizes. We design an effective approximate algorithm for learning this Gromov-Wasserstein factorization (GWF) model, unrolling loopy computations as stacked modules and computing gradients with backpropagation. The stacked modules can be with two different architectures, which correspond to the proximal point algorithm (PPA) and Bregman alternating direction method of multipliers (BADMM), respectively. Experiments show that our model obtains encouraging results on clustering graphs.
Tasks	Graph Clustering
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08530v1
PDF	https://arxiv.org/pdf/1911.08530v1.pdf
PWC	https://paperswithcode.com/paper/gromov-wasserstein-factorization-models-for
Repo	https://github.com/HongtengXu/Relational-Factorization-Model
Framework	pytorch

Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System


Title	Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System
Authors	Ye Liu, Chenwei Zhang, Xiaohui Yan, Yi Chang, Philip S. Yu
Abstract	In real-world question-answering (QA) systems, ill-formed questions, such as wrong words, ill word order, and noisy expressions, are common and may prevent the QA systems from understanding and answering them accurately. In order to eliminate the effect of ill-formed questions, we approach the question refinement task and propose a unified model, QREFINE, to refine the ill-formed questions to well-formed question. The basic idea is to learn a Seq2Seq model to generate a new question from the original one. To improve the quality and retrieval performance of the generated questions, we make two major improvements: 1) To better encode the semantics of ill-formed questions, we enrich the representation of questions with character embedding and the recent proposed contextual word embedding such as BERT, besides the traditional context-free word embeddings; 2) To make it capable to generate desired questions, we train the model with deep reinforcement learning techniques that considers an appropriate wording of the generation as an immediate reward and the correlation between generated question and answer as time-delayed long-term rewards. Experimental results on real-world datasets show that the proposed QREFINE method can generate refined questions with more readability but fewer mistakes than the original questions provided by users. Moreover, the refined questions also significantly improve the accuracy of answer retrieval.
Tasks	Question Answering, Word Embeddings
Published	2019-08-13
URL	https://arxiv.org/abs/1908.05604v3
PDF	https://arxiv.org/pdf/1908.05604v3.pdf
PWC	https://paperswithcode.com/paper/generative-question-refinement-with-deep
Repo	https://github.com/yeliu918/QREFINE-PPO
Framework	tf

Tree Transformer: Integrating Tree Structures into Self-Attention


Title	Tree Transformer: Integrating Tree Structures into Self-Attention
Authors	Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen
Abstract	Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed “Constituent Attention” module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores.
Tasks	Language Modelling
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06639v2
PDF	https://arxiv.org/pdf/1909.06639v2.pdf
PWC	https://paperswithcode.com/paper/tree-transformer-integrating-tree-structures
Repo	https://github.com/yaushian/Tree-Transformer
Framework	pytorch

Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness


Title	Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness
Authors	Saeed Mahloujifar, Xiao Zhang, Mohammad Mahmoody, David Evans
Abstract	Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018b), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with $\Omega(1)$ (e.g., 1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to $\ell_\infty$ and $\ell_2$ perturbations of several image classification benchmarks. Code for our experiments is available at https://github.com/xiaozhanguva/Measure-Concentration.
Tasks	Image Classification
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12202v2
PDF	https://arxiv.org/pdf/1905.12202v2.pdf
PWC	https://paperswithcode.com/paper/empirically-measuring-concentration
Repo	https://github.com/xiaozhanguva/Measure-Concentration
Framework	pytorch

Place Recognition for Stereo VisualOdometry using LiDAR Descriptors


Title	Place Recognition for Stereo VisualOdometry using LiDAR Descriptors
Authors	Jiawei Mo, Junaed Sattar
Abstract	Place recognition is a core component in SLAM, and in most visual SLAM systems, it is based on the similarity between 2D images. However, the 3D points generated by visual odometry, and the structure information embedded within, are not exploited. In this paper, we adapt place recognition methods for 3D point clouds into stereo visual odometry. Stereo visual odometry generates 3D point clouds with a consistent scale. Thus, we are able to use global LiDAR descriptors for 3D point clouds to determine the similarity between places. 3D point clouds are more reliable than 2D visual cues (e.g., 2D features) against environmental changes such as varying illumination and can benefit visual SLAM systems in long-term deployment scenarios. Extensive evaluation on a public dataset (Oxford RobotCar) demonstrates the accuracy and efficiency of using 3D point clouds for place recognition over 2D methods.
Tasks	Visual Odometry
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07267v2
PDF	https://arxiv.org/pdf/1909.07267v2.pdf
PWC	https://paperswithcode.com/paper/place-recognition-for-stereo-visualodometry
Repo	https://github.com/jiawei-mo/3d_place_recognition
Framework	none

Least Squares Auto-Tuning


Title	Least Squares Auto-Tuning
Authors	Shane Barratt, Stephen Boyd
Abstract	Least squares is by far the simplest and most commonly applied computational method in many fields. In almost all applications, the least squares objective is rarely the true objective. We account for this discrepancy by parametrizing the least squares problem and automatically adjusting these parameters using an optimization algorithm. We apply our method, which we call least squares auto-tuning, to data fitting.
Tasks
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05460v1
PDF	http://arxiv.org/pdf/1904.05460v1.pdf
PWC	https://paperswithcode.com/paper/least-squares-auto-tuning
Repo	https://github.com/sbarratt/lsat
Framework	pytorch

Sentence Centrality Revisited for Unsupervised Summarization


Title	Sentence Centrality Revisited for Unsupervised Summarization
Authors	Hao Zheng, Mirella Lapata
Abstract	Single document summarization has enjoyed renewed interests in recent years thanks to the popularity of neural network models and the availability of large-scale datasets. In this paper we develop an unsupervised approach arguing that it is unrealistic to expect large-scale and high-quality training data to be available or created for different types of summaries, domains, or languages. We revisit a popular graph-based ranking algorithm and modify how node (aka sentence) centrality is computed in two ways: (a)~we employ BERT, a state-of-the-art neural representation learning model to better capture sentential meaning and (b)~we build graphs with directed edges arguing that the contribution of any two nodes to their respective centrality is influenced by their relative position in a document. Experimental results on three news summarization datasets representative of different languages and writing styles show that our approach outperforms strong baselines by a wide margin.
Tasks	Document Summarization, Representation Learning
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03508v1
PDF	https://arxiv.org/pdf/1906.03508v1.pdf
PWC	https://paperswithcode.com/paper/sentence-centrality-revisited-for
Repo	https://github.com/mswellhao/PacSum
Framework	pytorch

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks


Title	Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
Authors	Alexandros Kastanos, Anton Ragni, Mark Gales
Abstract	Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for quality control as only word sequences are typically available. This paper examines this limited resource scenario for confidence estimation, a measure commonly used to assess transcription reliability. In particular, it explores what other sources of word and sub-word level information available in the transcription process could be used to improve confidence scores. To encode all such information this paper extends lattice recurrent neural networks to handle sub-words. Experimental results using the IARPA OpenKWS 2016 evaluation system show that the use of additional information yields significant gains in confidence estimation accuracy. The implementation for this model can be found online.
Tasks	Speech Recognition
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11933v2
PDF	https://arxiv.org/pdf/1910.11933v2.pdf
PWC	https://paperswithcode.com/paper/confidence-estimation-for-black-box-automatic
Repo	https://github.com/alecokas/lattice_rnn
Framework	pytorch

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery


Title	AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
Authors	Amanda J. Minnich, Kevin McLoughlin, Margaret Tse, Jason Deng, Andrew Weber, Neha Murad, Benjamin D. Madej, Bharath Ramsundar, Tom Rush, Stacie Calad-Thomson, Jim Brase, Jonathan E. Allen
Abstract	One of the key requirements for incorporating machine learning into the drug discovery process is complete reproducibility and traceability of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of machine learning and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical datasets covering a wide range of parameters. As a result of these comprehensive experiments, we have found that physicochemical descriptors and deep learning-based graph representations significantly outperform traditional fingerprints in the characterization of molecular features. We have also found that dataset size is directly correlated to prediction performance, and that single-task deep learning models only outperform shallow learners if there is sufficient data. Likewise, dataset size has a direct impact on model predictivity, independent of comprehensive hyperparameter model tuning. Our findings point to the need for public dataset integration or multi-task/transfer learning approaches. Lastly, we found that uncertainty quantification (UQ) analysis may help identify model error; however, efficacy of UQ to filter predictions varies considerably between datasets and featurization/model types. AMPL is open source and available for download at http://github.com/ATOMconsortium/AMPL.
Tasks	Drug Discovery, Transfer Learning
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05211v2
PDF	https://arxiv.org/pdf/1911.05211v2.pdf
PWC	https://paperswithcode.com/paper/ampl-a-data-driven-modeling-pipeline-for-drug
Repo	https://github.com/ATOMconsortium/AMPL
Framework	none

Information Gathering in Decentralized POMDPs by Policy Graph Improvement


Title	Information Gathering in Decentralized POMDPs by Policy Graph Improvement
Authors	Mikko Lauri, Joni Pajarinen, Jan Peters
Abstract	Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest without the ability to communicate. Decentralized partially observable Markov decision processes (Dec-POMDPs) are a general, principled model well-suited for such decentralized multiagent decision-making problems. In this paper, we investigate Dec-POMDPs for decentralized information gathering problems. An optimal solution of a Dec-POMDP maximizes the expected sum of rewards over time. To encourage information gathering, we set the reward as a function of the agents’ state information, for example the negative Shannon entropy. We prove that if the reward is convex, then the finite-horizon value function of the corresponding Dec-POMDP is also convex. We propose the first heuristic algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving problems an order of magnitude larger than previous state-of-the-art.
Tasks	Decision Making
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09840v1
PDF	http://arxiv.org/pdf/1902.09840v1.pdf
PWC	https://paperswithcode.com/paper/information-gathering-in-decentralized-pomdps
Repo	https://github.com/laurimi/npgi
Framework	none

Extremely Weak Supervised Image-to-Image Translation for Semantic Segmentation


Title	Extremely Weak Supervised Image-to-Image Translation for Semantic Segmentation
Authors	Samarth Shukla, Luc Van Gool, Radu Timofte
Abstract	Recent advances in generative models and adversarial training have led to a flourishing image-to-image (I2I) translation literature. The current I2I translation approaches require training images from the two domains that are either all paired (supervised) or all unpaired (unsupervised). In practice, obtaining paired training data in sufficient quantities is often very costly and cumbersome. Therefore solutions that employ unpaired data, while less accurate, are largely preferred. In this paper, we aim to bridge the gap between supervised and unsupervised I2I translation, with application to semantic image segmentation. We build upon pix2pix and CycleGAN, state-of-the-art seminal I2I translation techniques. We propose a method to select (very few) paired training samples and achieve significant improvements in both supervised and unsupervised I2I translation settings over random selection. Further, we boost the performance by incorporating both (selected) paired and unpaired samples in the training process. Our experiments show that an extremely weak supervised I2I translation solution using only one paired training sample can achieve a quantitative performance much better than the unsupervised CycleGAN model, and comparable to that of the supervised pix2pix model trained on thousands of pairs.
Tasks	Image-to-Image Translation, Semantic Segmentation
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08542v1
PDF	https://arxiv.org/pdf/1909.08542v1.pdf
PWC	https://paperswithcode.com/paper/extremely-weak-supervised-image-to-image
Repo	https://github.com/samarthshukla/ws-i2i
Framework	pytorch

Min-Entropy Latent Model for Weakly Supervised Object Detection


Title	Min-Entropy Latent Model for Weakly Supervised Object Detection
Authors	Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, Qixiang Ye
Abstract	Weakly supervised object detection is a challenging task when provided with image category supervision but required to learn, at the same time, object locations and object detectors. The inconsistency between the weak supervision and learning objectives introduces significant randomness to object locations and ambiguity to detectors. In this paper, a min-entropy latent model (MELM) is proposed for weakly supervised object detection. Min-entropy serves as a model to learn object locations and a metric to measure the randomness of object localization during learning. It aims to principally reduce the variance of learned instances and alleviate the ambiguity of detectors. MELM is decomposed into three components including proposal clique partition, object clique discovery, and object localization. MELM is optimized with a recurrent learning algorithm, which leverages continuation optimization to solve the challenging non-convexity problem. Experiments demonstrate that MELM significantly improves the performance of weakly supervised object detection, weakly supervised object localization, and image classification, against the state-of-the-art approaches.
Tasks	Image Classification, Object Detection, Object Localization, Weakly Supervised Object Detection, Weakly-Supervised Object Localization
Published	2019-02-16
URL	http://arxiv.org/abs/1902.06057v1
PDF	http://arxiv.org/pdf/1902.06057v1.pdf
PWC	https://paperswithcode.com/paper/min-entropy-latent-model-for-weakly
Repo	https://github.com/WinFrand/MELM
Framework	pytorch

Modified Distribution Alignment for Domain Adaptation with Pre-trained Inception ResNet


Title	Modified Distribution Alignment for Domain Adaptation with Pre-trained Inception ResNet
Authors	Youshan Zhang, Brian D. Davison
Abstract	Deep neural networks have been widely used in computer vision. There are several well trained deep neural networks for the ImageNet classification challenge, which has played a significant role in image recognition. However, little work has explored pre-trained neural networks for image recognition in domain adaption. In this paper, we are the first to extract better-represented features from a pre-trained Inception ResNet model for domain adaptation. We then present a modified distribution alignment method for classification using the extracted features. We test our model using three benchmark datasets (Office+Caltech-10, Office-31, and Office-Home). Extensive experiments demonstrate significant improvements (4.8%, 5.5%, and 10%) in classification accuracy over the state-of-the-art.
Tasks	Domain Adaptation
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02322v2
PDF	http://arxiv.org/pdf/1904.02322v2.pdf
PWC	https://paperswithcode.com/paper/modified-distribution-alignment-for-domain
Repo	https://github.com/heaventian93/MDAIR
Framework	none

Adversarially Robust Generalization Just Requires More Unlabeled Data


Title	Adversarially Robust Generalization Just Requires More Unlabeled Data
Authors	Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Liwei Wang
Abstract	Neural network robustness has recently been highlighted by the existence of adversarial examples. Many previous works show that the learned networks do not perform well on perturbed test data, and significantly more labeled data is required to achieve adversarially robust generalization. In this paper, we theoretically and empirically show that with just more unlabeled data, we can learn a model with better adversarially robust generalization. The key insight of our results is based on a risk decomposition theorem, in which the expected robust risk is separated into two parts: the stability part which measures the prediction stability in the presence of perturbations, and the accuracy part which evaluates the standard classification accuracy. As the stability part does not depend on any label information, we can optimize this part using unlabeled data. We further prove that for a specific Gaussian mixture problem, adversarially robust generalization can be almost as easy as the standard generalization in supervised learning if a sufficiently large amount of unlabeled data is provided. Inspired by the theoretical findings, we further show that a practical adversarial training algorithm that leverages unlabeled data can improve adversarial robust generalization on MNIST and Cifar-10.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00555v2
PDF	https://arxiv.org/pdf/1906.00555v2.pdf
PWC	https://paperswithcode.com/paper/190600555
Repo	https://github.com/RuntianZ/adversarial-robustness-unlabeled
Framework	pytorch


Title	Multi-modal 3D Shape Reconstruction Under Calibration Uncertainty using Parametric Level Set Methods
Authors	Moshe Eliasof, Andrei Sharf, Eran Treister
Abstract	We consider the problem of 3D shape reconstruction from multi-modal data, given uncertain calibration parameters. Typically, 3D data modalities can be in diverse forms such as sparse point sets, volumetric slices, 2D photos and so on. To jointly process these data modalities, we exploit a parametric level set method that utilizes ellipsoidal radial basis functions. This method not only allows us to analytically and compactly represent the object, it also confers on us the ability to overcome calibration related noise that originates from inaccurate acquisition parameters. This essentially implicit regularization leads to a highly robust and scalable reconstruction, surpassing other traditional methods. In our results we first demonstrate the ability of the method to compactly represent complex objects. We then show that our reconstruction method is robust both to a small number of measurements and to noise in the acquisition parameters. Finally, we demonstrate our reconstruction abilities from diverse modalities such as volume slices obtained from liquid displacement (similar to CTscans and XRays), and visual measurements obtained from shape silhouettes.
Tasks	Calibration
Published	2019-04-23
URL	https://arxiv.org/abs/1904.10379v2
PDF	https://arxiv.org/pdf/1904.10379v2.pdf
PWC	https://paperswithcode.com/paper/multi-modal-3d-shape-reconstruction-under
Repo	https://github.com/BGUCompSci/ShapeReconstructionPaLS.jl
Framework	none