Paper Group AWR 295
Gromov-Wasserstein Factorization Models for Graph Clustering. Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System. Tree Transformer: Integrating Tree Structures into Self-Attention. Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness. Place Recognition for Stereo VisualOdometry us …
Gromov-Wasserstein Factorization Models for Graph Clustering
Title | Gromov-Wasserstein Factorization Models for Graph Clustering |
Authors | Hongteng Xu |
Abstract | We propose a new nonlinear factorization model for graphs that are with topological structures, and optionally, node attributes. This model is based on a pseudometric called Gromov-Wasserstein (GW) discrepancy, which compares graphs in a relational way. It estimates observed graphs as GW barycenters constructed by a set of atoms with different weights. By minimizing the GW discrepancy between each observed graph and its GW barycenter-based estimation, we learn the atoms and their weights associated with the observed graphs. The model achieves a novel and flexible factorization mechanism under GW discrepancy, in which both the observed graphs and the learnable atoms can be unaligned and with different sizes. We design an effective approximate algorithm for learning this Gromov-Wasserstein factorization (GWF) model, unrolling loopy computations as stacked modules and computing gradients with backpropagation. The stacked modules can be with two different architectures, which correspond to the proximal point algorithm (PPA) and Bregman alternating direction method of multipliers (BADMM), respectively. Experiments show that our model obtains encouraging results on clustering graphs. |
Tasks | Graph Clustering |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08530v1 |
https://arxiv.org/pdf/1911.08530v1.pdf | |
PWC | https://paperswithcode.com/paper/gromov-wasserstein-factorization-models-for |
Repo | https://github.com/HongtengXu/Relational-Factorization-Model |
Framework | pytorch |
Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System
Title | Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System |
Authors | Ye Liu, Chenwei Zhang, Xiaohui Yan, Yi Chang, Philip S. Yu |
Abstract | In real-world question-answering (QA) systems, ill-formed questions, such as wrong words, ill word order, and noisy expressions, are common and may prevent the QA systems from understanding and answering them accurately. In order to eliminate the effect of ill-formed questions, we approach the question refinement task and propose a unified model, QREFINE, to refine the ill-formed questions to well-formed question. The basic idea is to learn a Seq2Seq model to generate a new question from the original one. To improve the quality and retrieval performance of the generated questions, we make two major improvements: 1) To better encode the semantics of ill-formed questions, we enrich the representation of questions with character embedding and the recent proposed contextual word embedding such as BERT, besides the traditional context-free word embeddings; 2) To make it capable to generate desired questions, we train the model with deep reinforcement learning techniques that considers an appropriate wording of the generation as an immediate reward and the correlation between generated question and answer as time-delayed long-term rewards. Experimental results on real-world datasets show that the proposed QREFINE method can generate refined questions with more readability but fewer mistakes than the original questions provided by users. Moreover, the refined questions also significantly improve the accuracy of answer retrieval. |
Tasks | Question Answering, Word Embeddings |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.05604v3 |
https://arxiv.org/pdf/1908.05604v3.pdf | |
PWC | https://paperswithcode.com/paper/generative-question-refinement-with-deep |
Repo | https://github.com/yeliu918/QREFINE-PPO |
Framework | tf |
Tree Transformer: Integrating Tree Structures into Self-Attention
Title | Tree Transformer: Integrating Tree Structures into Self-Attention |
Authors | Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen |
Abstract | Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed “Constituent Attention” module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores. |
Tasks | Language Modelling |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06639v2 |
https://arxiv.org/pdf/1909.06639v2.pdf | |
PWC | https://paperswithcode.com/paper/tree-transformer-integrating-tree-structures |
Repo | https://github.com/yaushian/Tree-Transformer |
Framework | pytorch |
Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness
Title | Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness |
Authors | Saeed Mahloujifar, Xiao Zhang, Mohammad Mahmoody, David Evans |
Abstract | Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018b), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with $\Omega(1)$ (e.g., 1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to $\ell_\infty$ and $\ell_2$ perturbations of several image classification benchmarks. Code for our experiments is available at https://github.com/xiaozhanguva/Measure-Concentration. |
Tasks | Image Classification |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12202v2 |
https://arxiv.org/pdf/1905.12202v2.pdf | |
PWC | https://paperswithcode.com/paper/empirically-measuring-concentration |
Repo | https://github.com/xiaozhanguva/Measure-Concentration |
Framework | pytorch |
Place Recognition for Stereo VisualOdometry using LiDAR Descriptors
Title | Place Recognition for Stereo VisualOdometry using LiDAR Descriptors |
Authors | Jiawei Mo, Junaed Sattar |
Abstract | Place recognition is a core component in SLAM, and in most visual SLAM systems, it is based on the similarity between 2D images. However, the 3D points generated by visual odometry, and the structure information embedded within, are not exploited. In this paper, we adapt place recognition methods for 3D point clouds into stereo visual odometry. Stereo visual odometry generates 3D point clouds with a consistent scale. Thus, we are able to use global LiDAR descriptors for 3D point clouds to determine the similarity between places. 3D point clouds are more reliable than 2D visual cues (e.g., 2D features) against environmental changes such as varying illumination and can benefit visual SLAM systems in long-term deployment scenarios. Extensive evaluation on a public dataset (Oxford RobotCar) demonstrates the accuracy and efficiency of using 3D point clouds for place recognition over 2D methods. |
Tasks | Visual Odometry |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07267v2 |
https://arxiv.org/pdf/1909.07267v2.pdf | |
PWC | https://paperswithcode.com/paper/place-recognition-for-stereo-visualodometry |
Repo | https://github.com/jiawei-mo/3d_place_recognition |
Framework | none |
Least Squares Auto-Tuning
Title | Least Squares Auto-Tuning |
Authors | Shane Barratt, Stephen Boyd |
Abstract | Least squares is by far the simplest and most commonly applied computational method in many fields. In almost all applications, the least squares objective is rarely the true objective. We account for this discrepancy by parametrizing the least squares problem and automatically adjusting these parameters using an optimization algorithm. We apply our method, which we call least squares auto-tuning, to data fitting. |
Tasks | |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05460v1 |
http://arxiv.org/pdf/1904.05460v1.pdf | |
PWC | https://paperswithcode.com/paper/least-squares-auto-tuning |
Repo | https://github.com/sbarratt/lsat |
Framework | pytorch |
Sentence Centrality Revisited for Unsupervised Summarization
Title | Sentence Centrality Revisited for Unsupervised Summarization |
Authors | Hao Zheng, Mirella Lapata |
Abstract | Single document summarization has enjoyed renewed interests in recent years thanks to the popularity of neural network models and the availability of large-scale datasets. In this paper we develop an unsupervised approach arguing that it is unrealistic to expect large-scale and high-quality training data to be available or created for different types of summaries, domains, or languages. We revisit a popular graph-based ranking algorithm and modify how node (aka sentence) centrality is computed in two ways: (a)~we employ BERT, a state-of-the-art neural representation learning model to better capture sentential meaning and (b)~we build graphs with directed edges arguing that the contribution of any two nodes to their respective centrality is influenced by their relative position in a document. Experimental results on three news summarization datasets representative of different languages and writing styles show that our approach outperforms strong baselines by a wide margin. |
Tasks | Document Summarization, Representation Learning |
Published | 2019-06-08 |
URL | https://arxiv.org/abs/1906.03508v1 |
https://arxiv.org/pdf/1906.03508v1.pdf | |
PWC | https://paperswithcode.com/paper/sentence-centrality-revisited-for |
Repo | https://github.com/mswellhao/PacSum |
Framework | pytorch |
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
Title | Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks |
Authors | Alexandros Kastanos, Anton Ragni, Mark Gales |
Abstract | Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for quality control as only word sequences are typically available. This paper examines this limited resource scenario for confidence estimation, a measure commonly used to assess transcription reliability. In particular, it explores what other sources of word and sub-word level information available in the transcription process could be used to improve confidence scores. To encode all such information this paper extends lattice recurrent neural networks to handle sub-words. Experimental results using the IARPA OpenKWS 2016 evaluation system show that the use of additional information yields significant gains in confidence estimation accuracy. The implementation for this model can be found online. |
Tasks | Speech Recognition |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11933v2 |
https://arxiv.org/pdf/1910.11933v2.pdf | |
PWC | https://paperswithcode.com/paper/confidence-estimation-for-black-box-automatic |
Repo | https://github.com/alecokas/lattice_rnn |
Framework | pytorch |
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
Title | AMPL: A Data-Driven Modeling Pipeline for Drug Discovery |
Authors | Amanda J. Minnich, Kevin McLoughlin, Margaret Tse, Jason Deng, Andrew Weber, Neha Murad, Benjamin D. Madej, Bharath Ramsundar, Tom Rush, Stacie Calad-Thomson, Jim Brase, Jonathan E. Allen |
Abstract | One of the key requirements for incorporating machine learning into the drug discovery process is complete reproducibility and traceability of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of machine learning and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical datasets covering a wide range of parameters. As a result of these comprehensive experiments, we have found that physicochemical descriptors and deep learning-based graph representations significantly outperform traditional fingerprints in the characterization of molecular features. We have also found that dataset size is directly correlated to prediction performance, and that single-task deep learning models only outperform shallow learners if there is sufficient data. Likewise, dataset size has a direct impact on model predictivity, independent of comprehensive hyperparameter model tuning. Our findings point to the need for public dataset integration or multi-task/transfer learning approaches. Lastly, we found that uncertainty quantification (UQ) analysis may help identify model error; however, efficacy of UQ to filter predictions varies considerably between datasets and featurization/model types. AMPL is open source and available for download at http://github.com/ATOMconsortium/AMPL. |
Tasks | Drug Discovery, Transfer Learning |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05211v2 |
https://arxiv.org/pdf/1911.05211v2.pdf | |
PWC | https://paperswithcode.com/paper/ampl-a-data-driven-modeling-pipeline-for-drug |
Repo | https://github.com/ATOMconsortium/AMPL |
Framework | none |
Information Gathering in Decentralized POMDPs by Policy Graph Improvement
Title | Information Gathering in Decentralized POMDPs by Policy Graph Improvement |
Authors | Mikko Lauri, Joni Pajarinen, Jan Peters |
Abstract | Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest without the ability to communicate. Decentralized partially observable Markov decision processes (Dec-POMDPs) are a general, principled model well-suited for such decentralized multiagent decision-making problems. In this paper, we investigate Dec-POMDPs for decentralized information gathering problems. An optimal solution of a Dec-POMDP maximizes the expected sum of rewards over time. To encourage information gathering, we set the reward as a function of the agents’ state information, for example the negative Shannon entropy. We prove that if the reward is convex, then the finite-horizon value function of the corresponding Dec-POMDP is also convex. We propose the first heuristic algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving problems an order of magnitude larger than previous state-of-the-art. |
Tasks | Decision Making |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09840v1 |
http://arxiv.org/pdf/1902.09840v1.pdf | |
PWC | https://paperswithcode.com/paper/information-gathering-in-decentralized-pomdps |
Repo | https://github.com/laurimi/npgi |
Framework | none |
Extremely Weak Supervised Image-to-Image Translation for Semantic Segmentation
Title | Extremely Weak Supervised Image-to-Image Translation for Semantic Segmentation |
Authors | Samarth Shukla, Luc Van Gool, Radu Timofte |
Abstract | Recent advances in generative models and adversarial training have led to a flourishing image-to-image (I2I) translation literature. The current I2I translation approaches require training images from the two domains that are either all paired (supervised) or all unpaired (unsupervised). In practice, obtaining paired training data in sufficient quantities is often very costly and cumbersome. Therefore solutions that employ unpaired data, while less accurate, are largely preferred. In this paper, we aim to bridge the gap between supervised and unsupervised I2I translation, with application to semantic image segmentation. We build upon pix2pix and CycleGAN, state-of-the-art seminal I2I translation techniques. We propose a method to select (very few) paired training samples and achieve significant improvements in both supervised and unsupervised I2I translation settings over random selection. Further, we boost the performance by incorporating both (selected) paired and unpaired samples in the training process. Our experiments show that an extremely weak supervised I2I translation solution using only one paired training sample can achieve a quantitative performance much better than the unsupervised CycleGAN model, and comparable to that of the supervised pix2pix model trained on thousands of pairs. |
Tasks | Image-to-Image Translation, Semantic Segmentation |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08542v1 |
https://arxiv.org/pdf/1909.08542v1.pdf | |
PWC | https://paperswithcode.com/paper/extremely-weak-supervised-image-to-image |
Repo | https://github.com/samarthshukla/ws-i2i |
Framework | pytorch |
Min-Entropy Latent Model for Weakly Supervised Object Detection
Title | Min-Entropy Latent Model for Weakly Supervised Object Detection |
Authors | Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, Qixiang Ye |
Abstract | Weakly supervised object detection is a challenging task when provided with image category supervision but required to learn, at the same time, object locations and object detectors. The inconsistency between the weak supervision and learning objectives introduces significant randomness to object locations and ambiguity to detectors. In this paper, a min-entropy latent model (MELM) is proposed for weakly supervised object detection. Min-entropy serves as a model to learn object locations and a metric to measure the randomness of object localization during learning. It aims to principally reduce the variance of learned instances and alleviate the ambiguity of detectors. MELM is decomposed into three components including proposal clique partition, object clique discovery, and object localization. MELM is optimized with a recurrent learning algorithm, which leverages continuation optimization to solve the challenging non-convexity problem. Experiments demonstrate that MELM significantly improves the performance of weakly supervised object detection, weakly supervised object localization, and image classification, against the state-of-the-art approaches. |
Tasks | Image Classification, Object Detection, Object Localization, Weakly Supervised Object Detection, Weakly-Supervised Object Localization |
Published | 2019-02-16 |
URL | http://arxiv.org/abs/1902.06057v1 |
http://arxiv.org/pdf/1902.06057v1.pdf | |
PWC | https://paperswithcode.com/paper/min-entropy-latent-model-for-weakly |
Repo | https://github.com/WinFrand/MELM |
Framework | pytorch |
Modified Distribution Alignment for Domain Adaptation with Pre-trained Inception ResNet
Title | Modified Distribution Alignment for Domain Adaptation with Pre-trained Inception ResNet |
Authors | Youshan Zhang, Brian D. Davison |
Abstract | Deep neural networks have been widely used in computer vision. There are several well trained deep neural networks for the ImageNet classification challenge, which has played a significant role in image recognition. However, little work has explored pre-trained neural networks for image recognition in domain adaption. In this paper, we are the first to extract better-represented features from a pre-trained Inception ResNet model for domain adaptation. We then present a modified distribution alignment method for classification using the extracted features. We test our model using three benchmark datasets (Office+Caltech-10, Office-31, and Office-Home). Extensive experiments demonstrate significant improvements (4.8%, 5.5%, and 10%) in classification accuracy over the state-of-the-art. |
Tasks | Domain Adaptation |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02322v2 |
http://arxiv.org/pdf/1904.02322v2.pdf | |
PWC | https://paperswithcode.com/paper/modified-distribution-alignment-for-domain |
Repo | https://github.com/heaventian93/MDAIR |
Framework | none |
Adversarially Robust Generalization Just Requires More Unlabeled Data
Title | Adversarially Robust Generalization Just Requires More Unlabeled Data |
Authors | Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Liwei Wang |
Abstract | Neural network robustness has recently been highlighted by the existence of adversarial examples. Many previous works show that the learned networks do not perform well on perturbed test data, and significantly more labeled data is required to achieve adversarially robust generalization. In this paper, we theoretically and empirically show that with just more unlabeled data, we can learn a model with better adversarially robust generalization. The key insight of our results is based on a risk decomposition theorem, in which the expected robust risk is separated into two parts: the stability part which measures the prediction stability in the presence of perturbations, and the accuracy part which evaluates the standard classification accuracy. As the stability part does not depend on any label information, we can optimize this part using unlabeled data. We further prove that for a specific Gaussian mixture problem, adversarially robust generalization can be almost as easy as the standard generalization in supervised learning if a sufficiently large amount of unlabeled data is provided. Inspired by the theoretical findings, we further show that a practical adversarial training algorithm that leverages unlabeled data can improve adversarial robust generalization on MNIST and Cifar-10. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00555v2 |
https://arxiv.org/pdf/1906.00555v2.pdf | |
PWC | https://paperswithcode.com/paper/190600555 |
Repo | https://github.com/RuntianZ/adversarial-robustness-unlabeled |
Framework | pytorch |
Multi-modal 3D Shape Reconstruction Under Calibration Uncertainty using Parametric Level Set Methods
Title | Multi-modal 3D Shape Reconstruction Under Calibration Uncertainty using Parametric Level Set Methods |
Authors | Moshe Eliasof, Andrei Sharf, Eran Treister |
Abstract | We consider the problem of 3D shape reconstruction from multi-modal data, given uncertain calibration parameters. Typically, 3D data modalities can be in diverse forms such as sparse point sets, volumetric slices, 2D photos and so on. To jointly process these data modalities, we exploit a parametric level set method that utilizes ellipsoidal radial basis functions. This method not only allows us to analytically and compactly represent the object, it also confers on us the ability to overcome calibration related noise that originates from inaccurate acquisition parameters. This essentially implicit regularization leads to a highly robust and scalable reconstruction, surpassing other traditional methods. In our results we first demonstrate the ability of the method to compactly represent complex objects. We then show that our reconstruction method is robust both to a small number of measurements and to noise in the acquisition parameters. Finally, we demonstrate our reconstruction abilities from diverse modalities such as volume slices obtained from liquid displacement (similar to CTscans and XRays), and visual measurements obtained from shape silhouettes. |
Tasks | Calibration |
Published | 2019-04-23 |
URL | https://arxiv.org/abs/1904.10379v2 |
https://arxiv.org/pdf/1904.10379v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-3d-shape-reconstruction-under |
Repo | https://github.com/BGUCompSci/ShapeReconstructionPaLS.jl |
Framework | none |