January 31, 2020

3106 words 15 mins read

Paper Group AWR 446

Paper Group AWR 446

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection. DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles. Retrospective and Prospective Mixture-of-Generators for Task-oriented Dialogue Response Generation. Trident Segmentation CNN: A Spatiotemporal Transformation CNN for Punctate White Matter Lesions Segmentati …

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

Title Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Authors Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent
Abstract Learning to localize and name object instances is a fundamental problem in vision, but state-of-the-art approaches rely on expensive bounding box supervision. While weakly supervised detection (WSOD) methods relax the need for boxes to that of image-level annotations, even cheaper supervision is naturally available in the form of unstructured textual descriptions that users may freely provide when uploading image content. However, straightforward approaches to using such data for WSOD wastefully discard captions that do not exactly match object names. Instead, we show how to squeeze the most information out of these captions by training a text-only classifier that generalizes beyond dataset boundaries. Our discovery provides an opportunity for learning detection models from noisy but more abundant and freely-available caption data. We also validate our model on three classic object detection benchmarks and achieve state-of-the-art WSOD performance. Our code is available at https://github.com/yekeren/Cap2Det.
Tasks Object Detection
Published 2019-07-23
URL https://arxiv.org/abs/1907.10164v3
PDF https://arxiv.org/pdf/1907.10164v3.pdf
PWC https://paperswithcode.com/paper/cap2det-learning-to-amplify-weak-caption
Repo https://github.com/yekeren/Cap2Det
Framework tf

DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles

Title DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles
Authors Yue Zhao, Maciej K. Hryniewicki
Abstract Selecting and combining the outlier scores of different base detectors used within outlier ensembles can be quite challenging in the absence of ground truth. In this paper, an unsupervised outlier detector combination framework called DCSO is proposed, demonstrated and assessed for the dynamic selection of most competent base detectors, with an emphasis on data locality. The proposed DCSO framework first defines the local region of a test instance by its k nearest neighbors and then identifies the top-performing base detectors within the local region. Experimental results on ten benchmark datasets demonstrate that DCSO provides consistent performance improvement over existing static combination approaches in mining outlying objects. To facilitate interpretability and reliability of the proposed method, DCSO is analyzed using both theoretical frameworks and visualization techniques, and presented alongside empirical parameter setting instructions that can be used to improve the overall performance.
Tasks outlier ensembles
Published 2019-11-23
URL https://arxiv.org/abs/1911.10418v1
PDF https://arxiv.org/pdf/1911.10418v1.pdf
PWC https://paperswithcode.com/paper/dcso-dynamic-combination-of-detector-scores
Repo https://github.com/yzhao062/DCSO
Framework none

Retrospective and Prospective Mixture-of-Generators for Task-oriented Dialogue Response Generation

Title Retrospective and Prospective Mixture-of-Generators for Task-oriented Dialogue Response Generation
Authors Jiahuan Pei, Pengjie Ren, Christof Monz, Maarten de Rijke
Abstract Dialogue response generation (DRG) is a critical component of task-oriented dialogue systems (TDSs). Its purpose is to generate proper natural language responses given some context, e.g., historical utterances, system states, etc. State-of-the-art work focuses on how to better tackle DRG in an end-to-end way. Typically, such studies assume that each token is drawn from a single distribution over the output vocabulary, which may not always be optimal. Responses vary greatly with different intents, e.g., domains, system actions. We propose a novel mixture-of-generators network (MoGNet) for DRG, where we assume that each token of a response is drawn from a mixture of distributions. MoGNet consists of a chair generator and several expert generators. Each expert is specialized for DRG w.r.t. a particular intent. The chair coordinates multiple experts and combines the output they have generated to produce more appropriate responses. We propose two strategies to help the chair make better decisions, namely, a retrospective mixture-of-generators (RMoG) and prospective mixture-of-generators (PMoG). The former only considers the historical expert-generated responses until the current time step while the latter also considers possible expert-generated responses in the future by encouraging exploration. In order to differentiate experts, we also devise a global-and-local (GL) learning scheme that forces each expert to be specialized towards a particular intent using a local loss and trains the chair and all experts to coordinate using a global loss. We carry out extensive experiments on the MultiWOZ benchmark dataset. MoGNet significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, demonstrating its effectiveness for DRG.
Tasks Task-Oriented Dialogue Systems
Published 2019-11-19
URL https://arxiv.org/abs/1911.08151v2
PDF https://arxiv.org/pdf/1911.08151v2.pdf
PWC https://paperswithcode.com/paper/retrospective-and-prospective-mixture-of
Repo https://github.com/Jiahuan-Pei/multiwoz-mdrg
Framework pytorch

Trident Segmentation CNN: A Spatiotemporal Transformation CNN for Punctate White Matter Lesions Segmentation in Preterm Neonates

Title Trident Segmentation CNN: A Spatiotemporal Transformation CNN for Punctate White Matter Lesions Segmentation in Preterm Neonates
Authors Yalong Liu, Jie Li, Miaomiao Wang, Zhicheng Jiao, Jian Yang, Xianjun Li
Abstract Accurate segmentation of punctate white matter lesions (PWML) in preterm neonates by an automatic algorithm can better assist doctors in diagnosis. However, the existing algorithms have many limitations, such as low detection accuracy and large resource consumption. In this paper, a novel spatiotemporal transformation deep learning method called Trident Segmentation CNN (TS-CNN) is proposed to segment PWML in MR images. It can convert spatial information into temporal information, which reduces the consumption of computing resources. Furthermore, a new improved training loss called Self-balancing Focal Loss (SBFL) is proposed to balance the loss during the training process. The whole model is evaluated on a dataset of 704 MR images. Overall the method achieves median DSC, sensitivity, specificity, and Hausdorff distance of 0.6355, 0.7126, 0.9998, and 24.5836 mm which outperforms the state-of-the-art algorithm. (The code is now available on https://github.com/YalongLiu/Trident-Segmentation-CNN)
Tasks
Published 2019-10-22
URL https://arxiv.org/abs/1910.09773v1
PDF https://arxiv.org/pdf/1910.09773v1.pdf
PWC https://paperswithcode.com/paper/trident-segmentation-cnn-a-spatiotemporal
Repo https://github.com/YalongLiu/Trident-Segmentation-CNN
Framework tf

Flexibly-Structured Model for Task-Oriented Dialogues

Title Flexibly-Structured Model for Task-Oriented Dialogues
Authors Lei Shu, Piero Molino, Mahdi Namazifar, Hu Xu, Bing Liu, Huaixiu Zheng, Gokhan Tur
Abstract This paper proposes a novel end-to-end architecture for task-oriented dialogue systems. It is based on a simple and practical yet very effective sequence-to-sequence approach, where language understanding and state tracking tasks are modeled jointly with a structured copy-augmented sequential decoder and a multi-label decoder for each slot. The policy engine and language generation tasks are modeled jointly following that. The copy-augmented sequential decoder deals with new or unknown values in the conversation, while the multi-label decoder combined with the sequential decoder ensures the explicit assignment of values to slots. On the generation part, slot binary classifiers are used to improve performance. This architecture is scalable to real-world scenarios and is shown through an empirical evaluation to achieve state-of-the-art performance on both the Cambridge Restaurant dataset and the Stanford in-car assistant dataset\footnote{The code is available at \url{https://github.com/uber-research/FSDM}}
Tasks Task-Oriented Dialogue Systems, Text Generation
Published 2019-08-06
URL https://arxiv.org/abs/1908.02402v1
PDF https://arxiv.org/pdf/1908.02402v1.pdf
PWC https://paperswithcode.com/paper/flexibly-structured-model-for-task-oriented
Repo https://github.com/uber-research/FSDM
Framework pytorch

Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation

Title Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation
Authors Shanshan Zhao, Huan Fu, Mingming Gong, Dacheng Tao
Abstract Supervised depth estimation has achieved high accuracy due to the advanced deep network architectures. Since the groundtruth depth labels are hard to obtain, recent methods try to learn depth estimation networks in an unsupervised way by exploring unsupervised cues, which are effective but less reliable than true labels. An emerging way to resolve this dilemma is to transfer knowledge from synthetic images with ground truth depth via domain adaptation techniques. However, these approaches overlook specific geometric structure of the natural images in the target domain (i.e., real data), which is important for high-performing depth prediction. Motivated by the observation, we propose a geometry-aware symmetric domain adaptation framework (GASDA) to explore the labels in the synthetic data and epipolar geometry in the real data jointly. Moreover, by training two image style translators and depth estimators symmetrically in an end-to-end network, our model achieves better image style transfer and generates high-quality depth maps. The experimental results demonstrate the effectiveness of our proposed method and comparable performance against the state-of-the-art. Code will be publicly available at: https://github.com/sshan-zhao/GASDA.
Tasks Depth Estimation, Domain Adaptation, Monocular Depth Estimation, Style Transfer
Published 2019-04-03
URL http://arxiv.org/abs/1904.01870v1
PDF http://arxiv.org/pdf/1904.01870v1.pdf
PWC https://paperswithcode.com/paper/geometry-aware-symmetric-domain-adaptation
Repo https://github.com/sshan-zhao/GASDA
Framework pytorch

Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets

Title Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets
Authors Guanhua Zhang, Bing Bai, Jian Liang, Kun Bai, Shiyu Chang, Mo Yu, Conghui Zhu, Tiejun Zhao
Abstract Natural Language Sentence Matching (NLSM) has gained substantial attention from both academics and the industry, and rich public datasets contribute a lot to this process. However, biased datasets can also hurt the generalization performance of trained models and give untrustworthy evaluation results. For many NLSM datasets, the providers select some pairs of sentences into the datasets, and this sampling procedure can easily bring unintended pattern, i.e., selection bias. One example is the QuoraQP dataset, where some content-independent naive features are unreasonably predictive. Such features are the reflection of the selection bias and termed as the leakage features. In this paper, we investigate the problem of selection bias on six NLSM datasets and find that four out of them are significantly biased. We further propose a training and evaluation framework to alleviate the bias. Experimental results on QuoraQP suggest that the proposed framework can improve the generalization ability of trained models, and give more trustworthy evaluation results for real-world adoptions.
Tasks
Published 2019-05-15
URL https://arxiv.org/abs/1905.06221v4
PDF https://arxiv.org/pdf/1905.06221v4.pdf
PWC https://paperswithcode.com/paper/selection-bias-explorations-and-debias
Repo https://github.com/ghzhang233/Leakage-Neutral-Learning-for-QuoraQP
Framework tf

Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction

Title Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction
Authors Alex Wong, Byung-Woo Hong, Stefano Soatto
Abstract Supervised learning methods to infer (hypothesize) depth of a scene from a single image require costly per-pixel ground-truth. We follow a geometric approach that exploits abundant stereo imagery to learn a model to hypothesize scene structure without direct supervision. Although we train a network with stereo pairs, we only require a single image at test time to hypothesize disparity or depth. We propose a novel objective function that exploits the bilateral cyclic relationship between the left and right disparities and we introduce an adaptive regularization scheme that allows the network to handle both the co-visible and occluded regions in a stereo pair. This process ultimately produces a model to generate hypotheses for the 3-dimensional structure of the scene as viewed in a single image. When used to generate a single (most probable) estimate of depth, our method outperforms state-of-the-art unsupervised monocular depth prediction methods on the KITTI benchmarks. We show that our method generalizes well by applying our models trained on KITTI to the Make3d dataset.
Tasks Depth Estimation
Published 2019-03-18
URL https://arxiv.org/abs/1903.07309v3
PDF https://arxiv.org/pdf/1903.07309v3.pdf
PWC https://paperswithcode.com/paper/bilateral-cyclic-constraint-and-adaptive
Repo https://github.com/alexklwong/adareg-monodispnet
Framework tf

SOSD: A Benchmark for Learned Indexes

Title SOSD: A Benchmark for Learned Indexes
Authors Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, Thomas Neumann
Abstract A groundswell of recent work has focused on improving data management systems with learned components. Specifically, work on learned index structures has proposed replacing traditional index structures, such as B-trees, with learned models. Given the decades of research committed to improving index structures, there is significant skepticism about whether learned indexes actually outperform state-of-the-art implementations of traditional structures on real-world data. To answer this question, we propose a new benchmarking framework that comes with a variety of real-world datasets and baseline implementations to compare against. We also show preliminary results for selected index structures, and find that learned models indeed often outperform state-of-the-art implementations, and are therefore a promising direction for future research.
Tasks
Published 2019-11-29
URL https://arxiv.org/abs/1911.13014v1
PDF https://arxiv.org/pdf/1911.13014v1.pdf
PWC https://paperswithcode.com/paper/sosd-a-benchmark-for-learned-indexes
Repo https://github.com/learnedsystems/SOSD
Framework none

A Topology Layer for Machine Learning

Title A Topology Layer for Machine Learning
Authors Rickard Brüel-Gabrielsson, Bradley J. Nelson, Anjan Dwaraknath, Primoz Skraba, Leonidas J. Guibas, Gunnar Carlsson
Abstract Topology applied to real world data using persistent homology has started to find applications within machine learning, including deep learning. We present a differentiable topology layer that computes persistent homology based on level set filtrations and distance-bases filtrations. We present three novel applications: the topological layer can (i) serve as a regularizer directly on data or the weights of machine learning models, (ii) construct a loss on the output of a deep generative network to incorporate topological priors, and (iii) perform topological adversarial attacks on deep networks trained with persistence features. The code is publicly available and we hope its availability will facilitate the use of persistent homology in deep learning and other gradient based applications.
Tasks
Published 2019-05-29
URL https://arxiv.org/abs/1905.12200v1
PDF https://arxiv.org/pdf/1905.12200v1.pdf
PWC https://paperswithcode.com/paper/a-topology-layer-for-machine-learning
Repo https://github.com/bruel-gabrielsson/TopologyLayer
Framework pytorch

On the Expressive Power of Deep Polynomial Neural Networks

Title On the Expressive Power of Deep Polynomial Neural Networks
Authors Joe Kileel, Matthew Trager, Joan Bruna
Abstract We study deep neural networks with polynomial activations, particularly their expressive power. For a fixed architecture and activation degree, a polynomial neural network defines an algebraic map from weights to polynomials. The image of this map is the functional space associated to the network, and it is an irreducible algebraic variety upon taking closure. This paper proposes the dimension of this variety as a precise measure of the expressive power of polynomial neural networks. We obtain several theoretical results regarding this dimension as a function of architecture, including an exact formula for high activation degrees, as well as upper and lower bounds on layer widths in order for deep polynomials networks to fill the ambient functional space. We also present computational evidence that it is profitable in terms of expressiveness for layer widths to increase monotonically and then decrease monotonically. Finally, we link our study to favorable optimization properties when training weights, and we draw intriguing connections with tensor and polynomial decompositions.
Tasks
Published 2019-05-29
URL https://arxiv.org/abs/1905.12207v1
PDF https://arxiv.org/pdf/1905.12207v1.pdf
PWC https://paperswithcode.com/paper/on-the-expressive-power-of-deep-polynomial
Repo https://github.com/mtrager/polynomial_networks
Framework none

A Semi-Supervised Self-Organizing Map with Adaptive Local Thresholds

Title A Semi-Supervised Self-Organizing Map with Adaptive Local Thresholds
Authors Pedro H. M. Braga, Hansenclever F. Bassani
Abstract In the recent years, there is a growing interest in semi-supervised learning, since, in many learning tasks, there is a plentiful supply of unlabeled data, but insufficient labeled ones. Hence, Semi-Supervised learning models can benefit from both types of data to improve the obtained performance. Also, it is important to develop methods that are easy to parameterize in a way that is robust to the different characteristics of the data at hand. This article presents a new method based on Self-Organizing Map (SOM) for clustering and classification, called Adaptive Local Thresholds Semi-Supervised Self-Organizing Map (ALTSS-SOM). It can dynamically switch between two forms of learning at training time, according to the availability of labels, as in previous models, and can automatically adjust itself to the local variance observed in each data cluster. The results show that the ALTSS-SOM surpass the performance of other semi-supervised methods in terms of classification, and other pure clustering methods when there are no labels available, being also less sensitive than previous methods to the parameters values.
Tasks
Published 2019-07-01
URL https://arxiv.org/abs/1907.01086v2
PDF https://arxiv.org/pdf/1907.01086v2.pdf
PWC https://paperswithcode.com/paper/a-semi-supervised-self-organizing-map-with
Repo https://github.com/phbraga/alt-sssom
Framework none

SampleNet: Differentiable Point Cloud Sampling

Title SampleNet: Differentiable Point Cloud Sampling
Authors Itai Lang, Asaf Manor, Shai Avidan
Abstract There is a growing number of tasks that work directly on point clouds. As the size of the point cloud grows, so do the computational demands of these tasks. A possible solution is to sample the point cloud first. Classic sampling approaches, such as farthest point sampling (FPS), do not consider the downstream task. A recent work showed that learning a task-specific sampling can improve results significantly. However, the proposed technique did not deal with the non-differentiability of the sampling operation and offered a workaround instead. We introduce a novel differentiable relaxation for point cloud sampling. Our approach employs a soft projection operation that approximates sampled points as a mixture of points in the primary input cloud. The approximation is controlled by a temperature parameter and converges to regular sampling when the temperature goes to zero. During training, we use a projection loss that encourages the temperature to drop, thereby driving every sample point to be close to one of the input points. This approximation scheme leads to consistently good results on various applications such as classification, retrieval, and geometric reconstruction. We also show that the proposed sampling network can be used as a front to a point cloud registration network. This is a challenging task since sampling must be consistent across two different point clouds. In all cases, our method works better than existing non-learned and learned sampling alternatives. Our code is publicly available at https://github.com/itailang/SampleNet.
Tasks Point Cloud Registration
Published 2019-12-08
URL https://arxiv.org/abs/1912.03663v1
PDF https://arxiv.org/pdf/1912.03663v1.pdf
PWC https://paperswithcode.com/paper/samplenet-differentiable-point-cloud-sampling
Repo https://github.com/itailang/SampleNet
Framework tf

PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning

Title PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning
Authors Iou-Jen Liu, Raymond A. Yeh, Alexander G. Schwing
Abstract Sample efficiency and scalability to a large number of agents are two important goals for multi-agent reinforcement learning systems. Recent works got us closer to those goals, addressing non-stationarity of the environment from a single agent’s perspective by utilizing a deep net critic which depends on all observations and actions. The critic input concatenates agent observations and actions in a user-specified order. However, since deep nets aren’t permutation invariant, a permuted input changes the critic output despite the environment remaining identical. To avoid this inefficiency, we propose a ‘permutation invariant critic’ (PIC), which yields identical output irrespective of the agent permutation. This consistent representation enables our model to scale to 30 times more agents and to achieve improvements of test episode reward between 15% to 50% on the challenging multi-agent particle environment (MPE).
Tasks Multi-agent Reinforcement Learning
Published 2019-10-31
URL https://arxiv.org/abs/1911.00025v1
PDF https://arxiv.org/pdf/1911.00025v1.pdf
PWC https://paperswithcode.com/paper/pic-permutation-invariant-critic-for-multi
Repo https://github.com/IouJenLiu/PIC
Framework pytorch

MAVEN: Multi-Agent Variational Exploration

Title MAVEN: Multi-Agent Variational Exploration
Authors Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, Shimon Whiteson
Abstract Centralised training with decentralised execution is an important setting for cooperative deep multi-agent reinforcement learning due to communication constraints during execution and computational tractability in training. In this paper, we analyse value-based methods that are known to have superior performance in complex environments [43]. We specifically focus on QMIX [40], the current state-of-the-art in this domain. We show that the representational constraints on the joint action-values introduced by QMIX and similar methods lead to provably poor exploration and suboptimality. Furthermore, we propose a novel approach called MAVEN that hybridises value and policy-based methods by introducing a latent space for hierarchical control. The value-based agents condition their behaviour on the shared latent variable controlled by a hierarchical policy. This allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks. Our experimental results show that MAVEN achieves significant performance improvements on the challenging SMAC domain [43].
Tasks Multi-agent Reinforcement Learning
Published 2019-10-16
URL https://arxiv.org/abs/1910.07483v2
PDF https://arxiv.org/pdf/1910.07483v2.pdf
PWC https://paperswithcode.com/paper/maven-multi-agent-variational-exploration
Repo https://github.com/AnujMahajanOxf/MAVEN
Framework pytorch
comments powered by Disqus