Paper Group AWR 38
$F$, $B$, Alpha Matting. Retrosynthesis Prediction with Conditional Graph Logic Network. Utilizing a null class to restrict decision spaces and defend against neural network adversarial attacks. UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World. Symmetrical Synthesis for Deep Metric Learning. Sparse principal component regr …
$F$, $B$, Alpha Matting
Title | $F$, $B$, Alpha Matting |
Authors | Marco Forte, François Pitié |
Abstract | Cutting out an object and estimating its opacity mask, known as image matting, is a key task in many image editing applications. Deep learning approaches have made significant progress by adapting the encoder-decoder architecture of segmentation networks. However, most of the existing networks only predict the alpha matte and post-processing methods must then be used to recover the original foreground and background colours in the transparent regions. Recently, two methods have shown improved results by also estimating the foreground colours, but at a significant computational and memory cost. In this paper, we propose a low-cost modification to alpha matting networks to also predict the foreground and background colours. We study variations of the training regime and explore a wide range of existing and novel loss functions for the joint prediction. Our method achieves the state of the art performance on the Adobe Composition-1k dataset for alpha matte and composite colour quality. It is also the current best performing method on the alphamatting.com online evaluation. |
Tasks | Image Matting |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07711v1 |
https://arxiv.org/pdf/2003.07711v1.pdf | |
PWC | https://paperswithcode.com/paper/f-b-alpha-matting |
Repo | https://github.com/MarcoForte/FBA-Matting |
Framework | pytorch |
Retrosynthesis Prediction with Conditional Graph Logic Network
Title | Retrosynthesis Prediction with Conditional Graph Logic Network |
Authors | Hanjun Dai, Chengtao Li, Connor W. Coley, Bo Dai, Le Song |
Abstract | Retrosynthesis is one of the fundamental problems in organic chemistry. The task is to identify reactants that can be used to synthesize a specified product molecule. Recently, computer-aided retrosynthesis is finding renewed interest from both chemistry and computer science communities. Most existing approaches rely on template-based models that define subgraph matching rules, but whether or not a chemical reaction can proceed is not defined by hard decision rules. In this work, we propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks that learns when rules from reaction templates should be applied, implicitly considering whether the resulting reaction would be both chemically feasible and strategic. We also propose an efficient hierarchical sampling to alleviate the computation cost. While achieving a significant improvement of $8.1%$ over current state-of-the-art methods on the benchmark dataset, our model also offers interpretations for the prediction. |
Tasks | |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01408v1 |
https://arxiv.org/pdf/2001.01408v1.pdf | |
PWC | https://paperswithcode.com/paper/retrosynthesis-prediction-with-conditional-1 |
Repo | https://github.com/Hanjun-Dai/GLN |
Framework | pytorch |
Utilizing a null class to restrict decision spaces and defend against neural network adversarial attacks
Title | Utilizing a null class to restrict decision spaces and defend against neural network adversarial attacks |
Authors | Matthew J. Roos |
Abstract | Despite recent progress, deep neural networks generally continue to be vulnerable to so-called adversarial examples–input images with small perturbations that can result in changes in the output classifications, despite no such change in the semantic meaning to human viewers. This is true even for seemingly simple challenges such as the MNIST digit classification task. In part, this suggests that these networks are not relying on the same set of object features as humans use to make these classifications. In this paper we examine an additional, and largely unexplored, cause behind this phenomenon–namely, the use of the conventional training paradigm in which the entire input space is parcellated among the training classes. Owing to this paradigm, learned decision spaces for individual classes span excessively large regions of the input space and include images that have no semantic similarity to images in the training set. In this study, we train models that include a null class. That is, models may “opt-out” of classifying an input image as one of the digit classes. During training, null images are created through a variety of methods, in an attempt to create tighter and more semantically meaningful decision spaces for the digit classes. The best performing models classify nearly all adversarial examples as nulls, rather than mistaking them as a member of an incorrect digit class, while simultaneously maintaining high accuracy on the unperturbed test set. The use of a null class and the training paradigm presented herein may provide an effective defense against adversarial attacks for some applications. Code for replicating this study will be made available at https://github.com/mattroos/null_class_adversarial_defense . |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10084v1 |
https://arxiv.org/pdf/2002.10084v1.pdf | |
PWC | https://paperswithcode.com/paper/utilizing-a-null-class-to-restrict-decision |
Repo | https://github.com/mattroos/null_class_adversarial_defense |
Framework | none |
UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World
Title | UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World |
Authors | Shangbang Long, Cong Yao |
Abstract | Synthetic data has been a critical tool for training scene text detection and recognition models. On the one hand, synthetic word images have proven to be a successful substitute for real images in training scene text recognizers. On the other hand, however, scene text detectors still heavily rely on a large amount of manually annotated real-world images, which are expensive. In this paper, we introduce UnrealText, an efficient image synthesis method that renders realistic images via a 3D graphics engine. 3D synthetic engine provides realistic appearance by rendering scene and text as a whole, and allows for better text region proposals with access to precise scene information, e.g. normal and even object meshes. The comprehensive experiments verify its effectiveness on both scene text detection and recognition. We also generate a multilingual version for future research into multilingual scene text detection and recognition. The code and the generated datasets are released at: https://github.com/Jyouhou/UnrealText/ . |
Tasks | Image Generation, Scene Text Detection |
Published | 2020-03-24 |
URL | https://arxiv.org/abs/2003.10608v1 |
https://arxiv.org/pdf/2003.10608v1.pdf | |
PWC | https://paperswithcode.com/paper/unrealtext-synthesizing-realistic-scene-text |
Repo | https://github.com/Jyouhou/UnrealText |
Framework | none |
Symmetrical Synthesis for Deep Metric Learning
Title | Symmetrical Synthesis for Deep Metric Learning |
Authors | Geonmo Gu, Byungsoo Ko |
Abstract | Deep metric learning aims to learn embeddings that contain semantic similarity information among data points. To learn better embeddings, methods to generate synthetic hard samples have been proposed. Existing methods of synthetic hard sample generation are adopting autoencoders or generative adversarial networks, but this leads to more hyper-parameters, harder optimization, and slower training speed. In this paper, we address these problems by proposing a novel method of synthetic hard sample generation called symmetrical synthesis. Given two original feature points from the same class, the proposed method firstly generates synthetic points with each other as an axis of symmetry. Secondly, it performs hard negative pair mining within the original and synthetic points to select a more informative negative pair for computing the metric learning loss. Our proposed method is hyper-parameter free and plug-and-play for existing metric learning losses without network modification. We demonstrate the superiority of our proposed method over existing methods for a variety of loss functions on clustering and image retrieval tasks. Our implementations is publicly available. |
Tasks | Image Retrieval, Metric Learning, Semantic Similarity, Semantic Textual Similarity |
Published | 2020-01-31 |
URL | https://arxiv.org/abs/2001.11658v2 |
https://arxiv.org/pdf/2001.11658v2.pdf | |
PWC | https://paperswithcode.com/paper/symmetrical-synthesis-for-deep-metric |
Repo | https://github.com/clovaai/symmetrical-synthesis |
Framework | tf |
Sparse principal component regression via singular value decomposition approach
Title | Sparse principal component regression via singular value decomposition approach |
Authors | Shuichi Kawano |
Abstract | Principal component regression (PCR) is a two-stage procedure: the first stage performs principal component analysis (PCA) and the second stage constructs a regression model whose explanatory variables are replaced by principal components obtained by the first stage. Since PCA is performed by using only explanatory variables, the principal components have no information about the response variable. To address the problem, we propose a one-stage procedure for PCR in terms of singular value decomposition approach. Our approach is based upon two loss functions, a regression loss and a PCA loss, with sparse regularization. The proposed method enables us to obtain principal component loadings that possess information about both explanatory variables and a response variable. An estimation algorithm is developed by using alternating direction method of multipliers. We conduct numerical studies to show the effectiveness of the proposed method. |
Tasks | |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09188v1 |
https://arxiv.org/pdf/2002.09188v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-principal-component-regression-via |
Repo | https://github.com/ShuichiKawano/spcr-svd |
Framework | none |
From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds
Title | From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds |
Authors | Christiane Sommer, Yumin Sun, Leonidas Guibas, Daniel Cremers, Tolga Birdal |
Abstract | We propose a new method for segmentation-free joint estimation of orthogonal planes, their intersection lines, relationship graph and corners lying at the intersection of three orthogonal planes. Such unified scene exploration under orthogonality allows for multitudes of applications such as semantic plane detection or local and global scan alignment, which in turn can aid robot localization or grasping tasks. Our two-stage pipeline involves a rough yet joint estimation of orthogonal planes followed by a subsequent joint refinement of plane parameters respecting their orthogonality relations. We form a graph of these primitives, paving the way to the extraction of further reliable features: lines and corners. Our experiments demonstrate the validity of our approach in numerous scenarios from wall detection to 6D tracking, both on synthetic and real data. |
Tasks | |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07360v1 |
https://arxiv.org/pdf/2001.07360v1.pdf | |
PWC | https://paperswithcode.com/paper/from-planes-to-corners-multi-purpose |
Repo | https://github.com/c-sommer/orthogonal-planes |
Framework | none |
Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches
Title | Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches |
Authors | Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia, Jiyang Xie, Yi-Zhe Song, Zhanyu Ma, Jun Guo |
Abstract | Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works mainly tackle this problem by focusing on how to locate the most discriminative parts, more complementary parts, and parts of various granularities. However, less effort has been placed to which granularities are the most discriminative and how to fuse information cross multi-granularity. In this work, we propose a novel framework for fine-grained visual classification to tackle these problems. In particular, we propose: (i) a novel progressive training strategy that adds new layers in each training step to exploit information based on the smaller granularity information found at the last step and the previous stage. (ii) a simple jigsaw puzzle generator to form images contain information of different granularity levels. We obtain state-of-the-art performances on several standard FGVC benchmark datasets, where the proposed method consistently outperforms existing methods or delivers competitive results. The code will be available at https://github.com/RuoyiDu/PMG-Progressive-Multi-Granularity-Training. |
Tasks | Fine-Grained Image Classification |
Published | 2020-03-08 |
URL | https://arxiv.org/abs/2003.03836v2 |
https://arxiv.org/pdf/2003.03836v2.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-visual-classification-via |
Repo | https://github.com/RuoyiDu/PMG-Progressive-Multi-Granularity-Training |
Framework | pytorch |
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius
Title | MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius |
Authors | Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang |
Abstract | Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing can be used to provide a certified l2 radius to smoothed classifiers, and our algorithm trains provably robust smoothed classifiers via MAximizing the CErtified Radius (MACER). The attack-free characteristic makes MACER faster to train and easier to optimize. In our experiments, we show that our method can be applied to modern deep neural networks on a wide range of datasets, including Cifar-10, ImageNet, MNIST, and SVHN. For all tasks, MACER spends less training time than state-of-the-art adversarial training algorithms, and the learned models achieve larger average certified radius. |
Tasks | |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.02378v3 |
https://arxiv.org/pdf/2001.02378v3.pdf | |
PWC | https://paperswithcode.com/paper/macer-attack-free-and-scalable-robust-1 |
Repo | https://github.com/RuntianZ/macer |
Framework | pytorch |
Real-time Fusion Network for RGB-D Semantic Segmentation Incorporating Unexpected Obstacle Detection for Road-driving Images
Title | Real-time Fusion Network for RGB-D Semantic Segmentation Incorporating Unexpected Obstacle Detection for Road-driving Images |
Authors | Lei Sun, Kailun Yang, Xinxin Hu, Weijian Hu, Kaiwei Wang |
Abstract | Semantic segmentation has made striking progress due to the success of deep convolutional neural networks. Considering the demand of autonomous driving, real-time semantic segmentation has become a research hotspot these years. However, few real-time RGB-D fusion semantic segmentation studies are carried out despite readily accessible depth information nowadays. In this paper, we propose a real-time fusion semantic segmentation network termed RFNet that efficiently exploits complementary features from depth information to enhance the performance in an attention-augmented way, while running swiftly that is a necessity for autonomous vehicles applications. Multi-dataset training is leveraged to incorporate unexpected small obstacle detection, enriching the recognizable classes required to face unforeseen hazards in the real world. A comprehensive set of experiments demonstrates the effectiveness of our framework. On \textit{Cityscapes}, Our method outperforms previous state-of-the-art semantic segmenters, with excellent accuracy and 22Hz inference speed at the full 2048$\times$1024 resolution, outperforming most existing RGB-D networks. |
Tasks | Autonomous Driving, Autonomous Vehicles, Real-Time Semantic Segmentation, Semantic Segmentation |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10570v1 |
https://arxiv.org/pdf/2002.10570v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-fusion-network-for-rgb-d-semantic |
Repo | https://github.com/AHupuJR/RFNet |
Framework | pytorch |
Learning Certified Individually Fair Representations
Title | Learning Certified Individually Fair Representations |
Authors | Anian Ruoss, Mislav Balunović, Marc Fischer, Martin Vechev |
Abstract | To effectively enforce fairness constraints one needs to define an appropriate notion of fairness and employ representation learning in order to impose this notion without compromising downstream utility for the data consumer. A desirable notion is individual fairness as it guarantees similar treatment for similar individuals. In this work, we introduce the first method which generalizes individual fairness to rich similarity notions via logical constraints while also enabling data consumers to obtain fairness certificates for their models. The key idea is to learn a representation that provably maps similar individuals to latent representations at most $\epsilon$ apart in $\ell_{\infty}$-distance, enabling data consumers to certify individual fairness by proving $\epsilon$-robustness of their classifier. Our experimental evaluation on six real-world datasets and a wide range of fairness constraints demonstrates that our approach is expressive enough to capture similarity notions beyond existing distance metrics while scaling to realistic use cases. |
Tasks | Representation Learning |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10312v1 |
https://arxiv.org/pdf/2002.10312v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-certified-individually-fair |
Repo | https://github.com/eth-sri/lcifr |
Framework | pytorch |
Algorithms for Tensor Network Contraction Ordering
Title | Algorithms for Tensor Network Contraction Ordering |
Authors | Frank Schindler, Adam S. Jermyn |
Abstract | Contracting tensor networks is often computationally demanding. Well-designed contraction sequences can dramatically reduce the contraction cost. We explore the performance of simulated annealing and genetic algorithms, two common discrete optimization techniques, to this ordering problem. We benchmark their performance as well as that of the commonly-used greedy search on physically relevant tensor networks. Where computationally feasible, we also compare them with the optimal contraction sequence obtained by an exhaustive search. We find that the algorithms we consider consistently outperform a greedy search given equal computational resources, with an advantage that scales with tensor network size. We compare the obtained contraction sequences and identify signs of highly non-local optimization, with the more sophisticated algorithms sacrificing run-time early in the contraction for better overall performance. |
Tasks | Tensor Networks |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.08063v1 |
https://arxiv.org/pdf/2001.08063v1.pdf | |
PWC | https://paperswithcode.com/paper/algorithms-for-tensor-network-contraction |
Repo | https://github.com/frankschindler/OptimizedTensorContraction |
Framework | none |
Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition
Title | Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition |
Authors | Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang |
Abstract | Handwritten text and scene text suffer from various shapes and distorted patterns. Thus training a robust recognition model requires a large amount of data to cover diversity as much as possible. In contrast to data collection and annotation, data augmentation is a low cost way. In this paper, we propose a new method for text image augmentation. Different from traditional augmentation methods such as rotation, scaling and perspective transformation, our proposed augmentation method is designed to learn proper and efficient data augmentation which is more effective and specific for training a robust recognizer. By using a set of custom fiducial points, the proposed augmentation method is flexible and controllable. Furthermore, we bridge the gap between the isolated processes of data augmentation and network optimization by joint learning. An agent network learns from the output of the recognition network and controls the fiducial points to generate more proper training samples for the recognition network. Extensive experiments on various benchmarks, including regular scene text, irregular scene text and handwritten text, show that the proposed augmentation and the joint learning methods significantly boost the performance of the recognition networks. A general toolkit for geometric augmentation is available. |
Tasks | Data Augmentation, Image Augmentation |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.06606v1 |
https://arxiv.org/pdf/2003.06606v1.pdf | |
PWC | https://paperswithcode.com/paper/learn-to-augment-joint-data-augmentation-and |
Repo | https://github.com/Canjie-Luo/Scene-Text-Image-Transformer |
Framework | pytorch |
Reliable Fidelity and Diversity Metrics for Generative Models
Title | Reliable Fidelity and Diversity Metrics for Generative Models |
Authors | Muhammad Ferjad Naeem, Seong Joon Oh, Youngjung Uh, Yunjey Choi, Jaejun Yoo |
Abstract | Devising indicative evaluation metrics for the image generation task remains an open problem. The most widely used metric for measuring the similarity between real and generated images has been the Fr'echet Inception Distance (FID) score. Because it does not differentiate the fidelity and diversity aspects of the generated images, recent papers have introduced variants of precision and recall metrics to diagnose those properties separately. In this paper, we show that even the latest version of the precision and recall metrics are not reliable yet. For example, they fail to detect the match between two identical distributions, they are not robust against outliers, and the evaluation hyperparameters are selected arbitrarily. We propose density and coverage metrics that solve the above issues. We analytically and experimentally show that density and coverage provide more interpretable and reliable signals for practitioners than the existing metrics. Code: https://github.com/clovaai/generative-evaluation-prdc. |
Tasks | Image Generation |
Published | 2020-02-23 |
URL | https://arxiv.org/abs/2002.09797v1 |
https://arxiv.org/pdf/2002.09797v1.pdf | |
PWC | https://paperswithcode.com/paper/reliable-fidelity-and-diversity-metrics-for |
Repo | https://github.com/clovaai/generative-evaluation-prdc |
Framework | none |
CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues
Title | CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues |
Authors | Francisco J. Chiyah Garcia, José Lopes, Xingkun Liu, Helen Hastie |
Abstract | Large corpora of task-based and open-domain conversational dialogues are hugely valuable in the field of data-driven dialogue systems. Crowdsourcing platforms, such as Amazon Mechanical Turk, have been an effective method for collecting such large amounts of data. However, difficulties arise when task-based dialogues require expert domain knowledge or rapid access to domain-relevant information, such as databases for tourism. This will become even more prevalent as dialogue systems become increasingly ambitious, expanding into tasks with high levels of complexity that require collaboration and forward planning, such as in our domain of emergency response. In this paper, we propose CRWIZ: a framework for collecting real-time Wizard of Oz dialogues through crowdsourcing for collaborative, complex tasks. This framework uses semi-guided dialogue to avoid interactions that breach procedures and processes only known to experts, while enabling the capture of a wide variety of interactions. The framework is available at https://github.com/JChiyah/crwiz |
Tasks | |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.05995v1 |
https://arxiv.org/pdf/2003.05995v1.pdf | |
PWC | https://paperswithcode.com/paper/crwiz-a-framework-for-crowdsourcing-real-time |
Repo | https://github.com/JChiyah/crwiz |
Framework | none |