February 2, 2020

3113 words 15 mins read

Paper Group AWR 19

Paper Group AWR 19

Neural-Network Guided Expression Transformation. Automatic Labeled LiDAR Data Generation based on Precise Human Model. A New Benchmark for Evaluation of Cross-Domain Few-Shot Learning. Image2StyleGAN++: How to Edit the Embedded Images?. Learning to Reconstruct 3D Manhattan Wireframes from a Single Image. Understanding and Robustifying Differentiabl …

Neural-Network Guided Expression Transformation

Title Neural-Network Guided Expression Transformation
Authors Romain Edelmann, Viktor Kunčak
Abstract Optimizing compilers, as well as other translator systems, often work by rewriting expressions according to equivalence preserving rules. Given an input expression and its optimized form, finding the sequence of rules that were applied is a non-trivial task. Most of the time, the tools provide no proof, of any kind, of the equivalence between the original expression and its optimized form. In this work, we propose to reconstruct proofs of equivalence of simple mathematical expressions, after the fact, by finding paths of equivalence preserving transformations between expressions. We propose to find those sequences of transformations using a search algorithm, guided by a neural network heuristic. Using a Tree-LSTM recursive neural network, we learn a distributed representation of expressions where the Manhattan distance between vectors approximately corresponds to the rewrite distance between expressions. We then show how the neural network can be efficiently used to search for transformation paths, leading to substantial gain in speed compared to an uninformed exhaustive search. In one of our experiments, our neural-network guided search algorithm is able to solve more instances with a 2 seconds timeout per instance than breadth-first search does with a 5 minutes timeout per instance.
Tasks
Published 2019-02-06
URL http://arxiv.org/abs/1902.02194v1
PDF http://arxiv.org/pdf/1902.02194v1.pdf
PWC https://paperswithcode.com/paper/neural-network-guided-expression
Repo https://github.com/epfl-lara/nugget
Framework pytorch

Automatic Labeled LiDAR Data Generation based on Precise Human Model

Title Automatic Labeled LiDAR Data Generation based on Precise Human Model
Authors Wonjik Kim, Masayuki Tanaka, Masatoshi Okutomi, Yoko Sasaki
Abstract Following improvements in deep neural networks, state-of-the-art networks have been proposed for human recognition using point clouds captured by LiDAR. However, the performance of these networks strongly depends on the training data. An issue with collecting training data is labeling. Labeling by humans is necessary to obtain the ground truth label; however, labeling requires huge costs. Therefore, we propose an automatic labeled data generation pipeline, for which we can change any parameters or data generation environments. Our approach uses a human model named Dhaiba and a background of Miraikan and consequently generated realistic artificial data. We present 500k+ data generated by the proposed pipeline. This paper also describes the specification of the pipeline and data details with evaluations of various approaches.
Tasks
Published 2019-02-14
URL http://arxiv.org/abs/1902.05341v1
PDF http://arxiv.org/pdf/1902.05341v1.pdf
PWC https://paperswithcode.com/paper/automatic-labeled-lidar-data-generation-based
Repo https://github.com/Likarian/AutomaticLabeledLiDARData
Framework tf

A New Benchmark for Evaluation of Cross-Domain Few-Shot Learning

Title A New Benchmark for Evaluation of Cross-Domain Few-Shot Learning
Authors Yunhui Guo, Noel C. F. Codella, Leonid Karlinsky, John R. Smith, Tajana Rosing, Rogerio Feris
Abstract Recent progress on few-shot learning has largely re-lied on annotated data for meta-learning, sampled from the same domain as the novel classes. However, in many applications, collecting data for meta-learning is infeasible or impossible. This leads to the cross-domain few-shot learn-ing problem, where a large domain shift exists between base and novel classes. Although some preliminary investigation of the few-shot methods under domain shift exists, a standard benchmark for cross-domain few-shot learning is not yet established. In this paper, we propose the cross-domain few-shot learning (CD-FSL) benchmark, consist-ing of images from diverse domains with varying similarity to ImageNet, ranging from crop disease images, satellite images, and medical images. Extensive experiments on the proposed benchmark are performed to compare an array of state-of-art meta-learning and transfer learning approaches, including various forms of single model fine-tuning and ensemble learning. The results demonstrate that current meta-learning methods underperform in relation to simple fine-tuning by 12.8% average accuracy. Accuracy of all methods tend to correlate with dataset similarity toImageNet. In addition, the relative performance gain with increasing number of shots is greater with transfer methods compared to meta-learning. Finally, we demonstrate that transferring from multiple pretrained models achieves best performance, with accuracy improvements of 14.9% and 1.9% versus the best of meta-learning and single model fine-tuning approaches, respectively. In summary, the proposed benchmark serves as a challenging platform to guide future research on cross-domain few-shot learning due to its spectrum of diversity and coverage
Tasks Cross-Domain Few-Shot, cross-domain few-shot learning, Few-Shot Image Classification, Few-Shot Learning, Meta-Learning, Transfer Learning
Published 2019-12-16
URL https://arxiv.org/abs/1912.07200v1
PDF https://arxiv.org/pdf/1912.07200v1.pdf
PWC https://paperswithcode.com/paper/a-new-benchmark-for-evaluation-of-cross
Repo https://github.com/IBM/cdfsl-benchmark
Framework pytorch

Image2StyleGAN++: How to Edit the Embedded Images?

Title Image2StyleGAN++: How to Edit the Embedded Images?
Authors Rameen Abdal, Yipeng Qin, Peter Wonka
Abstract We propose Image2StyleGAN++, a flexible image editing framework with many applications. Our framework extends the recent Image2StyleGAN in three ways. First, we introduce noise optimization as a complement to the $W^+$ latent space embedding. Our noise optimization can restore high frequency features in images and thus significantly improves the quality of reconstructed images, e.g. a big increase of PSNR from 20 dB to 45 dB. Second, we extend the global $W^+$ latent space embedding to enable local embeddings. Third, we combine embedding with activation tensor manipulation to perform high quality local edits along with global semantic edits on images. Such edits motivate various high quality image editing applications, e.g. image reconstruction, image inpainting, image crossover, local style transfer, image editing using scribbles, and attribute level feature transfer. Examples of the edited images are shown across the paper for visual inspection.
Tasks Image Inpainting, Image Reconstruction, Style Transfer
Published 2019-11-26
URL https://arxiv.org/abs/1911.11544v1
PDF https://arxiv.org/pdf/1911.11544v1.pdf
PWC https://paperswithcode.com/paper/image2stylegan-how-to-edit-the-embedded
Repo https://github.com/pacifinapacific/StyleGAN_LatentEditor
Framework pytorch

Learning to Reconstruct 3D Manhattan Wireframes from a Single Image

Title Learning to Reconstruct 3D Manhattan Wireframes from a Single Image
Authors Yichao Zhou, Haozhi Qi, Yuexiang Zhai, Qi Sun, Zhili Chen, Li-Yi Wei, Yi Ma
Abstract In this paper, we propose a method to obtain a compact and accurate 3D wireframe representation from a single image by effectively exploiting global structural regularities. Our method trains a convolutional neural network to simultaneously detect salient junctions and straight lines, as well as predict their 3D depth and vanishing points. Compared with the state-of-the-art learning-based wireframe detection methods, our network is much simpler and more unified, leading to better 2D wireframe detection. With global structural priors such as Manhattan assumption, our method further reconstructs a full 3D wireframe model, a compact vector representation suitable for a variety of high-level vision tasks such as AR and CAD. We conduct extensive evaluations on a large synthetic dataset of urban scenes as well as real images. Our code and datasets will be released.
Tasks
Published 2019-05-17
URL https://arxiv.org/abs/1905.07482v1
PDF https://arxiv.org/pdf/1905.07482v1.pdf
PWC https://paperswithcode.com/paper/learning-to-reconstruct-3d-manhattan
Repo https://github.com/zhou13/neurvps
Framework pytorch
Title Understanding and Robustifying Differentiable Architecture Search
Authors Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter
Abstract Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem. However, DARTS does not work robustly for new problems: we identify a wide range of search spaces for which DARTS yields degenerate architectures with very poor test performance. We study this failure mode and show that, while DARTS successfully minimizes validation loss, the found solutions generalize poorly when they coincide with high validation loss curvature in the architecture space. We show that by adding one of various types of regularization we can robustify DARTS to find solutions with less curvature and better generalization properties. Based on these observations, we propose several simple variations of DARTS that perform substantially more robustly in practice. Our observations are robust across five search spaces on three image classification tasks and also hold for the very different domains of disparity estimation (a dense regression task) and language modelling.
Tasks Disparity Estimation, Image Classification, Language Modelling
Published 2019-09-20
URL https://arxiv.org/abs/1909.09656v2
PDF https://arxiv.org/pdf/1909.09656v2.pdf
PWC https://paperswithcode.com/paper/190909656
Repo https://github.com/automl/RobustDARTS
Framework pytorch

AutoDispNet: Improving Disparity Estimation With AutoML

Title AutoDispNet: Improving Disparity Estimation With AutoML
Authors Tonmoy Saikia, Yassine Marrakchi, Arber Zela, Frank Hutter, Thomas Brox
Abstract Much research work in computer vision is being spent on optimizing existing network architectures to obtain a few more percentage points on benchmarks. Recent AutoML approaches promise to relieve us from this effort. However, they are mainly designed for comparatively small-scale classification tasks. In this work, we show how to use and extend existing AutoML techniques to efficiently optimize large-scale U-Net-like encoder-decoder architectures. In particular, we leverage gradient-based neural architecture search and Bayesian optimization for hyperparameter search. The resulting optimization does not require a large-scale compute cluster. We show results on disparity estimation that clearly outperform the manually optimized baseline and reach state-of-the-art performance.
Tasks AutoML, Disparity Estimation, Neural Architecture Search
Published 2019-05-17
URL https://arxiv.org/abs/1905.07443v2
PDF https://arxiv.org/pdf/1905.07443v2.pdf
PWC https://paperswithcode.com/paper/autodispnet-improving-disparity-estimation
Repo https://github.com/lmb-freiburg/autodispnet
Framework tf

IPC: A Benchmark Data Set for Learning with Graph-Structured Data

Title IPC: A Benchmark Data Set for Learning with Graph-Structured Data
Authors Patrick Ferber, Tengfei Ma, Siyu Huo, Jie Chen, Michael Katz
Abstract Benchmark data sets are an indispensable ingredient of the evaluation of graph-based machine learning methods. We release a new data set, compiled from International Planning Competitions (IPC), for benchmarking graph classification, regression, and related tasks. Apart from the graph construction (based on AI planning problems) that is interesting in its own right, the data set possesses distinctly different characteristics from popularly used benchmarks. The data set, named IPC, consists of two self-contained versions, grounded and lifted, both including graphs of large and skewedly distributed sizes, posing substantial challenges for the computation of graph models such as graph kernels and graph neural networks. The graphs in this data set are directed and the lifted version is acyclic, offering the opportunity of benchmarking specialized models for directed (acyclic) structures. Moreover, the graph generator and the labeling are computer programmed; thus, the data set may be extended easily if a larger scale is desired. The data set is accessible from \url{https://github.com/IBM/IPC-graph-data}.
Tasks Graph Classification, graph construction
Published 2019-05-15
URL https://arxiv.org/abs/1905.06393v1
PDF https://arxiv.org/pdf/1905.06393v1.pdf
PWC https://paperswithcode.com/paper/ipc-a-benchmark-data-set-for-learning-with
Repo https://github.com/IBM/IPC-graph-data
Framework none

Weighted Boxes Fusion: ensembling boxes for object detection models

Title Weighted Boxes Fusion: ensembling boxes for object detection models
Authors Roman Solovyev, Weimin Wang
Abstract In this work, we introduce a novel Weighted Box Fusion (WBF) ensembling algorithm that boosts the performance by ensembling predictions from different object detection models. Method was tested on predictions of different models trained on large Open Images Dataset. The source code for our approach is publicly available at https://github.com/ZFTurbo/Weighted-Boxes-Fusion
Tasks Object Detection
Published 2019-10-29
URL https://arxiv.org/abs/1910.13302v1
PDF https://arxiv.org/pdf/1910.13302v1.pdf
PWC https://paperswithcode.com/paper/weighted-boxes-fusion-ensembling-boxes-for
Repo https://github.com/ZFTurbo/Weighted-Boxes-Fusion
Framework none

Attention routing between capsules

Title Attention routing between capsules
Authors Jaewoong Choi, Hyun Seo, Suii Im, Myungjoo Kang
Abstract In this paper, we propose a new capsule network architecture called Attention Routing CapsuleNet (AR CapsNet). We replace the dynamic routing and squash activation function of the capsule network with dynamic routing (CapsuleNet) with the attention routing and capsule activation. The attention routing is a routing between capsules through an attention module. The attention routing is a fast forward-pass while keeping spatial information. On the other hand, the intuitive interpretation of the dynamic routing is finding a centroid of the prediction capsules. Thus, the squash activation function and its variant focus on preserving a vector orientation. However, the capsule activation focuses on performing a capsule-scale activation function. We evaluate our proposed model on the MNIST, affNIST, and CIFAR-10 classification tasks. The proposed model achieves higher accuracy with fewer parameters (x0.65 in the MNIST, x0.82 in the CIFAR-10) and less training time than CapsuleNet (x0.19 in the MNIST, x0.35 in the CIFAR-10). These results validate that designing a capsule-scale operation is a key factor to implement the capsule concept. Also, our experiment shows that our proposed model is transformation equivariant as CapsuleNet. As we perturb each element of the output capsule, the decoder attached to the output capsules shows global variations. Further experiments show that the difference in the capsule features caused by applying affine transformations on an input image is significantly aligned in one direction.
Tasks
Published 2019-07-03
URL https://arxiv.org/abs/1907.01750v4
PDF https://arxiv.org/pdf/1907.01750v4.pdf
PWC https://paperswithcode.com/paper/attention-routing-between-capsules
Repo https://github.com/chjw1475/Attention-Routing-Capsules
Framework tf

OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks

Title OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks
Authors Jiashi Li, Qi Qi, Jingyu Wang, Ce Ge, Yujian Li, Zhangzhang Yue, Haifeng Sun
Abstract Channel pruning can significantly accelerate and compress deep neural networks. Many channel pruning works utilize structured sparsity regularization to zero out all the weights in some channels and automatically obtain structure-sparse network in training stage. However, these methods apply structured sparsity regularization on each layer separately where the correlations between consecutive layers are omitted. In this paper, we first combine one out-channel in current layer and the corresponding in-channel in next layer as a regularization group, namely out-in-channel. Our proposed Out-In-Channel Sparsity Regularization (OICSR) considers correlations between successive layers to further retain predictive power of the compact network. Training with OICSR thoroughly transfers discriminative features into a fraction of out-in-channels. Correspondingly, OICSR measures channel importance based on statistics computed from two consecutive layers, not individual layer. Finally, a global greedy pruning algorithm is designed to remove redundant out-in-channels in an iterative way. Our method is comprehensively evaluated with various CNN architectures including CifarNet, AlexNet, ResNet, DenseNet and PreActSeNet on CIFAR-10, CIFAR-100 and ImageNet-1K datasets. Notably, on ImageNet-1K, we reduce 37.2% FLOPs on ResNet-50 while outperforming the original model by 0.22% top-1 accuracy.
Tasks
Published 2019-05-28
URL https://arxiv.org/abs/1905.11664v5
PDF https://arxiv.org/pdf/1905.11664v5.pdf
PWC https://paperswithcode.com/paper/oicsr-out-in-channel-sparsity-regularization-1
Repo https://github.com/dsfour/OICSR
Framework pytorch

A block-random algorithm for learning on distributed, heterogeneous data

Title A block-random algorithm for learning on distributed, heterogeneous data
Authors Prakash Mohan, Marc T. Henry de Frahan, Ryan King, Ray W. Grout
Abstract Most deep learning models are based on deep neural networks with multiple layers between input and output. The parameters defining these layers are initialized using random values and are “learned” from data, typically using stochastic gradient descent based algorithms. These algorithms rely on data being randomly shuffled before optimization. The randomization of the data prior to processing in batches that is formally required for stochastic gradient descent algorithm to effectively derive a useful deep learning model is expected to be prohibitively expensive for in situ model training because of the resulting data communications across the processor nodes. We show that the stochastic gradient descent (SGD) algorithm can still make useful progress if the batches are defined on a per-processor basis and processed in random order even though (i) the batches are constructed from data samples from a single class or specific flow region, and (ii) the overall data samples are heterogeneous. We present block-random gradient descent, a new algorithm that works on distributed, heterogeneous data without having to pre-shuffle. This algorithm enables in situ learning for exascale simulations. The performance of this algorithm is demonstrated on a set of benchmark classification models and the construction of a subgrid scale large eddy simulations (LES) model for turbulent channel flow using a data model similar to that which will be encountered in exascale simulation.
Tasks
Published 2019-02-28
URL http://arxiv.org/abs/1903.00091v1
PDF http://arxiv.org/pdf/1903.00091v1.pdf
PWC https://paperswithcode.com/paper/a-block-random-algorithm-for-learning-on
Repo https://github.com/NREL/block-random
Framework none

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Title Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector
Authors Qi Fan, Wei Zhuo, Chi-Keung Tang, Yu-Wing Tai
Abstract Conventional methods for object detection typically require a substantial amount of training data and preparing such high-quality training data is very labor-intensive. In this paper, we propose a novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples. Central to our method are our Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which exploit the similarity between the few shot support set and query set to detect novel objects while suppressing false detection in the background. To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations. To the best of our knowledge, this is one of the first datasets specifically designed for few-shot object detection. Once our few-shot network is trained, it can detect objects of unseen categories without further training or fine-tuning. Our method is general and has a wide range of potential applications. We produce a new state-of-the-art performance on different datasets in the few-shot setting. The dataset link is https://github.com/fanq15/Few-Shot-Object-Detection-Dataset.
Tasks Few-Shot Object Detection, Object Detection
Published 2019-08-06
URL https://arxiv.org/abs/1908.01998v3
PDF https://arxiv.org/pdf/1908.01998v3.pdf
PWC https://paperswithcode.com/paper/few-shot-object-detection-with-attention-rpn
Repo https://github.com/fanq15/Few-Shot-Object-Detection-Dataset
Framework none

Character Region Awareness for Text Detection

Title Character Region Awareness for Text Detection
Authors Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee
Abstract Scene text detection methods based on neural networks have emerged recently and have shown promising results. Previous methods trained with rigid word-level bounding boxes exhibit limitations in representing the text region in an arbitrary shape. In this paper, we propose a new scene text detection method to effectively detect text area by exploring each character and affinity between characters. To overcome the lack of individual character level annotations, our proposed framework exploits both the given character-level annotations for synthetic images and the estimated character-level ground-truths for real images acquired by the learned interim model. In order to estimate affinity between characters, the network is trained with the newly proposed representation for affinity. Extensive experiments on six benchmarks, including the TotalText and CTW-1500 datasets which contain highly curved texts in natural images, demonstrate that our character-level text detection significantly outperforms the state-of-the-art detectors. According to the results, our proposed method guarantees high flexibility in detecting complicated scene text images, such as arbitrarily-oriented, curved, or deformed texts.
Tasks Scene Text Detection
Published 2019-04-03
URL http://arxiv.org/abs/1904.01941v1
PDF http://arxiv.org/pdf/1904.01941v1.pdf
PWC https://paperswithcode.com/paper/character-region-awareness-for-text-detection
Repo https://github.com/dipu-bd/craft-moran-ocr
Framework pytorch

Emotion Action Detection and Emotion Inference: the Task and Dataset

Title Emotion Action Detection and Emotion Inference: the Task and Dataset
Authors Pengyuan Liu, Chengyu Du, Shuofeng Zhao, Chenghao Zhu
Abstract Many Natural Language Processing works on emotion analysis only focus on simple emotion classification without exploring the potentials of putting emotion into “event context”, and ignore the analysis of emotion-related events. One main reason is the lack of this kind of corpus. Here we present Cause-Emotion-Action Corpus, which manually annotates not only emotion, but also cause events and action events. We propose two new tasks based on the data-set: emotion causality and emotion inference. The first task is to extract a triple (cause, emotion, action). The second task is to infer the probable emotion. We are currently releasing the data-set with 10,603 samples and 15,892 events, basic statistic analysis and baseline on both emotion causality and emotion inference tasks. Baseline performance demonstrates that there is much room for both tasks to be improved.
Tasks Action Detection, Emotion Classification, Emotion Recognition
Published 2019-03-16
URL http://arxiv.org/abs/1903.06901v1
PDF http://arxiv.org/pdf/1903.06901v1.pdf
PWC https://paperswithcode.com/paper/emotion-action-detection-and-emotion
Repo https://github.com/liupengyuan/EmotionAction_EmotionInference
Framework none
comments powered by Disqus