February 2, 2020

3113 words 15 mins read

Paper Group AWR 19

Neural-Network Guided Expression Transformation. Automatic Labeled LiDAR Data Generation based on Precise Human Model. A New Benchmark for Evaluation of Cross-Domain Few-Shot Learning. Image2StyleGAN++: How to Edit the Embedded Images?. Learning to Reconstruct 3D Manhattan Wireframes from a Single Image. Understanding and Robustifying Differentiabl …

Neural-Network Guided Expression Transformation


Title	Neural-Network Guided Expression Transformation
Authors	Romain Edelmann, Viktor Kunčak
Abstract	Optimizing compilers, as well as other translator systems, often work by rewriting expressions according to equivalence preserving rules. Given an input expression and its optimized form, finding the sequence of rules that were applied is a non-trivial task. Most of the time, the tools provide no proof, of any kind, of the equivalence between the original expression and its optimized form. In this work, we propose to reconstruct proofs of equivalence of simple mathematical expressions, after the fact, by finding paths of equivalence preserving transformations between expressions. We propose to find those sequences of transformations using a search algorithm, guided by a neural network heuristic. Using a Tree-LSTM recursive neural network, we learn a distributed representation of expressions where the Manhattan distance between vectors approximately corresponds to the rewrite distance between expressions. We then show how the neural network can be efficiently used to search for transformation paths, leading to substantial gain in speed compared to an uninformed exhaustive search. In one of our experiments, our neural-network guided search algorithm is able to solve more instances with a 2 seconds timeout per instance than breadth-first search does with a 5 minutes timeout per instance.
Tasks
Published	2019-02-06
URL	http://arxiv.org/abs/1902.02194v1
PDF	http://arxiv.org/pdf/1902.02194v1.pdf
PWC	https://paperswithcode.com/paper/neural-network-guided-expression
Repo	https://github.com/epfl-lara/nugget
Framework	pytorch

Automatic Labeled LiDAR Data Generation based on Precise Human Model


Title	Automatic Labeled LiDAR Data Generation based on Precise Human Model
Authors	Wonjik Kim, Masayuki Tanaka, Masatoshi Okutomi, Yoko Sasaki
Abstract	Following improvements in deep neural networks, state-of-the-art networks have been proposed for human recognition using point clouds captured by LiDAR. However, the performance of these networks strongly depends on the training data. An issue with collecting training data is labeling. Labeling by humans is necessary to obtain the ground truth label; however, labeling requires huge costs. Therefore, we propose an automatic labeled data generation pipeline, for which we can change any parameters or data generation environments. Our approach uses a human model named Dhaiba and a background of Miraikan and consequently generated realistic artificial data. We present 500k+ data generated by the proposed pipeline. This paper also describes the specification of the pipeline and data details with evaluations of various approaches.
Tasks
Published	2019-02-14
URL	http://arxiv.org/abs/1902.05341v1
PDF	http://arxiv.org/pdf/1902.05341v1.pdf
PWC	https://paperswithcode.com/paper/automatic-labeled-lidar-data-generation-based
Repo	https://github.com/Likarian/AutomaticLabeledLiDARData
Framework	tf

A New Benchmark for Evaluation of Cross-Domain Few-Shot Learning


Title	A New Benchmark for Evaluation of Cross-Domain Few-Shot Learning
Authors	Yunhui Guo, Noel C. F. Codella, Leonid Karlinsky, John R. Smith, Tajana Rosing, Rogerio Feris
Abstract	Recent progress on few-shot learning has largely re-lied on annotated data for meta-learning, sampled from the same domain as the novel classes. However, in many applications, collecting data for meta-learning is infeasible or impossible. This leads to the cross-domain few-shot learn-ing problem, where a large domain shift exists between base and novel classes. Although some preliminary investigation of the few-shot methods under domain shift exists, a standard benchmark for cross-domain few-shot learning is not yet established. In this paper, we propose the cross-domain few-shot learning (CD-FSL) benchmark, consist-ing of images from diverse domains with varying similarity to ImageNet, ranging from crop disease images, satellite images, and medical images. Extensive experiments on the proposed benchmark are performed to compare an array of state-of-art meta-learning and transfer learning approaches, including various forms of single model fine-tuning and ensemble learning. The results demonstrate that current meta-learning methods underperform in relation to simple fine-tuning by 12.8% average accuracy. Accuracy of all methods tend to correlate with dataset similarity toImageNet. In addition, the relative performance gain with increasing number of shots is greater with transfer methods compared to meta-learning. Finally, we demonstrate that transferring from multiple pretrained models achieves best performance, with accuracy improvements of 14.9% and 1.9% versus the best of meta-learning and single model fine-tuning approaches, respectively. In summary, the proposed benchmark serves as a challenging platform to guide future research on cross-domain few-shot learning due to its spectrum of diversity and coverage
Tasks	Cross-Domain Few-Shot, cross-domain few-shot learning, Few-Shot Image Classification, Few-Shot Learning, Meta-Learning, Transfer Learning
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07200v1
PDF	https://arxiv.org/pdf/1912.07200v1.pdf
PWC	https://paperswithcode.com/paper/a-new-benchmark-for-evaluation-of-cross
Repo	https://github.com/IBM/cdfsl-benchmark
Framework	pytorch

Image2StyleGAN++: How to Edit the Embedded Images?


Title	Image2StyleGAN++: How to Edit the Embedded Images?
Authors	Rameen Abdal, Yipeng Qin, Peter Wonka
Abstract	We propose Image2StyleGAN++, a flexible image editing framework with many applications. Our framework extends the recent Image2StyleGAN in three ways. First, we introduce noise optimization as a complement to the $W^+$ latent space embedding. Our noise optimization can restore high frequency features in images and thus significantly improves the quality of reconstructed images, e.g. a big increase of PSNR from 20 dB to 45 dB. Second, we extend the global $W^+$ latent space embedding to enable local embeddings. Third, we combine embedding with activation tensor manipulation to perform high quality local edits along with global semantic edits on images. Such edits motivate various high quality image editing applications, e.g. image reconstruction, image inpainting, image crossover, local style transfer, image editing using scribbles, and attribute level feature transfer. Examples of the edited images are shown across the paper for visual inspection.
Tasks	Image Inpainting, Image Reconstruction, Style Transfer
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11544v1
PDF	https://arxiv.org/pdf/1911.11544v1.pdf
PWC	https://paperswithcode.com/paper/image2stylegan-how-to-edit-the-embedded
Repo	https://github.com/pacifinapacific/StyleGAN_LatentEditor
Framework	pytorch

Learning to Reconstruct 3D Manhattan Wireframes from a Single Image


Title	Learning to Reconstruct 3D Manhattan Wireframes from a Single Image
Authors	Yichao Zhou, Haozhi Qi, Yuexiang Zhai, Qi Sun, Zhili Chen, Li-Yi Wei, Yi Ma
Abstract	In this paper, we propose a method to obtain a compact and accurate 3D wireframe representation from a single image by effectively exploiting global structural regularities. Our method trains a convolutional neural network to simultaneously detect salient junctions and straight lines, as well as predict their 3D depth and vanishing points. Compared with the state-of-the-art learning-based wireframe detection methods, our network is much simpler and more unified, leading to better 2D wireframe detection. With global structural priors such as Manhattan assumption, our method further reconstructs a full 3D wireframe model, a compact vector representation suitable for a variety of high-level vision tasks such as AR and CAD. We conduct extensive evaluations on a large synthetic dataset of urban scenes as well as real images. Our code and datasets will be released.
Tasks
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07482v1
PDF	https://arxiv.org/pdf/1905.07482v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-reconstruct-3d-manhattan
Repo	https://github.com/zhou13/neurvps
Framework	pytorch

Understanding and Robustifying Differentiable Architecture Search


Title	Understanding and Robustifying Differentiable Architecture Search
Authors	Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter
Abstract	Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem. However, DARTS does not work robustly for new problems: we identify a wide range of search spaces for which DARTS yields degenerate architectures with very poor test performance. We study this failure mode and show that, while DARTS successfully minimizes validation loss, the found solutions generalize poorly when they coincide with high validation loss curvature in the architecture space. We show that by adding one of various types of regularization we can robustify DARTS to find solutions with less curvature and better generalization properties. Based on these observations, we propose several simple variations of DARTS that perform substantially more robustly in practice. Our observations are robust across five search spaces on three image classification tasks and also hold for the very different domains of disparity estimation (a dense regression task) and language modelling.
Tasks	Disparity Estimation, Image Classification, Language Modelling
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09656v2
PDF	https://arxiv.org/pdf/1909.09656v2.pdf
PWC	https://paperswithcode.com/paper/190909656
Repo	https://github.com/automl/RobustDARTS
Framework	pytorch

AutoDispNet: Improving Disparity Estimation With AutoML


Title	AutoDispNet: Improving Disparity Estimation With AutoML
Authors	Tonmoy Saikia, Yassine Marrakchi, Arber Zela, Frank Hutter, Thomas Brox
Abstract	Much research work in computer vision is being spent on optimizing existing network architectures to obtain a few more percentage points on benchmarks. Recent AutoML approaches promise to relieve us from this effort. However, they are mainly designed for comparatively small-scale classification tasks. In this work, we show how to use and extend existing AutoML techniques to efficiently optimize large-scale U-Net-like encoder-decoder architectures. In particular, we leverage gradient-based neural architecture search and Bayesian optimization for hyperparameter search. The resulting optimization does not require a large-scale compute cluster. We show results on disparity estimation that clearly outperform the manually optimized baseline and reach state-of-the-art performance.
Tasks	AutoML, Disparity Estimation, Neural Architecture Search
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07443v2
PDF	https://arxiv.org/pdf/1905.07443v2.pdf
PWC	https://paperswithcode.com/paper/autodispnet-improving-disparity-estimation
Repo	https://github.com/lmb-freiburg/autodispnet
Framework	tf

IPC: A Benchmark Data Set for Learning with Graph-Structured Data


Title	IPC: A Benchmark Data Set for Learning with Graph-Structured Data
Authors	Patrick Ferber, Tengfei Ma, Siyu Huo, Jie Chen, Michael Katz
Abstract	Benchmark data sets are an indispensable ingredient of the evaluation of graph-based machine learning methods. We release a new data set, compiled from International Planning Competitions (IPC), for benchmarking graph classification, regression, and related tasks. Apart from the graph construction (based on AI planning problems) that is interesting in its own right, the data set possesses distinctly different characteristics from popularly used benchmarks. The data set, named IPC, consists of two self-contained versions, grounded and lifted, both including graphs of large and skewedly distributed sizes, posing substantial challenges for the computation of graph models such as graph kernels and graph neural networks. The graphs in this data set are directed and the lifted version is acyclic, offering the opportunity of benchmarking specialized models for directed (acyclic) structures. Moreover, the graph generator and the labeling are computer programmed; thus, the data set may be extended easily if a larger scale is desired. The data set is accessible from \url{https://github.com/IBM/IPC-graph-data}.
Tasks	Graph Classification, graph construction
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06393v1
PDF	https://arxiv.org/pdf/1905.06393v1.pdf
PWC	https://paperswithcode.com/paper/ipc-a-benchmark-data-set-for-learning-with
Repo	https://github.com/IBM/IPC-graph-data
Framework	none

Weighted Boxes Fusion: ensembling boxes for object detection models


Title	Weighted Boxes Fusion: ensembling boxes for object detection models
Authors	Roman Solovyev, Weimin Wang
Abstract	In this work, we introduce a novel Weighted Box Fusion (WBF) ensembling algorithm that boosts the performance by ensembling predictions from different object detection models. Method was tested on predictions of different models trained on large Open Images Dataset. The source code for our approach is publicly available at https://github.com/ZFTurbo/Weighted-Boxes-Fusion
Tasks	Object Detection
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13302v1
PDF	https://arxiv.org/pdf/1910.13302v1.pdf
PWC	https://paperswithcode.com/paper/weighted-boxes-fusion-ensembling-boxes-for
Repo	https://github.com/ZFTurbo/Weighted-Boxes-Fusion
Framework	none

Attention routing between capsules


Title	Attention routing between capsules
Authors	Jaewoong Choi, Hyun Seo, Suii Im, Myungjoo Kang
Abstract	In this paper, we propose a new capsule network architecture called Attention Routing CapsuleNet (AR CapsNet). We replace the dynamic routing and squash activation function of the capsule network with dynamic routing (CapsuleNet) with the attention routing and capsule activation. The attention routing is a routing between capsules through an attention module. The attention routing is a fast forward-pass while keeping spatial information. On the other hand, the intuitive interpretation of the dynamic routing is finding a centroid of the prediction capsules. Thus, the squash activation function and its variant focus on preserving a vector orientation. However, the capsule activation focuses on performing a capsule-scale activation function. We evaluate our proposed model on the MNIST, affNIST, and CIFAR-10 classification tasks. The proposed model achieves higher accuracy with fewer parameters (x0.65 in the MNIST, x0.82 in the CIFAR-10) and less training time than CapsuleNet (x0.19 in the MNIST, x0.35 in the CIFAR-10). These results validate that designing a capsule-scale operation is a key factor to implement the capsule concept. Also, our experiment shows that our proposed model is transformation equivariant as CapsuleNet. As we perturb each element of the output capsule, the decoder attached to the output capsules shows global variations. Further experiments show that the difference in the capsule features caused by applying affine transformations on an input image is significantly aligned in one direction.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01750v4
PDF	https://arxiv.org/pdf/1907.01750v4.pdf
PWC	https://paperswithcode.com/paper/attention-routing-between-capsules
Repo	https://github.com/chjw1475/Attention-Routing-Capsules
Framework	tf

OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks


Title	OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks
Authors	Jiashi Li, Qi Qi, Jingyu Wang, Ce Ge, Yujian Li, Zhangzhang Yue, Haifeng Sun
Abstract	Channel pruning can significantly accelerate and compress deep neural networks. Many channel pruning works utilize structured sparsity regularization to zero out all the weights in some channels and automatically obtain structure-sparse network in training stage. However, these methods apply structured sparsity regularization on each layer separately where the correlations between consecutive layers are omitted. In this paper, we first combine one out-channel in current layer and the corresponding in-channel in next layer as a regularization group, namely out-in-channel. Our proposed Out-In-Channel Sparsity Regularization (OICSR) considers correlations between successive layers to further retain predictive power of the compact network. Training with OICSR thoroughly transfers discriminative features into a fraction of out-in-channels. Correspondingly, OICSR measures channel importance based on statistics computed from two consecutive layers, not individual layer. Finally, a global greedy pruning algorithm is designed to remove redundant out-in-channels in an iterative way. Our method is comprehensively evaluated with various CNN architectures including CifarNet, AlexNet, ResNet, DenseNet and PreActSeNet on CIFAR-10, CIFAR-100 and ImageNet-1K datasets. Notably, on ImageNet-1K, we reduce 37.2% FLOPs on ResNet-50 while outperforming the original model by 0.22% top-1 accuracy.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11664v5
PDF	https://arxiv.org/pdf/1905.11664v5.pdf
PWC	https://paperswithcode.com/paper/oicsr-out-in-channel-sparsity-regularization-1
Repo	https://github.com/dsfour/OICSR
Framework	pytorch

A block-random algorithm for learning on distributed, heterogeneous data


Title	A block-random algorithm for learning on distributed, heterogeneous data
Authors	Prakash Mohan, Marc T. Henry de Frahan, Ryan King, Ray W. Grout
Abstract	Most deep learning models are based on deep neural networks with multiple layers between input and output. The parameters defining these layers are initialized using random values and are “learned” from data, typically using stochastic gradient descent based algorithms. These algorithms rely on data being randomly shuffled before optimization. The randomization of the data prior to processing in batches that is formally required for stochastic gradient descent algorithm to effectively derive a useful deep learning model is expected to be prohibitively expensive for in situ model training because of the resulting data communications across the processor nodes. We show that the stochastic gradient descent (SGD) algorithm can still make useful progress if the batches are defined on a per-processor basis and processed in random order even though (i) the batches are constructed from data samples from a single class or specific flow region, and (ii) the overall data samples are heterogeneous. We present block-random gradient descent, a new algorithm that works on distributed, heterogeneous data without having to pre-shuffle. This algorithm enables in situ learning for exascale simulations. The performance of this algorithm is demonstrated on a set of benchmark classification models and the construction of a subgrid scale large eddy simulations (LES) model for turbulent channel flow using a data model similar to that which will be encountered in exascale simulation.
Tasks
Published	2019-02-28
URL	http://arxiv.org/abs/1903.00091v1
PDF	http://arxiv.org/pdf/1903.00091v1.pdf
PWC	https://paperswithcode.com/paper/a-block-random-algorithm-for-learning-on
Repo	https://github.com/NREL/block-random
Framework	none

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector


Title	Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector
Authors	Qi Fan, Wei Zhuo, Chi-Keung Tang, Yu-Wing Tai
Abstract	Conventional methods for object detection typically require a substantial amount of training data and preparing such high-quality training data is very labor-intensive. In this paper, we propose a novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples. Central to our method are our Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which exploit the similarity between the few shot support set and query set to detect novel objects while suppressing false detection in the background. To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations. To the best of our knowledge, this is one of the first datasets specifically designed for few-shot object detection. Once our few-shot network is trained, it can detect objects of unseen categories without further training or fine-tuning. Our method is general and has a wide range of potential applications. We produce a new state-of-the-art performance on different datasets in the few-shot setting. The dataset link is https://github.com/fanq15/Few-Shot-Object-Detection-Dataset.
Tasks	Few-Shot Object Detection, Object Detection
Published	2019-08-06
URL	https://arxiv.org/abs/1908.01998v3
PDF	https://arxiv.org/pdf/1908.01998v3.pdf
PWC	https://paperswithcode.com/paper/few-shot-object-detection-with-attention-rpn
Repo	https://github.com/fanq15/Few-Shot-Object-Detection-Dataset
Framework	none

Character Region Awareness for Text Detection


Title	Character Region Awareness for Text Detection
Authors	Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee
Abstract	Scene text detection methods based on neural networks have emerged recently and have shown promising results. Previous methods trained with rigid word-level bounding boxes exhibit limitations in representing the text region in an arbitrary shape. In this paper, we propose a new scene text detection method to effectively detect text area by exploring each character and affinity between characters. To overcome the lack of individual character level annotations, our proposed framework exploits both the given character-level annotations for synthetic images and the estimated character-level ground-truths for real images acquired by the learned interim model. In order to estimate affinity between characters, the network is trained with the newly proposed representation for affinity. Extensive experiments on six benchmarks, including the TotalText and CTW-1500 datasets which contain highly curved texts in natural images, demonstrate that our character-level text detection significantly outperforms the state-of-the-art detectors. According to the results, our proposed method guarantees high flexibility in detecting complicated scene text images, such as arbitrarily-oriented, curved, or deformed texts.
Tasks	Scene Text Detection
Published	2019-04-03
URL	http://arxiv.org/abs/1904.01941v1
PDF	http://arxiv.org/pdf/1904.01941v1.pdf
PWC	https://paperswithcode.com/paper/character-region-awareness-for-text-detection
Repo	https://github.com/dipu-bd/craft-moran-ocr
Framework	pytorch

Emotion Action Detection and Emotion Inference: the Task and Dataset


Title	Emotion Action Detection and Emotion Inference: the Task and Dataset
Authors	Pengyuan Liu, Chengyu Du, Shuofeng Zhao, Chenghao Zhu
Abstract	Many Natural Language Processing works on emotion analysis only focus on simple emotion classification without exploring the potentials of putting emotion into “event context”, and ignore the analysis of emotion-related events. One main reason is the lack of this kind of corpus. Here we present Cause-Emotion-Action Corpus, which manually annotates not only emotion, but also cause events and action events. We propose two new tasks based on the data-set: emotion causality and emotion inference. The first task is to extract a triple (cause, emotion, action). The second task is to infer the probable emotion. We are currently releasing the data-set with 10,603 samples and 15,892 events, basic statistic analysis and baseline on both emotion causality and emotion inference tasks. Baseline performance demonstrates that there is much room for both tasks to be improved.
Tasks	Action Detection, Emotion Classification, Emotion Recognition
Published	2019-03-16
URL	http://arxiv.org/abs/1903.06901v1
PDF	http://arxiv.org/pdf/1903.06901v1.pdf
PWC	https://paperswithcode.com/paper/emotion-action-detection-and-emotion
Repo	https://github.com/liupengyuan/EmotionAction_EmotionInference
Framework	none