February 1, 2020

3102 words 15 mins read

Paper Group AWR 114

High-Resolution Representations for Labeling Pixels and Regions. Differentially Private Mixed-Type Data Generation For Unsupervised Learning. Controllable List-wise Ranking for Universal No-reference Image Quality Assessment. Learning joint reconstruction of hands and manipulated objects. Unstructured Multi-View Depth Estimation Using Mask-Based Mu …

High-Resolution Representations for Labeling Pixels and Regions


Title	High-Resolution Representations for Labeling Pixels and Regions
Authors	Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang
Abstract	High-resolution representation learning plays an essential role in many vision problems, e.g., pose estimation and semantic segmentation. The high-resolution network (HRNet)~\cite{SunXLW19}, recently developed for human pose estimation, maintains high-resolution representations through the whole process by connecting high-to-low resolution convolutions in \emph{parallel} and produces strong high-resolution representations by repeatedly conducting fusions across parallel convolutions. In this paper, we conduct a further study on high-resolution representations by introducing a simple yet effective modification and apply it to a wide range of vision tasks. We augment the high-resolution representation by aggregating the (upsampled) representations from all the parallel convolutions rather than only the representation from the high-resolution convolution as done in~\cite{SunXLW19}. This simple modification leads to stronger representations, evidenced by superior results. We show top results in semantic segmentation on Cityscapes, LIP, and PASCAL Context, and facial landmark detection on AFLW, COFW, $300$W, and WFLW. In addition, we build a multi-level representation from the high-resolution representation and apply it to the Faster R-CNN object detection framework and the extended frameworks. The proposed approach achieves superior results to existing single-model networks on COCO object detection. The code and models have been publicly available at \url{https://github.com/HRNet}.
Tasks	Facial Landmark Detection, Object Detection, Pose Estimation, Representation Learning, Semantic Segmentation
Published	2019-04-09
URL	http://arxiv.org/abs/1904.04514v1
PDF	http://arxiv.org/pdf/1904.04514v1.pdf
PWC	https://paperswithcode.com/paper/high-resolution-representations-for-labeling
Repo	https://github.com/HRNet/HRNet-Image-Classification
Framework	pytorch

Differentially Private Mixed-Type Data Generation For Unsupervised Learning


Title	Differentially Private Mixed-Type Data Generation For Unsupervised Learning
Authors	Uthaipon Tantipongpipat, Chris Waites, Digvijay Boob, Amaresh Ankit Siva, Rachel Cummings
Abstract	In this work we introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in raw sensitive data, and privately train a model for generating synthetic data that will satisfy the same statistical properties as the original data. This learned model can be used to generate arbitrary amounts of publicly available synthetic data, which can then be freely shared due to the post-processing guarantees of differential privacy. Our framework is applicable to unlabeled mixed-type data, that may include binary, categorical, and real-valued data. We implement this framework on both unlabeled binary data (MIMIC-III) and unlabeled mixed-type data (ADULT). We also introduce new metrics for evaluating the quality of synthetic mixed-type data, particularly in unsupervised settings.
Tasks	Synthetic Data Generation
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03250v1
PDF	https://arxiv.org/pdf/1912.03250v1.pdf
PWC	https://paperswithcode.com/paper/differentially-private-mixed-type-data-1
Repo	https://github.com/DPautoGAN/DPautoGAN
Framework	pytorch

Controllable List-wise Ranking for Universal No-reference Image Quality Assessment


Title	Controllable List-wise Ranking for Universal No-reference Image Quality Assessment
Authors	Fu-Zhao Ou, Yuan-Gen Wang, Jin Li, Guopu Zhu, Sam Kwong
Abstract	No-reference image quality assessment (NR-IQA) has received increasing attention in the IQA community since reference image is not always available. Real-world images generally suffer from various types of distortion. Unfortunately, existing NR-IQA methods do not work with all types of distortion. It is a challenging task to develop universal NR-IQA that has the ability of evaluating all types of distorted images. In this paper, we propose a universal NR-IQA method based on controllable list-wise ranking (CLRIQA). First, to extend the authentically distorted image dataset, we present an imaging-heuristic approach, in which the over-underexposure is formulated as an inverse of Weber-Fechner law, and fusion strategy and probabilistic compression are adopted, to generate the degraded real-world images. These degraded images are label-free yet associated with quality ranking information. We then design a controllable list-wise ranking function by limiting rank range and introducing an adaptive margin to tune rank interval. Finally, the extended dataset and controllable list-wise ranking function are used to pre-train a CNN. Moreover, in order to obtain an accurate prediction model, we take advantage of the original dataset to further fine-tune the pre-trained network. Experiments evaluated on four benchmark datasets (i.e. LIVE, CSIQ, TID2013, and LIVE-C) show that the proposed CLRIQA improves the state of the art by over 9% in terms of overall performance. The code and model are publicly available at https://github.com/GZHU-Image-Lab/CLRIQA.
Tasks	Image Quality Assessment, No-Reference Image Quality Assessment
Published	2019-11-24
URL	https://arxiv.org/abs/1911.10566v2
PDF	https://arxiv.org/pdf/1911.10566v2.pdf
PWC	https://paperswithcode.com/paper/controllable-list-wise-ranking-for-universal
Repo	https://github.com/GZHU-Image-Lab/CLRIQA
Framework	none

Learning joint reconstruction of hands and manipulated objects


Title	Learning joint reconstruction of hands and manipulated objects
Authors	Yana Hasson, Gül Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, Cordelia Schmid
Abstract	Estimating hand-object manipulations is essential for interpreting and imitating human actions. Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation. Yet, reconstructing hands and objects during manipulation is a more challenging task due to significant occlusions of both the hand and object. While presenting challenges, manipulations may also simplify the problem since the physics of contact restricts the space of valid hand-object configurations. For example, during manipulation, the hand and object should be in contact but not interpenetrate. In this work, we regularize the joint reconstruction of hands and objects with manipulation constraints. We present an end-to-end learnable model that exploits a novel contact loss that favors physically plausible hand-object constellations. Our approach improves grasp quality metrics over baselines, using RGB images as input. To train and evaluate the model, we also propose a new large-scale synthetic dataset, ObMan, with hand-object manipulations. We demonstrate the transferability of ObMan-trained models to real data.
Tasks	Hand Joint Reconstruction
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05767v1
PDF	http://arxiv.org/pdf/1904.05767v1.pdf
PWC	https://paperswithcode.com/paper/learning-joint-reconstruction-of-hands-and
Repo	https://github.com/hassony2/manopth
Framework	pytorch

Unstructured Multi-View Depth Estimation Using Mask-Based Multiplane Representation


Title	Unstructured Multi-View Depth Estimation Using Mask-Based Multiplane Representation
Authors	Yuxin Hou, Arno Solin, Juho Kannala
Abstract	This paper presents a novel method, MaskMVS, to solve depth estimation for unstructured multi-view image-pose pairs. In the plane-sweep procedure, the depth planes are sampled by histogram matching that ensures covering the depth range of interest. Unlike other plane-sweep methods, we do not rely on a cost metric to explicitly build the cost volume, but instead infer a multiplane mask representation which regularizes the learning. Compared to many previous approaches, we show that our method is lightweight and generalizes well without requiring excessive training. We outperform the current state-of-the-art and show results on the sun3d, scenes11, MVS, and RGBD test data sets.
Tasks	Depth Estimation
Published	2019-02-06
URL	http://arxiv.org/abs/1902.02166v2
PDF	http://arxiv.org/pdf/1902.02166v2.pdf
PWC	https://paperswithcode.com/paper/unstructured-multi-view-depth-estimation
Repo	https://github.com/AaltoVision/MaskMVS
Framework	pytorch

On Mutual Information Maximization for Representation Learning


Title	On Mutual Information Maximization for Representation Learning
Authors	Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic
Abstract	Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods.
Tasks	Representation Learning, Self-Supervised Image Classification
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13625v2
PDF	https://arxiv.org/pdf/1907.13625v2.pdf
PWC	https://paperswithcode.com/paper/on-mutual-information-maximization-for
Repo	https://github.com/google-research/google-research/tree/master/mutual_information_representation_learning
Framework	tf

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search


Title	Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search
Authors	Youhei Akimoto, Shinichi Shirakawa, Nozomu Yoshinari, Kento Uchida, Shota Saito, Kouhei Nishida
Abstract	High sensitivity of neural architecture search (NAS) methods against their input such as step-size (i.e., learning rate) and search space prevents practitioners from applying them out-of-the-box to their own problems, albeit its purpose is to automate a part of tuning process. Aiming at a fast, robust, and widely-applicable NAS, we develop a generic optimization framework for NAS. We turn a coupled optimization of connection weights and neural architecture into a differentiable optimization by means of stochastic relaxation. It accepts arbitrary search space (widely-applicable) and enables to employ a gradient-based simultaneous optimization of weights and architecture (fast). We propose a stochastic natural gradient method with an adaptive step-size mechanism built upon our theoretical investigation (robust). Despite its simplicity and no problem-dependent parameter tuning, our method exhibited near state-of-the-art performances with low computational budgets both on image classification and inpainting tasks.
Tasks	Image Classification, Neural Architecture Search
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08537v1
PDF	https://arxiv.org/pdf/1905.08537v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-stochastic-natural-gradient-method
Repo	https://github.com/shirakawas/ASNG-NAS
Framework	pytorch

Fast AutoAugment


Title	Fast AutoAugment
Authors	Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, Sungwoong Kim
Abstract	Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset. In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, SVHN, and ImageNet.
Tasks	Data Augmentation, Image Augmentation, Image Classification
Published	2019-05-01
URL	https://arxiv.org/abs/1905.00397v2
PDF	https://arxiv.org/pdf/1905.00397v2.pdf
PWC	https://paperswithcode.com/paper/fast-autoaugment
Repo	https://github.com/junkwhinger/fastautoaugment_jsh
Framework	pytorch

Bayesian Learning of Neural Network Architectures


Title	Bayesian Learning of Neural Network Architectures
Authors	Georgi Dikov, Patrick van der Smagt, Justin Bayer
Abstract	In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum.
Tasks	Neural Architecture Search
Published	2019-01-14
URL	http://arxiv.org/abs/1901.04436v2
PDF	http://arxiv.org/pdf/1901.04436v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-learning-of-neural-network
Repo	https://github.com/antonFJohansson/Bayesian-Learning-of-Neural-Network-Architectures
Framework	none

Region segmentation via deep learning and convex optimization


Title	Region segmentation via deep learning and convex optimization
Authors	Matthias Sonntag, Veniamin I. Morgenshtern
Abstract	In this paper, we propose a method to segment regions in three-dimensional point clouds. We assume that (i) the shape and the number of regions in the point cloud are not known and (ii) the point cloud may be noisy. The method consists of two steps. In the first step we use a deep neural network to predict the probability that a pair of small patches from the point cloud belongs to the same region. In the second step, we use a convex-optimization based method to improve the predictions of the network by enforcing consistency constraints. We evaluate the accuracy of our method on a custom dataset of convex polyhedra, where the regions correspond to the faces of the polyhedra. The method can be seen as a robust and flexible alternative to the famous region growing segmentation algorithm. All reported results are reproducible and come with easy to use code that could serve as a baseline for future research.
Tasks
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12870v1
PDF	https://arxiv.org/pdf/1911.12870v1.pdf
PWC	https://paperswithcode.com/paper/region-segmentation-via-deep-learning-and
Repo	https://github.com/vmorgenshtern/deepsegmentation
Framework	pytorch

LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks


Title	LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks
Authors	Swalpa Kumar Roy, Suvojit Manna, Shiv Ram Dubey, Bidyut B. Chaudhuri
Abstract	The activation function in neural network is one of the important aspects which facilitates the deep training by introducing the non-linearity into the learning process. However, because of zero-hard rectification, some the of existing activations function such as ReLU and Swish miss to utilize the negative input values and may suffer from the dying gradient problem. Thus, it is important to look for a better activation function which is free from such problems. As a remedy, this paper proposes a new non-parametric function, called Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs). The proposed LiSHT activation function is an attempt to scale the non-linear Hyperbolic Tangent (Tanh) function by a linear function and tackle the dying gradient problem. The training and classification experiments are performed over benchmark Car Evaluation, Iris, MNIST, CIFAR10, CIFAR100 and twitter140 datasets to show that the proposed activation achieves faster convergence and higher performance. A very promising performance improvement is observed on three different type of neural networks including Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent neural network like Long-short term memory (LSTM). The advantages of proposed activation function are also visualized in terms of the feature activation maps, weight distribution and loss landscape.
Tasks
Published	2019-01-01
URL	http://arxiv.org/abs/1901.05894v1
PDF	http://arxiv.org/pdf/1901.05894v1.pdf
PWC	https://paperswithcode.com/paper/lisht-non-parametric-linearly-scaled
Repo	https://github.com/lessw2020/LightRelu
Framework	pytorch

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection


Title	M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
Authors	Garrick Brazil, Xiaoming Liu
Abstract	Understanding the world in 3D is a critical component of urban autonomous driving. Generally, the combination of expensive LiDAR sensors and stereo RGB imaging has been paramount for successful 3D object detection algorithms, whereas monocular image-only methods experience drastically reduced performance. We propose to reduce the gap by reformulating the monocular 3D detection problem as a standalone 3D region proposal network. We leverage the geometric relationship of 2D and 3D perspectives, allowing 3D boxes to utilize well-known and powerful convolutional features generated in the image-space. To help address the strenuous 3D parameter estimations, we further design depth-aware convolutional layers which enable location specific feature development and in consequence improved 3D scene understanding. Compared to prior work in monocular 3D detection, our method consists of only the proposed 3D region proposal network rather than relying on external networks, data, or multiple stages. M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird’s Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.
Tasks	3D Object Detection, Autonomous Driving, Object Detection, Scene Understanding
Published	2019-07-13
URL	https://arxiv.org/abs/1907.06038v2
PDF	https://arxiv.org/pdf/1907.06038v2.pdf
PWC	https://paperswithcode.com/paper/m3d-rpn-monocular-3d-region-proposal-network
Repo	https://github.com/garrickbrazil/M3D-RPN
Framework	pytorch

Adaptive NMS: Refining Pedestrian Detection in a Crowd


Title	Adaptive NMS: Refining Pedestrian Detection in a Crowd
Authors	Songtao Liu, Di Huang, Yunhong Wang
Abstract	Pedestrian detection in a crowd is a very challenging issue. This paper addresses this problem by a novel Non-Maximum Suppression (NMS) algorithm to better refine the bounding boxes given by detectors. The contributions are threefold: (1) we propose adaptive-NMS, which applies a dynamic suppression threshold to an instance, according to the target density; (2) we design an efficient subnetwork to learn density scores, which can be conveniently embedded into both the single-stage and two-stage detectors; and (3) we achieve state of the art results on the CityPersons and CrowdHuman benchmarks.
Tasks	Pedestrian Detection
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03629v1
PDF	http://arxiv.org/pdf/1904.03629v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-nms-refining-pedestrian-detection-in
Repo	https://github.com/abhinavsagar/Pedestrian-detection
Framework	none

LINSPECTOR: Multilingual Probing Tasks for Word Representations


Title	LINSPECTOR: Multilingual Probing Tasks for Word Representations
Authors	Gözde Gül Şahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych
Abstract	Despite an ever growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the community to get an estimate of the downstream task performance, as well as to design more informed neural architectures, while avoiding extensive experimentation which requires substantial computational resources not all researchers have access to. A recent development in NLP is to use simple classification tasks, also called probing tasks, that test for a single linguistic feature such as part-of-speech. Existing studies mostly focus on exploring the linguistic information encoded by the continuous representations of English text. However, from a typological perspective the morphologically poor English is rather an outlier: the information encoded by the word order and function words in English is often stored on a morphological level in other languages. To address this, we introduce 15 type-level probing tasks such as case marking, possession, word length, morphological tag count and pseudoword identification for 24 languages. We present a reusable methodology for creation and evaluation of such tests in a multilingual setting. We then present experiments on several diverse multilingual word embedding models, in which we relate the probing task performance for a diverse set of languages to a range of five classic NLP tasks: POS-tagging, dependency parsing, semantic role labeling, named entity recognition and natural language inference. We find that a number of probing tests have significantly high positive correlation to the downstream tasks, especially for morphologically rich languages. We show that our tests can be used to explore word embeddings or black-box neural models for linguistic cues in a multilingual setting.
Tasks	Dependency Parsing, Named Entity Recognition, Natural Language Inference, Semantic Role Labeling, Word Embeddings
Published	2019-03-22
URL	https://arxiv.org/abs/1903.09442v2
PDF	https://arxiv.org/pdf/1903.09442v2.pdf
PWC	https://paperswithcode.com/paper/linspector-multilingual-probing-tasks-for
Repo	https://github.com/maexe/linspector-web
Framework	none

Knowledge Distillation from Internal Representations


Title	Knowledge Distillation from Internal Representations
Authors	Gustavo Aguilar, Yuan Ling, Yu Zhang, Benjamin Yao, Xing Fan, Chenlei Guo
Abstract	Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as soft-labels to optimize the student. However, when the teacher is considerably large, there is no guarantee that the internal knowledge of the teacher will be transferred into the student; even if the student closely matches the soft-labels, its internal representations may be considerably different. This internal mismatch can undermine the generalization capabilities originally intended to be transferred from the teacher to the student. In this paper, we propose to distill the internal representations of a large model such as BERT into a simplified version of it. We formulate two ways to distill such representations and various algorithms to conduct the distillation. We experiment with datasets from the GLUE benchmark and consistently show that adding knowledge distillation from internal representations is a more powerful method than only using soft-label distillation.
Tasks
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03723v2
PDF	https://arxiv.org/pdf/1910.03723v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-distillation-from-internal
Repo	https://github.com/ElenaKutanov/BertelsmannChallengeAI-
Framework	tf