Paper Group AWR 114
High-Resolution Representations for Labeling Pixels and Regions. Differentially Private Mixed-Type Data Generation For Unsupervised Learning. Controllable List-wise Ranking for Universal No-reference Image Quality Assessment. Learning joint reconstruction of hands and manipulated objects. Unstructured Multi-View Depth Estimation Using Mask-Based Mu …
High-Resolution Representations for Labeling Pixels and Regions
Title | High-Resolution Representations for Labeling Pixels and Regions |
Authors | Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang |
Abstract | High-resolution representation learning plays an essential role in many vision problems, e.g., pose estimation and semantic segmentation. The high-resolution network (HRNet)~\cite{SunXLW19}, recently developed for human pose estimation, maintains high-resolution representations through the whole process by connecting high-to-low resolution convolutions in \emph{parallel} and produces strong high-resolution representations by repeatedly conducting fusions across parallel convolutions. In this paper, we conduct a further study on high-resolution representations by introducing a simple yet effective modification and apply it to a wide range of vision tasks. We augment the high-resolution representation by aggregating the (upsampled) representations from all the parallel convolutions rather than only the representation from the high-resolution convolution as done in~\cite{SunXLW19}. This simple modification leads to stronger representations, evidenced by superior results. We show top results in semantic segmentation on Cityscapes, LIP, and PASCAL Context, and facial landmark detection on AFLW, COFW, $300$W, and WFLW. In addition, we build a multi-level representation from the high-resolution representation and apply it to the Faster R-CNN object detection framework and the extended frameworks. The proposed approach achieves superior results to existing single-model networks on COCO object detection. The code and models have been publicly available at \url{https://github.com/HRNet}. |
Tasks | Facial Landmark Detection, Object Detection, Pose Estimation, Representation Learning, Semantic Segmentation |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04514v1 |
http://arxiv.org/pdf/1904.04514v1.pdf | |
PWC | https://paperswithcode.com/paper/high-resolution-representations-for-labeling |
Repo | https://github.com/HRNet/HRNet-Image-Classification |
Framework | pytorch |
Differentially Private Mixed-Type Data Generation For Unsupervised Learning
Title | Differentially Private Mixed-Type Data Generation For Unsupervised Learning |
Authors | Uthaipon Tantipongpipat, Chris Waites, Digvijay Boob, Amaresh Ankit Siva, Rachel Cummings |
Abstract | In this work we introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in raw sensitive data, and privately train a model for generating synthetic data that will satisfy the same statistical properties as the original data. This learned model can be used to generate arbitrary amounts of publicly available synthetic data, which can then be freely shared due to the post-processing guarantees of differential privacy. Our framework is applicable to unlabeled mixed-type data, that may include binary, categorical, and real-valued data. We implement this framework on both unlabeled binary data (MIMIC-III) and unlabeled mixed-type data (ADULT). We also introduce new metrics for evaluating the quality of synthetic mixed-type data, particularly in unsupervised settings. |
Tasks | Synthetic Data Generation |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.03250v1 |
https://arxiv.org/pdf/1912.03250v1.pdf | |
PWC | https://paperswithcode.com/paper/differentially-private-mixed-type-data-1 |
Repo | https://github.com/DPautoGAN/DPautoGAN |
Framework | pytorch |
Controllable List-wise Ranking for Universal No-reference Image Quality Assessment
Title | Controllable List-wise Ranking for Universal No-reference Image Quality Assessment |
Authors | Fu-Zhao Ou, Yuan-Gen Wang, Jin Li, Guopu Zhu, Sam Kwong |
Abstract | No-reference image quality assessment (NR-IQA) has received increasing attention in the IQA community since reference image is not always available. Real-world images generally suffer from various types of distortion. Unfortunately, existing NR-IQA methods do not work with all types of distortion. It is a challenging task to develop universal NR-IQA that has the ability of evaluating all types of distorted images. In this paper, we propose a universal NR-IQA method based on controllable list-wise ranking (CLRIQA). First, to extend the authentically distorted image dataset, we present an imaging-heuristic approach, in which the over-underexposure is formulated as an inverse of Weber-Fechner law, and fusion strategy and probabilistic compression are adopted, to generate the degraded real-world images. These degraded images are label-free yet associated with quality ranking information. We then design a controllable list-wise ranking function by limiting rank range and introducing an adaptive margin to tune rank interval. Finally, the extended dataset and controllable list-wise ranking function are used to pre-train a CNN. Moreover, in order to obtain an accurate prediction model, we take advantage of the original dataset to further fine-tune the pre-trained network. Experiments evaluated on four benchmark datasets (i.e. LIVE, CSIQ, TID2013, and LIVE-C) show that the proposed CLRIQA improves the state of the art by over 9% in terms of overall performance. The code and model are publicly available at https://github.com/GZHU-Image-Lab/CLRIQA. |
Tasks | Image Quality Assessment, No-Reference Image Quality Assessment |
Published | 2019-11-24 |
URL | https://arxiv.org/abs/1911.10566v2 |
https://arxiv.org/pdf/1911.10566v2.pdf | |
PWC | https://paperswithcode.com/paper/controllable-list-wise-ranking-for-universal |
Repo | https://github.com/GZHU-Image-Lab/CLRIQA |
Framework | none |
Learning joint reconstruction of hands and manipulated objects
Title | Learning joint reconstruction of hands and manipulated objects |
Authors | Yana Hasson, Gül Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, Cordelia Schmid |
Abstract | Estimating hand-object manipulations is essential for interpreting and imitating human actions. Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation. Yet, reconstructing hands and objects during manipulation is a more challenging task due to significant occlusions of both the hand and object. While presenting challenges, manipulations may also simplify the problem since the physics of contact restricts the space of valid hand-object configurations. For example, during manipulation, the hand and object should be in contact but not interpenetrate. In this work, we regularize the joint reconstruction of hands and objects with manipulation constraints. We present an end-to-end learnable model that exploits a novel contact loss that favors physically plausible hand-object constellations. Our approach improves grasp quality metrics over baselines, using RGB images as input. To train and evaluate the model, we also propose a new large-scale synthetic dataset, ObMan, with hand-object manipulations. We demonstrate the transferability of ObMan-trained models to real data. |
Tasks | Hand Joint Reconstruction |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05767v1 |
http://arxiv.org/pdf/1904.05767v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-joint-reconstruction-of-hands-and |
Repo | https://github.com/hassony2/manopth |
Framework | pytorch |
Unstructured Multi-View Depth Estimation Using Mask-Based Multiplane Representation
Title | Unstructured Multi-View Depth Estimation Using Mask-Based Multiplane Representation |
Authors | Yuxin Hou, Arno Solin, Juho Kannala |
Abstract | This paper presents a novel method, MaskMVS, to solve depth estimation for unstructured multi-view image-pose pairs. In the plane-sweep procedure, the depth planes are sampled by histogram matching that ensures covering the depth range of interest. Unlike other plane-sweep methods, we do not rely on a cost metric to explicitly build the cost volume, but instead infer a multiplane mask representation which regularizes the learning. Compared to many previous approaches, we show that our method is lightweight and generalizes well without requiring excessive training. We outperform the current state-of-the-art and show results on the sun3d, scenes11, MVS, and RGBD test data sets. |
Tasks | Depth Estimation |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.02166v2 |
http://arxiv.org/pdf/1902.02166v2.pdf | |
PWC | https://paperswithcode.com/paper/unstructured-multi-view-depth-estimation |
Repo | https://github.com/AaltoVision/MaskMVS |
Framework | pytorch |
On Mutual Information Maximization for Representation Learning
Title | On Mutual Information Maximization for Representation Learning |
Authors | Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic |
Abstract | Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods. |
Tasks | Representation Learning, Self-Supervised Image Classification |
Published | 2019-07-31 |
URL | https://arxiv.org/abs/1907.13625v2 |
https://arxiv.org/pdf/1907.13625v2.pdf | |
PWC | https://paperswithcode.com/paper/on-mutual-information-maximization-for |
Repo | https://github.com/google-research/google-research/tree/master/mutual_information_representation_learning |
Framework | tf |
Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search
Title | Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search |
Authors | Youhei Akimoto, Shinichi Shirakawa, Nozomu Yoshinari, Kento Uchida, Shota Saito, Kouhei Nishida |
Abstract | High sensitivity of neural architecture search (NAS) methods against their input such as step-size (i.e., learning rate) and search space prevents practitioners from applying them out-of-the-box to their own problems, albeit its purpose is to automate a part of tuning process. Aiming at a fast, robust, and widely-applicable NAS, we develop a generic optimization framework for NAS. We turn a coupled optimization of connection weights and neural architecture into a differentiable optimization by means of stochastic relaxation. It accepts arbitrary search space (widely-applicable) and enables to employ a gradient-based simultaneous optimization of weights and architecture (fast). We propose a stochastic natural gradient method with an adaptive step-size mechanism built upon our theoretical investigation (robust). Despite its simplicity and no problem-dependent parameter tuning, our method exhibited near state-of-the-art performances with low computational budgets both on image classification and inpainting tasks. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08537v1 |
https://arxiv.org/pdf/1905.08537v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-stochastic-natural-gradient-method |
Repo | https://github.com/shirakawas/ASNG-NAS |
Framework | pytorch |
Fast AutoAugment
Title | Fast AutoAugment |
Authors | Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, Sungwoong Kim |
Abstract | Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset. In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, SVHN, and ImageNet. |
Tasks | Data Augmentation, Image Augmentation, Image Classification |
Published | 2019-05-01 |
URL | https://arxiv.org/abs/1905.00397v2 |
https://arxiv.org/pdf/1905.00397v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-autoaugment |
Repo | https://github.com/junkwhinger/fastautoaugment_jsh |
Framework | pytorch |
Bayesian Learning of Neural Network Architectures
Title | Bayesian Learning of Neural Network Architectures |
Authors | Georgi Dikov, Patrick van der Smagt, Justin Bayer |
Abstract | In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum. |
Tasks | Neural Architecture Search |
Published | 2019-01-14 |
URL | http://arxiv.org/abs/1901.04436v2 |
http://arxiv.org/pdf/1901.04436v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-learning-of-neural-network |
Repo | https://github.com/antonFJohansson/Bayesian-Learning-of-Neural-Network-Architectures |
Framework | none |
Region segmentation via deep learning and convex optimization
Title | Region segmentation via deep learning and convex optimization |
Authors | Matthias Sonntag, Veniamin I. Morgenshtern |
Abstract | In this paper, we propose a method to segment regions in three-dimensional point clouds. We assume that (i) the shape and the number of regions in the point cloud are not known and (ii) the point cloud may be noisy. The method consists of two steps. In the first step we use a deep neural network to predict the probability that a pair of small patches from the point cloud belongs to the same region. In the second step, we use a convex-optimization based method to improve the predictions of the network by enforcing consistency constraints. We evaluate the accuracy of our method on a custom dataset of convex polyhedra, where the regions correspond to the faces of the polyhedra. The method can be seen as a robust and flexible alternative to the famous region growing segmentation algorithm. All reported results are reproducible and come with easy to use code that could serve as a baseline for future research. |
Tasks | |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12870v1 |
https://arxiv.org/pdf/1911.12870v1.pdf | |
PWC | https://paperswithcode.com/paper/region-segmentation-via-deep-learning-and |
Repo | https://github.com/vmorgenshtern/deepsegmentation |
Framework | pytorch |
LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks
Title | LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks |
Authors | Swalpa Kumar Roy, Suvojit Manna, Shiv Ram Dubey, Bidyut B. Chaudhuri |
Abstract | The activation function in neural network is one of the important aspects which facilitates the deep training by introducing the non-linearity into the learning process. However, because of zero-hard rectification, some the of existing activations function such as ReLU and Swish miss to utilize the negative input values and may suffer from the dying gradient problem. Thus, it is important to look for a better activation function which is free from such problems. As a remedy, this paper proposes a new non-parametric function, called Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs). The proposed LiSHT activation function is an attempt to scale the non-linear Hyperbolic Tangent (Tanh) function by a linear function and tackle the dying gradient problem. The training and classification experiments are performed over benchmark Car Evaluation, Iris, MNIST, CIFAR10, CIFAR100 and twitter140 datasets to show that the proposed activation achieves faster convergence and higher performance. A very promising performance improvement is observed on three different type of neural networks including Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent neural network like Long-short term memory (LSTM). The advantages of proposed activation function are also visualized in terms of the feature activation maps, weight distribution and loss landscape. |
Tasks | |
Published | 2019-01-01 |
URL | http://arxiv.org/abs/1901.05894v1 |
http://arxiv.org/pdf/1901.05894v1.pdf | |
PWC | https://paperswithcode.com/paper/lisht-non-parametric-linearly-scaled |
Repo | https://github.com/lessw2020/LightRelu |
Framework | pytorch |
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
Title | M3D-RPN: Monocular 3D Region Proposal Network for Object Detection |
Authors | Garrick Brazil, Xiaoming Liu |
Abstract | Understanding the world in 3D is a critical component of urban autonomous driving. Generally, the combination of expensive LiDAR sensors and stereo RGB imaging has been paramount for successful 3D object detection algorithms, whereas monocular image-only methods experience drastically reduced performance. We propose to reduce the gap by reformulating the monocular 3D detection problem as a standalone 3D region proposal network. We leverage the geometric relationship of 2D and 3D perspectives, allowing 3D boxes to utilize well-known and powerful convolutional features generated in the image-space. To help address the strenuous 3D parameter estimations, we further design depth-aware convolutional layers which enable location specific feature development and in consequence improved 3D scene understanding. Compared to prior work in monocular 3D detection, our method consists of only the proposed 3D region proposal network rather than relying on external networks, data, or multiple stages. M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird’s Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model. |
Tasks | 3D Object Detection, Autonomous Driving, Object Detection, Scene Understanding |
Published | 2019-07-13 |
URL | https://arxiv.org/abs/1907.06038v2 |
https://arxiv.org/pdf/1907.06038v2.pdf | |
PWC | https://paperswithcode.com/paper/m3d-rpn-monocular-3d-region-proposal-network |
Repo | https://github.com/garrickbrazil/M3D-RPN |
Framework | pytorch |
Adaptive NMS: Refining Pedestrian Detection in a Crowd
Title | Adaptive NMS: Refining Pedestrian Detection in a Crowd |
Authors | Songtao Liu, Di Huang, Yunhong Wang |
Abstract | Pedestrian detection in a crowd is a very challenging issue. This paper addresses this problem by a novel Non-Maximum Suppression (NMS) algorithm to better refine the bounding boxes given by detectors. The contributions are threefold: (1) we propose adaptive-NMS, which applies a dynamic suppression threshold to an instance, according to the target density; (2) we design an efficient subnetwork to learn density scores, which can be conveniently embedded into both the single-stage and two-stage detectors; and (3) we achieve state of the art results on the CityPersons and CrowdHuman benchmarks. |
Tasks | Pedestrian Detection |
Published | 2019-04-07 |
URL | http://arxiv.org/abs/1904.03629v1 |
http://arxiv.org/pdf/1904.03629v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-nms-refining-pedestrian-detection-in |
Repo | https://github.com/abhinavsagar/Pedestrian-detection |
Framework | none |
LINSPECTOR: Multilingual Probing Tasks for Word Representations
Title | LINSPECTOR: Multilingual Probing Tasks for Word Representations |
Authors | Gözde Gül Şahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych |
Abstract | Despite an ever growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the community to get an estimate of the downstream task performance, as well as to design more informed neural architectures, while avoiding extensive experimentation which requires substantial computational resources not all researchers have access to. A recent development in NLP is to use simple classification tasks, also called probing tasks, that test for a single linguistic feature such as part-of-speech. Existing studies mostly focus on exploring the linguistic information encoded by the continuous representations of English text. However, from a typological perspective the morphologically poor English is rather an outlier: the information encoded by the word order and function words in English is often stored on a morphological level in other languages. To address this, we introduce 15 type-level probing tasks such as case marking, possession, word length, morphological tag count and pseudoword identification for 24 languages. We present a reusable methodology for creation and evaluation of such tests in a multilingual setting. We then present experiments on several diverse multilingual word embedding models, in which we relate the probing task performance for a diverse set of languages to a range of five classic NLP tasks: POS-tagging, dependency parsing, semantic role labeling, named entity recognition and natural language inference. We find that a number of probing tests have significantly high positive correlation to the downstream tasks, especially for morphologically rich languages. We show that our tests can be used to explore word embeddings or black-box neural models for linguistic cues in a multilingual setting. |
Tasks | Dependency Parsing, Named Entity Recognition, Natural Language Inference, Semantic Role Labeling, Word Embeddings |
Published | 2019-03-22 |
URL | https://arxiv.org/abs/1903.09442v2 |
https://arxiv.org/pdf/1903.09442v2.pdf | |
PWC | https://paperswithcode.com/paper/linspector-multilingual-probing-tasks-for |
Repo | https://github.com/maexe/linspector-web |
Framework | none |
Knowledge Distillation from Internal Representations
Title | Knowledge Distillation from Internal Representations |
Authors | Gustavo Aguilar, Yuan Ling, Yu Zhang, Benjamin Yao, Xing Fan, Chenlei Guo |
Abstract | Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as soft-labels to optimize the student. However, when the teacher is considerably large, there is no guarantee that the internal knowledge of the teacher will be transferred into the student; even if the student closely matches the soft-labels, its internal representations may be considerably different. This internal mismatch can undermine the generalization capabilities originally intended to be transferred from the teacher to the student. In this paper, we propose to distill the internal representations of a large model such as BERT into a simplified version of it. We formulate two ways to distill such representations and various algorithms to conduct the distillation. We experiment with datasets from the GLUE benchmark and consistently show that adding knowledge distillation from internal representations is a more powerful method than only using soft-label distillation. |
Tasks | |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03723v2 |
https://arxiv.org/pdf/1910.03723v2.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-distillation-from-internal |
Repo | https://github.com/ElenaKutanov/BertelsmannChallengeAI- |
Framework | tf |