February 1, 2020

2907 words 14 mins read

Paper Group AWR 315

Paper Group AWR 315

Do ImageNet Classifiers Generalize to ImageNet?. Learning Guided Convolutional Network for Depth Completion. Unsupervised Depth Completion from Visual Inertial Odometry. A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization. Progressive Feature Polishing Network for Salient Object Detection. CosRec: 2D Convolutional Neural …

Do ImageNet Classifiers Generalize to ImageNet?

Title Do ImageNet Classifiers Generalize to ImageNet?
Authors Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar
Abstract We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models’ inability to generalize to slightly “harder” images than those found in the original test sets.
Tasks
Published 2019-02-13
URL https://arxiv.org/abs/1902.10811v2
PDF https://arxiv.org/pdf/1902.10811v2.pdf
PWC https://paperswithcode.com/paper/do-imagenet-classifiers-generalize-to
Repo https://github.com/modestyachts/ImageNetV2
Framework pytorch

Learning Guided Convolutional Network for Depth Completion

Title Learning Guided Convolutional Network for Depth Completion
Authors Jie Tang, Fei-Peng Tian, Wei Feng, Jian Li, Ping Tan
Abstract Dense depth perception is critical for autonomous driving and other robotics applications. However, modern LiDAR sensors only provide sparse depth measurement. It is thus necessary to complete the sparse LiDAR data, where a synchronized guidance RGB image is often used to facilitate this completion. Many neural networks have been designed for this task. However, they often na"{\i}vely fuse the LiDAR data and RGB image information by performing feature concatenation or element-wise addition. Inspired by the guided image filtering, we design a novel guided network to predict kernel weights from the guidance image. These predicted kernels are then applied to extract the depth image features. In this way, our network generates content-dependent and spatially-variant kernels for multi-modal feature fusion. Dynamically generated spatially-variant kernels could lead to prohibitive GPU memory consumption and computation overhead. We further design a convolution factorization to reduce computation and memory consumption. The GPU memory reduction makes it possible for feature fusion to work in multi-stage scheme. We conduct comprehensive experiments to verify our method on real-world outdoor, indoor and synthetic datasets. Our method produces strong results. It outperforms state-of-the-art methods on the NYUv2 dataset and ranks 1st on the KITTI depth completion benchmark at the time of submission. It also presents strong generalization capability under different 3D point densities, various lighting and weather conditions as well as cross-dataset evaluations. The code will be released for reproduction.
Tasks Autonomous Driving, Depth Completion
Published 2019-08-03
URL https://arxiv.org/abs/1908.01238v1
PDF https://arxiv.org/pdf/1908.01238v1.pdf
PWC https://paperswithcode.com/paper/learning-guided-convolutional-network-for
Repo https://github.com/kakaxi314/GuideNet
Framework pytorch

Unsupervised Depth Completion from Visual Inertial Odometry

Title Unsupervised Depth Completion from Visual Inertial Odometry
Authors Alex Wong, Xiaohan Fei, Stephanie Tsuei, Stefano Soatto
Abstract We describe a method to infer dense depth from camera motion and sparse depth as estimated using a visual-inertial odometry system. Unlike other scenarios using point clouds from lidar or structured light sensors, we have few hundreds to few thousand points, insufficient to inform the topology of the scene. Our method first constructs a piecewise planar scaffolding of the scene, and then uses it to infer dense depth using the image along with the sparse points. We use a predictive cross-modal criterion, akin to `self-supervision,’ measuring photometric consistency across time, forward-backward pose consistency, and geometric compatibility with the sparse point cloud. We also launch the first visual-inertial + depth dataset, which we hope will foster additional exploration into combining the complementary strengths of visual and inertial sensors. To compare our method to prior work, we adopt the unsupervised KITTI depth completion benchmark, and show state-of-the-art performance on it. |
Tasks Depth Completion
Published 2019-05-15
URL https://arxiv.org/abs/1905.08616v3
PDF https://arxiv.org/pdf/1905.08616v3.pdf
PWC https://paperswithcode.com/paper/190508616
Repo https://github.com/alexklwong/unsupervised-depth-completion-visual-inertial-odometry
Framework tf

A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization

Title A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization
Authors Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Minjoon Seo
Abstract We present SQLova, the first Natural-language-to-SQL (NL2SQL) model to achieve human performance in WikiSQL dataset. We revisit and discuss diverse popular methods in NL2SQL literature, take a full advantage of BERT {Devlin et al., 2018) through an effective table contextualization method, and coherently combine them, outperforming the previous state of the art by 8.2% and 2.5% in logical form and execution accuracy, respectively. We particularly note that BERT with a seq2seq decoder leads to a poor performance in the task, indicating the importance of a careful design when using such large pretrained models. We also provide a comprehensive analysis on the dataset and our model, which can be helpful for designing future NL2SQL datsets and models. We especially show that our model’s performance is near the upper bound in WikiSQL, where we observe that a large portion of the evaluation errors are due to wrong annotations, and our model is already exceeding human performance by 1.3% in execution accuracy.
Tasks Semantic Parsing
Published 2019-02-04
URL https://arxiv.org/abs/1902.01069v2
PDF https://arxiv.org/pdf/1902.01069v2.pdf
PWC https://paperswithcode.com/paper/a-comprehensive-exploration-on-wikisql-with
Repo https://github.com/naver/sqlova
Framework pytorch

Progressive Feature Polishing Network for Salient Object Detection

Title Progressive Feature Polishing Network for Salient Object Detection
Authors Bo Wang, Quan Chen, Min Zhou, Zhiqiang Zhang, Xiaogang Jin, Kun Gai
Abstract Feature matters for salient object detection. Existing methods mainly focus on designing a sophisticated structure to incorporate multi-level features and filter out cluttered features. We present Progressive Feature Polishing Network (PFPN), a simple yet effective framework to progressively polish the multi-level features to be more accurate and representative. By employing multiple Feature Polishing Modules (FPMs) in a recurrent manner, our approach is able to detect salient objects with fine details without any post-processing. A FPM parallelly updates the features of each level by directly incorporating all higher level context information. Moreover, it can keep the dimensions and hierarchical structures of the feature maps, which makes it flexible to be integrated with any CNN-based models. Empirical experiments show that our results are monotonically getting better with increasing number of FPMs. Without bells and whistles, PFPN outperforms the state-of-the-art methods significantly on five benchmark datasets under various evaluation metrics.
Tasks Object Detection, Salient Object Detection
Published 2019-11-14
URL https://arxiv.org/abs/1911.05942v1
PDF https://arxiv.org/pdf/1911.05942v1.pdf
PWC https://paperswithcode.com/paper/progressive-feature-polishing-network-for
Repo https://github.com/chenquan-cq/PFPN
Framework pytorch

CosRec: 2D Convolutional Neural Networks for Sequential Recommendation

Title CosRec: 2D Convolutional Neural Networks for Sequential Recommendation
Authors An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, Julian McAuley
Abstract Sequential patterns play an important role in building modern recommender systems. To this end, several recommender systems have been built on top of Markov Chains and Recurrent Models (among others). Although these sequential models have proven successful at a range of tasks, they still struggle to uncover complex relationships nested in user purchase histories. In this paper, we argue that modeling pairwise relationships directly leads to an efficient representation of sequential features and captures complex item correlations. Specifically, we propose a 2D convolutional network for sequential recommendation (CosRec). It encodes a sequence of items into a three-way tensor; learns local features using 2D convolutional filters; and aggregates high-order interactions in a feedforward manner. Quantitative results on two public datasets show that our method outperforms both conventional methods and recent sequence-based approaches, achieving state-of-the-art performance on various evaluation metrics.
Tasks Recommendation Systems
Published 2019-08-27
URL https://arxiv.org/abs/1908.09972v1
PDF https://arxiv.org/pdf/1908.09972v1.pdf
PWC https://paperswithcode.com/paper/cosrec-2d-convolutional-neural-networks-for
Repo https://github.com/zzxslp/CosRec
Framework pytorch

Class Feature Pyramids for Video Explanation

Title Class Feature Pyramids for Video Explanation
Authors Alexandros Stergiou, Georgios Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, Ronald Poppe, Remco Veltkamp
Abstract Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of the trained models are more difficult to interpret. We focus on creating human-understandable visual explanations that represent the hierarchical parts of spatio-temporal networks. We introduce Class Feature Pyramids, a method that traverses the entire network structure and incrementally discovers kernels at different network depths that are informative for a specific class. Our method does not depend on the network’s architecture or the type of 3D convolutions, supporting grouped and depth-wise convolutions, convolutions in fibers, and convolutions in branches. We demonstrate the method on six state-of-the-art 3D convolution neural networks (CNNs) on three action recognition (Kinetics-400, UCF-101, and HMDB-51) and two egocentric action recognition datasets (EPIC-Kitchens and EGTEA Gaze+).
Tasks Temporal Action Localization
Published 2019-09-18
URL https://arxiv.org/abs/1909.08611v1
PDF https://arxiv.org/pdf/1909.08611v1.pdf
PWC https://paperswithcode.com/paper/class-feature-pyramids-for-video-explanation
Repo https://github.com/alexandrosstergiou/Class_Feature_Visualization_Pyramid
Framework pytorch

Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching

Title Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching
Authors Youmin Zhang, Yimin Chen, Xiao Bai, Suihanjin Yu, Kun Yu, Zhiwei Li, Kuiyuan Yang
Abstract State-of-the-art deep learning based stereo matching approaches treat disparity estimation as a regression problem, where loss function is directly defined on true disparities and their estimated ones. However, disparity is just a byproduct of a matching process modeled by cost volume, while indirectly learning cost volume driven by disparity regression is prone to overfitting since the cost volume is under constrained. In this paper, we propose to directly add constraints to the cost volume by filtering cost volume with unimodal distribution peaked at true disparities. In addition, variances of the unimodal distributions for each pixel are estimated to explicitly model matching uncertainty under different contexts. The proposed architecture achieves state-of-the-art performance on Scene Flow and two KITTI stereo benchmarks. In particular, our method ranked the $1^{st}$ place of KITTI 2012 evaluation and the $4^{th}$ place of KITTI 2015 evaluation (recorded on 2019.8.20). The codes of AcfNet are available at: https://github.com/DeepMotionAIResearch/DenseMatchingBenchmark.
Tasks Disparity Estimation, Stereo Matching, Stereo Matching Hand
Published 2019-09-09
URL https://arxiv.org/abs/1909.03751v2
PDF https://arxiv.org/pdf/1909.03751v2.pdf
PWC https://paperswithcode.com/paper/adaptive-unimodal-cost-volume-filtering-for
Repo https://github.com/youmi-zym/AcfNet
Framework pytorch

FC$^2$N: Fully Channel-Concatenated Network for Single Image Super-Resolution

Title FC$^2$N: Fully Channel-Concatenated Network for Single Image Super-Resolution
Authors Xiaole Zhao, Ying Liao, Tian He, Yulun Zhang, Yadong Wu, Tao Zhang
Abstract Most current image super-resolution (SR) methods based on convolutional neural networks (CNNs) use residual learning in network structural design, which favors to effective back propagation and hence improves SR performance by increasing model scale. However, residual networks suffer from representational redundancy by introducing identity paths that impede the full exploitation of model capacity. Besides, blindly enlarging network scale can cause more problems in model training, even with residual learning. In this paper, a novel fully channel-concatenated network (FC$^2$N) is presented to make further mining of representational capacity of deep models, in which all interlayer skips are implemented by a simple and straightforward operation, weighted channel concatenation (WCC), followed by a 1$\times$1 conv layer. Based on the WCC, the model can achieve the joint attention mechanism of linear and nonlinear features in the network, and presents better performance than other advanced SR models with fewer model parameters. To our best knowledge, our FC$^2$N is the first CNN model that achieves state-of-the-art performance with less than 10M parameters after EDSR, as well as the first CNN model that does not use residual learning and reaches network depth over 400 layers. Moreover, it shows excellent performance in both largescale and lightweight implementations.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-07-07
URL https://arxiv.org/abs/1907.03221v3
PDF https://arxiv.org/pdf/1907.03221v3.pdf
PWC https://paperswithcode.com/paper/fc2n-fully-channel-concatenated-network-for
Repo https://github.com/zxlation/FC2N
Framework tf

Review Conversational Reading Comprehension

Title Review Conversational Reading Comprehension
Authors Hu Xu, Bing Liu, Lei Shu, Philip S. Yu
Abstract Inspired by conversational reading comprehension (CRC), this paper studies a novel task of leveraging reviews as a source to build an agent that can answer multi-turn questions from potential consumers of online businesses. We first build a review CRC dataset and then propose a novel task-aware pre-tuning step running between language model (e.g., BERT) pre-training and domain-specific fine-tuning. The proposed pre-tuning requires no data annotation, but can greatly enhance the performance on our end task. Experimental results show that the proposed approach is highly effective and has competitive performance as the supervised approach. The dataset is available at \url{https://github.com/howardhsu/RCRC}
Tasks Language Modelling, Machine Reading Comprehension, Reading Comprehension
Published 2019-02-03
URL https://arxiv.org/abs/1902.00821v2
PDF https://arxiv.org/pdf/1902.00821v2.pdf
PWC https://paperswithcode.com/paper/review-conversational-reading-comprehension
Repo https://github.com/howardhsu/RCRC
Framework none

A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning

Title A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning
Authors Yo Joong Choe, Jiyeon Ham, Kyubyong Park, Yeoil Yoon
Abstract Grammatical error correction can be viewed as a low-resource sequence-to-sequence task, because publicly available parallel corpora are limited. To tackle this challenge, we first generate erroneous versions of large unannotated corpora using a realistic noising function. The resulting parallel corpora are subsequently used to pre-train Transformer models. Then, by sequentially applying transfer learning, we adapt these models to the domain and style of the test set. Combined with a context-aware neural spellchecker, our system achieves competitive results in both restricted and low resource tracks in ACL 2019 BEA Shared Task. We release all of our code and materials for reproducibility.
Tasks Grammatical Error Correction, Transfer Learning
Published 2019-07-02
URL https://arxiv.org/abs/1907.01256v1
PDF https://arxiv.org/pdf/1907.01256v1.pdf
PWC https://paperswithcode.com/paper/a-neural-grammatical-error-correction-system
Repo https://github.com/kakaobrain/helo_word
Framework pytorch

Tex2Shape: Detailed Full Human Body Geometry From a Single Image

Title Tex2Shape: Detailed Full Human Body Geometry From a Single Image
Authors Thiemo Alldieck, Gerard Pons-Moll, Christian Theobalt, Marcus Magnor
Abstract We present a simple yet effective method to infer detailed full human body shape from only a single photograph. Our model can infer full-body shape including face, hair, and clothing including wrinkles at interactive frame-rates. Results feature details even on parts that are occluded in the input image. Our main idea is to turn shape regression into an aligned image-to-image translation problem. The input to our method is a partial texture map of the visible region obtained from off-the-shelf methods. From a partial texture, we estimate detailed normal and vector displacement maps, which can be applied to a low-resolution smooth body model to add detail and clothing. Despite being trained purely with synthetic data, our model generalizes well to real-world photographs. Numerous results demonstrate the versatility and robustness of our method.
Tasks Image-to-Image Translation
Published 2019-04-18
URL https://arxiv.org/abs/1904.08645v2
PDF https://arxiv.org/pdf/1904.08645v2.pdf
PWC https://paperswithcode.com/paper/tex2shape-detailed-full-human-body-geometry
Repo https://github.com/thmoa/tex2shape
Framework none

Model Compression with Adversarial Robustness: A Unified Optimization Framework

Title Model Compression with Adversarial Robustness: A Unified Optimization Framework
Authors Shupeng Gui, Haotao Wang, Chen Yu, Haichuan Yang, Zhangyang Wang, Ji Liu
Abstract Deep model compression has been extensively studied, and state-of-the-art methods can now achieve high compression ratios with minimal accuracy loss. This paper studies model compression through a different lens: could we compress models without hurting their robustness to adversarial attacks, in addition to maintaining accuracy? Previous literature suggested that the goals of robustness and compactness might sometimes contradict. We propose a novel Adversarially Trained Model Compression (ATMC) framework. ATMC constructs a unified constrained optimization formulation, where existing compression means (pruning, factorization, quantization) are all integrated into the constraints. An efficient algorithm is then developed. An extensive group of experiments are presented, demonstrating that ATMC obtains remarkably more favorable trade-off among model size, accuracy and robustness, over currently available alternatives in various settings. The codes are publicly available at: https://github.com/shupenggui/ATMC.
Tasks Model Compression, Quantization
Published 2019-02-10
URL https://arxiv.org/abs/1902.03538v3
PDF https://arxiv.org/pdf/1902.03538v3.pdf
PWC https://paperswithcode.com/paper/adversarially-trained-model-compression-when
Repo https://github.com/shupenggui/ATMC
Framework pytorch

Tensorized Embedding Layers for Efficient Model Compression

Title Tensorized Embedding Layers for Efficient Model Compression
Authors Oleksii Hrinchuk, Valentin Khrulkov, Leyla Mirvakhabova, Elena Orlova, Ivan Oseledets
Abstract The embedding layers transforming input words into real vectors are the key components of deep neural networks used in natural language processing. However, when the vocabulary is large, the corresponding weight matrices can be enormous, which precludes their deployment in a limited resource setting. We introduce a novel way of parametrizing embedding layers based on the Tensor Train (TT) decomposition, which allows compressing the model significantly at the cost of a negligible drop or even a slight gain in performance. We evaluate our method on a wide range of benchmarks in natural language processing and analyze the trade-off between performance and compression ratios for a wide range of architectures, from MLPs to LSTMs and Transformers.
Tasks Language Modelling, Machine Translation, Model Compression, Sentiment Analysis
Published 2019-01-30
URL https://arxiv.org/abs/1901.10787v2
PDF https://arxiv.org/pdf/1901.10787v2.pdf
PWC https://paperswithcode.com/paper/tensorized-embedding-layers-for-efficient
Repo https://github.com/KhrulkovV/tt-pytorch
Framework pytorch

Multi-source Domain Adaptation for Semantic Segmentation

Title Multi-source Domain Adaptation for Semantic Segmentation
Authors Sicheng Zhao, Bo Li, Xiangyu Yue, Yang Gu, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer
Abstract Simulation-to-real domain adaptation for semantic segmentation has been actively studied for various applications such as autonomous driving. Existing methods mainly focus on a single-source setting, which cannot easily handle a more practical scenario of multiple sources with different distributions. In this paper, we propose to investigate multi-source domain adaptation for semantic segmentation. Specifically, we design a novel framework, termed Multi-source Adversarial Domain Aggregation Network (MADAN), which can be trained in an end-to-end manner. First, we generate an adapted domain for each source with dynamic semantic consistency while aligning at the pixel-level cycle-consistently towards the target. Second, we propose sub-domain aggregation discriminator and cross-domain cycle discriminator to make different adapted domains more closely aggregated. Finally, feature-level alignment is performed between the aggregated domain and target domain while training the segmentation network. Extensive experiments from synthetic GTA and SYNTHIA to real Cityscapes and BDDS datasets demonstrate that the proposed MADAN model outperforms state-of-the-art approaches. Our source code is released at: https://github.com/Luodian/MADAN.
Tasks Autonomous Driving, Domain Adaptation, Semantic Segmentation
Published 2019-10-27
URL https://arxiv.org/abs/1910.12181v1
PDF https://arxiv.org/pdf/1910.12181v1.pdf
PWC https://paperswithcode.com/paper/multi-source-domain-adaptation-for-semantic
Repo https://github.com/Luodian/MADAN
Framework pytorch
comments powered by Disqus