February 1, 2020

3341 words 16 mins read

Paper Group AWR 236

Paper Group AWR 236

Deep Network Classification by Scattering and Homotopy Dictionary Learning. Multi-source Distilling Domain Adaptation. Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension. Stokes Inversion based on Convolutional Neural Networks. When Low Resource NLP Meets Unsupervised Language Model: Meta-pretraining Then …

Deep Network Classification by Scattering and Homotopy Dictionary Learning

Title Deep Network Classification by Scattering and Homotopy Dictionary Learning
Authors John Zarka, Louis Thiry, Tomás Angles, Stéphane Mallat
Abstract We introduce a sparse scattering deep convolutional neural network, which provides a simple model to analyze properties of deep representation learning for classification. Learning a single dictionary matrix with a classifier yields a higher classification accuracy than AlexNet over the ImageNet 2012 dataset. The network first applies a scattering transform that linearizes variabilities due to geometric transformations such as translations and small deformations. A sparse $\ell^1$ dictionary coding reduces intra-class variability while preserving class separation through projections over unions of linear spaces. It is implemented in a deep convolutional network with a homotopy algorithm having an exponential convergence. A convergence proof is given in a general framework that includes ALISTA. Classification results are analyzed on ImageNet.
Tasks Dictionary Learning, Representation Learning
Published 2019-10-08
URL https://arxiv.org/abs/1910.03561v3
PDF https://arxiv.org/pdf/1910.03561v3.pdf
PWC https://paperswithcode.com/paper/deep-network-classification-by-scattering-and
Repo https://github.com/j-zarka/SparseScatNet
Framework pytorch

Multi-source Distilling Domain Adaptation

Title Multi-source Distilling Domain Adaptation
Authors Sicheng Zhao, Guangzhi Wang, Shanghang Zhang, Yang Gu, Yaxian Li, Zhichao Song, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer
Abstract Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA). Conventional DA methods usually assume that the labeled data is sampled from a single source distribution. However, in practice, labeled data may be collected from multiple sources, while naive application of the single-source DA algorithms may lead to suboptimal solutions. In this paper, we propose a novel multi-source distilling domain adaptation (MDDA) network, which not only considers the different distances among multiple sources and the target, but also investigates the different similarities of the source samples to the target ones. Specifically, the proposed MDDA includes four stages: (1) pre-train the source classifiers separately using the training data from each source; (2) adversarially map the target into the feature space of each source respectively by minimizing the empirical Wasserstein distance between source and target; (3) select the source training samples that are closer to the target to fine-tune the source classifiers; and (4) classify each encoded target feature by corresponding source classifier, and aggregate different predictions using respective domain weight, which corresponds to the discrepancy between each source and target. Extensive experiments are conducted on public DA benchmarks, and the results demonstrate that the proposed MDDA significantly outperforms the state-of-the-art approaches. Our source code is released at: https://github.com/daoyuan98/MDDA.
Tasks Domain Adaptation
Published 2019-11-22
URL https://arxiv.org/abs/1911.11554v2
PDF https://arxiv.org/pdf/1911.11554v2.pdf
PWC https://paperswithcode.com/paper/multi-source-distilling-domain-adaptation
Repo https://github.com/daoyuan98/MDDA
Framework tf

Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension

Title Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension
Authors Y. Xu, X. Zhong, A. J. J. Yepes, J. H. Lau
Abstract The creation of large-scale open domain reading comprehension data sets in recent years has enabled the development of end-to-end neural comprehension models with promising results. To use these models for domains with limited training data, one of the most effective approach is to first pretrain them on large out-of-domain source data and then fine-tune them with the limited target data. The caveat of this is that after fine-tuning the comprehension models tend to perform poorly in the source domain, a phenomenon known as catastrophic forgetting. In this paper, we explore methods that overcome catastrophic forgetting during fine-tuning without assuming access to data from the source domain. We introduce new auxiliary penalty terms and observe the best performance when a combination of auxiliary penalty terms is used to regularise the fine-tuning process for adapting comprehension models. To test our methods, we develop and release 6 narrow domain data sets that could potentially be used as reading comprehension benchmarks.
Tasks Domain Adaptation, Reading Comprehension
Published 2019-11-01
URL https://arxiv.org/abs/1911.00202v1
PDF https://arxiv.org/pdf/1911.00202v1.pdf
PWC https://paperswithcode.com/paper/forget-me-not-reducing-catastrophic
Repo https://github.com/ibm-aur-nlp/domain-specific-QA
Framework none

Stokes Inversion based on Convolutional Neural Networks

Title Stokes Inversion based on Convolutional Neural Networks
Authors A. Asensio Ramos, C. Diaz Baso
Abstract Spectropolarimetric inversions are routinely used in the field of Solar Physics for the extraction of physical information from observations. The application to two-dimensional fields of view often requires the use of supercomputers with parallelized inversion codes. Even in this case, the computing time spent on the process is still very large. Our aim is to develop a new inversion code based on the application of convolutional neural networks that can quickly provide a three-dimensional cube of thermodynamical and magnetic properties from the interpretation of two-dimensional maps of Stokes profiles. We train two different architectures of fully convolutional neural networks. To this end, we use the synthetic Stokes profiles obtained from two snapshots of three-dimensional magneto-hydrodynamic numerical simulations of different structures of the solar atmosphere. We provide an extensive analysis of the new inversion technique, showing that it infers the thermodynamical and magnetic properties with a precision comparable to that of standard inversion techniques. However, it provides several key improvements: our method is around one million times faster, it returns a three-dimensional view of the physical properties of the region of interest in geometrical height, it provides quantities that cannot be obtained otherwise (pressure and Wilson depression) and the inferred properties are decontaminated from the blurring effect of instrumental point spread functions for free. The code is provided for free on a specific repository, with options for training and evaluation.
Tasks
Published 2019-04-07
URL https://arxiv.org/abs/1904.03714v2
PDF https://arxiv.org/pdf/1904.03714v2.pdf
PWC https://paperswithcode.com/paper/stokes-inversion-based-on-convolutional
Repo https://github.com/aasensio/sicon
Framework pytorch

When Low Resource NLP Meets Unsupervised Language Model: Meta-pretraining Then Meta-learning for Few-shot Text Classification

Title When Low Resource NLP Meets Unsupervised Language Model: Meta-pretraining Then Meta-learning for Few-shot Text Classification
Authors Shumin Deng, Ningyu Zhang, Zhanlin Sun, Jiaoyan Chen, Huajun Chen
Abstract Text classification tends to be difficult when data are deficient or when it is required to adapt to unseen classes. In such challenging scenarios, recent studies have often used meta-learning to simulate the few-shot task, thus negating implicit common linguistic features across tasks. This paper addresses such problems using meta-learning and unsupervised language models. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. We show that our approach is not only simple but also produces a state-of-the-art performance on a well-studied sentiment classification dataset. It can thus be further suggested that pretraining could be a promising solution for few-shot learning of many other NLP tasks. The code and the dataset to replicate the experiments are made available at https://github.com/zxlzr/FewShotNLP.
Tasks Few-Shot Learning, Language Modelling, Meta-Learning, Sentiment Analysis, Text Classification
Published 2019-08-22
URL https://arxiv.org/abs/1908.08788v2
PDF https://arxiv.org/pdf/1908.08788v2.pdf
PWC https://paperswithcode.com/paper/improving-few-shot-text-classification-via
Repo https://github.com/zxlzr/FewShotNLP
Framework pytorch

EfficientDet: Scalable and Efficient Object Detection

Title EfficientDet: Scalable and Efficient Object Detection
Authors Mingxing Tan, Ruoming Pang, Quoc V. Le
Abstract Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and EfficientNet backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single-model and single-scale, our EfficientDet-D7 achieves state-of-the-art 52.2 AP on COCO test-dev with 52M parameters and 325B FLOPs, being 4x - 9x smaller and using 13x - 42x fewer FLOPs than previous detectors. Code is available at https://github.com/google/automl/tree/master/efficientdet.
Tasks AutoML, Object Detection
Published 2019-11-20
URL https://arxiv.org/abs/1911.09070v3
PDF https://arxiv.org/pdf/1911.09070v3.pdf
PWC https://paperswithcode.com/paper/efficientdet-scalable-and-efficient-object
Repo https://github.com/google/automl/tree/master/efficientdet
Framework tf

Submodular Maximization Beyond Non-negativity: Guarantees, Fast Algorithms, and Applications

Title Submodular Maximization Beyond Non-negativity: Guarantees, Fast Algorithms, and Applications
Authors Christopher Harshaw, Moran Feldman, Justin Ward, Amin Karbasi
Abstract It is generally believed that submodular functions – and the more general class of $\gamma$-weakly submodular functions – may only be optimized under the non-negativity assumption $f(S) \geq 0$. In this paper, we show that once the function is expressed as the difference $f = g - c$, where $g$ is monotone, non-negative, and $\gamma$-weakly submodular and $c$ is non-negative modular, then strong approximation guarantees may be obtained. We present an algorithm for maximizing $g - c$ under a $k$-cardinality constraint which produces a random feasible set $S$ such that $\mathbb{E} \left[ g(S) - c(S) \right] \geq (1 - e^{-\gamma} - \epsilon) g(OPT) - c(OPT)$, whose running time is $O (\frac{n}{\epsilon} \log^2 \frac{1}{\epsilon})$, i.e., independent of $k$. We extend these results to the unconstrained setting by describing an algorithm with the same approximation guarantees and faster $O(\frac{n}{\epsilon} \log\frac{1}{\epsilon})$ runtime. The main techniques underlying our algorithms are two-fold: the use of a surrogate objective which varies the relative importance between $g$ and $c$ throughout the algorithm, and a geometric sweep over possible $\gamma$ values. Our algorithmic guarantees are complemented by a hardness result showing that no polynomial-time algorithm which accesses $g$ through a value oracle can do better. We empirically demonstrate the success of our algorithms by applying them to experimental design on the Boston Housing dataset and directed vertex cover on the Email EU dataset.
Tasks
Published 2019-04-19
URL http://arxiv.org/abs/1904.09354v1
PDF http://arxiv.org/pdf/1904.09354v1.pdf
PWC https://paperswithcode.com/paper/190409354
Repo https://github.com/crharshaw/submodular-minus-linear
Framework none

Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Title Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Authors Hao Tang, Dan Xu, Wei Wang, Yan Yan, Nicu Sebe
Abstract State-of-the-art methods for image-to-image translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired image data. However, these methods require the training of one specific model for every pair of image domains, which limits the scalability in dealing with more than two image domains. In addition, the training stage of these methods has the common problem of model collapse that degrades the quality of the generated images. To tackle these issues, we propose a Dual Generator Generative Adversarial Network (G$^2$GAN), which is a robust and scalable approach allowing to perform unpaired image-to-image translation for multiple domains using only dual generators within a single model. Moreover, we explore different optimization losses for better training of G$^2$GAN, and thus make unpaired image-to-image translation with higher consistency and better stability. Extensive experiments on six publicly available datasets with different scenarios, i.e., architectural buildings, seasons, landscape and human faces, demonstrate that the proposed G$^2$GAN achieves superior model capacity and better generation performance comparing with existing image-to-image translation GAN models.
Tasks Image-to-Image Translation
Published 2019-01-14
URL http://arxiv.org/abs/1901.04604v1
PDF http://arxiv.org/pdf/1901.04604v1.pdf
PWC https://paperswithcode.com/paper/dual-generator-generative-adversarial
Repo https://github.com/Ha0Tang/AsymmetricGAN
Framework pytorch

Two-stage Optimization for Machine Learning Workflow

Title Two-stage Optimization for Machine Learning Workflow
Authors Alexandre Quemy
Abstract Machines learning techniques plays a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as autoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem.
Tasks AutoML, Meta-Learning
Published 2019-07-01
URL https://arxiv.org/abs/1907.00678v1
PDF https://arxiv.org/pdf/1907.00678v1.pdf
PWC https://paperswithcode.com/paper/two-stage-optimization-for-machine-learning
Repo https://github.com/aquemy/DPSO_experiments
Framework none

Saccader: Improving Accuracy of Hard Attention Models for Vision

Title Saccader: Improving Accuracy of Hard Attention Models for Vision
Authors Gamaleldin F. Elsayed, Simon Kornblith, Quoc V. Le
Abstract Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, their decisions are difficult to interpret. One approach that offers some level of interpretability by design is \textit{hard attention}, which uses only relevant portions of the image. However, training hard attention models with only class label supervision is challenging, and hard attention has proved difficult to scale to complex datasets. Here, we propose a novel hard attention model, which we term Saccader. Key to Saccader is a pretraining step that requires only class labels and provides initial attention locations for policy gradient optimization. Our best models narrow the gap to common ImageNet baselines, achieving $75%$ top-1 and $91%$ top-5 while attending to less than one-third of the image.
Tasks Image Classification
Published 2019-08-20
URL https://arxiv.org/abs/1908.07644v3
PDF https://arxiv.org/pdf/1908.07644v3.pdf
PWC https://paperswithcode.com/paper/190807644
Repo https://github.com/parsatorb/PyTorch-Saccader
Framework pytorch

Optimizing CNN-based Hyperspectral ImageClassification on FPGAs

Title Optimizing CNN-based Hyperspectral ImageClassification on FPGAs
Authors Shuanglong Liu, Ringo S. W. Chu, Xiwei Wang, Wayne Luk
Abstract Hyperspectral image (HSI) classification has been widely adopted in applications involving remote sensing imagery analysis which require high classification accuracy and real-time processing speed. Methods based on Convolutional neural networks (CNNs) have been proven to achieve state-of-the-art accuracy in classifying HSIs. However, CNN models are often too computationally intensive to achieve real-time response due to the high dimensional nature of HSI, compared to traditional methods such as Support Vector Machines (SVMs). Besides, previous CNN models used in HSI are not specially designed for efficient implementation on embedded devices such as FPGAs. This paper proposes a novel CNN-based algorithm for HSI classification which takes into account hardware efficiency. A customized architecture which enables the proposed algorithm to be mapped effectively onto FPGA resources is then proposed to support real-time on-board classification with low power consumption. Implementation results show that our proposed accelerator on a Xilinx Zynq 706 FPGA board achieves more than 70x faster than an Intel 8-core Xeon CPU and 3x faster than an NVIDIA GeForce 1080 GPU. Compared to previous SVM-based FPGA accelerators, we achieve comparable processing speed but provide a much higher classification accuracy.
Tasks Hyperspectral Image Classification
Published 2019-06-27
URL https://arxiv.org/abs/1906.11834v1
PDF https://arxiv.org/pdf/1906.11834v1.pdf
PWC https://paperswithcode.com/paper/optimizing-cnn-based-hyperspectral
Repo https://github.com/custom-computing-ic/CNN-Based-Hyperspectral-Image-Classification
Framework tf

WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset

Title WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset
Authors Jibril Frej, Didier Schwab, Jean-Pierre Chevallet
Abstract Over the past years, deep learning methods allowed for new state-of-the-art results in ad-hoc information retrieval. However such methods usually require large amounts of annotated data to be effective. Since most standard ad-hoc information retrieval datasets publicly available for academic research (e.g. Robust04, ClueWeb09) have at most 250 annotated queries, the recent deep learning models for information retrieval perform poorly on these datasets. These models (e.g. DUET, Conv-KNRM) are trained and evaluated on data collected from commercial search engines not publicly available for academic research which is a problem for reproducibility and the advancement of research. In this paper, we propose WIKIR: an open-source toolkit to automatically build large-scale English information retrieval datasets based on Wikipedia. WIKIR is publicly available on GitHub. We also provide wikIR78k and wikIRS78k: two large-scale publicly available datasets that both contain 78,628 queries and 3,060,191 (query, relevant documents) pairs.
Tasks Ad-Hoc Information Retrieval, Information Retrieval
Published 2019-12-04
URL https://arxiv.org/abs/1912.01901v4
PDF https://arxiv.org/pdf/1912.01901v4.pdf
PWC https://paperswithcode.com/paper/wikir-a-python-toolkit-for-building-a-large
Repo https://github.com/getalp/wikIR
Framework none

Video Object Segmentation using Space-Time Memory Networks

Title Video Object Segmentation using Space-Time Memory Networks
Authors Seoung Wug Oh, Joon-Young Lee, Ning Xu, Seon Joo Kim
Abstract We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods are unable to fully exploit this rich source of information. We resolve the issue by leveraging memory networks and learn to read relevant information from all available sources. In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory. Specifically, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion. Contrast to the previous approaches, the abundant use of the guidance information allows us to better handle the challenges such as appearance changes and occlussions. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (overall score of 79.4 on Youtube-VOS val set, J of 88.7 and 79.2 on DAVIS 2016/2017 val set respectively) while having a fast runtime (0.16 second/frame on DAVIS 2016 val set).
Tasks Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2019-04-01
URL https://arxiv.org/abs/1904.00607v2
PDF https://arxiv.org/pdf/1904.00607v2.pdf
PWC https://paperswithcode.com/paper/video-object-segmentation-using-space-time
Repo https://github.com/seoungwugoh/STM
Framework pytorch

BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames

Title BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames
Authors Brent A. Griffin, Jason J. Corso
Abstract Semi-supervised video object segmentation has made significant progress on real and challenging videos in recent years. The current paradigm for segmentation methods and benchmark datasets is to segment objects in video provided a single annotation in the first frame. However, we find that segmentation performance across the entire video varies dramatically when selecting an alternative frame for annotation. This paper address the problem of learning to suggest the single best frame across the video for user annotation—this is, in fact, never the first frame of video. We achieve this by introducing BubbleNets, a novel deep sorting network that learns to select frames using a performance-based loss function that enables the conversion of expansive amounts of training examples from already existing datasets. Using BubbleNets, we are able to achieve an 11% relative improvement in segmentation performance on the DAVIS benchmark without any changes to the underlying method of segmentation.
Tasks Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2019-03-28
URL http://arxiv.org/abs/1903.11779v1
PDF http://arxiv.org/pdf/1903.11779v1.pdf
PWC https://paperswithcode.com/paper/bubblenets-learning-to-select-the-guidance
Repo https://github.com/griffbr/BubbleNets
Framework tf

Dual Attention MobDenseNet(DAMDNet) for Robust 3D Face Alignment

Title Dual Attention MobDenseNet(DAMDNet) for Robust 3D Face Alignment
Authors Lei Jiang Xiao-Jun Wu Josef Kittler
Abstract 3D face alignment of monocular images is a crucial process in the recognition of faces with disguise.3D face reconstruction facilitated by alignment can restore the face structure which is helpful in detcting disguise interference.This paper proposes a dual attention mechanism and an efficient end-to-end 3D face alignment framework.We build a stable network model through Depthwise Separable Convolution, Densely Connected Convolutional and Lightweight Channel Attention Mechanism. In order to enhance the ability of the network model to extract the spatial features of the face region, we adopt Spatial Group-wise Feature enhancement module to improve the representation ability of the network. Different loss functions are applied jointly to constrain the 3D parameters of a 3D Morphable Model (3DMM) and its 3D vertices. We use a variety of data enhancement methods and generate large virtual pose face data sets to solve the data imbalance problem. The experiments on the challenging AFLW,AFLW2000-3D datasets show that our algorithm significantly improves the accuracy of 3D face alignment. Our experiments using the field DFW dataset show that DAMDNet exhibits excellent performance in the 3D alignment and reconstruction of challenging disguised faces.The model parameters and the complexity of the proposed method are also reduced significantly.The code is publicly available at https:// github.com/LeiJiangJNU/DAMDNet
Tasks 3D Face Reconstruction, Face Alignment, Face Reconstruction
Published 2019-08-30
URL https://arxiv.org/abs/1908.11821v1
PDF https://arxiv.org/pdf/1908.11821v1.pdf
PWC https://paperswithcode.com/paper/dual-attention-mobdensenetdamdnet-for-robust
Repo https://github.com/LeiJiangJNU/DAMDNet
Framework pytorch
comments powered by Disqus