February 1, 2020

3341 words 16 mins read

Paper Group AWR 236

Deep Network Classification by Scattering and Homotopy Dictionary Learning. Multi-source Distilling Domain Adaptation. Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension. Stokes Inversion based on Convolutional Neural Networks. When Low Resource NLP Meets Unsupervised Language Model: Meta-pretraining Then …

Deep Network Classification by Scattering and Homotopy Dictionary Learning


Title	Deep Network Classification by Scattering and Homotopy Dictionary Learning
Authors	John Zarka, Louis Thiry, Tomás Angles, Stéphane Mallat
Abstract	We introduce a sparse scattering deep convolutional neural network, which provides a simple model to analyze properties of deep representation learning for classification. Learning a single dictionary matrix with a classifier yields a higher classification accuracy than AlexNet over the ImageNet 2012 dataset. The network first applies a scattering transform that linearizes variabilities due to geometric transformations such as translations and small deformations. A sparse $\ell^1$ dictionary coding reduces intra-class variability while preserving class separation through projections over unions of linear spaces. It is implemented in a deep convolutional network with a homotopy algorithm having an exponential convergence. A convergence proof is given in a general framework that includes ALISTA. Classification results are analyzed on ImageNet.
Tasks	Dictionary Learning, Representation Learning
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03561v3
PDF	https://arxiv.org/pdf/1910.03561v3.pdf
PWC	https://paperswithcode.com/paper/deep-network-classification-by-scattering-and
Repo	https://github.com/j-zarka/SparseScatNet
Framework	pytorch

Multi-source Distilling Domain Adaptation


Title	Multi-source Distilling Domain Adaptation
Authors	Sicheng Zhao, Guangzhi Wang, Shanghang Zhang, Yang Gu, Yaxian Li, Zhichao Song, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer
Abstract	Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA). Conventional DA methods usually assume that the labeled data is sampled from a single source distribution. However, in practice, labeled data may be collected from multiple sources, while naive application of the single-source DA algorithms may lead to suboptimal solutions. In this paper, we propose a novel multi-source distilling domain adaptation (MDDA) network, which not only considers the different distances among multiple sources and the target, but also investigates the different similarities of the source samples to the target ones. Specifically, the proposed MDDA includes four stages: (1) pre-train the source classifiers separately using the training data from each source; (2) adversarially map the target into the feature space of each source respectively by minimizing the empirical Wasserstein distance between source and target; (3) select the source training samples that are closer to the target to fine-tune the source classifiers; and (4) classify each encoded target feature by corresponding source classifier, and aggregate different predictions using respective domain weight, which corresponds to the discrepancy between each source and target. Extensive experiments are conducted on public DA benchmarks, and the results demonstrate that the proposed MDDA significantly outperforms the state-of-the-art approaches. Our source code is released at: https://github.com/daoyuan98/MDDA.
Tasks	Domain Adaptation
Published	2019-11-22
URL	https://arxiv.org/abs/1911.11554v2
PDF	https://arxiv.org/pdf/1911.11554v2.pdf
PWC	https://paperswithcode.com/paper/multi-source-distilling-domain-adaptation
Repo	https://github.com/daoyuan98/MDDA
Framework	tf

Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension


Title	Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension
Authors	Y. Xu, X. Zhong, A. J. J. Yepes, J. H. Lau
Abstract	The creation of large-scale open domain reading comprehension data sets in recent years has enabled the development of end-to-end neural comprehension models with promising results. To use these models for domains with limited training data, one of the most effective approach is to first pretrain them on large out-of-domain source data and then fine-tune them with the limited target data. The caveat of this is that after fine-tuning the comprehension models tend to perform poorly in the source domain, a phenomenon known as catastrophic forgetting. In this paper, we explore methods that overcome catastrophic forgetting during fine-tuning without assuming access to data from the source domain. We introduce new auxiliary penalty terms and observe the best performance when a combination of auxiliary penalty terms is used to regularise the fine-tuning process for adapting comprehension models. To test our methods, we develop and release 6 narrow domain data sets that could potentially be used as reading comprehension benchmarks.
Tasks	Domain Adaptation, Reading Comprehension
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00202v1
PDF	https://arxiv.org/pdf/1911.00202v1.pdf
PWC	https://paperswithcode.com/paper/forget-me-not-reducing-catastrophic
Repo	https://github.com/ibm-aur-nlp/domain-specific-QA
Framework	none

Stokes Inversion based on Convolutional Neural Networks


Title	Stokes Inversion based on Convolutional Neural Networks
Authors	A. Asensio Ramos, C. Diaz Baso
Abstract	Spectropolarimetric inversions are routinely used in the field of Solar Physics for the extraction of physical information from observations. The application to two-dimensional fields of view often requires the use of supercomputers with parallelized inversion codes. Even in this case, the computing time spent on the process is still very large. Our aim is to develop a new inversion code based on the application of convolutional neural networks that can quickly provide a three-dimensional cube of thermodynamical and magnetic properties from the interpretation of two-dimensional maps of Stokes profiles. We train two different architectures of fully convolutional neural networks. To this end, we use the synthetic Stokes profiles obtained from two snapshots of three-dimensional magneto-hydrodynamic numerical simulations of different structures of the solar atmosphere. We provide an extensive analysis of the new inversion technique, showing that it infers the thermodynamical and magnetic properties with a precision comparable to that of standard inversion techniques. However, it provides several key improvements: our method is around one million times faster, it returns a three-dimensional view of the physical properties of the region of interest in geometrical height, it provides quantities that cannot be obtained otherwise (pressure and Wilson depression) and the inferred properties are decontaminated from the blurring effect of instrumental point spread functions for free. The code is provided for free on a specific repository, with options for training and evaluation.
Tasks
Published	2019-04-07
URL	https://arxiv.org/abs/1904.03714v2
PDF	https://arxiv.org/pdf/1904.03714v2.pdf
PWC	https://paperswithcode.com/paper/stokes-inversion-based-on-convolutional
Repo	https://github.com/aasensio/sicon
Framework	pytorch

When Low Resource NLP Meets Unsupervised Language Model: Meta-pretraining Then Meta-learning for Few-shot Text Classification


Title	When Low Resource NLP Meets Unsupervised Language Model: Meta-pretraining Then Meta-learning for Few-shot Text Classification
Authors	Shumin Deng, Ningyu Zhang, Zhanlin Sun, Jiaoyan Chen, Huajun Chen
Abstract	Text classification tends to be difficult when data are deficient or when it is required to adapt to unseen classes. In such challenging scenarios, recent studies have often used meta-learning to simulate the few-shot task, thus negating implicit common linguistic features across tasks. This paper addresses such problems using meta-learning and unsupervised language models. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. We show that our approach is not only simple but also produces a state-of-the-art performance on a well-studied sentiment classification dataset. It can thus be further suggested that pretraining could be a promising solution for few-shot learning of many other NLP tasks. The code and the dataset to replicate the experiments are made available at https://github.com/zxlzr/FewShotNLP.
Tasks	Few-Shot Learning, Language Modelling, Meta-Learning, Sentiment Analysis, Text Classification
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08788v2
PDF	https://arxiv.org/pdf/1908.08788v2.pdf
PWC	https://paperswithcode.com/paper/improving-few-shot-text-classification-via
Repo	https://github.com/zxlzr/FewShotNLP
Framework	pytorch

EfficientDet: Scalable and Efficient Object Detection


Title	EfficientDet: Scalable and Efficient Object Detection
Authors	Mingxing Tan, Ruoming Pang, Quoc V. Le
Abstract	Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and EfficientNet backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single-model and single-scale, our EfficientDet-D7 achieves state-of-the-art 52.2 AP on COCO test-dev with 52M parameters and 325B FLOPs, being 4x - 9x smaller and using 13x - 42x fewer FLOPs than previous detectors. Code is available at https://github.com/google/automl/tree/master/efficientdet.
Tasks	AutoML, Object Detection
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09070v3
PDF	https://arxiv.org/pdf/1911.09070v3.pdf
PWC	https://paperswithcode.com/paper/efficientdet-scalable-and-efficient-object
Repo	https://github.com/google/automl/tree/master/efficientdet
Framework	tf

Submodular Maximization Beyond Non-negativity: Guarantees, Fast Algorithms, and Applications


Title	Submodular Maximization Beyond Non-negativity: Guarantees, Fast Algorithms, and Applications
Authors	Christopher Harshaw, Moran Feldman, Justin Ward, Amin Karbasi
Abstract	It is generally believed that submodular functions – and the more general class of $\gamma$-weakly submodular functions – may only be optimized under the non-negativity assumption $f(S) \geq 0$. In this paper, we show that once the function is expressed as the difference $f = g - c$, where $g$ is monotone, non-negative, and $\gamma$-weakly submodular and $c$ is non-negative modular, then strong approximation guarantees may be obtained. We present an algorithm for maximizing $g - c$ under a $k$-cardinality constraint which produces a random feasible set $S$ such that $\mathbb{E} \left[ g(S) - c(S) \right] \geq (1 - e^{-\gamma} - \epsilon) g(OPT) - c(OPT)$, whose running time is $O (\frac{n}{\epsilon} \log^2 \frac{1}{\epsilon})$, i.e., independent of $k$. We extend these results to the unconstrained setting by describing an algorithm with the same approximation guarantees and faster $O(\frac{n}{\epsilon} \log\frac{1}{\epsilon})$ runtime. The main techniques underlying our algorithms are two-fold: the use of a surrogate objective which varies the relative importance between $g$ and $c$ throughout the algorithm, and a geometric sweep over possible $\gamma$ values. Our algorithmic guarantees are complemented by a hardness result showing that no polynomial-time algorithm which accesses $g$ through a value oracle can do better. We empirically demonstrate the success of our algorithms by applying them to experimental design on the Boston Housing dataset and directed vertex cover on the Email EU dataset.
Tasks
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09354v1
PDF	http://arxiv.org/pdf/1904.09354v1.pdf
PWC	https://paperswithcode.com/paper/190409354
Repo	https://github.com/crharshaw/submodular-minus-linear
Framework	none

Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation


Title	Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Authors	Hao Tang, Dan Xu, Wei Wang, Yan Yan, Nicu Sebe
Abstract	State-of-the-art methods for image-to-image translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired image data. However, these methods require the training of one specific model for every pair of image domains, which limits the scalability in dealing with more than two image domains. In addition, the training stage of these methods has the common problem of model collapse that degrades the quality of the generated images. To tackle these issues, we propose a Dual Generator Generative Adversarial Network (G$^2$GAN), which is a robust and scalable approach allowing to perform unpaired image-to-image translation for multiple domains using only dual generators within a single model. Moreover, we explore different optimization losses for better training of G$^2$GAN, and thus make unpaired image-to-image translation with higher consistency and better stability. Extensive experiments on six publicly available datasets with different scenarios, i.e., architectural buildings, seasons, landscape and human faces, demonstrate that the proposed G$^2$GAN achieves superior model capacity and better generation performance comparing with existing image-to-image translation GAN models.
Tasks	Image-to-Image Translation
Published	2019-01-14
URL	http://arxiv.org/abs/1901.04604v1
PDF	http://arxiv.org/pdf/1901.04604v1.pdf
PWC	https://paperswithcode.com/paper/dual-generator-generative-adversarial
Repo	https://github.com/Ha0Tang/AsymmetricGAN
Framework	pytorch

Two-stage Optimization for Machine Learning Workflow


Title	Two-stage Optimization for Machine Learning Workflow
Authors	Alexandre Quemy
Abstract	Machines learning techniques plays a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as autoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem.
Tasks	AutoML, Meta-Learning
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00678v1
PDF	https://arxiv.org/pdf/1907.00678v1.pdf
PWC	https://paperswithcode.com/paper/two-stage-optimization-for-machine-learning
Repo	https://github.com/aquemy/DPSO_experiments
Framework	none

Saccader: Improving Accuracy of Hard Attention Models for Vision


Title	Saccader: Improving Accuracy of Hard Attention Models for Vision
Authors	Gamaleldin F. Elsayed, Simon Kornblith, Quoc V. Le
Abstract	Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, their decisions are difficult to interpret. One approach that offers some level of interpretability by design is \textit{hard attention}, which uses only relevant portions of the image. However, training hard attention models with only class label supervision is challenging, and hard attention has proved difficult to scale to complex datasets. Here, we propose a novel hard attention model, which we term Saccader. Key to Saccader is a pretraining step that requires only class labels and provides initial attention locations for policy gradient optimization. Our best models narrow the gap to common ImageNet baselines, achieving $75%$ top-1 and $91%$ top-5 while attending to less than one-third of the image.
Tasks	Image Classification
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07644v3
PDF	https://arxiv.org/pdf/1908.07644v3.pdf
PWC	https://paperswithcode.com/paper/190807644
Repo	https://github.com/parsatorb/PyTorch-Saccader
Framework	pytorch

Optimizing CNN-based Hyperspectral ImageClassification on FPGAs


Title	Optimizing CNN-based Hyperspectral ImageClassification on FPGAs
Authors	Shuanglong Liu, Ringo S. W. Chu, Xiwei Wang, Wayne Luk
Abstract	Hyperspectral image (HSI) classification has been widely adopted in applications involving remote sensing imagery analysis which require high classification accuracy and real-time processing speed. Methods based on Convolutional neural networks (CNNs) have been proven to achieve state-of-the-art accuracy in classifying HSIs. However, CNN models are often too computationally intensive to achieve real-time response due to the high dimensional nature of HSI, compared to traditional methods such as Support Vector Machines (SVMs). Besides, previous CNN models used in HSI are not specially designed for efficient implementation on embedded devices such as FPGAs. This paper proposes a novel CNN-based algorithm for HSI classification which takes into account hardware efficiency. A customized architecture which enables the proposed algorithm to be mapped effectively onto FPGA resources is then proposed to support real-time on-board classification with low power consumption. Implementation results show that our proposed accelerator on a Xilinx Zynq 706 FPGA board achieves more than 70x faster than an Intel 8-core Xeon CPU and 3x faster than an NVIDIA GeForce 1080 GPU. Compared to previous SVM-based FPGA accelerators, we achieve comparable processing speed but provide a much higher classification accuracy.
Tasks	Hyperspectral Image Classification
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11834v1
PDF	https://arxiv.org/pdf/1906.11834v1.pdf
PWC	https://paperswithcode.com/paper/optimizing-cnn-based-hyperspectral
Repo	https://github.com/custom-computing-ic/CNN-Based-Hyperspectral-Image-Classification
Framework	tf

WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset


Title	WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset
Authors	Jibril Frej, Didier Schwab, Jean-Pierre Chevallet
Abstract	Over the past years, deep learning methods allowed for new state-of-the-art results in ad-hoc information retrieval. However such methods usually require large amounts of annotated data to be effective. Since most standard ad-hoc information retrieval datasets publicly available for academic research (e.g. Robust04, ClueWeb09) have at most 250 annotated queries, the recent deep learning models for information retrieval perform poorly on these datasets. These models (e.g. DUET, Conv-KNRM) are trained and evaluated on data collected from commercial search engines not publicly available for academic research which is a problem for reproducibility and the advancement of research. In this paper, we propose WIKIR: an open-source toolkit to automatically build large-scale English information retrieval datasets based on Wikipedia. WIKIR is publicly available on GitHub. We also provide wikIR78k and wikIRS78k: two large-scale publicly available datasets that both contain 78,628 queries and 3,060,191 (query, relevant documents) pairs.
Tasks	Ad-Hoc Information Retrieval, Information Retrieval
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01901v4
PDF	https://arxiv.org/pdf/1912.01901v4.pdf
PWC	https://paperswithcode.com/paper/wikir-a-python-toolkit-for-building-a-large
Repo	https://github.com/getalp/wikIR
Framework	none

Video Object Segmentation using Space-Time Memory Networks


Title	Video Object Segmentation using Space-Time Memory Networks
Authors	Seoung Wug Oh, Joon-Young Lee, Ning Xu, Seon Joo Kim
Abstract	We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods are unable to fully exploit this rich source of information. We resolve the issue by leveraging memory networks and learn to read relevant information from all available sources. In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory. Specifically, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion. Contrast to the previous approaches, the abundant use of the guidance information allows us to better handle the challenges such as appearance changes and occlussions. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (overall score of 79.4 on Youtube-VOS val set, J of 88.7 and 79.2 on DAVIS 2016/2017 val set respectively) while having a fast runtime (0.16 second/frame on DAVIS 2016 val set).
Tasks	Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2019-04-01
URL	https://arxiv.org/abs/1904.00607v2
PDF	https://arxiv.org/pdf/1904.00607v2.pdf
PWC	https://paperswithcode.com/paper/video-object-segmentation-using-space-time
Repo	https://github.com/seoungwugoh/STM
Framework	pytorch

BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames


Title	BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames
Authors	Brent A. Griffin, Jason J. Corso
Abstract	Semi-supervised video object segmentation has made significant progress on real and challenging videos in recent years. The current paradigm for segmentation methods and benchmark datasets is to segment objects in video provided a single annotation in the first frame. However, we find that segmentation performance across the entire video varies dramatically when selecting an alternative frame for annotation. This paper address the problem of learning to suggest the single best frame across the video for user annotation—this is, in fact, never the first frame of video. We achieve this by introducing BubbleNets, a novel deep sorting network that learns to select frames using a performance-based loss function that enables the conversion of expansive amounts of training examples from already existing datasets. Using BubbleNets, we are able to achieve an 11% relative improvement in segmentation performance on the DAVIS benchmark without any changes to the underlying method of segmentation.
Tasks	Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2019-03-28
URL	http://arxiv.org/abs/1903.11779v1
PDF	http://arxiv.org/pdf/1903.11779v1.pdf
PWC	https://paperswithcode.com/paper/bubblenets-learning-to-select-the-guidance
Repo	https://github.com/griffbr/BubbleNets
Framework	tf

Dual Attention MobDenseNet(DAMDNet) for Robust 3D Face Alignment


Title	Dual Attention MobDenseNet(DAMDNet) for Robust 3D Face Alignment
Authors	Lei Jiang Xiao-Jun Wu Josef Kittler
Abstract	3D face alignment of monocular images is a crucial process in the recognition of faces with disguise.3D face reconstruction facilitated by alignment can restore the face structure which is helpful in detcting disguise interference.This paper proposes a dual attention mechanism and an efficient end-to-end 3D face alignment framework.We build a stable network model through Depthwise Separable Convolution, Densely Connected Convolutional and Lightweight Channel Attention Mechanism. In order to enhance the ability of the network model to extract the spatial features of the face region, we adopt Spatial Group-wise Feature enhancement module to improve the representation ability of the network. Different loss functions are applied jointly to constrain the 3D parameters of a 3D Morphable Model (3DMM) and its 3D vertices. We use a variety of data enhancement methods and generate large virtual pose face data sets to solve the data imbalance problem. The experiments on the challenging AFLW,AFLW2000-3D datasets show that our algorithm significantly improves the accuracy of 3D face alignment. Our experiments using the field DFW dataset show that DAMDNet exhibits excellent performance in the 3D alignment and reconstruction of challenging disguised faces.The model parameters and the complexity of the proposed method are also reduced significantly.The code is publicly available at https:// github.com/LeiJiangJNU/DAMDNet
Tasks	3D Face Reconstruction, Face Alignment, Face Reconstruction
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11821v1
PDF	https://arxiv.org/pdf/1908.11821v1.pdf
PWC	https://paperswithcode.com/paper/dual-attention-mobdensenetdamdnet-for-robust
Repo	https://github.com/LeiJiangJNU/DAMDNet
Framework	pytorch