April 3, 2020

3170 words 15 mins read

Paper Group AWR 20

MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation. Delving Deeper into the Decoder for Video Captioning. The Tree Ensemble Layer: Differentiability meets Conditional Computation. Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations. Cross-Domain Document Object Detection: Benchmark Suite and …

MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation


Title	MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation
Authors	Chaoyang He, Haishan Ye, Li Shen, Tong Zhang
Abstract	Many recently proposed methods for Neural Architecture Search (NAS) can be formulated as bilevel optimization. For efficient implementation, its solution requires approximations of second-order methods. In this paper, we demonstrate that gradient errors caused by such approximations lead to suboptimality, in the sense that the optimization procedure fails to converge to a (locally) optimal solution. To remedy this, this paper proposes \mldas, a mixed-level reformulation for NAS that can be optimized efficiently and reliably. It is shown that even when using a simple first-order method on the mixed-level formulation, \mldas\ can achieve a lower validation error for NAS problems. Consequently, architectures obtained by our method achieve consistently higher accuracies than those obtained from bilevel optimization. Moreover, \mldas\ proposes a framework beyond DARTS. It is upgraded via model size-based search and early stopping strategies to complete the search process in around 5 hours. Extensive experiments within the convolutional architecture search space validate the effectiveness of our approach.
Tasks	bilevel optimization, Neural Architecture Search
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12238v1
PDF	https://arxiv.org/pdf/2003.12238v1.pdf
PWC	https://paperswithcode.com/paper/milenas-efficient-neural-architecture-search
Repo	https://github.com/chaoyanghe/MiLeNAS
Framework	none

Delving Deeper into the Decoder for Video Captioning


Title	Delving Deeper into the Decoder for Video Captioning
Authors	Haoran Chen, Jianmin Li, Xiaolin Hu
Abstract	Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence. The encoder-decoder framework is the most popular paradigm for this task in recent years. However, there exist some problems in the decoder of a video captioning model. We make a thorough investigation into the decoder and adopt three techniques to improve the performance of the model. First of all, a combination of variational dropout and layer normalization is embedded into a recurrent unit to alleviate the problem of overfitting. Secondly, a new online method is proposed to evaluate the performance of a model on a validation set so as to select the best checkpoint for testing. Finally, a new training strategy called professional learning is proposed which uses the strengths of a captioning model and bypasses its weaknesses. It is demonstrated in the experiments on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets that our model has achieved the best results evaluated by BLEU, CIDEr, METEOR and ROUGE-L metrics with significant gains of up to 18% on MSVD and 3.5% on MSR-VTT compared with the previous state-of-the-art models.
Tasks	Video Captioning, Video Description
Published	2020-01-16
URL	https://arxiv.org/abs/2001.05614v3
PDF	https://arxiv.org/pdf/2001.05614v3.pdf
PWC	https://paperswithcode.com/paper/delving-deeper-into-the-decoder-for-video
Repo	https://github.com/WingsBrokenAngel/delving-deeper-into-the-decoder-for-video-captioning
Framework	tf

The Tree Ensemble Layer: Differentiability meets Conditional Computation


Title	The Tree Ensemble Layer: Differentiability meets Conditional Computation
Authors	Hussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, Rahul Mazumder
Abstract	Neural networks and tree ensembles are state-of-the-art learners, each with its unique statistical and computational advantages. We aim to combine these advantages by introducing a new layer for neural networks, composed of an ensemble of differentiable decision trees (a.k.a. soft trees). While differentiable trees demonstrate promising results in the literature, in practice they are typically slow in training and inference as they do not support conditional computation. We mitigate this issue by introducing a new sparse activation function for sample routing, and implement true conditional computation by developing specialized forward and backward propagation algorithms that exploit sparsity. Our efficient algorithms pave the way for jointly training over deep and wide tree ensembles using first-order methods (e.g., SGD). Experiments on 23 classification datasets indicate over 10x speed-ups compared to the differentiable trees used in the literature and over 20x reduction in the number of parameters compared to gradient boosted trees, while maintaining competitive performance. Moreover, experiments on CIFAR, MNIST, and Fashion MNIST indicate that replacing dense layers in CNNs with our tree layer reduces the test loss by 7-53% and the number of parameters by 8x. We provide an open-source TensorFlow implementation with a Keras API.
Tasks
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07772v1
PDF	https://arxiv.org/pdf/2002.07772v1.pdf
PWC	https://paperswithcode.com/paper/the-tree-ensemble-layer-differentiability
Repo	https://github.com/google-research/google-research/tree/master/tf_trees
Framework	tf

Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations


Title	Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations
Authors	Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Duane Boning, Cho-Jui Hsieh
Abstract	Deep Reinforcement Learning (DRL) is vulnerable to small adversarial perturbations on state observations. These perturbations do not alter the environment directly but can mislead the agent into making suboptimal decisions. We analyze the Markov Decision Process (MDP) under this threat model and utilize tools from the neural net-work verification literature to enable robust train-ing for DRL under observational perturbations. Our techniques are general and can be applied to both Deep Q Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) algorithms for discrete and continuous action control problems. We demonstrate that our proposed training procedure significantly improves the robustness of DQN and DDPG agents under a suite of strong white-box attacks on observations, including a few novel attacks we specifically craft. Additionally, our training procedure can produce provable certificates for the robustness of a Deep RL agent.
Tasks
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08938v1
PDF	https://arxiv.org/pdf/2003.08938v1.pdf
PWC	https://paperswithcode.com/paper/robust-deep-reinforcement-learning-against
Repo	https://github.com/chenhongge/StateAdvDRL
Framework	none

Cross-Domain Document Object Detection: Benchmark Suite and Method


Title	Cross-Domain Document Object Detection: Benchmark Suite and Method
Authors	Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu
Abstract	Decomposing images of document pages into high-level semantic regions (e.g., figures, tables, paragraphs), document object detection (DOD) is fundamental for downstream tasks like intelligent document editing and understanding. DOD remains a challenging problem as document objects vary significantly in layout, size, aspect ratio, texture, etc. An additional challenge arises in practice because large labeled training datasets are only available for domains that differ from the target domain. We investigate cross-domain DOD, where the goal is to learn a detector for the target domain using labeled data from the source domain and only unlabeled data from the target domain. Documents from the two domains may vary significantly in layout, language, and genre. We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model training and evaluation. For each dataset, we provide the page images, bounding box annotations, PDF files, and the rendering layers extracted from the PDF files. Moreover, we propose a novel cross-domain DOD model which builds upon the standard detection model and addresses domain shifts by incorporating three novel alignment modules: Feature Pyramid Alignment (FPA) module, Region Alignment (RA) module and Rendering Layer alignment (RLA) module. Extensive experiments on the benchmark suite substantiate the efficacy of the three proposed modules and the proposed method significantly outperforms the baseline methods. The project page is at \url{https://github.com/kailigo/cddod}.
Tasks	Object Detection
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13197v1
PDF	https://arxiv.org/pdf/2003.13197v1.pdf
PWC	https://paperswithcode.com/paper/cross-domain-document-object-detection
Repo	https://github.com/kailigo/cddod
Framework	none

MushroomRL: Simplifying Reinforcement Learning Research


Title	MushroomRL: Simplifying Reinforcement Learning Research
Authors	Carlo D’Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, Jan Peters
Abstract	MushroomRL is an open-source Python library developed to simplify the process of implementing and running Reinforcement Learning (RL) experiments. Compared to other available libraries, MushroomRL has been created with the purpose of providing a comprehensive and flexible framework to minimize the effort in implementing and testing novel RL methodologies. Indeed, the architecture of MushroomRL is built in such a way that every component of an RL problem is already provided, and most of the time users can only focus on the implementation of their own algorithms and experiments. The result is a library from which RL researchers can significantly benefit in the critical phase of the empirical analysis of their works. MushroomRL stable code, tutorials and documentation can be found at https://github.com/MushroomRL/mushroom-rl.
Tasks
Published	2020-01-04
URL	https://arxiv.org/abs/2001.01102v2
PDF	https://arxiv.org/pdf/2001.01102v2.pdf
PWC	https://paperswithcode.com/paper/mushroomrl-simplifying-reinforcement-learning
Repo	https://github.com/AIRLab-POLIMI/mushroom-rl
Framework	tf

HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment


Title	HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment
Authors	Anssi Yli-Jyrä, Josi Purhonen, Matti Liljeqvist, Arto Antturi, Pekka Nieminen, Kari M. Räntilä, Valtter Luoto
Abstract	Twenty-five years ago, morphologically aligned Hebrew-Finnish and Greek-Finnish bitexts (texts accompanied by a translation) were constructed manually in order to create an analytical concordance (Luoto et al., 1997) for a Finnish Bible translation. The creators of the bitexts recently secured the publisher’s permission to release its fine-grained alignment, but the alignment was still dependent on proprietary, third-party resources such as a copyrighted text edition and proprietary morphological analyses of the source texts. In this paper, we describe a nontrivial editorial process starting from the creation of the original one-purpose database and ending with its reconstruction using only freely available text editions and annotations. This process produced an openly available dataset that contains (i) the source texts and their translations, (ii) the morphological analyses, (iii) the cross-lingual morpheme alignments.
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07456v1
PDF	https://arxiv.org/pdf/2003.07456v1.pdf
PWC	https://paperswithcode.com/paper/helfi-a-hebrew-greek-finnish-parallel-bible
Repo	https://github.com/amikael/HELFI
Framework	none

Squeezed Deep 6DoF Object Detection Using Knowledge Distillation


Title	Squeezed Deep 6DoF Object Detection Using Knowledge Distillation
Authors	Heitor Felix, Walber M. Rodrigues, David Macêdo, Francisco Simões, Adriano L. I. Oliveira, Veronica Teichrieb, Cleber Zanchettin
Abstract	The detection of objects considering a 6DoF pose is common requisite to build virtual and augmented reality applications. It is usually a complex task witch requires real-time processing and high precision results for an adequate user experience. Recently, different deep learning techniques have been proposed to detect objects in 6DoF in RGB images but they rely on high complexity networks, requiring a computational power that prevents them to work on mobile devices. In this paper, we propose an approach to reduce the complexity of 6DoF detection networks while maintaining accuracy. We used Knowledge Distillation to teach portables Convolutional Neural Networks (CNN) to learn from a real-time 6DoF detection CNN. The proposed method allows real-time applications using only RGB images while decreasing the hardware requirements. We used the LINEMOD dataset to evaluate the proposed method and the experimental results show that the proposed method reduces the memory requirement almost 99% in comparison to the original architecture reducing half the accuracy in one of the metrics. Code is available at https://github.com/heitorcfelix/singleshot6Dpose
Tasks	Object Detection
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13586v2
PDF	https://arxiv.org/pdf/2003.13586v2.pdf
PWC	https://paperswithcode.com/paper/squeezed-deep-6dof-object-detection-using
Repo	https://github.com/heitorcfelix/singleshot6Dpose
Framework	pytorch

Generalizing Spatial Transformers to Projective Geometry with Applications to 2D/3D Registration


Title	Generalizing Spatial Transformers to Projective Geometry with Applications to 2D/3D Registration
Authors	Cong Gao, Xingtong Liu, Wenhao Gu, Benjamin Killeen, Mehran Armand, Russell Taylor, Mathias Unberath
Abstract	Differentiable rendering is a technique to connect 3D scenes with corresponding 2D images. Since it is differentiable, processes during image formation can be learned. Previous approaches to differentiable rendering focus on mesh-based representations of 3D scenes, which is inappropriate for medical applications where volumetric, voxelized models are used to represent anatomy. We propose a novel Projective Spatial Transformer module that generalizes spatial transformers to projective geometry, thus enabling differentiable volume rendering. We demonstrate the usefulness of this architecture on the example of 2D/3D registration between radiographs and CT scans. Specifically, we show that our transformer enables end-to-end learning of an image processing and projection model that approximates an image similarity function that is convex with respect to the pose parameters, and can thus be optimized effectively using conventional gradient descent. To the best of our knowledge, this is the first time that spatial transformers have been described for projective geometry. The source code will be made public upon publication of this manuscript and we hope that our developments will benefit related 3D research applications.
Tasks
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10987v1
PDF	https://arxiv.org/pdf/2003.10987v1.pdf
PWC	https://paperswithcode.com/paper/generalizing-spatial-transformers-to
Repo	https://github.com/gaocong13/Projective-Spatial-Transformers
Framework	pytorch

iTAML: An Incremental Task-Agnostic Meta-learning Approach


Title	iTAML: An Incremental Task-Agnostic Meta-learning Approach
Authors	Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
Abstract	Humans can continuously learn new knowledge as their experience grows. In contrast, previous learning in deep neural networks can quickly fade out when they are trained on a new task. In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks. In this pursuit, we introduce a novel meta-learning approach that seeks to maintain an equilibrium between all the encountered tasks. This is ensured by a new meta-update rule which avoids catastrophic forgetting. In comparison to previous meta-learning techniques, our approach is task-agnostic. When presented with a continuum of data, our model automatically identifies the task and quickly adapts to it with just a single update. We perform extensive experiments on five datasets in a class-incremental setting, leading to significant improvements over the state of the art methods (e.g., a 21.3% boost on CIFAR100 with 10 incremental tasks). Specifically, on large-scale datasets that generally prove difficult cases for incremental learning, our approach delivers absolute gains as high as 19.1% and 7.4% on ImageNet and MS-Celeb datasets, respectively.
Tasks	Meta-Learning
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11652v1
PDF	https://arxiv.org/pdf/2003.11652v1.pdf
PWC	https://paperswithcode.com/paper/itaml-an-incremental-task-agnostic-meta
Repo	https://github.com/brjathu/iTAML
Framework	pytorch

Learning Directly from Grammar Compressed Text


Title	Learning Directly from Grammar Compressed Text
Authors	Yoichi Sasaki, Kosuke Akimoto, Takanori Maehara
Abstract	Neural networks using numerous text data have been successfully applied to a variety of tasks. While massive text data is usually compressed using techniques such as grammar compression, almost all of the previous machine learning methods assume already decompressed sequence data as their input. In this paper, we propose a method to directly apply neural sequence models to text data compressed with grammar compression algorithms without decompression. To encode the unique symbols that appear in compression rules, we introduce composer modules to incrementally encode the symbols into vector representations. Through experiments on real datasets, we empirically showed that the proposal model can achieve both memory and computational efficiency while maintaining moderate performance.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12570v1
PDF	https://arxiv.org/pdf/2002.12570v1.pdf
PWC	https://paperswithcode.com/paper/learning-directly-from-grammar-compressed
Repo	https://github.com/aurtg/learning-on-grammar-compression
Framework	pytorch

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion


Title	Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion
Authors	Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang, Jiawei Han
Abstract	Given a small set of seed entities (e.g., `USA'',` Russia’'), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user’s interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.
Tasks
Published	2020-01-27
URL	https://arxiv.org/abs/2001.10106v1
PDF	https://arxiv.org/pdf/2001.10106v1.pdf
PWC	https://paperswithcode.com/paper/guiding-corpus-based-set-expansion-by
Repo	https://github.com/teapot123/SetCoExpan
Framework	none

Affective Expression Analysis in-the-wild using Multi-Task Temporal Statistical Deep Learning Model


Title	Affective Expression Analysis in-the-wild using Multi-Task Temporal Statistical Deep Learning Model
Authors	Nhu-Tai Do, Tram-Tran Nguyen-Quynh, Soo-Hyung Kim
Abstract	Affective behavior analysis plays an important role in human-computer interaction, customer marketing, health monitoring. ABAW Challenge and Aff-Wild2 dataset raise the new challenge for classifying basic emotions and regression valence-arousal value under in-the-wild environments. In this paper, we present an affective expression analysis model that deals with the above challenges. Our approach includes STAT and Temporal Module for fine-tuning again face feature model. We experimented on Aff-Wild2 dataset, a large-scale dataset for ABAW Challenge with the annotations for both the categorical and valence-arousal emotion. We achieved the expression score 0.543 and valence-arousal score 0.534 on the validation set.
Tasks
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09120v3
PDF	https://arxiv.org/pdf/2002.09120v3.pdf
PWC	https://paperswithcode.com/paper/affective-expression-analysis-in-the-wild
Repo	https://github.com/dntai/dntai_fg20_affwild2_challenges
Framework	tf

Learning Deep Kernels for Non-Parametric Two-Sample Tests


Title	Learning Deep Kernels for Non-Parametric Two-Sample Tests
Authors	Feng Liu, Wenkai Xu, Jie Lu, Guangquan Zhang, Arthur Gretton, D. J. Sutherland
Abstract	We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power. These tests adapt to variations in distribution smoothness and shape over space, and are especially suited to high dimensions and complex data. By contrast, the simpler kernels used in prior kernel testing work are spatially homogeneous, and adaptive only in lengthscale. We explain how this scheme includes popular classifier-based two-sample tests as a special case, but improves on them in general. We provide the first proof of consistency for the proposed adaptation method, which applies both to kernels on deep features and to simpler radial basis kernels or multiple kernel learning. In experiments, we establish the superior performance of our deep kernels in hypothesis testing on benchmark and real-world data. The code of our deep-kernel-based two sample tests is available at https://github.com/fengliu90/DK-for-TST.
Tasks
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09116v1
PDF	https://arxiv.org/pdf/2002.09116v1.pdf
PWC	https://paperswithcode.com/paper/learning-deep-kernels-for-non-parametric-two
Repo	https://github.com/fengliu90/DK-for-TST
Framework	pytorch

DeepSign: Deep On-Line Signature Verification


Title	DeepSign: Deep On-Line Signature Verification
Authors	Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Javier Ortega-Garcia
Abstract	Deep learning has become a breathtaking technology in the last years, overcoming traditional handcrafted approaches and even humans for many different tasks. However, in some tasks, such as the verification of handwritten signatures, the amount of publicly available data is scarce, what makes difficult to test the real limits of deep learning. In addition to the lack of public data, it is not easy to evaluate the improvements of novel proposed approaches as different databases and experimental protocols are usually considered. The main contributions of this study are: i) we provide an in-depth analysis of state-of-the-art deep learning approaches for on-line signature verification, ii) we present and describe the new DeepSignDB on-line handwritten signature biometric public database, iii) we propose a standard experimental protocol and benchmark to be used for the research community in order to perform a fair comparison of novel approaches with the state of the art, and iv) we adapt and evaluate our recent deep learning approach named Time-Aligned Recurrent Neural Networks (TA-RNNs) for the task of on-line handwritten signature verification. This approach combines the potential of Dynamic Time Warping and Recurrent Neural Networks to train more robust systems against forgeries. Our proposed TA-RNN system outperforms the state of the art, achieving results even below 2.0% EER when considering skilled forgery impostors and just one training signature per user.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10119v1
PDF	https://arxiv.org/pdf/2002.10119v1.pdf
PWC	https://paperswithcode.com/paper/deepsign-deep-on-line-signature-verification
Repo	https://github.com/BiDAlab/DeepSignDB
Framework	none