February 2, 2020

3026 words 15 mins read

Paper Group AWR 48

Domain Adaptation of Neural Machine Translation by Lexicon Induction. StegaStamp: Invisible Hyperlinks in Physical Photographs. Cross-View Policy Learning for Street Navigation. A Comprehensive Overhaul of Feature Distillation. DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion. Fast and Efficient Zero …

Domain Adaptation of Neural Machine Translation by Lexicon Induction


Title	Domain Adaptation of Neural Machine Translation by Lexicon Induction
Authors	Junjie Hu, Mengzhou Xia, Graham Neubig, Jaime Carbonell
Abstract	It has been previously noted that neural machine translation (NMT) is very sensitive to domain shift. In this paper, we argue that this is a dual effect of the highly lexicalized nature of NMT, resulting in failure for sentences with large numbers of unknown words, and lack of supervision for domain-specific words. To remedy this problem, we propose an unsupervised adaptation method which fine-tunes a pre-trained out-of-domain NMT model using a pseudo-in-domain corpus. Specifically, we perform lexicon induction to extract an in-domain lexicon, and construct a pseudo-parallel in-domain corpus by performing word-for-word back-translation of monolingual in-domain target sentences. In five domains over twenty pairwise adaptation settings and two model architectures, our method achieves consistent improvements without using any in-domain parallel sentences, improving up to 14 BLEU over unadapted models, and up to 2 BLEU over strong back-translation baselines.
Tasks	Domain Adaptation, Machine Translation
Published	2019-06-02
URL	https://arxiv.org/abs/1906.00376v1
PDF	https://arxiv.org/pdf/1906.00376v1.pdf
PWC	https://paperswithcode.com/paper/190600376
Repo	https://github.com/junjiehu/dali
Framework	pytorch

StegaStamp: Invisible Hyperlinks in Physical Photographs


Title	StegaStamp: Invisible Hyperlinks in Physical Photographs
Authors	Matthew Tancik, Ben Mildenhall, Ren Ng
Abstract	Printed and digitally displayed photos have the ability to hide imperceptible digital data that can be accessed through internet-connected imaging systems. Another way to think about this is physical photographs that have unique QR codes invisibly embedded within them. This paper presents an architecture, algorithms, and a prototype implementation addressing this vision. Our key technical contribution is StegaStamp, a learned steganographic algorithm to enable robust encoding and decoding of arbitrary hyperlink bitstrings into photos in a manner that approaches perceptual invisibility. StegaStamp comprises a deep neural network that learns an encoding/decoding algorithm robust to image perturbations approximating the space of distortions resulting from real printing and photography. We demonstrates real-time decoding of hyperlinks in photos from in-the-wild videos that contain variation in lighting, shadows, perspective, occlusion and viewing distance. Our prototype system robustly retrieves 56 bit hyperlinks after error correction - sufficient to embed a unique code within every photo on the internet.
Tasks	Steganographics
Published	2019-04-10
URL	https://arxiv.org/abs/1904.05343v2
PDF	https://arxiv.org/pdf/1904.05343v2.pdf
PWC	https://paperswithcode.com/paper/stegastamp-invisible-hyperlinks-in-physical
Repo	https://github.com/tancik/StegaStamp
Framework	tf


Title	Cross-View Policy Learning for Street Navigation
Authors	Ang Li, Huiyi Hu, Piotr Mirowski, Mehrdad Farajtabar
Abstract	The ability to navigate from visual observations in unfamiliar environments is a core component of intelligent agents and an ongoing challenge for Deep Reinforcement Learning (RL). Street View can be a sensible testbed for such RL agents, because it provides real-world photographic imagery at ground level, with diverse street appearances; it has been made into an interactive environment called StreetLearn and used for research on navigation. However, goal-driven street navigation agents have not so far been able to transfer to unseen areas without extensive retraining, and relying on simulation is not a scalable solution. Since aerial images are easily and globally accessible, we propose instead to train a multi-modal policy on ground and aerial views, then transfer the ground view policy to unseen (target) parts of the city by utilizing aerial view observations. Our core idea is to pair the ground view with an aerial view and to learn a joint policy that is transferable across views. We achieve this by learning a similar embedding space for both views, distilling the policy across views and dropping out visual modalities. We further reformulate the transfer learning paradigm into three stages: 1) cross-modal training, when the agent is initially trained on multiple city regions, 2) aerial view-only adaptation to a new area, when the agent is adapted to a held-out region using only the easily obtainable aerial view, and 3) ground view-only transfer, when the agent is tested on navigation tasks on unseen ground views, without aerial imagery. Experimental results suggest that the proposed cross-view policy learning enables better generalization of the agent and allows for more effective transfer to unseen environments.
Tasks	Transfer Learning
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05930v2
PDF	https://arxiv.org/pdf/1906.05930v2.pdf
PWC	https://paperswithcode.com/paper/cross-view-policy-learning-for-street
Repo	https://github.com/deepmind/streetlearn
Framework	tf

A Comprehensive Overhaul of Feature Distillation


Title	A Comprehensive Overhaul of Feature Distillation
Authors	Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi
Abstract	We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function. Our proposed distillation loss includes a feature transform with a newly designed margin ReLU, a new distillation feature position, and a partial L2 distance function to skip redundant information giving adverse effects to the compression of student. In ImageNet, our proposed method achieves 21.65% of top-1 error with ResNet50, which outperforms the performance of the teacher network, ResNet152. Our proposed method is evaluated on various tasks such as image classification, object detection and semantic segmentation and achieves a significant performance improvement in all tasks. The code is available at https://sites.google.com/view/byeongho-heo/overhaul
Tasks	Image Classification, Object Detection, Semantic Segmentation
Published	2019-04-03
URL	https://arxiv.org/abs/1904.01866v2
PDF	https://arxiv.org/pdf/1904.01866v2.pdf
PWC	https://paperswithcode.com/paper/a-comprehensive-overhaul-of-feature
Repo	https://github.com/clovaai/overhaul-distillation
Framework	pytorch

DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion


Title	DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion
Authors	Shreyas S. Shivakumar, Ty Nguyen, Ian D. Miller, Steven W. Chen, Vijay Kumar, Camillo J. Taylor
Abstract	In this paper we propose a convolutional neural network that is designed to upsample a series of sparse range measurements based on the contextual cues gleaned from a high resolution intensity image. Our approach draws inspiration from related work on super-resolution and in-painting. We propose a novel architecture that seeks to pull contextual cues separately from the intensity image and the depth features and then fuse them later in the network. We argue that this approach effectively exploits the relationship between the two modalities and produces accurate results while respecting salient image structures. We present experimental results to demonstrate that our approach is comparable with state of the art methods and generalizes well across multiple datasets.
Tasks	Depth Completion, Super-Resolution
Published	2019-02-02
URL	https://arxiv.org/abs/1902.00761v2
PDF	https://arxiv.org/pdf/1902.00761v2.pdf
PWC	https://paperswithcode.com/paper/dfusenet-deep-fusion-of-rgb-and-sparse-depth
Repo	https://github.com/ShreyasSkandanS/DFuseNet
Framework	pytorch

Fast and Efficient Zero-Learning Image Fusion


Title	Fast and Efficient Zero-Learning Image Fusion
Authors	Fayez Lahoud, Sabine Süsstrunk
Abstract	We propose a real-time image fusion method using pre-trained neural networks. Our method generates a single image containing features from multiple sources. We first decompose images into a base layer representing large scale intensity variations, and a detail layer containing small scale changes. We use visual saliency to fuse the base layers, and deep feature maps extracted from a pre-trained neural network to fuse the detail layers. We conduct ablation studies to analyze our method’s parameters such as decomposition filters, weight construction methods, and network depth and architecture. Then, we validate its effectiveness and speed on thermal, medical, and multi-focus fusion. We also apply it to multiple image inputs such as multi-exposure sequences. The experimental results demonstrate that our technique achieves state-of-the-art performance in visual quality, objective assessment, and runtime efficiency.
Tasks
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03590v1
PDF	https://arxiv.org/pdf/1905.03590v1.pdf
PWC	https://paperswithcode.com/paper/190503590
Repo	https://github.com/IVRL/Fast-Zero-Learning-Fusion
Framework	pytorch

Diffusion Variational Autoencoders


Title	Diffusion Variational Autoencoders
Authors	Luis A. Pérez Rey, Vlado Menkovski, Jacobus W. Portegies
Abstract	A standard Variational Autoencoder, with a Euclidean latent space, is structurally incapable of capturing topological properties of certain datasets. To remove topological obstructions, we introduce Diffusion Variational Autoencoders with arbitrary manifolds as a latent space. A Diffusion Variational Autoencoder uses transition kernels of Brownian motion on the manifold. In particular, it uses properties of the Brownian motion to implement the reparametrization trick and fast approximations to the KL divergence. We show that the Diffusion Variational Autoencoder is capable of capturing topological properties of synthetic datasets. Additionally, we train MNIST on spheres, tori, projective spaces, SO(3), and a torus embedded in R3. Although a natural dataset like MNIST does not have latent variables with a clear-cut topological structure, training it on a manifold can still highlight topological and geometrical properties.
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08991v2
PDF	http://arxiv.org/pdf/1901.08991v2.pdf
PWC	https://paperswithcode.com/paper/diffusion-variational-autoencoders
Repo	https://github.com/luis-armando-perez-rey/diffusion_vae_github
Framework	tf

Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model-Agnostic Interpretations


Title	Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model-Agnostic Interpretations
Authors	Christian A. Scholbeck, Christoph Molnar, Christian Heumann, Bernd Bischl, Giuseppe Casalicchio
Abstract	Model-agnostic interpretation techniques allow us to explain the behavior of any predictive model. Due to different notations and terminology, it is difficult to see how they are related. A unified view on these methods has been missing. We present the generalized SIPA (sampling, intervention, prediction, aggregation) framework of work stages for model-agnostic interpretations and demonstrate how several prominent methods for feature effects can be embedded into the proposed framework. Furthermore, we extend the framework to feature importance computations by pointing out how variance-based and performance-based importance measures are based on the same work stages. The SIPA framework reduces the diverse set of model-agnostic techniques to a single methodology and establishes a common terminology to discuss them in future work.
Tasks	Feature Importance
Published	2019-04-08
URL	https://arxiv.org/abs/1904.03959v4
PDF	https://arxiv.org/pdf/1904.03959v4.pdf
PWC	https://paperswithcode.com/paper/sampling-intervention-prediction-aggregation
Repo	https://github.com/koalaverse/vip
Framework	none

CenterNet: Keypoint Triplets for Object Detection


Title	CenterNet: Keypoint Triplets for Object Detection
Authors	Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian
Abstract	In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additional look into the cropped regions. This paper presents an efficient solution which explores the visual patterns within each cropped region with minimal costs. We build our framework upon a representative one-stage keypoint-based detector named CornerNet. Our approach, named CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall. Accordingly, we design two customized modules named cascade corner pooling and center pooling, which play the roles of enriching information collected by both top-left and bottom-right corners and providing more recognizable information at the central regions, respectively. On the MS-COCO dataset, CenterNet achieves an AP of 47.0%, which outperforms all existing one-stage detectors by at least 4.9%. Meanwhile, with a faster inference speed, CenterNet demonstrates quite comparable performance to the top-ranked two-stage detectors. Code is available at https://github.com/Duankaiwen/CenterNet.
Tasks	Object Detection
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08189v3
PDF	http://arxiv.org/pdf/1904.08189v3.pdf
PWC	https://paperswithcode.com/paper/centernet-object-detection-with-keypoint
Repo	https://github.com/Duankaiwen/CenterNet
Framework	pytorch

CatGAN: Category-aware Generative Adversarial Networks with Hierarchical Evolutionary Learning for Category Text Generation


Title	CatGAN: Category-aware Generative Adversarial Networks with Hierarchical Evolutionary Learning for Category Text Generation
Authors	Zhiyue Liu, Jiahai Wang, Zhiwei Liang
Abstract	Generating multiple categories of texts is a challenging task and draws more and more attention. Since generative adversarial nets (GANs) have shown competitive results on general text generation, they are extended for category text generation in some previous works. However, the complicated model structures and learning strategies limit their performance and exacerbate the training instability. This paper proposes a category-aware GAN (CatGAN) which consists of an efficient category-aware model for category text generation and a hierarchical evolutionary learning algorithm for training our model. The category-aware model directly measures the gap between real samples and generated samples on each category, then reducing this gap will guide the model to generate high-quality category samples. The Gumbel-Softmax relaxation further frees our model from complicated learning strategies for updating CatGAN on discrete data. Moreover, only focusing on the sample quality normally leads the mode collapse problem, thus a hierarchical evolutionary learning algorithm is introduced to stabilize the training procedure and obtain the trade-off between quality and diversity while training CatGAN. Experimental results demonstrate that CatGAN outperforms most of the existing state-of-the-art methods.
Tasks	Text Generation
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06641v2
PDF	https://arxiv.org/pdf/1911.06641v2.pdf
PWC	https://paperswithcode.com/paper/catgan-category-aware-generative-adversarial
Repo	https://github.com/williamSYSU/CatGAN
Framework	pytorch

Generating valid Euclidean distance matrices


Title	Generating valid Euclidean distance matrices
Authors	Moritz Hoffmann, Frank Noé
Abstract	Generating point clouds, e.g., molecular structures, in arbitrary rotations, translations, and enumerations remains a challenging task. Meanwhile, neural networks utilizing symmetry invariant layers have been shown to be able to optimize their training objective in a data-efficient way. In this spirit, we present an architecture which allows to produce valid Euclidean distance matrices, which by construction are already invariant under rotation and translation of the described object. Motivated by the goal to generate molecular structures in Cartesian space, we use this architecture to construct a Wasserstein GAN utilizing a permutation invariant critic network. This makes it possible to generate molecular structures in a one-shot fashion by producing Euclidean distance matrices which have a three-dimensional embedding.
Tasks
Published	2019-10-07
URL	https://arxiv.org/abs/1910.03131v2
PDF	https://arxiv.org/pdf/1910.03131v2.pdf
PWC	https://paperswithcode.com/paper/generating-valid-euclidean-distance-matrices-1
Repo	https://github.com/noegroup/EDMnets
Framework	tf

Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network


Title	Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network
Authors	Shervin Minaee, Amirali Abdolrashidi
Abstract	Facial expression recognition has been an active research area over the past few decades, and it is still challenging due to the high intra-class variation. Traditional approaches for this problem rely on hand-crafted features such as SIFT, HOG and LBP, followed by a classifier trained on a database of images or videos. Most of these works perform reasonably well on datasets of images captured in a controlled condition, but fail to perform as good on more challenging datasets with more image variation and partial faces. In recent years, several works proposed an end-to-end framework for facial expression recognition, using deep learning models. Despite the better performance of these works, there still seems to be a great room for improvement. In this work, we propose a deep learning approach based on attentional convolutional network, which is able to focus on important parts of the face, and achieves significant improvement over previous models on multiple datasets, including FER-2013, CK+, FERG, and JAFFE. We also use a visualization technique which is able to find important face regions for detecting different emotions, based on the classifier’s output. Through experimental results, we show that different emotions seems to be sensitive to different parts of the face.
Tasks	Facial Expression Recognition
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01019v1
PDF	http://arxiv.org/pdf/1902.01019v1.pdf
PWC	https://paperswithcode.com/paper/deep-emotion-facial-expression-recognition
Repo	https://github.com/omarsayed7/Deep-Emotion
Framework	pytorch

Seq2Seq RNN based Gait Anomaly Detection from Smartphone Acquired Multimodal Motion Data


Title	Seq2Seq RNN based Gait Anomaly Detection from Smartphone Acquired Multimodal Motion Data
Authors	Riccardo Bonetto, Mattia Soldan, Alberto Lanaro, Simone Milani, Michele Rossi
Abstract	Smartphones and wearable devices are fast growing technologies that, in conjunction with advances in wireless sensor hardware, are enabling ubiquitous sensing applications. Wearables are suitable for indoor and outdoor scenarios, can be placed on many parts of the human body and can integrate a large number of sensors capable of gathering physiological and behavioral biometric information. Here, we are concerned with gait analysis systems that extract meaningful information from a user’s movements to identify anomalies and changes in their walking style. The solution that is put forward is subject-specific, as the designed feature extraction and classification tools are trained on the subject under observation. A smartphone mounted on an ad-hoc made chest support is utilized to gather inertial data and video signals from its built-in sensors and rear-facing camera. The collected video and inertial data are preprocessed, combined and then classified by means of a Recurrent Neural Network (RNN) based Sequence-to-Sequence (Seq2Seq) model, which is used as a feature extractor, and a following Convolutional Neural Network (CNN) classifier. This architecture provides excellent results, being able to correctly assess anomalies in 100% of the cases, for the considered tests, surpassing the performance of support vector machine classifiers.
Tasks	Anomaly Detection
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08608v1
PDF	https://arxiv.org/pdf/1911.08608v1.pdf
PWC	https://paperswithcode.com/paper/seq2seq-rnn-based-gait-anomaly-detection-from
Repo	https://github.com/Soldelli/gait_anomaly_detection
Framework	tf

Importance of Copying Mechanism for News Headline Generation


Title	Importance of Copying Mechanism for News Headline Generation
Authors	Ilya Gusev
Abstract	News headline generation is an essential problem of text summarization because it is constrained, well-defined, and is still hard to solve. Models with a limited vocabulary can not solve it well, as new named entities can appear regularly in the news and these entities often should be in the headline. News articles in morphologically rich languages such as Russian require model modifications due to a large number of possible word forms. This study aims to validate that models with a possibility of copying words from the original article performs better than models without such an option. The proposed model achieves a mean ROUGE score of 23 on the provided test dataset, which is 8 points greater than the result of a similar model without a copying mechanism. Moreover, the resulting model performs better than any known model on the new dataset of Russian news.
Tasks	Text Summarization
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11475v1
PDF	http://arxiv.org/pdf/1904.11475v1.pdf
PWC	https://paperswithcode.com/paper/importance-of-copying-mechanism-for-news
Repo	https://github.com/IlyaGusev/summarus
Framework	pytorch

Exploring the Limitations of Behavior Cloning for Autonomous Driving


Title	Exploring the Limitations of Behavior Cloning for Autonomous Driving
Authors	Felipe Codevilla, Eder Santana, Antonio M. López, Adrien Gaidon
Abstract	Driving requires reacting to a wide variety of complex environment conditions and agent behaviors. Explicitly modeling each possible scenario is unrealistic. In contrast, imitation learning can, in theory, leverage data from large fleets of human-driven cars. Behavior cloning in particular has been successfully used to learn simple visuomotor policies end-to-end, but scaling to the full spectrum of driving behaviors remains an unsolved problem. In this paper, we propose a new benchmark to experimentally investigate the scalability and limitations of behavior cloning. We show that behavior cloning leads to state-of-the-art results, including in unseen environments, executing complex lateral and longitudinal maneuvers without these reactions being explicitly programmed. However, we confirm well-known limitations (due to dataset bias and overfitting), new generalization issues (due to dynamic objects and the lack of a causal model), and training instability requiring further research before behavior cloning can graduate to real-world driving. The code of the studied behavior cloning approaches can be found at https://github.com/felipecode/coiltraine .
Tasks	Autonomous Driving, Imitation Learning
Published	2019-04-18
URL	http://arxiv.org/abs/1904.08980v1
PDF	http://arxiv.org/pdf/1904.08980v1.pdf
PWC	https://paperswithcode.com/paper/exploring-the-limitations-of-behavior-cloning
Repo	https://github.com/felipecode/coiltraine
Framework	none