February 2, 2020

3026 words 15 mins read

Paper Group AWR 48

Paper Group AWR 48

Domain Adaptation of Neural Machine Translation by Lexicon Induction. StegaStamp: Invisible Hyperlinks in Physical Photographs. Cross-View Policy Learning for Street Navigation. A Comprehensive Overhaul of Feature Distillation. DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion. Fast and Efficient Zero …

Domain Adaptation of Neural Machine Translation by Lexicon Induction

Title Domain Adaptation of Neural Machine Translation by Lexicon Induction
Authors Junjie Hu, Mengzhou Xia, Graham Neubig, Jaime Carbonell
Abstract It has been previously noted that neural machine translation (NMT) is very sensitive to domain shift. In this paper, we argue that this is a dual effect of the highly lexicalized nature of NMT, resulting in failure for sentences with large numbers of unknown words, and lack of supervision for domain-specific words. To remedy this problem, we propose an unsupervised adaptation method which fine-tunes a pre-trained out-of-domain NMT model using a pseudo-in-domain corpus. Specifically, we perform lexicon induction to extract an in-domain lexicon, and construct a pseudo-parallel in-domain corpus by performing word-for-word back-translation of monolingual in-domain target sentences. In five domains over twenty pairwise adaptation settings and two model architectures, our method achieves consistent improvements without using any in-domain parallel sentences, improving up to 14 BLEU over unadapted models, and up to 2 BLEU over strong back-translation baselines.
Tasks Domain Adaptation, Machine Translation
Published 2019-06-02
URL https://arxiv.org/abs/1906.00376v1
PDF https://arxiv.org/pdf/1906.00376v1.pdf
PWC https://paperswithcode.com/paper/190600376
Repo https://github.com/junjiehu/dali
Framework pytorch
Title StegaStamp: Invisible Hyperlinks in Physical Photographs
Authors Matthew Tancik, Ben Mildenhall, Ren Ng
Abstract Printed and digitally displayed photos have the ability to hide imperceptible digital data that can be accessed through internet-connected imaging systems. Another way to think about this is physical photographs that have unique QR codes invisibly embedded within them. This paper presents an architecture, algorithms, and a prototype implementation addressing this vision. Our key technical contribution is StegaStamp, a learned steganographic algorithm to enable robust encoding and decoding of arbitrary hyperlink bitstrings into photos in a manner that approaches perceptual invisibility. StegaStamp comprises a deep neural network that learns an encoding/decoding algorithm robust to image perturbations approximating the space of distortions resulting from real printing and photography. We demonstrates real-time decoding of hyperlinks in photos from in-the-wild videos that contain variation in lighting, shadows, perspective, occlusion and viewing distance. Our prototype system robustly retrieves 56 bit hyperlinks after error correction - sufficient to embed a unique code within every photo on the internet.
Tasks Steganographics
Published 2019-04-10
URL https://arxiv.org/abs/1904.05343v2
PDF https://arxiv.org/pdf/1904.05343v2.pdf
PWC https://paperswithcode.com/paper/stegastamp-invisible-hyperlinks-in-physical
Repo https://github.com/tancik/StegaStamp
Framework tf

Cross-View Policy Learning for Street Navigation

Title Cross-View Policy Learning for Street Navigation
Authors Ang Li, Huiyi Hu, Piotr Mirowski, Mehrdad Farajtabar
Abstract The ability to navigate from visual observations in unfamiliar environments is a core component of intelligent agents and an ongoing challenge for Deep Reinforcement Learning (RL). Street View can be a sensible testbed for such RL agents, because it provides real-world photographic imagery at ground level, with diverse street appearances; it has been made into an interactive environment called StreetLearn and used for research on navigation. However, goal-driven street navigation agents have not so far been able to transfer to unseen areas without extensive retraining, and relying on simulation is not a scalable solution. Since aerial images are easily and globally accessible, we propose instead to train a multi-modal policy on ground and aerial views, then transfer the ground view policy to unseen (target) parts of the city by utilizing aerial view observations. Our core idea is to pair the ground view with an aerial view and to learn a joint policy that is transferable across views. We achieve this by learning a similar embedding space for both views, distilling the policy across views and dropping out visual modalities. We further reformulate the transfer learning paradigm into three stages: 1) cross-modal training, when the agent is initially trained on multiple city regions, 2) aerial view-only adaptation to a new area, when the agent is adapted to a held-out region using only the easily obtainable aerial view, and 3) ground view-only transfer, when the agent is tested on navigation tasks on unseen ground views, without aerial imagery. Experimental results suggest that the proposed cross-view policy learning enables better generalization of the agent and allows for more effective transfer to unseen environments.
Tasks Transfer Learning
Published 2019-06-13
URL https://arxiv.org/abs/1906.05930v2
PDF https://arxiv.org/pdf/1906.05930v2.pdf
PWC https://paperswithcode.com/paper/cross-view-policy-learning-for-street
Repo https://github.com/deepmind/streetlearn
Framework tf

A Comprehensive Overhaul of Feature Distillation

Title A Comprehensive Overhaul of Feature Distillation
Authors Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi
Abstract We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function. Our proposed distillation loss includes a feature transform with a newly designed margin ReLU, a new distillation feature position, and a partial L2 distance function to skip redundant information giving adverse effects to the compression of student. In ImageNet, our proposed method achieves 21.65% of top-1 error with ResNet50, which outperforms the performance of the teacher network, ResNet152. Our proposed method is evaluated on various tasks such as image classification, object detection and semantic segmentation and achieves a significant performance improvement in all tasks. The code is available at https://sites.google.com/view/byeongho-heo/overhaul
Tasks Image Classification, Object Detection, Semantic Segmentation
Published 2019-04-03
URL https://arxiv.org/abs/1904.01866v2
PDF https://arxiv.org/pdf/1904.01866v2.pdf
PWC https://paperswithcode.com/paper/a-comprehensive-overhaul-of-feature
Repo https://github.com/clovaai/overhaul-distillation
Framework pytorch

DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion

Title DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion
Authors Shreyas S. Shivakumar, Ty Nguyen, Ian D. Miller, Steven W. Chen, Vijay Kumar, Camillo J. Taylor
Abstract In this paper we propose a convolutional neural network that is designed to upsample a series of sparse range measurements based on the contextual cues gleaned from a high resolution intensity image. Our approach draws inspiration from related work on super-resolution and in-painting. We propose a novel architecture that seeks to pull contextual cues separately from the intensity image and the depth features and then fuse them later in the network. We argue that this approach effectively exploits the relationship between the two modalities and produces accurate results while respecting salient image structures. We present experimental results to demonstrate that our approach is comparable with state of the art methods and generalizes well across multiple datasets.
Tasks Depth Completion, Super-Resolution
Published 2019-02-02
URL https://arxiv.org/abs/1902.00761v2
PDF https://arxiv.org/pdf/1902.00761v2.pdf
PWC https://paperswithcode.com/paper/dfusenet-deep-fusion-of-rgb-and-sparse-depth
Repo https://github.com/ShreyasSkandanS/DFuseNet
Framework pytorch

Fast and Efficient Zero-Learning Image Fusion

Title Fast and Efficient Zero-Learning Image Fusion
Authors Fayez Lahoud, Sabine Süsstrunk
Abstract We propose a real-time image fusion method using pre-trained neural networks. Our method generates a single image containing features from multiple sources. We first decompose images into a base layer representing large scale intensity variations, and a detail layer containing small scale changes. We use visual saliency to fuse the base layers, and deep feature maps extracted from a pre-trained neural network to fuse the detail layers. We conduct ablation studies to analyze our method’s parameters such as decomposition filters, weight construction methods, and network depth and architecture. Then, we validate its effectiveness and speed on thermal, medical, and multi-focus fusion. We also apply it to multiple image inputs such as multi-exposure sequences. The experimental results demonstrate that our technique achieves state-of-the-art performance in visual quality, objective assessment, and runtime efficiency.
Tasks
Published 2019-05-09
URL https://arxiv.org/abs/1905.03590v1
PDF https://arxiv.org/pdf/1905.03590v1.pdf
PWC https://paperswithcode.com/paper/190503590
Repo https://github.com/IVRL/Fast-Zero-Learning-Fusion
Framework pytorch

Diffusion Variational Autoencoders

Title Diffusion Variational Autoencoders
Authors Luis A. Pérez Rey, Vlado Menkovski, Jacobus W. Portegies
Abstract A standard Variational Autoencoder, with a Euclidean latent space, is structurally incapable of capturing topological properties of certain datasets. To remove topological obstructions, we introduce Diffusion Variational Autoencoders with arbitrary manifolds as a latent space. A Diffusion Variational Autoencoder uses transition kernels of Brownian motion on the manifold. In particular, it uses properties of the Brownian motion to implement the reparametrization trick and fast approximations to the KL divergence. We show that the Diffusion Variational Autoencoder is capable of capturing topological properties of synthetic datasets. Additionally, we train MNIST on spheres, tori, projective spaces, SO(3), and a torus embedded in R3. Although a natural dataset like MNIST does not have latent variables with a clear-cut topological structure, training it on a manifold can still highlight topological and geometrical properties.
Tasks
Published 2019-01-25
URL http://arxiv.org/abs/1901.08991v2
PDF http://arxiv.org/pdf/1901.08991v2.pdf
PWC https://paperswithcode.com/paper/diffusion-variational-autoencoders
Repo https://github.com/luis-armando-perez-rey/diffusion_vae_github
Framework tf

Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model-Agnostic Interpretations

Title Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model-Agnostic Interpretations
Authors Christian A. Scholbeck, Christoph Molnar, Christian Heumann, Bernd Bischl, Giuseppe Casalicchio
Abstract Model-agnostic interpretation techniques allow us to explain the behavior of any predictive model. Due to different notations and terminology, it is difficult to see how they are related. A unified view on these methods has been missing. We present the generalized SIPA (sampling, intervention, prediction, aggregation) framework of work stages for model-agnostic interpretations and demonstrate how several prominent methods for feature effects can be embedded into the proposed framework. Furthermore, we extend the framework to feature importance computations by pointing out how variance-based and performance-based importance measures are based on the same work stages. The SIPA framework reduces the diverse set of model-agnostic techniques to a single methodology and establishes a common terminology to discuss them in future work.
Tasks Feature Importance
Published 2019-04-08
URL https://arxiv.org/abs/1904.03959v4
PDF https://arxiv.org/pdf/1904.03959v4.pdf
PWC https://paperswithcode.com/paper/sampling-intervention-prediction-aggregation
Repo https://github.com/koalaverse/vip
Framework none

CenterNet: Keypoint Triplets for Object Detection

Title CenterNet: Keypoint Triplets for Object Detection
Authors Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian
Abstract In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additional look into the cropped regions. This paper presents an efficient solution which explores the visual patterns within each cropped region with minimal costs. We build our framework upon a representative one-stage keypoint-based detector named CornerNet. Our approach, named CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall. Accordingly, we design two customized modules named cascade corner pooling and center pooling, which play the roles of enriching information collected by both top-left and bottom-right corners and providing more recognizable information at the central regions, respectively. On the MS-COCO dataset, CenterNet achieves an AP of 47.0%, which outperforms all existing one-stage detectors by at least 4.9%. Meanwhile, with a faster inference speed, CenterNet demonstrates quite comparable performance to the top-ranked two-stage detectors. Code is available at https://github.com/Duankaiwen/CenterNet.
Tasks Object Detection
Published 2019-04-17
URL http://arxiv.org/abs/1904.08189v3
PDF http://arxiv.org/pdf/1904.08189v3.pdf
PWC https://paperswithcode.com/paper/centernet-object-detection-with-keypoint
Repo https://github.com/Duankaiwen/CenterNet
Framework pytorch

CatGAN: Category-aware Generative Adversarial Networks with Hierarchical Evolutionary Learning for Category Text Generation

Title CatGAN: Category-aware Generative Adversarial Networks with Hierarchical Evolutionary Learning for Category Text Generation
Authors Zhiyue Liu, Jiahai Wang, Zhiwei Liang
Abstract Generating multiple categories of texts is a challenging task and draws more and more attention. Since generative adversarial nets (GANs) have shown competitive results on general text generation, they are extended for category text generation in some previous works. However, the complicated model structures and learning strategies limit their performance and exacerbate the training instability. This paper proposes a category-aware GAN (CatGAN) which consists of an efficient category-aware model for category text generation and a hierarchical evolutionary learning algorithm for training our model. The category-aware model directly measures the gap between real samples and generated samples on each category, then reducing this gap will guide the model to generate high-quality category samples. The Gumbel-Softmax relaxation further frees our model from complicated learning strategies for updating CatGAN on discrete data. Moreover, only focusing on the sample quality normally leads the mode collapse problem, thus a hierarchical evolutionary learning algorithm is introduced to stabilize the training procedure and obtain the trade-off between quality and diversity while training CatGAN. Experimental results demonstrate that CatGAN outperforms most of the existing state-of-the-art methods.
Tasks Text Generation
Published 2019-11-15
URL https://arxiv.org/abs/1911.06641v2
PDF https://arxiv.org/pdf/1911.06641v2.pdf
PWC https://paperswithcode.com/paper/catgan-category-aware-generative-adversarial
Repo https://github.com/williamSYSU/CatGAN
Framework pytorch

Generating valid Euclidean distance matrices

Title Generating valid Euclidean distance matrices
Authors Moritz Hoffmann, Frank Noé
Abstract Generating point clouds, e.g., molecular structures, in arbitrary rotations, translations, and enumerations remains a challenging task. Meanwhile, neural networks utilizing symmetry invariant layers have been shown to be able to optimize their training objective in a data-efficient way. In this spirit, we present an architecture which allows to produce valid Euclidean distance matrices, which by construction are already invariant under rotation and translation of the described object. Motivated by the goal to generate molecular structures in Cartesian space, we use this architecture to construct a Wasserstein GAN utilizing a permutation invariant critic network. This makes it possible to generate molecular structures in a one-shot fashion by producing Euclidean distance matrices which have a three-dimensional embedding.
Tasks
Published 2019-10-07
URL https://arxiv.org/abs/1910.03131v2
PDF https://arxiv.org/pdf/1910.03131v2.pdf
PWC https://paperswithcode.com/paper/generating-valid-euclidean-distance-matrices-1
Repo https://github.com/noegroup/EDMnets
Framework tf

Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network

Title Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network
Authors Shervin Minaee, Amirali Abdolrashidi
Abstract Facial expression recognition has been an active research area over the past few decades, and it is still challenging due to the high intra-class variation. Traditional approaches for this problem rely on hand-crafted features such as SIFT, HOG and LBP, followed by a classifier trained on a database of images or videos. Most of these works perform reasonably well on datasets of images captured in a controlled condition, but fail to perform as good on more challenging datasets with more image variation and partial faces. In recent years, several works proposed an end-to-end framework for facial expression recognition, using deep learning models. Despite the better performance of these works, there still seems to be a great room for improvement. In this work, we propose a deep learning approach based on attentional convolutional network, which is able to focus on important parts of the face, and achieves significant improvement over previous models on multiple datasets, including FER-2013, CK+, FERG, and JAFFE. We also use a visualization technique which is able to find important face regions for detecting different emotions, based on the classifier’s output. Through experimental results, we show that different emotions seems to be sensitive to different parts of the face.
Tasks Facial Expression Recognition
Published 2019-02-04
URL http://arxiv.org/abs/1902.01019v1
PDF http://arxiv.org/pdf/1902.01019v1.pdf
PWC https://paperswithcode.com/paper/deep-emotion-facial-expression-recognition
Repo https://github.com/omarsayed7/Deep-Emotion
Framework pytorch

Seq2Seq RNN based Gait Anomaly Detection from Smartphone Acquired Multimodal Motion Data

Title Seq2Seq RNN based Gait Anomaly Detection from Smartphone Acquired Multimodal Motion Data
Authors Riccardo Bonetto, Mattia Soldan, Alberto Lanaro, Simone Milani, Michele Rossi
Abstract Smartphones and wearable devices are fast growing technologies that, in conjunction with advances in wireless sensor hardware, are enabling ubiquitous sensing applications. Wearables are suitable for indoor and outdoor scenarios, can be placed on many parts of the human body and can integrate a large number of sensors capable of gathering physiological and behavioral biometric information. Here, we are concerned with gait analysis systems that extract meaningful information from a user’s movements to identify anomalies and changes in their walking style. The solution that is put forward is subject-specific, as the designed feature extraction and classification tools are trained on the subject under observation. A smartphone mounted on an ad-hoc made chest support is utilized to gather inertial data and video signals from its built-in sensors and rear-facing camera. The collected video and inertial data are preprocessed, combined and then classified by means of a Recurrent Neural Network (RNN) based Sequence-to-Sequence (Seq2Seq) model, which is used as a feature extractor, and a following Convolutional Neural Network (CNN) classifier. This architecture provides excellent results, being able to correctly assess anomalies in 100% of the cases, for the considered tests, surpassing the performance of support vector machine classifiers.
Tasks Anomaly Detection
Published 2019-11-19
URL https://arxiv.org/abs/1911.08608v1
PDF https://arxiv.org/pdf/1911.08608v1.pdf
PWC https://paperswithcode.com/paper/seq2seq-rnn-based-gait-anomaly-detection-from
Repo https://github.com/Soldelli/gait_anomaly_detection
Framework tf

Importance of Copying Mechanism for News Headline Generation

Title Importance of Copying Mechanism for News Headline Generation
Authors Ilya Gusev
Abstract News headline generation is an essential problem of text summarization because it is constrained, well-defined, and is still hard to solve. Models with a limited vocabulary can not solve it well, as new named entities can appear regularly in the news and these entities often should be in the headline. News articles in morphologically rich languages such as Russian require model modifications due to a large number of possible word forms. This study aims to validate that models with a possibility of copying words from the original article performs better than models without such an option. The proposed model achieves a mean ROUGE score of 23 on the provided test dataset, which is 8 points greater than the result of a similar model without a copying mechanism. Moreover, the resulting model performs better than any known model on the new dataset of Russian news.
Tasks Text Summarization
Published 2019-04-25
URL http://arxiv.org/abs/1904.11475v1
PDF http://arxiv.org/pdf/1904.11475v1.pdf
PWC https://paperswithcode.com/paper/importance-of-copying-mechanism-for-news
Repo https://github.com/IlyaGusev/summarus
Framework pytorch

Exploring the Limitations of Behavior Cloning for Autonomous Driving

Title Exploring the Limitations of Behavior Cloning for Autonomous Driving
Authors Felipe Codevilla, Eder Santana, Antonio M. López, Adrien Gaidon
Abstract Driving requires reacting to a wide variety of complex environment conditions and agent behaviors. Explicitly modeling each possible scenario is unrealistic. In contrast, imitation learning can, in theory, leverage data from large fleets of human-driven cars. Behavior cloning in particular has been successfully used to learn simple visuomotor policies end-to-end, but scaling to the full spectrum of driving behaviors remains an unsolved problem. In this paper, we propose a new benchmark to experimentally investigate the scalability and limitations of behavior cloning. We show that behavior cloning leads to state-of-the-art results, including in unseen environments, executing complex lateral and longitudinal maneuvers without these reactions being explicitly programmed. However, we confirm well-known limitations (due to dataset bias and overfitting), new generalization issues (due to dynamic objects and the lack of a causal model), and training instability requiring further research before behavior cloning can graduate to real-world driving. The code of the studied behavior cloning approaches can be found at https://github.com/felipecode/coiltraine .
Tasks Autonomous Driving, Imitation Learning
Published 2019-04-18
URL http://arxiv.org/abs/1904.08980v1
PDF http://arxiv.org/pdf/1904.08980v1.pdf
PWC https://paperswithcode.com/paper/exploring-the-limitations-of-behavior-cloning
Repo https://github.com/felipecode/coiltraine
Framework none
comments powered by Disqus