February 1, 2020

3490 words 17 mins read

Paper Group AWR 269

Deep Learning Moment Closure Approximations using Dynamic Boltzmann Distributions. Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives. DoveNet: Deep Image Harmonization via Domain Verification. Non-Causal Tracking by Deblatting. Transfer learning from language models to image caption generators: Better models ma …

Deep Learning Moment Closure Approximations using Dynamic Boltzmann Distributions


Title	Deep Learning Moment Closure Approximations using Dynamic Boltzmann Distributions
Authors	Oliver K. Ernst, Tom Bartol, Terrence Sejnowski, Eric Mjolsness
Abstract	The moments of spatial probabilistic systems are often given by an infinite hierarchy of coupled differential equations. Moment closure methods are used to approximate a subset of low order moments by terminating the hierarchy at some order and replacing higher order terms with functions of lower order ones. For a given system, it is not known beforehand which closure approximation is optimal, i.e. which higher order terms are relevant in the current regime. Further, the generalization of such approximations is typically poor, as higher order corrections may become relevant over long timescales. We have developed a method to learn moment closure approximations directly from data using dynamic Boltzmann distributions (DBDs). The dynamics of the distribution are parameterized using basis functions from finite element methods, such that the approach can be applied without knowing the true dynamics of the system under consideration. We use the hierarchical architecture of deep Boltzmann machines (DBMs) with multinomial latent variables to learn closure approximations for progressively higher order spatial correlations. The learning algorithm uses a centering transformation, allowing the dynamic DBM to be trained without the need for pre-training. We demonstrate the method for a Lotka-Volterra system on a lattice, a typical example in spatial chemical reaction networks. The approach can be applied broadly to learn deep generative models in applications where infinite systems of differential equations arise.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12122v1
PDF	https://arxiv.org/pdf/1905.12122v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-moment-closure-approximations
Repo	https://github.com/smrfeld/ReducedLotkaVolterra
Framework	none

Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives


Title	Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
Authors	Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey Levine, Yoshua Bengio
Abstract	Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavior. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decompose a policy into lower-level primitives or options, and a higher-level meta-policy that triggers the appropriate behaviors for a given situation. However, the meta-policy must still produce appropriate decisions in all states. In this work, we propose a policy design that decomposes into primitives, similarly to hierarchical reinforcement learning, but without a high-level meta-policy. Instead, each primitive can decide for themselves whether they wish to act in the current state. We use an information-theoretic mechanism for enabling this decentralized decision: each primitive chooses how much information it needs about the current state to make a decision and the primitive that requests the most information about the current state acts in the world. The primitives are regularized to use as little information as possible, which leads to natural competition and specialization. We experimentally demonstrate that this policy architecture improves over both flat and hierarchical policies in terms of generalization.
Tasks	Hierarchical Reinforcement Learning
Published	2019-06-25
URL	https://arxiv.org/abs/1906.10667v1
PDF	https://arxiv.org/pdf/1906.10667v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-with-competitive
Repo	https://github.com/maximecb/gym-minigrid
Framework	pytorch

DoveNet: Deep Image Harmonization via Domain Verification


Title	DoveNet: Deep Image Harmonization via Domain Verification
Authors	Wenyan Cong, Jianfu Zhang, Li Niu, Liu Liu, Zhixin Ling, Weiyuan Li, Liqing Zhang
Abstract	Image composition is an important operation in image processing, but the inconsistency between foreground and background significantly degrades the quality of composite image. Image harmonization, aiming to make the foreground compatible with the background, is a promising yet challenging task. However, the lack of high-quality publicly available dataset for image harmonization greatly hinders the development of image harmonization techniques. In this work, we contribute an image harmonization dataset iHarmony4 by generating synthesized composite images based on COCO (resp., Adobe5k, Flickr, day2night) dataset, leading to our HCOCO (resp., HAdobe5k, HFlickr, Hday2night) sub-dataset. Moreover, we propose a new deep image harmonization method DoveNet using a novel domain verification discriminator, with the insight that the foreground needs to be translated to the same domain as background. Extensive experiments on our constructed dataset demonstrate the effectiveness of our proposed method. Our dataset and code are available at https://github.com/bcmi/Image_Harmonization_Datasets.
Tasks
Published	2019-11-27
URL	https://arxiv.org/abs/1911.13239v2
PDF	https://arxiv.org/pdf/1911.13239v2.pdf
PWC	https://paperswithcode.com/paper/deep-image-harmonization-via-domain
Repo	https://github.com/bcmi/Image_Harmonization_Datasets
Framework	tf

Non-Causal Tracking by Deblatting


Title	Non-Causal Tracking by Deblatting
Authors	Denys Rozumnyi, Jan Kotera, Filip Šroubek, Jiří Matas
Abstract	Tracking by Deblatting stands for solving an inverse problem of deblurring and image matting for tracking motion-blurred objects. We propose non-causal Tracking by Deblatting which estimates continuous, complete and accurate object trajectories. Energy minimization by dynamic programming is used to detect abrupt changes of motion, called bounces. High-order polynomials are fitted to segments, which are parts of the trajectory separated by bounces. The output is a continuous trajectory function which assigns location for every real-valued time stamp from zero to the number of frames. Additionally, we show that from the trajectory function precise physical calculations are possible, such as radius, gravity or sub-frame object velocity. Velocity estimation is compared to the high-speed camera measurements and radars. Results show high performance of the proposed method in terms of Trajectory-IoU, recall and velocity estimation.
Tasks	Deblurring, Image Matting
Published	2019-09-15
URL	https://arxiv.org/abs/1909.06894v1
PDF	https://arxiv.org/pdf/1909.06894v1.pdf
PWC	https://paperswithcode.com/paper/non-causal-tracking-by-deblatting
Repo	https://github.com/rozumden/tbd
Framework	none

Transfer learning from language models to image caption generators: Better models may not transfer better


Title	Transfer learning from language models to image caption generators: Better models may not transfer better
Authors	Marc Tanti, Albert Gatt, Kenneth P. Camilleri
Abstract	When designing a neural caption generator, a convolutional neural network can be used to extract image features. Is it possible to also use a neural language model to extract sentence prefix features? We answer this question by trying different ways to transfer the recurrent neural network and embedding layer from a neural language model to an image caption generator. We find that image caption generators with transferred parameters perform better than those trained from scratch, even when simply pre-training them on the text of the same captions dataset it will later be trained on. We also find that the best language models (in terms of perplexity) do not result in the best caption generators after transfer learning.
Tasks	Language Modelling, Transfer Learning
Published	2019-01-01
URL	http://arxiv.org/abs/1901.01216v1
PDF	http://arxiv.org/pdf/1901.01216v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-from-language-models-to
Repo	https://github.com/zhouheng2018/Image-Captioning-Papers
Framework	tf

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention


Title	Neural Simile Recognition with Cyclic Multitask Learning and Local Attention
Authors	Jiali Zeng, Linfeng Song, Jinsong Su, Jun Xie, Wei Song, Jiebo Luo
Abstract	Simile recognition is to detect simile sentences and to extract simile components, i.e., tenors and vehicles. It involves two subtasks: {\it simile sentence classification} and {\it simile component extraction}. Recent work has shown that standard multitask learning is effective for Chinese simile recognition, but it is still uncertain whether the mutual effects between the subtasks have been well captured by simple parameter sharing. We propose a novel cyclic multitask learning framework for neural simile recognition, which stacks the subtasks and makes them into a loop by connecting the last to the first. It iteratively performs each subtask, taking the outputs of the previous subtask as additional inputs to the current one, so that the interdependence between the subtasks can be better explored. Extensive experiments show that our framework significantly outperforms the current state-of-the-art model and our carefully designed baselines, and the gains are still remarkable using BERT.
Tasks	Sentence Classification
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09084v1
PDF	https://arxiv.org/pdf/1912.09084v1.pdf
PWC	https://paperswithcode.com/paper/neural-simile-recognition-with-cyclic
Repo	https://github.com/DeepLearnXMU/Cyclic
Framework	pytorch

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension


Title	BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Authors	Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
Abstract	We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also report ablation experiments that replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance.
Tasks	Denoising, Machine Translation, Natural Language Inference, Question Answering, Text Generation
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13461v1
PDF	https://arxiv.org/pdf/1910.13461v1.pdf
PWC	https://paperswithcode.com/paper/bart-denoising-sequence-to-sequence-pre
Repo	https://github.com/huggingface/transformers
Framework	pytorch

Effective and efficient ROI-wise visual encoding using an end-to-end CNN regression model and selective optimization


Title	Effective and efficient ROI-wise visual encoding using an end-to-end CNN regression model and selective optimization
Authors	Kai Qiao, Chi Zhang, Jian Chen, Linyuan Wang, Li Tong, Bin Yan
Abstract	Recently, visual encoding based on functional magnetic resonance imaging (fMRI) have realized many achievements with the rapid development of deep network computation. Visual encoding model is aimed at predicting brain activity in response to presented image stimuli. Currently, visual encoding is accomplished mainly by firstly extracting image features through convolutional neural network (CNN) model pre-trained on computer vision task, and secondly training a linear regression model to map specific layer of CNN features to each voxel, namely voxel-wise encoding. However, the two-step manner model, essentially, is hard to determine which kind of well features are well linearly matched for beforehand unknown fMRI data with little understanding of human visual representation. Analogizing computer vision mostly related human vision, we proposed the end-to-end convolution regression model (ETECRM) in the region of interest (ROI)-wise manner to accomplish effective and efficient visual encoding. The end-to-end manner was introduced to make the model automatically learn better matching features to improve encoding performance. The ROI-wise manner was used to improve the encoding efficiency for many voxels. In addition, we designed the selective optimization including self-adapting weight learning and weighted correlation loss, noise regularization to avoid interfering of ineffective voxels in ROI-wise encoding. Experiment demonstrated that the proposed model obtained better predicting accuracy than the two-step manner of encoding models. Comparative analysis implied that end-to-end manner and large volume of fMRI data may drive the future development of visual encoding.
Tasks
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11885v1
PDF	https://arxiv.org/pdf/1907.11885v1.pdf
PWC	https://paperswithcode.com/paper/effective-and-efficient-roi-wise-visual
Repo	https://github.com/KaiQiao1992/ETECRM
Framework	pytorch

Visualizing Deep Networks by Optimizing with Integrated Gradients


Title	Visualizing Deep Networks by Optimizing with Integrated Gradients
Authors	Zhongang Qi, Saeed Khorram, Li Fuxin
Abstract	Understanding and interpreting the decisions made by deep learning models is valuable in many domains. In computer vision, computing heatmaps from a deep network is a popular approach for visualizing and understanding deep networks. However, heatmaps that do not correlate with the network may mislead human, hence the performance of heatmaps in providing a faithful explanation to the underlying deep network is crucial. In this paper, we propose I-GOS, which optimizes for a heatmap so that the classification scores on the masked image would maximally decrease. The main novelty of the approach is to compute descent directions based on the integrated gradients instead of the normal gradient, which avoids local optima and speeds up convergence. Compared with previous approaches, our method can flexibly compute heatmaps at any resolution for different user needs. Extensive experiments on several benchmark datasets show that the heatmaps produced by our approach are more correlated with the decision of the underlying deep network, in comparison with other state-of-the-art approaches.
Tasks
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00954v1
PDF	https://arxiv.org/pdf/1905.00954v1.pdf
PWC	https://paperswithcode.com/paper/visualizing-deep-networks-by-optimizing-with
Repo	https://github.com/zhongangqi/IGOS
Framework	pytorch

Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction


Title	Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction
Authors	Tapas Nayak, Hwee Tou Ng
Abstract	A relation tuple consists of two entities and the relation between them, and often such tuples are found in unstructured text. There may be multiple relation tuples present in a text and they may share one or both entities among them. Extracting such relation tuples from a sentence is a difficult task and sharing of entities or overlapping entities among the tuples makes it more challenging. Most prior work adopted a pipeline approach where entities were identified first followed by finding the relations among them, thus missing the interaction among the relation tuples in a sentence. In this paper, we propose two approaches to use encoder-decoder architecture for jointly extracting entities and relations. In the first approach, we propose a representation scheme for relation tuples which enables the decoder to generate one word at a time like machine translation models and still finds all the tuples present in a sentence with full entity names of different length and with overlapping entities. Next, we propose a pointer network-based decoding approach where an entire tuple is generated at every time step. Experiments on the publicly available New York Times corpus show that our proposed approaches outperform previous work and achieve significantly higher F1 scores.
Tasks	Joint Entity and Relation Extraction, Machine Translation, Relation Extraction
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09886v1
PDF	https://arxiv.org/pdf/1911.09886v1.pdf
PWC	https://paperswithcode.com/paper/effective-modeling-of-encoder-decoder
Repo	https://github.com/nusnlp/PtrNetDecoding4JERE
Framework	pytorch

PCRNet: Point Cloud Registration Network using PointNet Encoding


Title	PCRNet: Point Cloud Registration Network using PointNet Encoding
Authors	Vinit Sarode, Xueqian Li, Hunter Goforth, Yasuhiro Aoki, Rangaprasad Arun Srivatsan, Simon Lucey, Howie Choset
Abstract	PointNet has recently emerged as a popular representation for unstructured point cloud data, allowing application of deep learning to tasks such as object detection, segmentation and shape completion. However, recent works in literature have shown the sensitivity of the PointNet representation to pose misalignment. This paper presents a novel framework that uses the PointNet representation to align point clouds and perform registration for applications such as tracking, 3D reconstruction and pose estimation. We develop a framework that compares PointNet features of template and source point clouds to find the transformation that aligns them accurately. Depending on the prior information about the shape of the object formed by the point clouds, our framework can produce approaches that are shape specific or general to unseen shapes. The shape specific approach uses a Siamese architecture with fully connected (FC) layers and is robust to noise and initial misalignment in data. We perform extensive simulation and real-world experiments to validate the efficacy of our approach and compare the performance with state-of-art approaches.
Tasks	3D Reconstruction, Object Detection, Point Cloud Registration, Pose Estimation
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07906v2
PDF	https://arxiv.org/pdf/1908.07906v2.pdf
PWC	https://paperswithcode.com/paper/190807906
Repo	https://github.com/vinits5/pcrnet_pytorch
Framework	pytorch

Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture from Images “In the Wild”


Title	Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture from Images “In the Wild”
Authors	Silvia Zuffi, Angjoo Kanazawa, Tanya Berger-Wolf, Michael J. Black
Abstract	We present the first method to perform automatic 3D pose, shape and texture capture of animals from images acquired in-the-wild. In particular, we focus on the problem of capturing 3D information about Grevy’s zebras from a collection of images. The Grevy’s zebra is one of the most endangered species in Africa, with only a few thousand individuals left. Capturing the shape and pose of these animals can provide biologists and conservationists with information about animal health and behavior. In contrast to research on human pose, shape and texture estimation, training data for endangered species is limited, the animals are in complex natural scenes with occlusion, they are naturally camouflaged, travel in herds, and look similar to each other. To overcome these challenges, we integrate the recent SMAL animal model into a network-based regression pipeline, which we train end-to-end on synthetically generated images with pose, shape, and background variation. Going beyond state-of-the-art methods for human shape and pose estimation, our method learns a shape space for zebras during training. Learning such a shape space from images using only a photometric loss is novel, and the approach can be used to learn shape in other settings with limited 3D supervision. Moreover, we couple 3D pose and shape prediction with the task of texture synthesis, obtaining a full texture map of the animal from a single image. We show that the predicted texture map allows a novel per-instance unsupervised optimization over the network features. This method, SMALST (SMAL with learned Shape and Texture) goes beyond previous work, which assumed manual keypoints and/or segmentation, to regress directly from pixels to 3D animal shape, pose and texture. Code and data are available at https://github.com/silviazuffi/smalst.
Tasks	Pose Estimation, Texture Synthesis
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07201v2
PDF	https://arxiv.org/pdf/1908.07201v2.pdf
PWC	https://paperswithcode.com/paper/three-d-safari-learning-to-estimate-zebra
Repo	https://github.com/silviazuffi/smalst
Framework	pytorch

Segmentation-Aware Image Denoising without Knowing True Segmentation


Title	Segmentation-Aware Image Denoising without Knowing True Segmentation
Authors	Sicheng Wang, Bihan Wen, Junru Wu, Dacheng Tao, Zhangyang Wang
Abstract	Several recent works discussed application-driven image restoration neural networks, which are capable of not only removing noise in images but also preserving their semantic-aware details, making them suitable for various high-level computer vision tasks as the pre-processing step. However, such approaches require extra annotations for their high-level vision tasks, in order to train the joint pipeline using hybrid losses. The availability of those annotations is yet often limited to a few image sets, potentially restricting the general applicability of these methods to denoising more unseen and unannotated images. Motivated by that, we propose a segmentation-aware image denoising model dubbed U-SAID, based on a novel unsupervised approach with a pixel-wise uncertainty loss. U-SAID does not need any ground-truth segmentation map, and thus can be applied to any image dataset. It generates denoised images with comparable or even better quality, and the denoised results show stronger robustness for subsequent semantic segmentation tasks, when compared to either its supervised counterpart or classical “application-agnostic” denoisers. Moreover, we demonstrate the superior generalizability of U-SAID in three-folds, by plugging its “universal” denoiser without fine-tuning: (1) denoising unseen types of images; (2) denoising as pre-processing for segmenting unseen noisy images; and (3) denoising for unseen high-level tasks. Extensive experiments demonstrate the effectiveness, robustness and generalizability of the proposed U-SAID over various popular image sets.
Tasks	Denoising, Image Denoising, Image Restoration, Semantic Segmentation
Published	2019-05-22
URL	https://arxiv.org/abs/1905.08965v1
PDF	https://arxiv.org/pdf/1905.08965v1.pdf
PWC	https://paperswithcode.com/paper/segmentation-aware-image-denoising-without
Repo	https://github.com/TAMU-VITA/USAID
Framework	pytorch

Few-shot Video-to-Video Synthesis


Title	Few-shot Video-to-Video Synthesis
Authors	Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro
Abstract	Video-to-video synthesis (vid2vid) aims at converting an input semantic video, such as videos of human poses or segmentation masks, to an output photorealistic video. While the state-of-the-art of vid2vid has advanced significantly, existing approaches share two major limitations. First, they are data-hungry. Numerous images of a target human subject or a scene are required for training. Second, a learned model has limited generalization capability. A pose-to-human vid2vid model can only synthesize poses of the single person in the training set. It does not generalize to other humans that are not in the training set. To address the limitations, we propose a few-shot vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time. Our model achieves this few-shot generalization capability via a novel network weight generation module utilizing an attention mechanism. We conduct extensive experimental validations with comparisons to strong baselines using several large-scale video datasets including human-dancing videos, talking-head videos, and street-scene videos. The experimental results verify the effectiveness of the proposed framework in addressing the two limitations of existing vid2vid approaches.
Tasks	Video-to-Video Synthesis
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12713v1
PDF	https://arxiv.org/pdf/1910.12713v1.pdf
PWC	https://paperswithcode.com/paper/few-shot-video-to-video-synthesis
Repo	https://github.com/ShanghaiTechCVDL/Weekly_Group_Meeting_Paper_List
Framework	none

Assessing Knee OA Severity with CNN attention-based end-to-end architectures


Title	Assessing Knee OA Severity with CNN attention-based end-to-end architectures
Authors	Marc Górriz, Joseph Antony, Kevin McGuinness, Xavier Giró-i-Nieto, Noel E. O’Connor
Abstract	This work proposes a novel end-to-end convolutional neural network (CNN) architecture to automatically quantify the severity of knee osteoarthritis (OA) using X-Ray images, which incorporates trainable attention modules acting as unsupervised fine-grained detectors of the region of interest (ROI). The proposed attention modules can be applied at different levels and scales across any CNN pipeline helping the network to learn relevant attention patterns over the most informative parts of the image at different resolutions. We test the proposed attention mechanism on existing state-of-the-art CNN architectures as our base models, achieving promising results on the benchmark knee OA datasets from the osteoarthritis initiative (OAI) and multicenter osteoarthritis study (MOST). All code from our experiments will be publicly available on the github repository: https://github.com/marc-gorriz/KneeOA-CNNAttention
Tasks
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08856v1
PDF	https://arxiv.org/pdf/1908.08856v1.pdf
PWC	https://paperswithcode.com/paper/assessing-knee-oa-severity-with-cnn-attention
Repo	https://github.com/marc-gorriz/KneeOA-CNNAttention
Framework	tf