October 17, 2019

2860 words 14 mins read

Paper Group ANR 913

Incorporating Structured Commonsense Knowledge in Story Completion. Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input. Accurate 3-D Reconstruction with RGB-D Cameras using Depth Map Fusion and Pose Refinement. Adversarial Example Decomposition. Uncertainty Quantification in CNN-Based Surface Prediction Using Shap …

Incorporating Structured Commonsense Knowledge in Story Completion


Title	Incorporating Structured Commonsense Knowledge in Story Completion
Authors	Jiaao Chen, Jianshu Chen, Zhou Yu
Abstract	The ability to select an appropriate story ending is the first step towards perfect narrative comprehension. Story ending prediction requires not only the explicit clues within the context, but also the implicit knowledge (such as commonsense) to construct a reasonable and consistent story. However, most previous approaches do not explicitly use background commonsense knowledge. We present a neural story ending selection model that integrates three types of information: narrative sequence, sentiment evolution and commonsense knowledge. Experiments show that our model outperforms state-of-the-art approaches on a public dataset, ROCStory Cloze Task , and the performance gain from adding the additional commonsense knowledge is significant.
Tasks	Story Completion
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00625v1
PDF	http://arxiv.org/pdf/1811.00625v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-structured-commonsense
Repo
Framework

Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input


Title	Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input
Authors	Youmna Farag, Helen Yannakoudakis, Ted Briscoe
Abstract	We demonstrate that current state-of-the-art approaches to Automated Essay Scoring (AES) are not well-suited to capturing adversarially crafted input of grammatical but incoherent sequences of sentences. We develop a neural model of local coherence that can effectively learn connectedness features between sentences, and propose a framework for integrating and jointly training the local coherence model with a state-of-the-art AES model. We evaluate our approach against a number of baselines and experimentally demonstrate its effectiveness on both the AES task and the task of flagging adversarial input, further contributing to the development of an approach that strengthens the validity of neural essay scoring models.
Tasks
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06898v2
PDF	http://arxiv.org/pdf/1804.06898v2.pdf
PWC	https://paperswithcode.com/paper/neural-automated-essay-scoring-and-coherence
Repo
Framework


Title	Accurate 3-D Reconstruction with RGB-D Cameras using Depth Map Fusion and Pose Refinement
Authors	Markus Ylimäki, Juho Kannala, Janne Heikkilä
Abstract	Depth map fusion is an essential part in both stereo and RGB-D based 3-D reconstruction pipelines. Whether produced with a passive stereo reconstruction or using an active depth sensor, such as Microsoft Kinect, the depth maps have noise and may have poor initial registration. In this paper, we introduce a method which is capable of handling outliers, and especially, even significant registration errors. The proposed method first fuses a sequence of depth maps into a single non-redundant point cloud so that the redundant points are merged together by giving more weight to more certain measurements. Then, the original depth maps are re-registered to the fused point cloud to refine the original camera extrinsic parameters. The fusion is then performed again with the refined extrinsic parameters. This procedure is repeated until the result is satisfying or no significant changes happen between iterations. The method is robust to outliers and erroneous depth measurements as well as even significant depth map registration errors due to inaccurate initial camera poses.
Tasks
Published	2018-04-24
URL	http://arxiv.org/abs/1804.08912v1
PDF	http://arxiv.org/pdf/1804.08912v1.pdf
PWC	https://paperswithcode.com/paper/accurate-3-d-reconstruction-with-rgb-d
Repo
Framework

Adversarial Example Decomposition


Title	Adversarial Example Decomposition
Authors	Horace He, Aaron Lou, Qingxuan Jiang, Isay Katsman, Serge Belongie, Ser-Nam Lim
Abstract	Research has shown that widely used deep neural networks are vulnerable to carefully crafted adversarial perturbations. Moreover, these adversarial perturbations often transfer across models. We hypothesize that adversarial weakness is composed of three sources of bias: architecture, dataset, and random initialization. We show that one can decompose adversarial examples into an architecture-dependent component, data-dependent component, and noise-dependent component and that these components behave intuitively. For example, noise-dependent components transfer poorly to all other models, while architecture-dependent components transfer better to retrained models with the same architecture. In addition, we demonstrate that these components can be recombined to improve transferability without sacrificing efficacy on the original model.
Tasks
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01198v2
PDF	https://arxiv.org/pdf/1812.01198v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-example-decomposition
Repo
Framework

Uncertainty Quantification in CNN-Based Surface Prediction Using Shape Priors


Title	Uncertainty Quantification in CNN-Based Surface Prediction Using Shape Priors
Authors	Katarína Tóthová, Sarah Parisot, Matthew C. H. Lee, Esther Puyol-Antón, Lisa M. Koch, Andrew P. King, Ender Konukoglu, Marc Pollefeys
Abstract	Surface reconstruction is a vital tool in a wide range of areas of medical image analysis and clinical research. Despite the fact that many methods have proposed solutions to the reconstruction problem, most, due to their deterministic nature, do not directly address the issue of quantifying uncertainty associated with their predictions. We remedy this by proposing a novel probabilistic deep learning approach capable of simultaneous surface reconstruction and associated uncertainty prediction. The method incorporates prior shape information in the form of a principal component analysis (PCA) model. Experiments using the UK Biobank data show that our probabilistic approach outperforms an analogous deterministic PCA-based method in the task of 2D organ delineation and quantifies uncertainty by formulating distributions over predicted surface vertex positions.
Tasks
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11272v1
PDF	http://arxiv.org/pdf/1807.11272v1.pdf
PWC	https://paperswithcode.com/paper/uncertainty-quantification-in-cnn-based
Repo
Framework

Paraphrasing Complex Network: Network Compression via Factor Transfer


Title	Paraphrasing Complex Network: Network Compression via Factor Transfer
Authors	Jangho Kim, SeoungUK Park, Nojun Kwak
Abstract	Deep neural networks (DNN) have recently shown promising performances in various areas. Although DNNs are very powerful, a large number of network parameters requires substantial storage and memory bandwidth which hinders them from being applied to actual embedded systems. Many researchers have sought ways of model compression to reduce the size of a network with minimal performance degradation. Among them, a method called knowledge transfer is to train the student network with a stronger teacher network. In this paper, we propose a method to overcome the limitations of conventional knowledge transfer methods and improve the performance of a student network. An auto-encoder is used in an unsupervised manner to extract compact factors which are defined as compressed feature maps of the teacher network. When using the factors to train the student network, we observed that the performance of the student network becomes better than the ones with other conventional knowledge transfer methods because factors contain paraphrased compact information of the teacher network that is easy for the student network to understand.
Tasks	Model Compression, Transfer Learning
Published	2018-02-14
URL	http://arxiv.org/abs/1802.04977v2
PDF	http://arxiv.org/pdf/1802.04977v2.pdf
PWC	https://paperswithcode.com/paper/paraphrasing-complex-network-network
Repo
Framework

Sequential Attention GAN for Interactive Image Editing via Dialogue


Title	Sequential Attention GAN for Interactive Image Editing via Dialogue
Authors	Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, Jianfeng Gao
Abstract	We introduce a new task - Interactive Image Editing via conversational language, where users can guide an agent to edit images via multi-turn dialogue. In each dialogue turn, the agent takes a source image and a natural language description as the input, and generates a modified image following the textual description. Two new datasets are introduced for this task (Zap-Seq and DeepFashion-Seq), which contain multi-turn dialog sessions with crowdsourced image-description sequences. The main challenges in this sequential and interactive image generation task are two-fold: 1) contextual consistency between a generated image and the given textual description; 2) step-by-step region-level modification to maintain visual consistency across the image sequence. To address these challenges, we propose a novel Sequential Attention Generative Adversarial Network (SeqAttnGAN) framework, which applies a neural state tracker to encode the previous image and the textual description in each dialogue turn, and uses a GAN framework to generate a modified version of the image that is consistent with the dialogue context and preceding images. To achieve better region-specific refinement, we also introduce a sequential attention mechanism into the model. Experiments on Zap-Seq and DeepFashion-Seq datasets show that the proposed SeqAttnGAN model outperforms state-of-the-art approaches on the interactive image editing task across all evaluation metrics on visual quality, image sequence coherence and text-image consistency.
Tasks	Image Generation, Text-to-Image Generation
Published	2018-12-20
URL	https://arxiv.org/abs/1812.08352v3
PDF	https://arxiv.org/pdf/1812.08352v3.pdf
PWC	https://paperswithcode.com/paper/sequential-attention-gan-for-interactive
Repo
Framework

Dissimilarity Coefficient based Weakly Supervised Object Detection


Title	Dissimilarity Coefficient based Weakly Supervised Object Detection
Authors	Aditya Arun, C. V. Jawahar, M. Pawan Kumar
Abstract	We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category. In order to model the uncertainty in the location of the objects, we employ a dissimilarity coefficient based probabilistic learning objective. The learning objective minimizes the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution. The main computational challenge is the complex nature of the conditional distribution, which consists of terms over hundreds or thousands of variables. The complexity of the conditional distribution rules out the possibility of explicitly modeling it. Instead, we exploit the fact that deep learning frameworks rely on stochastic optimization. This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution. Extensive experiments on PASCAL VOC 2007 and 2012 data sets demonstrate the efficacy of our proposed approach.
Tasks	Object Detection, Stochastic Optimization, Weakly Supervised Object Detection
Published	2018-11-25
URL	http://arxiv.org/abs/1811.10016v1
PDF	http://arxiv.org/pdf/1811.10016v1.pdf
PWC	https://paperswithcode.com/paper/dissimilarity-coefficient-based-weakly
Repo
Framework

On Computation and Generalization of GANs with Spectrum Control


Title	On Computation and Generalization of GANs with Spectrum Control
Authors	Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao
Abstract	Generative Adversarial Networks (GANs), though powerful, is hard to train. Several recent works (brock2016neural,miyato2018spectral) suggest that controlling the spectra of weight matrices in the discriminator can significantly improve the training of GANs. Motivated by their discovery, we propose a new framework for training GANs, which allows more flexible spectrum control (e.g., making the weight matrices of the discriminator have slow singular value decays). Specifically, we propose a new reparameterization approach for the weight matrices of the discriminator in GANs, which allows us to directly manipulate the spectra of the weight matrices through various regularizers and constraints, without intensively computing singular value decompositions. Theoretically, we further show that the spectrum control improves the generalization ability of GANs. Our experiments on CIFAR-10, STL-10, and ImageNet datasets confirm that compared to other methods, our proposed method is capable of generating images with competitive quality by utilizing spectral normalization and encouraging the slow singular value decay.
Tasks
Published	2018-12-28
URL	http://arxiv.org/abs/1812.10912v2
PDF	http://arxiv.org/pdf/1812.10912v2.pdf
PWC	https://paperswithcode.com/paper/on-computation-and-generalization-of-gans
Repo
Framework

Handwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder Network


Title	Handwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder Network
Authors	Ayan Kumar Bhunia, Abir Bhowmick, Ankan Kumar Bhunia, Aishik Konwer, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal
Abstract	In this paper, we introduce a novel technique to recover the pen trajectory of offline characters which is a crucial step for handwritten character recognition. Generally, online acquisition approach has more advantage than its offline counterpart as the online technique keeps track of the pen movement. Hence, pen tip trajectory retrieval from offline text can bridge the gap between online and offline methods. Our proposed framework employs sequence to sequence model which consists of an encoder-decoder LSTM module. Our encoder module consists of Convolutional LSTM network, which takes an offline character image as the input and encodes the feature sequence to a hidden representation. The output of the encoder is fed to a decoder LSTM and we get the successive coordinate points from every time step of the decoder LSTM. Although the sequence to sequence model is a popular paradigm in various computer vision and language translation tasks, the main contribution of our work lies in designing an end-to-end network for a decade old popular problem in Document Image Analysis community. Tamil, Telugu and Devanagari characters of LIPI Toolkit dataset are used for our experiments. Our proposed method has achieved superior performance compared to the other conventional approaches.
Tasks
Published	2018-01-22
URL	http://arxiv.org/abs/1801.07211v4
PDF	http://arxiv.org/pdf/1801.07211v4.pdf
PWC	https://paperswithcode.com/paper/handwriting-trajectory-recovery-using-end-to
Repo
Framework


Title	Gated Feedback Refinement Network for Coarse-to-Fine Dense Semantic Image Labeling
Authors	Md Amirul Islam, Mrigank Rochan, Shujon Naha, Neil D. B. Bruce, Yang Wang
Abstract	Effective integration of local and global contextual information is crucial for semantic segmentation and dense image labeling. We develop two encoder-decoder based deep learning architectures to address this problem. We first propose a network architecture called Label Refinement Network (LRN) that predicts segmentation labels in a coarse-to-fine fashion at several spatial resolutions. In this network, we also define loss functions at several stages to provide supervision at different stages of training. However, there are limits to the quality of refinement possible if ambiguous information is passed forward. In order to address this issue, we also propose Gated Feedback Refinement Network (G-FRNet) that addresses this limitation. Initially, G-FRNet makes a coarse-grained prediction which it progressively refines to recover details by effectively integrating local and global contextual information during the refinement stages. This is achieved by gate units proposed in this work, that control information passed forward in order to resolve the ambiguity. Experiments were conducted on four challenging dense labeling datasets (CamVid, PASCAL VOC 2012, Horse-Cow Parsing, PASCAL-Person-Part, and SUN-RGBD). G-FRNet achieves state-of-the-art semantic segmentation results on the CamVid and Horse-Cow Parsing datasets and produces results competitive with the best performing approaches that appear in the literature for the other three datasets.
Tasks	Semantic Segmentation
Published	2018-06-29
URL	http://arxiv.org/abs/1806.11266v1
PDF	http://arxiv.org/pdf/1806.11266v1.pdf
PWC	https://paperswithcode.com/paper/gated-feedback-refinement-network-for-coarse
Repo
Framework

On Deep Ensemble Learning from a Function Approximation Perspective


Title	On Deep Ensemble Learning from a Function Approximation Perspective
Authors	Jiawei Zhang, Limeng Cui, Fisher B. Gouza
Abstract	In this paper, we propose to provide a general ensemble learning framework based on deep learning models. Given a group of unit models, the proposed deep ensemble learning framework will effectively combine their learning results via a multilayered ensemble model. In the case when the unit model mathematical mappings are bounded, sigmoidal and discriminatory, we demonstrate that the deep ensemble learning framework can achieve a universal approximation of any functions from the input space to the output space. Meanwhile, to achieve such a performance, the deep ensemble learning framework also impose a strict constraint on the number of involved unit models. According to the theoretic proof provided in this paper, given the input feature space of dimension d, the required unit model number will be 2d, if the ensemble model involves one single layer. Furthermore, as the ensemble component goes deeper, the number of required unit model is proved to be lowered down exponentially.
Tasks
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07502v1
PDF	http://arxiv.org/pdf/1805.07502v1.pdf
PWC	https://paperswithcode.com/paper/on-deep-ensemble-learning-from-a-function
Repo
Framework

CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas


Title	CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas
Authors	Amanpreet Singh, Sharan Agrawal
Abstract	We propose a new recurrent generative model for generating images from text captions while attending on specific parts of text captions. Our model creates images by incrementally adding patches on a “canvas” while attending on words from text caption at each timestep. Finally, the canvas is passed through an upscaling network to generate images. We also introduce a new method for generating visual-semantic sentence embeddings based on self-attention over text. We compare our model’s generated images with those generated Reed et. al.‘s model and show that our model is a stronger baseline for text to image generation tasks.
Tasks	Image Generation, Sentence Embeddings, Text-to-Image Generation
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02833v1
PDF	http://arxiv.org/pdf/1810.02833v1.pdf
PWC	https://paperswithcode.com/paper/canvasgan-a-simple-baseline-for-text-to-image
Repo
Framework

Semantically Invariant Text-to-Image Generation


Title	Semantically Invariant Text-to-Image Generation
Authors	Shagan Sah, Dheeraj Peri, Ameya Shringi, Chi Zhang, Miguel Dominguez, Andreas Savakis, Ray Ptucha
Abstract	Image captioning has demonstrated models that are capable of generating plausible text given input images or videos. Further, recent work in image generation has shown significant improvements in image quality when text is used as a prior. Our work ties these concepts together by creating an architecture that can enable bidirectional generation of images and text. We call this network Multi-Modal Vector Representation (MMVR). Along with MMVR, we propose two improvements to the text conditioned image generation. Firstly, a n-gram metric based cost function is introduced that generalizes the caption with respect to the image. Secondly, multiple semantically similar sentences are shown to help in generating better images. Qualitative and quantitative evaluations demonstrate that MMVR improves upon existing text conditioned image generation results by over 20%, while integrating visual and text modalities.
Tasks	Image Captioning, Image Generation, Text-to-Image Generation
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10274v1
PDF	http://arxiv.org/pdf/1809.10274v1.pdf
PWC	https://paperswithcode.com/paper/semantically-invariant-text-to-image
Repo
Framework

Formal Ways for Measuring Relations between Concepts in Conceptual Spaces


Title	Formal Ways for Measuring Relations between Concepts in Conceptual Spaces
Authors	Lucas Bechberger, Kai-Uwe Kühnberger
Abstract	The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a high-dimensional space and concepts are represented by regions in this space. In this article, we extend our recent mathematical formalization of this framework by providing quantitative mathematical definitions for measuring relations between concepts: We develop formal ways for computing concept size, subsethood, implication, similarity, and betweenness. This considerably increases the representational capabilities of our formalization and makes it the most thorough and comprehensive formalization of conceptual spaces developed so far.
Tasks
Published	2018-04-06
URL	http://arxiv.org/abs/1804.02393v1
PDF	http://arxiv.org/pdf/1804.02393v1.pdf
PWC	https://paperswithcode.com/paper/formal-ways-for-measuring-relations-between
Repo
Framework