October 20, 2019

2936 words 14 mins read

Paper Group ANR 74

RSGAN: Face Swapping and Editing using Face and Hair Representation in Latent Spaces. Lightweight Convolutional Approaches to Reading Comprehension on SQuAD. Rethinking Machine Learning Development and Deployment for Edge Devices. Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition. LIT: Blo …

RSGAN: Face Swapping and Editing using Face and Hair Representation in Latent Spaces


Title	RSGAN: Face Swapping and Editing using Face and Hair Representation in Latent Spaces
Authors	Ryota Natsume, Tatsuya Yatagawa, Shigeo Morishima
Abstract	In this paper, we present an integrated system for automatically generating and editing face images through face swapping, attribute-based editing, and random face parts synthesis. The proposed system is based on a deep neural network that variationally learns the face and hair regions with large-scale face image datasets. Different from conventional variational methods, the proposed network represents the latent spaces individually for faces and hairs. We refer to the proposed network as region-separative generative adversarial network (RSGAN). The proposed network independently handles face and hair appearances in the latent spaces, and then, face swapping is achieved by replacing the latent-space representations of the faces, and reconstruct the entire face image with them. This approach in the latent space robustly performs face swapping even for images which the previous methods result in failure due to inappropriate fitting or the 3D morphable models. In addition, the proposed system can further edit face-swapped images with the same network by manipulating visual attributes or by composing them with randomly generated face or hair parts.
Tasks	Face Swapping
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03447v2
PDF	http://arxiv.org/pdf/1804.03447v2.pdf
PWC	https://paperswithcode.com/paper/rsgan-face-swapping-and-editing-using-face
Repo
Framework

Lightweight Convolutional Approaches to Reading Comprehension on SQuAD


Title	Lightweight Convolutional Approaches to Reading Comprehension on SQuAD
Authors	Tobin Bell, Benjamin Penchas
Abstract	Current state-of-the-art reading comprehension models rely heavily on recurrent neural networks. We explored an entirely different approach to question answering: a convolutional model. By their nature, these convolutional models are fast to train and capture local dependencies well, though they can struggle with longer-range dependencies and thus require augmentation to achieve comparable performance to RNN-based models. We conducted over two dozen controlled experiments with convolutional models and various kernel/attention/regularization schemes to determine the precise performance gains of each strategy, while maintaining a focus on speed. We ultimately ensembled three models: crossconv (0.5398 dev F1), attnconv (0.5665), and maybeconv (0.5285). The ensembled model was able to achieve a 0.6238 F1 score using the official SQuAD evaluation script. Our individual convolutional model crossconv was able to exceed the performance of the RNN-plus-attention baseline by 25% while training 6 times faster.
Tasks	Question Answering, Reading Comprehension
Published	2018-10-19
URL	http://arxiv.org/abs/1810.08680v1
PDF	http://arxiv.org/pdf/1810.08680v1.pdf
PWC	https://paperswithcode.com/paper/lightweight-convolutional-approaches-to
Repo
Framework

Rethinking Machine Learning Development and Deployment for Edge Devices


Title	Rethinking Machine Learning Development and Deployment for Edge Devices
Authors	Liangzhen Lai, Naveen Suda
Abstract	Machine learning (ML), especially deep learning is made possible by the availability of big data, enormous compute power and, often overlooked, development tools or frameworks. As the algorithms become mature and efficient, more and more ML inference is moving out of datacenters/cloud and deployed on edge devices. This model deployment process can be challenging as the deployment environment and requirements can be substantially different from those during model development. In this paper, we propose a new ML development and deployment approach that is specially designed and optimized for inference-only deployment on edge devices. We build a prototype and demonstrate that this approach can address all the deployment challenges and result in more efficient and high-quality solutions.
Tasks
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07846v1
PDF	http://arxiv.org/pdf/1806.07846v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-machine-learning-development-and
Repo
Framework

Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition


Title	Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition
Authors	Antonio Jimeno Yepes
Abstract	Named entity recognition (NER) is used to identify relevant entities in text. A bidirectional LSTM (long short term memory) encoder with a neural conditional random fields (CRF) decoder (biLSTM-CRF) is the state of the art methodology. In this work, we have done an analysis of several methods that intend to optimize the performance of networks based on this architecture, which in some cases encourage overfitting avoidance. These methods target exploration of parameter space, regularization of LSTMs and penalization of confident output distributions. Results show that the optimization methods improve the performance of the biLSTM-CRF NER baseline system, setting a new state of the art performance for the CoNLL-2003 Spanish set with an F1 of 87.18.
Tasks	Named Entity Recognition
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04029v2
PDF	http://arxiv.org/pdf/1808.04029v2.pdf
PWC	https://paperswithcode.com/paper/confidence-penalty-annealing-gaussian-noise
Repo
Framework

LIT: Block-wise Intermediate Representation Training for Model Compression


Title	LIT: Block-wise Intermediate Representation Training for Model Compression
Authors	Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia
Abstract	Knowledge distillation (KD) is a popular method for reducing the computational overhead of deep network inference, in which the output of a teacher model is used to train a smaller, faster student model. Hint training (i.e., FitNets) extends KD by regressing a student model’s intermediate representation to a teacher model’s intermediate representation. In this work, we introduce bLock-wise Intermediate representation Training (LIT), a novel model compression technique that extends the use of intermediate representations in deep network compression, outperforming KD and hint training. LIT has two key ideas: 1) LIT trains a student of the same width (but shallower depth) as the teacher by directly comparing the intermediate representations, and 2) LIT uses the intermediate representation from the previous block in the teacher model as an input to the current student block during training, avoiding unstable intermediate representations in the student network. We show that LIT provides substantial reductions in network depth without loss in accuracy – for example, LIT can compress a ResNeXt-110 to a ResNeXt-20 (5.5x) on CIFAR10 and a VDCNN-29 to a VDCNN-9 (3.2x) on Amazon Reviews without loss in accuracy, outperforming KD and hint training in network size for a given accuracy. We also show that applying LIT to identical student/teacher architectures increases the accuracy of the student model above the teacher model, outperforming the recently-proposed Born Again Networks procedure on ResNet, ResNeXt, and VDCNN. Finally, we show that LIT can effectively compress GAN generators, which are not supported in the KD framework because GANs output pixels as opposed to probabilities.
Tasks	Model Compression
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01937v1
PDF	http://arxiv.org/pdf/1810.01937v1.pdf
PWC	https://paperswithcode.com/paper/lit-block-wise-intermediate-representation
Repo
Framework

Spatial Correlation and Value Prediction in Convolutional Neural Networks


Title	Spatial Correlation and Value Prediction in Convolutional Neural Networks
Authors	Gil Shomron, Uri Weiser
Abstract	Convolutional neural networks (CNNs) are a widely used form of deep neural networks, introducing state-of-the-art results for different problems such as image classification, computer vision tasks, and speech recognition. However, CNNs are compute intensive, requiring billions of multiply-accumulate (MAC) operations per input. To reduce the number of MACs in CNNs, we propose a value prediction method that exploits the spatial correlation of zero-valued activations within the CNN output feature maps, thereby saving convolution operations. Our method reduces the number of MAC operations by 30.4%, averaged on three modern CNNs for ImageNet, with top-1 accuracy degradation of 1.7%, and top-5 accuracy degradation of 1.1%.
Tasks	Image Classification, Speech Recognition
Published	2018-07-21
URL	http://arxiv.org/abs/1807.10598v2
PDF	http://arxiv.org/pdf/1807.10598v2.pdf
PWC	https://paperswithcode.com/paper/spatial-correlation-and-value-prediction-in
Repo
Framework

Deep Generative Models in the Real-World: An Open Challenge from Medical Imaging


Title	Deep Generative Models in the Real-World: An Open Challenge from Medical Imaging
Authors	Xiaoran Chen, Nick Pawlowski, Martin Rajchl, Ben Glocker, Ender Konukoglu
Abstract	Recent advances in deep learning led to novel generative modeling techniques that achieve unprecedented quality in generated samples and performance in learning complex distributions in imaging data. These new models in medical image computing have important applications that form clinically relevant and very challenging unsupervised learning problems. In this paper, we explore the feasibility of using state-of-the-art auto-encoder-based deep generative models, such as variational and adversarial auto-encoders, for one such task: abnormality detection in medical imaging. We utilize typical, publicly available datasets with brain scans from healthy subjects and patients with stroke lesions and brain tumors. We use the data from healthy subjects to train different auto-encoder based models to learn the distribution of healthy images and detect pathologies as outliers. Models that can better learn the data distribution should be able to detect outliers more accurately. We evaluate the detection performance of deep generative models and compare them with non-deep learning based approaches to provide a benchmark of the current state of research. We conclude that abnormality detection is a challenging task for deep generative models and large room exists for improvement. In order to facilitate further research, we aim to provide carefully pre-processed imaging data available to the research community.
Tasks	Anomaly Detection
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05452v1
PDF	http://arxiv.org/pdf/1806.05452v1.pdf
PWC	https://paperswithcode.com/paper/deep-generative-models-in-the-real-world-an
Repo
Framework

Tone Biased MMR Text Summarization


Title	Tone Biased MMR Text Summarization
Authors	Mayank Chaudhari, Aakash Nelson Mattukoyya
Abstract	Text summarization is an interesting area for researchers to develop new techniques to provide human like summaries for vast amounts of information. Summarization techniques tend to focus on providing accurate representation of content, and often the tone of the content is ignored. Tone of the content sets a baseline for how a reader perceives the content. As such being able to generate summary with tone that is appropriate for the reader is important. In our work we implement Maximal Marginal Relevance [MMR] based multi-document text summarization and propose a naive model to change tone of the summarization by setting a bias to specific set of words and restricting other words in the summarization output. This bias towards a specified set of words produces a summary whose tone is same as tone of specified words.
Tasks	Text Summarization
Published	2018-02-26
URL	http://arxiv.org/abs/1802.09426v2
PDF	http://arxiv.org/pdf/1802.09426v2.pdf
PWC	https://paperswithcode.com/paper/tone-biased-mmr-text-summarization
Repo
Framework

Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment


Title	Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment
Authors	Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, Carlo Zaniolo
Abstract	Multilingual knowledge graph (KG) embeddings provide latent semantic representations of entities and structured knowledge with cross-lingual inferences, which benefit various knowledge-driven cross-lingual NLP tasks. However, precisely learning such cross-lingual inferences is usually hindered by the low coverage of entity alignment in many KGs. Since many multilingual KGs also provide literal descriptions of entities, in this paper, we introduce an embedding-based approach which leverages a weakly aligned multilingual KG for semi-supervised cross-lingual learning using entity descriptions. Our approach performs co-training of two embedding models, i.e. a multilingual KG embedding model and a multilingual literal description embedding model. The models are trained on a large Wikipedia-based trilingual dataset where most entity alignment is unknown to training. Experimental results show that the performance of the proposed approach on the entity alignment task improves at each iteration of co-training, and eventually reaches a stage at which it significantly surpasses previous approaches. We also show that our approach has promising abilities for zero-shot entity alignment, and cross-lingual KG completion.
Tasks	Entity Alignment, Knowledge Graphs
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06478v1
PDF	http://arxiv.org/pdf/1806.06478v1.pdf
PWC	https://paperswithcode.com/paper/co-training-embeddings-of-knowledge-graphs
Repo
Framework

Feature Fusion through Multitask CNN for Large-scale Remote Sensing Image Segmentation


Title	Feature Fusion through Multitask CNN for Large-scale Remote Sensing Image Segmentation
Authors	Shihao Sun, Lei Yang, Wenjie Liu, Ruirui Li
Abstract	In recent years, Fully Convolutional Networks (FCN) has been widely used in various semantic segmentation tasks, including multi-modal remote sensing imagery. How to fuse multi-modal data to improve the segmentation performance has always been a research hotspot. In this paper, a novel end-toend fully convolutional neural network is proposed for semantic segmentation of natural color, infrared imagery and Digital Surface Models (DSM). It is based on a modified DeepUNet and perform the segmentation in a multi-task way. The channels are clustered into groups and processed on different task pipelines. After a series of segmentation and fusion, their shared features and private features are successfully merged together. Experiment results show that the feature fusion network is efficient. And our approach achieves good performance in ISPRS Semantic Labeling Contest (2D).
Tasks	Semantic Segmentation
Published	2018-07-24
URL	http://arxiv.org/abs/1807.09072v1
PDF	http://arxiv.org/pdf/1807.09072v1.pdf
PWC	https://paperswithcode.com/paper/feature-fusion-through-multitask-cnn-for
Repo
Framework

Tandem Blocks in Deep Convolutional Neural Networks


Title	Tandem Blocks in Deep Convolutional Neural Networks
Authors	Chris Hettinger, Tanner Christensen, Jeffrey Humpherys, Tyler J. Jarvis
Abstract	Due to the success of residual networks (resnets) and related architectures, shortcut connections have quickly become standard tools for building convolutional neural networks. The explanations in the literature for the apparent effectiveness of shortcuts are varied and often contradictory. We hypothesize that shortcuts work primarily because they act as linear counterparts to nonlinear layers. We test this hypothesis by using several variations on the standard residual block, with different types of linear connections, to build small image classification networks. Our experiments show that other kinds of linear connections can be even more effective than the identity shortcuts. Our results also suggest that the best type of linear connection for a given application may depend on both network width and depth.
Tasks	Image Classification
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00145v1
PDF	http://arxiv.org/pdf/1806.00145v1.pdf
PWC	https://paperswithcode.com/paper/tandem-blocks-in-deep-convolutional-neural
Repo
Framework

Comparison between Suitable Priors for Additive Bayesian Networks


Title	Comparison between Suitable Priors for Additive Bayesian Networks
Authors	Gilles Kratzer, Reinhard Furrer, Marta Pittavino
Abstract	Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior - like a too weakly informative one - is used, data separation and data sparsity lead to issues in the model selection process. In this work a simulation study between two weakly and a strongly informative priors is presented. As weakly informative prior we use a zero mean Gaussian prior with a large variance, currently implemented in the R-package abn. The second prior belongs to the Student’s t-distribution, specifically designed for logistic regressions and, finally, the strongly informative prior is again Gaussian with mean equal to true parameter value and a small variance. We compare the impact of these priors on the accuracy of the learned additive Bayesian network in function of different parameters. We create a simulation study to illustrate Lindley’s paradox based on the prior choice. We then conclude by highlighting the good performance of the informative Student’s t-prior and the limited impact of the Lindley’s paradox. Finally, suggestions for further developments are provided.
Tasks	Model Selection
Published	2018-09-18
URL	http://arxiv.org/abs/1809.06636v1
PDF	http://arxiv.org/pdf/1809.06636v1.pdf
PWC	https://paperswithcode.com/paper/comparison-between-suitable-priors-for
Repo
Framework

An N Time-Slice Dynamic Chain Event Graph


Title	An N Time-Slice Dynamic Chain Event Graph
Authors	Rodrigo A. Collazo, Jim Q. Smith
Abstract	The Dynamic Chain Event Graph (DCEG) is able to depict many classes of discrete random processes exhibiting asymmetries in their developments and context-specific conditional probabilities structures. However, paradoxically, this very generality has so far frustrated its wide application. So in this paper we develop an object-oriented method to fully analyse a particularly useful and feasibly implementable new subclass of these graphical models called the N Time-Slice DCEG (NT-DCEG). After demonstrating a close relationship between an NT-DCEG and a specific class of Markov processes, we discuss how graphical modellers can exploit this connection to gain a deep understanding of their processes. We also show how to read from the topology of this graph context-specific independence statements that can then be checked by domain experts. Our methods are illustrated throughout using examples of dynamic multivariate processes describing inmate radicalisation in a prison.
Tasks
Published	2018-08-17
URL	http://arxiv.org/abs/1808.05726v2
PDF	http://arxiv.org/pdf/1808.05726v2.pdf
PWC	https://paperswithcode.com/paper/an-n-time-slice-dynamic-chain-event-graph
Repo
Framework

Improved Gradient-Based Optimization Over Discrete Distributions


Title	Improved Gradient-Based Optimization Over Discrete Distributions
Authors	Evgeny Andriyash, Arash Vahdat, Bill Macready
Abstract	In many applications we seek to maximize an expectation with respect to a distribution over discrete variables. Estimating gradients of such objectives with respect to the distribution parameters is a challenging problem. We analyze existing solutions including finite-difference (FD) estimators and continuous relaxation (CR) estimators in terms of bias and variance. We show that the commonly used Gumbel-Softmax estimator is biased and propose a simple method to reduce it. We also derive a simpler piece-wise linear continuous relaxation that also possesses reduced bias. We demonstrate empirically that reduced bias leads to a better performance in variational inference and on binary optimization tasks.
Tasks
Published	2018-09-29
URL	https://arxiv.org/abs/1810.00116v3
PDF	https://arxiv.org/pdf/1810.00116v3.pdf
PWC	https://paperswithcode.com/paper/improved-gradient-based-optimization-over
Repo
Framework

Bayesian Structure Learning by Recursive Bootstrap


Title	Bayesian Structure Learning by Recursive Bootstrap
Authors	Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov, Guy Koren, Gal Novik
Abstract	We address the problem of Bayesian structure learning for domains with hundreds of variables by employing non-parametric bootstrap, recursively. We propose a method that covers both model averaging and model selection in the same framework. The proposed method deals with the main weakness of constraint-based learning—sensitivity to errors in the independence tests—by a novel way of combining bootstrap with constraint-based learning. Essentially, we provide an algorithm for learning a tree, in which each node represents a scored CPDAG for a subset of variables and the level of the node corresponds to the maximal order of conditional independencies that are encoded in the graph. As higher order independencies are tested in deeper recursive calls, they benefit from more bootstrap samples, and therefore more resistant to the curse-of-dimensionality. Moreover, the re-use of stable low order independencies allows greater computational efficiency. We also provide an algorithm for sampling CPDAGs efficiently from their posterior given the learned tree. We empirically demonstrate that the proposed algorithm scales well to hundreds of variables, and learns better MAP models and more reliable causal relationships between variables, than other state-of-the-art-methods.
Tasks	Model Selection
Published	2018-09-13
URL	http://arxiv.org/abs/1809.04828v1
PDF	http://arxiv.org/pdf/1809.04828v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-structure-learning-by-recursive
Repo
Framework