Paper Group ANR 74
RSGAN: Face Swapping and Editing using Face and Hair Representation in Latent Spaces. Lightweight Convolutional Approaches to Reading Comprehension on SQuAD. Rethinking Machine Learning Development and Deployment for Edge Devices. Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition. LIT: Blo …
RSGAN: Face Swapping and Editing using Face and Hair Representation in Latent Spaces
Title | RSGAN: Face Swapping and Editing using Face and Hair Representation in Latent Spaces |
Authors | Ryota Natsume, Tatsuya Yatagawa, Shigeo Morishima |
Abstract | In this paper, we present an integrated system for automatically generating and editing face images through face swapping, attribute-based editing, and random face parts synthesis. The proposed system is based on a deep neural network that variationally learns the face and hair regions with large-scale face image datasets. Different from conventional variational methods, the proposed network represents the latent spaces individually for faces and hairs. We refer to the proposed network as region-separative generative adversarial network (RSGAN). The proposed network independently handles face and hair appearances in the latent spaces, and then, face swapping is achieved by replacing the latent-space representations of the faces, and reconstruct the entire face image with them. This approach in the latent space robustly performs face swapping even for images which the previous methods result in failure due to inappropriate fitting or the 3D morphable models. In addition, the proposed system can further edit face-swapped images with the same network by manipulating visual attributes or by composing them with randomly generated face or hair parts. |
Tasks | Face Swapping |
Published | 2018-04-10 |
URL | http://arxiv.org/abs/1804.03447v2 |
http://arxiv.org/pdf/1804.03447v2.pdf | |
PWC | https://paperswithcode.com/paper/rsgan-face-swapping-and-editing-using-face |
Repo | |
Framework | |
Lightweight Convolutional Approaches to Reading Comprehension on SQuAD
Title | Lightweight Convolutional Approaches to Reading Comprehension on SQuAD |
Authors | Tobin Bell, Benjamin Penchas |
Abstract | Current state-of-the-art reading comprehension models rely heavily on recurrent neural networks. We explored an entirely different approach to question answering: a convolutional model. By their nature, these convolutional models are fast to train and capture local dependencies well, though they can struggle with longer-range dependencies and thus require augmentation to achieve comparable performance to RNN-based models. We conducted over two dozen controlled experiments with convolutional models and various kernel/attention/regularization schemes to determine the precise performance gains of each strategy, while maintaining a focus on speed. We ultimately ensembled three models: crossconv (0.5398 dev F1), attnconv (0.5665), and maybeconv (0.5285). The ensembled model was able to achieve a 0.6238 F1 score using the official SQuAD evaluation script. Our individual convolutional model crossconv was able to exceed the performance of the RNN-plus-attention baseline by 25% while training 6 times faster. |
Tasks | Question Answering, Reading Comprehension |
Published | 2018-10-19 |
URL | http://arxiv.org/abs/1810.08680v1 |
http://arxiv.org/pdf/1810.08680v1.pdf | |
PWC | https://paperswithcode.com/paper/lightweight-convolutional-approaches-to |
Repo | |
Framework | |
Rethinking Machine Learning Development and Deployment for Edge Devices
Title | Rethinking Machine Learning Development and Deployment for Edge Devices |
Authors | Liangzhen Lai, Naveen Suda |
Abstract | Machine learning (ML), especially deep learning is made possible by the availability of big data, enormous compute power and, often overlooked, development tools or frameworks. As the algorithms become mature and efficient, more and more ML inference is moving out of datacenters/cloud and deployed on edge devices. This model deployment process can be challenging as the deployment environment and requirements can be substantially different from those during model development. In this paper, we propose a new ML development and deployment approach that is specially designed and optimized for inference-only deployment on edge devices. We build a prototype and demonstrate that this approach can address all the deployment challenges and result in more efficient and high-quality solutions. |
Tasks | |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07846v1 |
http://arxiv.org/pdf/1806.07846v1.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-machine-learning-development-and |
Repo | |
Framework | |
Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition
Title | Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition |
Authors | Antonio Jimeno Yepes |
Abstract | Named entity recognition (NER) is used to identify relevant entities in text. A bidirectional LSTM (long short term memory) encoder with a neural conditional random fields (CRF) decoder (biLSTM-CRF) is the state of the art methodology. In this work, we have done an analysis of several methods that intend to optimize the performance of networks based on this architecture, which in some cases encourage overfitting avoidance. These methods target exploration of parameter space, regularization of LSTMs and penalization of confident output distributions. Results show that the optimization methods improve the performance of the biLSTM-CRF NER baseline system, setting a new state of the art performance for the CoNLL-2003 Spanish set with an F1 of 87.18. |
Tasks | Named Entity Recognition |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04029v2 |
http://arxiv.org/pdf/1808.04029v2.pdf | |
PWC | https://paperswithcode.com/paper/confidence-penalty-annealing-gaussian-noise |
Repo | |
Framework | |
LIT: Block-wise Intermediate Representation Training for Model Compression
Title | LIT: Block-wise Intermediate Representation Training for Model Compression |
Authors | Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia |
Abstract | Knowledge distillation (KD) is a popular method for reducing the computational overhead of deep network inference, in which the output of a teacher model is used to train a smaller, faster student model. Hint training (i.e., FitNets) extends KD by regressing a student model’s intermediate representation to a teacher model’s intermediate representation. In this work, we introduce bLock-wise Intermediate representation Training (LIT), a novel model compression technique that extends the use of intermediate representations in deep network compression, outperforming KD and hint training. LIT has two key ideas: 1) LIT trains a student of the same width (but shallower depth) as the teacher by directly comparing the intermediate representations, and 2) LIT uses the intermediate representation from the previous block in the teacher model as an input to the current student block during training, avoiding unstable intermediate representations in the student network. We show that LIT provides substantial reductions in network depth without loss in accuracy – for example, LIT can compress a ResNeXt-110 to a ResNeXt-20 (5.5x) on CIFAR10 and a VDCNN-29 to a VDCNN-9 (3.2x) on Amazon Reviews without loss in accuracy, outperforming KD and hint training in network size for a given accuracy. We also show that applying LIT to identical student/teacher architectures increases the accuracy of the student model above the teacher model, outperforming the recently-proposed Born Again Networks procedure on ResNet, ResNeXt, and VDCNN. Finally, we show that LIT can effectively compress GAN generators, which are not supported in the KD framework because GANs output pixels as opposed to probabilities. |
Tasks | Model Compression |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.01937v1 |
http://arxiv.org/pdf/1810.01937v1.pdf | |
PWC | https://paperswithcode.com/paper/lit-block-wise-intermediate-representation |
Repo | |
Framework | |
Spatial Correlation and Value Prediction in Convolutional Neural Networks
Title | Spatial Correlation and Value Prediction in Convolutional Neural Networks |
Authors | Gil Shomron, Uri Weiser |
Abstract | Convolutional neural networks (CNNs) are a widely used form of deep neural networks, introducing state-of-the-art results for different problems such as image classification, computer vision tasks, and speech recognition. However, CNNs are compute intensive, requiring billions of multiply-accumulate (MAC) operations per input. To reduce the number of MACs in CNNs, we propose a value prediction method that exploits the spatial correlation of zero-valued activations within the CNN output feature maps, thereby saving convolution operations. Our method reduces the number of MAC operations by 30.4%, averaged on three modern CNNs for ImageNet, with top-1 accuracy degradation of 1.7%, and top-5 accuracy degradation of 1.1%. |
Tasks | Image Classification, Speech Recognition |
Published | 2018-07-21 |
URL | http://arxiv.org/abs/1807.10598v2 |
http://arxiv.org/pdf/1807.10598v2.pdf | |
PWC | https://paperswithcode.com/paper/spatial-correlation-and-value-prediction-in |
Repo | |
Framework | |
Deep Generative Models in the Real-World: An Open Challenge from Medical Imaging
Title | Deep Generative Models in the Real-World: An Open Challenge from Medical Imaging |
Authors | Xiaoran Chen, Nick Pawlowski, Martin Rajchl, Ben Glocker, Ender Konukoglu |
Abstract | Recent advances in deep learning led to novel generative modeling techniques that achieve unprecedented quality in generated samples and performance in learning complex distributions in imaging data. These new models in medical image computing have important applications that form clinically relevant and very challenging unsupervised learning problems. In this paper, we explore the feasibility of using state-of-the-art auto-encoder-based deep generative models, such as variational and adversarial auto-encoders, for one such task: abnormality detection in medical imaging. We utilize typical, publicly available datasets with brain scans from healthy subjects and patients with stroke lesions and brain tumors. We use the data from healthy subjects to train different auto-encoder based models to learn the distribution of healthy images and detect pathologies as outliers. Models that can better learn the data distribution should be able to detect outliers more accurately. We evaluate the detection performance of deep generative models and compare them with non-deep learning based approaches to provide a benchmark of the current state of research. We conclude that abnormality detection is a challenging task for deep generative models and large room exists for improvement. In order to facilitate further research, we aim to provide carefully pre-processed imaging data available to the research community. |
Tasks | Anomaly Detection |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05452v1 |
http://arxiv.org/pdf/1806.05452v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-generative-models-in-the-real-world-an |
Repo | |
Framework | |
Tone Biased MMR Text Summarization
Title | Tone Biased MMR Text Summarization |
Authors | Mayank Chaudhari, Aakash Nelson Mattukoyya |
Abstract | Text summarization is an interesting area for researchers to develop new techniques to provide human like summaries for vast amounts of information. Summarization techniques tend to focus on providing accurate representation of content, and often the tone of the content is ignored. Tone of the content sets a baseline for how a reader perceives the content. As such being able to generate summary with tone that is appropriate for the reader is important. In our work we implement Maximal Marginal Relevance [MMR] based multi-document text summarization and propose a naive model to change tone of the summarization by setting a bias to specific set of words and restricting other words in the summarization output. This bias towards a specified set of words produces a summary whose tone is same as tone of specified words. |
Tasks | Text Summarization |
Published | 2018-02-26 |
URL | http://arxiv.org/abs/1802.09426v2 |
http://arxiv.org/pdf/1802.09426v2.pdf | |
PWC | https://paperswithcode.com/paper/tone-biased-mmr-text-summarization |
Repo | |
Framework | |
Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment
Title | Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment |
Authors | Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, Carlo Zaniolo |
Abstract | Multilingual knowledge graph (KG) embeddings provide latent semantic representations of entities and structured knowledge with cross-lingual inferences, which benefit various knowledge-driven cross-lingual NLP tasks. However, precisely learning such cross-lingual inferences is usually hindered by the low coverage of entity alignment in many KGs. Since many multilingual KGs also provide literal descriptions of entities, in this paper, we introduce an embedding-based approach which leverages a weakly aligned multilingual KG for semi-supervised cross-lingual learning using entity descriptions. Our approach performs co-training of two embedding models, i.e. a multilingual KG embedding model and a multilingual literal description embedding model. The models are trained on a large Wikipedia-based trilingual dataset where most entity alignment is unknown to training. Experimental results show that the performance of the proposed approach on the entity alignment task improves at each iteration of co-training, and eventually reaches a stage at which it significantly surpasses previous approaches. We also show that our approach has promising abilities for zero-shot entity alignment, and cross-lingual KG completion. |
Tasks | Entity Alignment, Knowledge Graphs |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06478v1 |
http://arxiv.org/pdf/1806.06478v1.pdf | |
PWC | https://paperswithcode.com/paper/co-training-embeddings-of-knowledge-graphs |
Repo | |
Framework | |
Feature Fusion through Multitask CNN for Large-scale Remote Sensing Image Segmentation
Title | Feature Fusion through Multitask CNN for Large-scale Remote Sensing Image Segmentation |
Authors | Shihao Sun, Lei Yang, Wenjie Liu, Ruirui Li |
Abstract | In recent years, Fully Convolutional Networks (FCN) has been widely used in various semantic segmentation tasks, including multi-modal remote sensing imagery. How to fuse multi-modal data to improve the segmentation performance has always been a research hotspot. In this paper, a novel end-toend fully convolutional neural network is proposed for semantic segmentation of natural color, infrared imagery and Digital Surface Models (DSM). It is based on a modified DeepUNet and perform the segmentation in a multi-task way. The channels are clustered into groups and processed on different task pipelines. After a series of segmentation and fusion, their shared features and private features are successfully merged together. Experiment results show that the feature fusion network is efficient. And our approach achieves good performance in ISPRS Semantic Labeling Contest (2D). |
Tasks | Semantic Segmentation |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.09072v1 |
http://arxiv.org/pdf/1807.09072v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-fusion-through-multitask-cnn-for |
Repo | |
Framework | |
Tandem Blocks in Deep Convolutional Neural Networks
Title | Tandem Blocks in Deep Convolutional Neural Networks |
Authors | Chris Hettinger, Tanner Christensen, Jeffrey Humpherys, Tyler J. Jarvis |
Abstract | Due to the success of residual networks (resnets) and related architectures, shortcut connections have quickly become standard tools for building convolutional neural networks. The explanations in the literature for the apparent effectiveness of shortcuts are varied and often contradictory. We hypothesize that shortcuts work primarily because they act as linear counterparts to nonlinear layers. We test this hypothesis by using several variations on the standard residual block, with different types of linear connections, to build small image classification networks. Our experiments show that other kinds of linear connections can be even more effective than the identity shortcuts. Our results also suggest that the best type of linear connection for a given application may depend on both network width and depth. |
Tasks | Image Classification |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00145v1 |
http://arxiv.org/pdf/1806.00145v1.pdf | |
PWC | https://paperswithcode.com/paper/tandem-blocks-in-deep-convolutional-neural |
Repo | |
Framework | |
Comparison between Suitable Priors for Additive Bayesian Networks
Title | Comparison between Suitable Priors for Additive Bayesian Networks |
Authors | Gilles Kratzer, Reinhard Furrer, Marta Pittavino |
Abstract | Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior - like a too weakly informative one - is used, data separation and data sparsity lead to issues in the model selection process. In this work a simulation study between two weakly and a strongly informative priors is presented. As weakly informative prior we use a zero mean Gaussian prior with a large variance, currently implemented in the R-package abn. The second prior belongs to the Student’s t-distribution, specifically designed for logistic regressions and, finally, the strongly informative prior is again Gaussian with mean equal to true parameter value and a small variance. We compare the impact of these priors on the accuracy of the learned additive Bayesian network in function of different parameters. We create a simulation study to illustrate Lindley’s paradox based on the prior choice. We then conclude by highlighting the good performance of the informative Student’s t-prior and the limited impact of the Lindley’s paradox. Finally, suggestions for further developments are provided. |
Tasks | Model Selection |
Published | 2018-09-18 |
URL | http://arxiv.org/abs/1809.06636v1 |
http://arxiv.org/pdf/1809.06636v1.pdf | |
PWC | https://paperswithcode.com/paper/comparison-between-suitable-priors-for |
Repo | |
Framework | |
An N Time-Slice Dynamic Chain Event Graph
Title | An N Time-Slice Dynamic Chain Event Graph |
Authors | Rodrigo A. Collazo, Jim Q. Smith |
Abstract | The Dynamic Chain Event Graph (DCEG) is able to depict many classes of discrete random processes exhibiting asymmetries in their developments and context-specific conditional probabilities structures. However, paradoxically, this very generality has so far frustrated its wide application. So in this paper we develop an object-oriented method to fully analyse a particularly useful and feasibly implementable new subclass of these graphical models called the N Time-Slice DCEG (NT-DCEG). After demonstrating a close relationship between an NT-DCEG and a specific class of Markov processes, we discuss how graphical modellers can exploit this connection to gain a deep understanding of their processes. We also show how to read from the topology of this graph context-specific independence statements that can then be checked by domain experts. Our methods are illustrated throughout using examples of dynamic multivariate processes describing inmate radicalisation in a prison. |
Tasks | |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05726v2 |
http://arxiv.org/pdf/1808.05726v2.pdf | |
PWC | https://paperswithcode.com/paper/an-n-time-slice-dynamic-chain-event-graph |
Repo | |
Framework | |
Improved Gradient-Based Optimization Over Discrete Distributions
Title | Improved Gradient-Based Optimization Over Discrete Distributions |
Authors | Evgeny Andriyash, Arash Vahdat, Bill Macready |
Abstract | In many applications we seek to maximize an expectation with respect to a distribution over discrete variables. Estimating gradients of such objectives with respect to the distribution parameters is a challenging problem. We analyze existing solutions including finite-difference (FD) estimators and continuous relaxation (CR) estimators in terms of bias and variance. We show that the commonly used Gumbel-Softmax estimator is biased and propose a simple method to reduce it. We also derive a simpler piece-wise linear continuous relaxation that also possesses reduced bias. We demonstrate empirically that reduced bias leads to a better performance in variational inference and on binary optimization tasks. |
Tasks | |
Published | 2018-09-29 |
URL | https://arxiv.org/abs/1810.00116v3 |
https://arxiv.org/pdf/1810.00116v3.pdf | |
PWC | https://paperswithcode.com/paper/improved-gradient-based-optimization-over |
Repo | |
Framework | |
Bayesian Structure Learning by Recursive Bootstrap
Title | Bayesian Structure Learning by Recursive Bootstrap |
Authors | Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov, Guy Koren, Gal Novik |
Abstract | We address the problem of Bayesian structure learning for domains with hundreds of variables by employing non-parametric bootstrap, recursively. We propose a method that covers both model averaging and model selection in the same framework. The proposed method deals with the main weakness of constraint-based learning—sensitivity to errors in the independence tests—by a novel way of combining bootstrap with constraint-based learning. Essentially, we provide an algorithm for learning a tree, in which each node represents a scored CPDAG for a subset of variables and the level of the node corresponds to the maximal order of conditional independencies that are encoded in the graph. As higher order independencies are tested in deeper recursive calls, they benefit from more bootstrap samples, and therefore more resistant to the curse-of-dimensionality. Moreover, the re-use of stable low order independencies allows greater computational efficiency. We also provide an algorithm for sampling CPDAGs efficiently from their posterior given the learned tree. We empirically demonstrate that the proposed algorithm scales well to hundreds of variables, and learns better MAP models and more reliable causal relationships between variables, than other state-of-the-art-methods. |
Tasks | Model Selection |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.04828v1 |
http://arxiv.org/pdf/1809.04828v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-structure-learning-by-recursive |
Repo | |
Framework | |