July 29, 2019

2901 words 14 mins read

Paper Group AWR 185

Paper Group AWR 185

A selectional auto-encoder approach for document image binarization. Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization. SphereFace: Deep Hypersphere Embedding for Face Recognition. CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting. Text Coherence Analysis …

A selectional auto-encoder approach for document image binarization

Title A selectional auto-encoder approach for document image binarization
Authors Jorge Calvo-Zaragoza, Antonio-Javier Gallego
Abstract Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of documents analysis systems, and serves as a basis for subsequent steps. Hence it has to be robust in order to allow the full analysis workflow to be successful. Several methods for document image binarization have been proposed so far, most of which are based on hand-crafted image processing strategies. Recently, Convolutional Neural Networks have shown an amazing performance in many disparate duties related to computer vision. In this paper we discuss the use of convolutional auto-encoders devoted to learning an end-to-end map from an input image to its selectional output, in which activations indicate the likelihood of pixels to be either foreground or background. Once trained, documents can therefore be binarized by parsing them through the model and applying a threshold. This approach has proven to outperform existing binarization strategies in a number of document domains.
Tasks Information Retrieval
Published 2017-06-30
URL http://arxiv.org/abs/1706.10241v3
PDF http://arxiv.org/pdf/1706.10241v3.pdf
PWC https://paperswithcode.com/paper/a-selectional-auto-encoder-approach-for
Repo https://github.com/ajgallego/document-image-binarization
Framework none

Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization

Title Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization
Authors Tomoyuki Obuchi, Yoshiyuki Kabashima
Abstract We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an $\ell_1$-norm. This allows us to avoid repeated optimizations required for literally conducting cross-validation; hence, the computational time can be significantly reduced. The formula is derived through a perturbative approach employing the largeness of the data size and the model dimensionality. An extension to the elastic net regularization is also addressed. The usefulness of the approximate formula is demonstrated on simulated data and the ISOLET dataset from the UCI machine learning repository.
Tasks
Published 2017-11-15
URL http://arxiv.org/abs/1711.05420v2
PDF http://arxiv.org/pdf/1711.05420v2.pdf
PWC https://paperswithcode.com/paper/accelerating-cross-validation-in-multinomial
Repo https://github.com/T-Obuchi/AcceleratedCVonMLR_matlab
Framework none

SphereFace: Deep Hypersphere Embedding for Face Recognition

Title SphereFace: Deep Hypersphere Embedding for Face Recognition
Authors Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song
Abstract This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter $m$. We further derive specific $m$ to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge show the superiority of A-Softmax loss in FR tasks. The code has also been made publicly available.
Tasks Face Identification, Face Recognition, Face Verification
Published 2017-04-26
URL http://arxiv.org/abs/1704.08063v4
PDF http://arxiv.org/pdf/1704.08063v4.pdf
PWC https://paperswithcode.com/paper/sphereface-deep-hypersphere-embedding-for
Repo https://github.com/wy1iu/sphereface
Framework tf

CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting

Title CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting
Authors Vishwanath A. Sindagi, Vishal M. Patel
Abstract Estimating crowd count in densely crowded scenes is an extremely challenging task due to non-uniform scale variations. In this paper, we propose a novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation. Classifying crowd count into various groups is tantamount to coarsely estimating the total count in the image thereby incorporating a high-level prior into the density estimation network. This enables the layers in the network to learn globally relevant discriminative features which aid in estimating highly refined density maps with lower count error. The joint training is performed in an end-to-end fashion. Extensive experiments on highly challenging publicly available datasets show that the proposed method achieves lower count error and better quality density maps as compared to the recent state-of-the-art methods.
Tasks Crowd Counting, Density Estimation, Multi-Task Learning
Published 2017-07-30
URL http://arxiv.org/abs/1707.09605v2
PDF http://arxiv.org/pdf/1707.09605v2.pdf
PWC https://paperswithcode.com/paper/cnn-based-cascaded-multi-task-learning-of
Repo https://github.com/surajdakua/Crowd-Counting-Using-Pytorch
Framework pytorch

Text Coherence Analysis Based on Deep Neural Network

Title Text Coherence Analysis Based on Deep Neural Network
Authors Baiyun Cui, Yingming Li, Yaqing Zhang, Zhongfei Zhang
Abstract In this paper, we propose a novel deep coherence model (DCM) using a convolutional neural network architecture to capture the text coherence. The text coherence problem is investigated with a new perspective of learning sentence distributional representation and text coherence modeling simultaneously. In particular, the model captures the interactions between sentences by computing the similarities of their distributional representations. Further, it can be easily trained in an end-to-end fashion. The proposed model is evaluated on a standard Sentence Ordering task. The experimental results demonstrate its effectiveness and promise in coherence assessment showing a significant improvement over the state-of-the-art by a wide margin.
Tasks Sentence Ordering
Published 2017-10-21
URL http://arxiv.org/abs/1710.07770v1
PDF http://arxiv.org/pdf/1710.07770v1.pdf
PWC https://paperswithcode.com/paper/text-coherence-analysis-based-on-deep-neural
Repo https://github.com/geekSiddharth/DeepCoherence
Framework none

Unpaired Photo-to-Caricature Translation on Faces in the Wild

Title Unpaired Photo-to-Caricature Translation on Faces in the Wild
Authors Ziqiang Zheng, Wang Chao, Zhibin Yu, Nan Wang, Haiyong Zheng, Bing Zheng
Abstract Recently, image-to-image translation has been made much progress owing to the success of conditional Generative Adversarial Networks (cGANs). And some unpaired methods based on cycle consistency loss such as DualGAN, CycleGAN and DiscoGAN are really popular. However, it’s still very challenging for translation tasks with the requirement of high-level visual information conversion, such as photo-to-caricature translation that requires satire, exaggeration, lifelikeness and artistry. We present an approach for learning to translate faces in the wild from the source photo domain to the target caricature domain with different styles, which can also be used for other high-level image-to-image translation tasks. In order to capture global structure with local statistics while translation, we design a dual pathway model with one coarse discriminator and one fine discriminator. For generator, we provide one extra perceptual loss in association with adversarial loss and cycle consistency loss to achieve representation learning for two different domains. Also the style can be learned by the auxiliary noise input. Experiments on photo-to-caricature translation of faces in the wild show considerable performance gain of our proposed method over state-of-the-art translation methods as well as its potential real applications.
Tasks Caricature, Image-to-Image Translation, Photo-To-Caricature Translation, Representation Learning
Published 2017-11-29
URL http://arxiv.org/abs/1711.10735v2
PDF http://arxiv.org/pdf/1711.10735v2.pdf
PWC https://paperswithcode.com/paper/unpaired-photo-to-caricature-translation-on
Repo https://github.com/zhengziqiang/P2C
Framework tf

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Title Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Authors Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis
Abstract The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices.
Tasks Speech Synthesis
Published 2017-11-28
URL http://arxiv.org/abs/1711.10433v1
PDF http://arxiv.org/pdf/1711.10433v1.pdf
PWC https://paperswithcode.com/paper/parallel-wavenet-fast-high-fidelity-speech
Repo https://github.com/lokhiufung/music_generation
Framework tf

Scene Graph Generation by Iterative Message Passing

Title Scene Graph Generation by Iterative Message Passing
Authors Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei
Abstract Understanding a visual scene goes beyond recognizing individual objects in isolation. Relationships between objects also constitute rich semantic information about the scene. In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image. We propose a novel end-to-end model that generates such structured scene representation from an input image. The model solves the scene graph inference problem using standard RNNs and learns to iteratively improves its predictions via message passing. Our joint inference model can take advantage of contextual cues to make better predictions on objects and their relationships. The experiments show that our model significantly outperforms previous methods for generating scene graphs using Visual Genome dataset and inferring support relations with NYU Depth v2 dataset.
Tasks Graph Generation, Scene Graph Generation
Published 2017-01-10
URL http://arxiv.org/abs/1701.02426v2
PDF http://arxiv.org/pdf/1701.02426v2.pdf
PWC https://paperswithcode.com/paper/scene-graph-generation-by-iterative-message
Repo https://github.com/shikorab/SceneGraph
Framework tf

Generative Adversarial Source Separation

Title Generative Adversarial Source Separation
Authors Cem Subakan, Paris Smaragdis
Abstract Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density. Generative Adversarial Networks (GANs) can learn data distributions without needing a parametric assumption on the output density. We show on a speech source separation experiment that, a multi-layer perceptron trained with a Wasserstein-GAN formulation outperforms NMF, auto-encoders trained with maximum likelihood, and variational auto-encoders in terms of source to distortion ratio.
Tasks
Published 2017-10-30
URL http://arxiv.org/abs/1710.10779v1
PDF http://arxiv.org/pdf/1710.10779v1.pdf
PWC https://paperswithcode.com/paper/generative-adversarial-source-separation
Repo https://github.com/ycemsubakan/sourceseparation_misc
Framework pytorch

Nonlinear Information Bottleneck

Title Nonlinear Information Bottleneck
Authors Artemy Kolchinsky, Brendan D. Tracey, David H. Wolpert
Abstract Information bottleneck (IB) is a technique for extracting information in one random variable $X$ that is relevant for predicting another random variable $Y$. IB works by encoding $X$ in a compressed “bottleneck” random variable $M$ from which $Y$ can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete $X$ and $Y$ with small state spaces, and continuous $X$ and $Y$ with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous $X$ and $Y$, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed “variational IB” method on several real-world datasets.
Tasks
Published 2017-05-06
URL https://arxiv.org/abs/1705.02436v9
PDF https://arxiv.org/pdf/1705.02436v9.pdf
PWC https://paperswithcode.com/paper/nonlinear-information-bottleneck
Repo https://github.com/burklight/convex-IB-Lagrangian-PyTorch
Framework pytorch

A Regularized Framework for Sparse and Structured Neural Attention

Title A Regularized Framework for Sparse and Structured Neural Attention
Authors Vlad Niculae, Mathieu Blondel
Abstract Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Our framework includes softmax and a slight generalization of the recently-proposed sparsemax as special cases. However, we also show how our framework can incorporate modern structured penalties, resulting in more interpretable attention mechanisms, that focus on entire segments or groups of an input. We derive efficient algorithms to compute the forward and backward passes of our attention mechanisms, enabling their use in a neural network trained with backpropagation. To showcase their potential as a drop-in replacement for existing ones, we evaluate our attention mechanisms on three large-scale tasks: textual entailment, machine translation, and sentence summarization. Our attention mechanisms improve interpretability without sacrificing performance; notably, on textual entailment and summarization, we outperform the standard attention mechanisms based on softmax and sparsemax.
Tasks Machine Translation, Natural Language Inference, Text Summarization
Published 2017-05-22
URL http://arxiv.org/abs/1705.07704v3
PDF http://arxiv.org/pdf/1705.07704v3.pdf
PWC https://paperswithcode.com/paper/a-regularized-framework-for-sparse-and
Repo https://github.com/weiwang2330/sparse-structured-attention
Framework pytorch

Joint Topic-Semantic-aware Social Recommendation for Online Voting

Title Joint Topic-Semantic-aware Social Recommendation for Online Voting
Authors Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, Minyi Guo
Abstract Online voting is an emerging feature in social networks, in which users can express their attitudes toward various issues and show their unique interest. Online voting imposes new challenges on recommendation, because the propagation of votings heavily depends on the structure of social networks as well as the content of votings. In this paper, we investigate how to utilize these two factors in a comprehensive manner when doing voting recommendation. First, due to the fact that existing text mining methods such as topic model and semantic model cannot well process the content of votings that is typically short and ambiguous, we propose a novel Topic-Enhanced Word Embedding (TEWE) method to learn word and document representation by jointly considering their topics and semantics. Then we propose our Joint Topic-Semantic-aware social Matrix Factorization (JTS-MF) model for voting recommendation. JTS-MF model calculates similarity among users and votings by combining their TEWE representation and structural information of social networks, and preserves this topic-semantic-social similarity during matrix factorization. To evaluate the performance of TEWE representation and JTS-MF model, we conduct extensive experiments on real online voting dataset. The results prove the efficacy of our approach against several state-of-the-art baselines.
Tasks
Published 2017-12-03
URL http://arxiv.org/abs/1712.00731v1
PDF http://arxiv.org/pdf/1712.00731v1.pdf
PWC https://paperswithcode.com/paper/joint-topic-semantic-aware-social
Repo https://github.com/hwwang55/JTS-MF
Framework none

Fake News Detection on Social Media: A Data Mining Perspective

Title Fake News Detection on Social Media: A Data Mining Perspective
Authors Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, Huan Liu
Abstract Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of “fake news”, i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users’ social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Tasks Fake News Detection
Published 2017-08-07
URL http://arxiv.org/abs/1708.01967v3
PDF http://arxiv.org/pdf/1708.01967v3.pdf
PWC https://paperswithcode.com/paper/fake-news-detection-on-social-media-a-data
Repo https://github.com/KaiDMML/FakeNewsNet
Framework none

soc2seq: Social Embedding meets Conversation Model

Title soc2seq: Social Embedding meets Conversation Model
Authors Parminder Bhatia, Marsal Gavalda, Arash Einolghozati
Abstract While liking or upvoting a post on a mobile app is easy to do, replying with a written note is much more difficult, due to both the cognitive load of coming up with a meaningful response as well as the mechanics of entering the text. Here we present a novel textual reply generation model that goes beyond the current auto-reply and predictive text entry models by taking into account the content preferences of the user, the idiosyncrasies of their conversational style, and even the structure of their social graph. Specifically, we have developed two types of models for personalized user interactions: a content-based conversation model, which makes use of location together with user information, and a social-graph-based conversation model, which combines content-based conversation models with social graphs.
Tasks
Published 2017-02-17
URL http://arxiv.org/abs/1702.05512v3
PDF http://arxiv.org/pdf/1702.05512v3.pdf
PWC https://paperswithcode.com/paper/soc2seq-social-embedding-meets-conversation
Repo https://github.com/pbhatia243/Neural_Conversation_Models
Framework tf

Are GANs Created Equal? A Large-Scale Study

Title Are GANs Created Equal? A Large-Scale Study
Authors Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet
Abstract Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not find evidence that any of the tested algorithms consistently outperforms the non-saturating GAN introduced in \cite{goodfellow2014generative}.
Tasks Hyperparameter Optimization
Published 2017-11-28
URL http://arxiv.org/abs/1711.10337v4
PDF http://arxiv.org/pdf/1711.10337v4.pdf
PWC https://paperswithcode.com/paper/are-gans-created-equal-a-large-scale-study
Repo https://github.com/mseitzer/pytorch-fid
Framework pytorch
comments powered by Disqus