July 29, 2019

2901 words 14 mins read

Paper Group AWR 185

A selectional auto-encoder approach for document image binarization. Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization. SphereFace: Deep Hypersphere Embedding for Face Recognition. CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting. Text Coherence Analysis …

A selectional auto-encoder approach for document image binarization


Title	A selectional auto-encoder approach for document image binarization
Authors	Jorge Calvo-Zaragoza, Antonio-Javier Gallego
Abstract	Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of documents analysis systems, and serves as a basis for subsequent steps. Hence it has to be robust in order to allow the full analysis workflow to be successful. Several methods for document image binarization have been proposed so far, most of which are based on hand-crafted image processing strategies. Recently, Convolutional Neural Networks have shown an amazing performance in many disparate duties related to computer vision. In this paper we discuss the use of convolutional auto-encoders devoted to learning an end-to-end map from an input image to its selectional output, in which activations indicate the likelihood of pixels to be either foreground or background. Once trained, documents can therefore be binarized by parsing them through the model and applying a threshold. This approach has proven to outperform existing binarization strategies in a number of document domains.
Tasks	Information Retrieval
Published	2017-06-30
URL	http://arxiv.org/abs/1706.10241v3
PDF	http://arxiv.org/pdf/1706.10241v3.pdf
PWC	https://paperswithcode.com/paper/a-selectional-auto-encoder-approach-for
Repo	https://github.com/ajgallego/document-image-binarization
Framework	none

Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization


Title	Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization
Authors	Tomoyuki Obuchi, Yoshiyuki Kabashima
Abstract	We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an $\ell_1$-norm. This allows us to avoid repeated optimizations required for literally conducting cross-validation; hence, the computational time can be significantly reduced. The formula is derived through a perturbative approach employing the largeness of the data size and the model dimensionality. An extension to the elastic net regularization is also addressed. The usefulness of the approximate formula is demonstrated on simulated data and the ISOLET dataset from the UCI machine learning repository.
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05420v2
PDF	http://arxiv.org/pdf/1711.05420v2.pdf
PWC	https://paperswithcode.com/paper/accelerating-cross-validation-in-multinomial
Repo	https://github.com/T-Obuchi/AcceleratedCVonMLR_matlab
Framework	none

SphereFace: Deep Hypersphere Embedding for Face Recognition


Title	SphereFace: Deep Hypersphere Embedding for Face Recognition
Authors	Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song
Abstract	This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter $m$. We further derive specific $m$ to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge show the superiority of A-Softmax loss in FR tasks. The code has also been made publicly available.
Tasks	Face Identification, Face Recognition, Face Verification
Published	2017-04-26
URL	http://arxiv.org/abs/1704.08063v4
PDF	http://arxiv.org/pdf/1704.08063v4.pdf
PWC	https://paperswithcode.com/paper/sphereface-deep-hypersphere-embedding-for
Repo	https://github.com/wy1iu/sphereface
Framework	tf

CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting


Title	CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting
Authors	Vishwanath A. Sindagi, Vishal M. Patel
Abstract	Estimating crowd count in densely crowded scenes is an extremely challenging task due to non-uniform scale variations. In this paper, we propose a novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation. Classifying crowd count into various groups is tantamount to coarsely estimating the total count in the image thereby incorporating a high-level prior into the density estimation network. This enables the layers in the network to learn globally relevant discriminative features which aid in estimating highly refined density maps with lower count error. The joint training is performed in an end-to-end fashion. Extensive experiments on highly challenging publicly available datasets show that the proposed method achieves lower count error and better quality density maps as compared to the recent state-of-the-art methods.
Tasks	Crowd Counting, Density Estimation, Multi-Task Learning
Published	2017-07-30
URL	http://arxiv.org/abs/1707.09605v2
PDF	http://arxiv.org/pdf/1707.09605v2.pdf
PWC	https://paperswithcode.com/paper/cnn-based-cascaded-multi-task-learning-of
Repo	https://github.com/surajdakua/Crowd-Counting-Using-Pytorch
Framework	pytorch

Text Coherence Analysis Based on Deep Neural Network


Title	Text Coherence Analysis Based on Deep Neural Network
Authors	Baiyun Cui, Yingming Li, Yaqing Zhang, Zhongfei Zhang
Abstract	In this paper, we propose a novel deep coherence model (DCM) using a convolutional neural network architecture to capture the text coherence. The text coherence problem is investigated with a new perspective of learning sentence distributional representation and text coherence modeling simultaneously. In particular, the model captures the interactions between sentences by computing the similarities of their distributional representations. Further, it can be easily trained in an end-to-end fashion. The proposed model is evaluated on a standard Sentence Ordering task. The experimental results demonstrate its effectiveness and promise in coherence assessment showing a significant improvement over the state-of-the-art by a wide margin.
Tasks	Sentence Ordering
Published	2017-10-21
URL	http://arxiv.org/abs/1710.07770v1
PDF	http://arxiv.org/pdf/1710.07770v1.pdf
PWC	https://paperswithcode.com/paper/text-coherence-analysis-based-on-deep-neural
Repo	https://github.com/geekSiddharth/DeepCoherence
Framework	none

Unpaired Photo-to-Caricature Translation on Faces in the Wild


Title	Unpaired Photo-to-Caricature Translation on Faces in the Wild
Authors	Ziqiang Zheng, Wang Chao, Zhibin Yu, Nan Wang, Haiyong Zheng, Bing Zheng
Abstract	Recently, image-to-image translation has been made much progress owing to the success of conditional Generative Adversarial Networks (cGANs). And some unpaired methods based on cycle consistency loss such as DualGAN, CycleGAN and DiscoGAN are really popular. However, it’s still very challenging for translation tasks with the requirement of high-level visual information conversion, such as photo-to-caricature translation that requires satire, exaggeration, lifelikeness and artistry. We present an approach for learning to translate faces in the wild from the source photo domain to the target caricature domain with different styles, which can also be used for other high-level image-to-image translation tasks. In order to capture global structure with local statistics while translation, we design a dual pathway model with one coarse discriminator and one fine discriminator. For generator, we provide one extra perceptual loss in association with adversarial loss and cycle consistency loss to achieve representation learning for two different domains. Also the style can be learned by the auxiliary noise input. Experiments on photo-to-caricature translation of faces in the wild show considerable performance gain of our proposed method over state-of-the-art translation methods as well as its potential real applications.
Tasks	Caricature, Image-to-Image Translation, Photo-To-Caricature Translation, Representation Learning
Published	2017-11-29
URL	http://arxiv.org/abs/1711.10735v2
PDF	http://arxiv.org/pdf/1711.10735v2.pdf
PWC	https://paperswithcode.com/paper/unpaired-photo-to-caricature-translation-on
Repo	https://github.com/zhengziqiang/P2C
Framework	tf

Parallel WaveNet: Fast High-Fidelity Speech Synthesis


Title	Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Authors	Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis
Abstract	The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices.
Tasks	Speech Synthesis
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10433v1
PDF	http://arxiv.org/pdf/1711.10433v1.pdf
PWC	https://paperswithcode.com/paper/parallel-wavenet-fast-high-fidelity-speech
Repo	https://github.com/lokhiufung/music_generation
Framework	tf

Scene Graph Generation by Iterative Message Passing


Title	Scene Graph Generation by Iterative Message Passing
Authors	Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei
Abstract	Understanding a visual scene goes beyond recognizing individual objects in isolation. Relationships between objects also constitute rich semantic information about the scene. In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image. We propose a novel end-to-end model that generates such structured scene representation from an input image. The model solves the scene graph inference problem using standard RNNs and learns to iteratively improves its predictions via message passing. Our joint inference model can take advantage of contextual cues to make better predictions on objects and their relationships. The experiments show that our model significantly outperforms previous methods for generating scene graphs using Visual Genome dataset and inferring support relations with NYU Depth v2 dataset.
Tasks	Graph Generation, Scene Graph Generation
Published	2017-01-10
URL	http://arxiv.org/abs/1701.02426v2
PDF	http://arxiv.org/pdf/1701.02426v2.pdf
PWC	https://paperswithcode.com/paper/scene-graph-generation-by-iterative-message
Repo	https://github.com/shikorab/SceneGraph
Framework	tf

Generative Adversarial Source Separation


Title	Generative Adversarial Source Separation
Authors	Cem Subakan, Paris Smaragdis
Abstract	Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density. Generative Adversarial Networks (GANs) can learn data distributions without needing a parametric assumption on the output density. We show on a speech source separation experiment that, a multi-layer perceptron trained with a Wasserstein-GAN formulation outperforms NMF, auto-encoders trained with maximum likelihood, and variational auto-encoders in terms of source to distortion ratio.
Tasks
Published	2017-10-30
URL	http://arxiv.org/abs/1710.10779v1
PDF	http://arxiv.org/pdf/1710.10779v1.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-source-separation
Repo	https://github.com/ycemsubakan/sourceseparation_misc
Framework	pytorch

Nonlinear Information Bottleneck


Title	Nonlinear Information Bottleneck
Authors	Artemy Kolchinsky, Brendan D. Tracey, David H. Wolpert
Abstract	Information bottleneck (IB) is a technique for extracting information in one random variable $X$ that is relevant for predicting another random variable $Y$. IB works by encoding $X$ in a compressed “bottleneck” random variable $M$ from which $Y$ can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete $X$ and $Y$ with small state spaces, and continuous $X$ and $Y$ with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous $X$ and $Y$, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed “variational IB” method on several real-world datasets.
Tasks
Published	2017-05-06
URL	https://arxiv.org/abs/1705.02436v9
PDF	https://arxiv.org/pdf/1705.02436v9.pdf
PWC	https://paperswithcode.com/paper/nonlinear-information-bottleneck
Repo	https://github.com/burklight/convex-IB-Lagrangian-PyTorch
Framework	pytorch

A Regularized Framework for Sparse and Structured Neural Attention


Title	A Regularized Framework for Sparse and Structured Neural Attention
Authors	Vlad Niculae, Mathieu Blondel
Abstract	Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Our framework includes softmax and a slight generalization of the recently-proposed sparsemax as special cases. However, we also show how our framework can incorporate modern structured penalties, resulting in more interpretable attention mechanisms, that focus on entire segments or groups of an input. We derive efficient algorithms to compute the forward and backward passes of our attention mechanisms, enabling their use in a neural network trained with backpropagation. To showcase their potential as a drop-in replacement for existing ones, we evaluate our attention mechanisms on three large-scale tasks: textual entailment, machine translation, and sentence summarization. Our attention mechanisms improve interpretability without sacrificing performance; notably, on textual entailment and summarization, we outperform the standard attention mechanisms based on softmax and sparsemax.
Tasks	Machine Translation, Natural Language Inference, Text Summarization
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07704v3
PDF	http://arxiv.org/pdf/1705.07704v3.pdf
PWC	https://paperswithcode.com/paper/a-regularized-framework-for-sparse-and
Repo	https://github.com/weiwang2330/sparse-structured-attention
Framework	pytorch


Title	Joint Topic-Semantic-aware Social Recommendation for Online Voting
Authors	Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, Minyi Guo
Abstract	Online voting is an emerging feature in social networks, in which users can express their attitudes toward various issues and show their unique interest. Online voting imposes new challenges on recommendation, because the propagation of votings heavily depends on the structure of social networks as well as the content of votings. In this paper, we investigate how to utilize these two factors in a comprehensive manner when doing voting recommendation. First, due to the fact that existing text mining methods such as topic model and semantic model cannot well process the content of votings that is typically short and ambiguous, we propose a novel Topic-Enhanced Word Embedding (TEWE) method to learn word and document representation by jointly considering their topics and semantics. Then we propose our Joint Topic-Semantic-aware social Matrix Factorization (JTS-MF) model for voting recommendation. JTS-MF model calculates similarity among users and votings by combining their TEWE representation and structural information of social networks, and preserves this topic-semantic-social similarity during matrix factorization. To evaluate the performance of TEWE representation and JTS-MF model, we conduct extensive experiments on real online voting dataset. The results prove the efficacy of our approach against several state-of-the-art baselines.
Tasks
Published	2017-12-03
URL	http://arxiv.org/abs/1712.00731v1
PDF	http://arxiv.org/pdf/1712.00731v1.pdf
PWC	https://paperswithcode.com/paper/joint-topic-semantic-aware-social
Repo	https://github.com/hwwang55/JTS-MF
Framework	none


Title	Fake News Detection on Social Media: A Data Mining Perspective
Authors	Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, Huan Liu
Abstract	Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of “fake news”, i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users’ social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Tasks	Fake News Detection
Published	2017-08-07
URL	http://arxiv.org/abs/1708.01967v3
PDF	http://arxiv.org/pdf/1708.01967v3.pdf
PWC	https://paperswithcode.com/paper/fake-news-detection-on-social-media-a-data
Repo	https://github.com/KaiDMML/FakeNewsNet
Framework	none


Title	soc2seq: Social Embedding meets Conversation Model
Authors	Parminder Bhatia, Marsal Gavalda, Arash Einolghozati
Abstract	While liking or upvoting a post on a mobile app is easy to do, replying with a written note is much more difficult, due to both the cognitive load of coming up with a meaningful response as well as the mechanics of entering the text. Here we present a novel textual reply generation model that goes beyond the current auto-reply and predictive text entry models by taking into account the content preferences of the user, the idiosyncrasies of their conversational style, and even the structure of their social graph. Specifically, we have developed two types of models for personalized user interactions: a content-based conversation model, which makes use of location together with user information, and a social-graph-based conversation model, which combines content-based conversation models with social graphs.
Tasks
Published	2017-02-17
URL	http://arxiv.org/abs/1702.05512v3
PDF	http://arxiv.org/pdf/1702.05512v3.pdf
PWC	https://paperswithcode.com/paper/soc2seq-social-embedding-meets-conversation
Repo	https://github.com/pbhatia243/Neural_Conversation_Models
Framework	tf

Are GANs Created Equal? A Large-Scale Study


Title	Are GANs Created Equal? A Large-Scale Study
Authors	Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet
Abstract	Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not find evidence that any of the tested algorithms consistently outperforms the non-saturating GAN introduced in \cite{goodfellow2014generative}.
Tasks	Hyperparameter Optimization
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10337v4
PDF	http://arxiv.org/pdf/1711.10337v4.pdf
PWC	https://paperswithcode.com/paper/are-gans-created-equal-a-large-scale-study
Repo	https://github.com/mseitzer/pytorch-fid
Framework	pytorch

Paper Group AWR 185

A selectional auto-encoder approach for document image binarization

Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization

SphereFace: Deep Hypersphere Embedding for Face Recognition

CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting

Text Coherence Analysis Based on Deep Neural Network

Unpaired Photo-to-Caricature Translation on Faces in the Wild

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Scene Graph Generation by Iterative Message Passing

Generative Adversarial Source Separation

Nonlinear Information Bottleneck

A Regularized Framework for Sparse and Structured Neural Attention

Joint Topic-Semantic-aware Social Recommendation for Online Voting

Fake News Detection on Social Media: A Data Mining Perspective

soc2seq: Social Embedding meets Conversation Model

Are GANs Created Equal? A Large-Scale Study

Paper Group AWR 125

Paper Group AWR 177

Paper Group AWR 181