Paper Group AWR 185
A selectional auto-encoder approach for document image binarization. Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization. SphereFace: Deep Hypersphere Embedding for Face Recognition. CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting. Text Coherence Analysis …
A selectional auto-encoder approach for document image binarization
Title | A selectional auto-encoder approach for document image binarization |
Authors | Jorge Calvo-Zaragoza, Antonio-Javier Gallego |
Abstract | Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of documents analysis systems, and serves as a basis for subsequent steps. Hence it has to be robust in order to allow the full analysis workflow to be successful. Several methods for document image binarization have been proposed so far, most of which are based on hand-crafted image processing strategies. Recently, Convolutional Neural Networks have shown an amazing performance in many disparate duties related to computer vision. In this paper we discuss the use of convolutional auto-encoders devoted to learning an end-to-end map from an input image to its selectional output, in which activations indicate the likelihood of pixels to be either foreground or background. Once trained, documents can therefore be binarized by parsing them through the model and applying a threshold. This approach has proven to outperform existing binarization strategies in a number of document domains. |
Tasks | Information Retrieval |
Published | 2017-06-30 |
URL | http://arxiv.org/abs/1706.10241v3 |
http://arxiv.org/pdf/1706.10241v3.pdf | |
PWC | https://paperswithcode.com/paper/a-selectional-auto-encoder-approach-for |
Repo | https://github.com/ajgallego/document-image-binarization |
Framework | none |
Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization
Title | Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization |
Authors | Tomoyuki Obuchi, Yoshiyuki Kabashima |
Abstract | We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an $\ell_1$-norm. This allows us to avoid repeated optimizations required for literally conducting cross-validation; hence, the computational time can be significantly reduced. The formula is derived through a perturbative approach employing the largeness of the data size and the model dimensionality. An extension to the elastic net regularization is also addressed. The usefulness of the approximate formula is demonstrated on simulated data and the ISOLET dataset from the UCI machine learning repository. |
Tasks | |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05420v2 |
http://arxiv.org/pdf/1711.05420v2.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-cross-validation-in-multinomial |
Repo | https://github.com/T-Obuchi/AcceleratedCVonMLR_matlab |
Framework | none |
SphereFace: Deep Hypersphere Embedding for Face Recognition
Title | SphereFace: Deep Hypersphere Embedding for Face Recognition |
Authors | Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song |
Abstract | This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter $m$. We further derive specific $m$ to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge show the superiority of A-Softmax loss in FR tasks. The code has also been made publicly available. |
Tasks | Face Identification, Face Recognition, Face Verification |
Published | 2017-04-26 |
URL | http://arxiv.org/abs/1704.08063v4 |
http://arxiv.org/pdf/1704.08063v4.pdf | |
PWC | https://paperswithcode.com/paper/sphereface-deep-hypersphere-embedding-for |
Repo | https://github.com/wy1iu/sphereface |
Framework | tf |
CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting
Title | CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting |
Authors | Vishwanath A. Sindagi, Vishal M. Patel |
Abstract | Estimating crowd count in densely crowded scenes is an extremely challenging task due to non-uniform scale variations. In this paper, we propose a novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation. Classifying crowd count into various groups is tantamount to coarsely estimating the total count in the image thereby incorporating a high-level prior into the density estimation network. This enables the layers in the network to learn globally relevant discriminative features which aid in estimating highly refined density maps with lower count error. The joint training is performed in an end-to-end fashion. Extensive experiments on highly challenging publicly available datasets show that the proposed method achieves lower count error and better quality density maps as compared to the recent state-of-the-art methods. |
Tasks | Crowd Counting, Density Estimation, Multi-Task Learning |
Published | 2017-07-30 |
URL | http://arxiv.org/abs/1707.09605v2 |
http://arxiv.org/pdf/1707.09605v2.pdf | |
PWC | https://paperswithcode.com/paper/cnn-based-cascaded-multi-task-learning-of |
Repo | https://github.com/surajdakua/Crowd-Counting-Using-Pytorch |
Framework | pytorch |
Text Coherence Analysis Based on Deep Neural Network
Title | Text Coherence Analysis Based on Deep Neural Network |
Authors | Baiyun Cui, Yingming Li, Yaqing Zhang, Zhongfei Zhang |
Abstract | In this paper, we propose a novel deep coherence model (DCM) using a convolutional neural network architecture to capture the text coherence. The text coherence problem is investigated with a new perspective of learning sentence distributional representation and text coherence modeling simultaneously. In particular, the model captures the interactions between sentences by computing the similarities of their distributional representations. Further, it can be easily trained in an end-to-end fashion. The proposed model is evaluated on a standard Sentence Ordering task. The experimental results demonstrate its effectiveness and promise in coherence assessment showing a significant improvement over the state-of-the-art by a wide margin. |
Tasks | Sentence Ordering |
Published | 2017-10-21 |
URL | http://arxiv.org/abs/1710.07770v1 |
http://arxiv.org/pdf/1710.07770v1.pdf | |
PWC | https://paperswithcode.com/paper/text-coherence-analysis-based-on-deep-neural |
Repo | https://github.com/geekSiddharth/DeepCoherence |
Framework | none |
Unpaired Photo-to-Caricature Translation on Faces in the Wild
Title | Unpaired Photo-to-Caricature Translation on Faces in the Wild |
Authors | Ziqiang Zheng, Wang Chao, Zhibin Yu, Nan Wang, Haiyong Zheng, Bing Zheng |
Abstract | Recently, image-to-image translation has been made much progress owing to the success of conditional Generative Adversarial Networks (cGANs). And some unpaired methods based on cycle consistency loss such as DualGAN, CycleGAN and DiscoGAN are really popular. However, it’s still very challenging for translation tasks with the requirement of high-level visual information conversion, such as photo-to-caricature translation that requires satire, exaggeration, lifelikeness and artistry. We present an approach for learning to translate faces in the wild from the source photo domain to the target caricature domain with different styles, which can also be used for other high-level image-to-image translation tasks. In order to capture global structure with local statistics while translation, we design a dual pathway model with one coarse discriminator and one fine discriminator. For generator, we provide one extra perceptual loss in association with adversarial loss and cycle consistency loss to achieve representation learning for two different domains. Also the style can be learned by the auxiliary noise input. Experiments on photo-to-caricature translation of faces in the wild show considerable performance gain of our proposed method over state-of-the-art translation methods as well as its potential real applications. |
Tasks | Caricature, Image-to-Image Translation, Photo-To-Caricature Translation, Representation Learning |
Published | 2017-11-29 |
URL | http://arxiv.org/abs/1711.10735v2 |
http://arxiv.org/pdf/1711.10735v2.pdf | |
PWC | https://paperswithcode.com/paper/unpaired-photo-to-caricature-translation-on |
Repo | https://github.com/zhengziqiang/P2C |
Framework | tf |
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Title | Parallel WaveNet: Fast High-Fidelity Speech Synthesis |
Authors | Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis |
Abstract | The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices. |
Tasks | Speech Synthesis |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10433v1 |
http://arxiv.org/pdf/1711.10433v1.pdf | |
PWC | https://paperswithcode.com/paper/parallel-wavenet-fast-high-fidelity-speech |
Repo | https://github.com/lokhiufung/music_generation |
Framework | tf |
Scene Graph Generation by Iterative Message Passing
Title | Scene Graph Generation by Iterative Message Passing |
Authors | Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei |
Abstract | Understanding a visual scene goes beyond recognizing individual objects in isolation. Relationships between objects also constitute rich semantic information about the scene. In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image. We propose a novel end-to-end model that generates such structured scene representation from an input image. The model solves the scene graph inference problem using standard RNNs and learns to iteratively improves its predictions via message passing. Our joint inference model can take advantage of contextual cues to make better predictions on objects and their relationships. The experiments show that our model significantly outperforms previous methods for generating scene graphs using Visual Genome dataset and inferring support relations with NYU Depth v2 dataset. |
Tasks | Graph Generation, Scene Graph Generation |
Published | 2017-01-10 |
URL | http://arxiv.org/abs/1701.02426v2 |
http://arxiv.org/pdf/1701.02426v2.pdf | |
PWC | https://paperswithcode.com/paper/scene-graph-generation-by-iterative-message |
Repo | https://github.com/shikorab/SceneGraph |
Framework | tf |
Generative Adversarial Source Separation
Title | Generative Adversarial Source Separation |
Authors | Cem Subakan, Paris Smaragdis |
Abstract | Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density. Generative Adversarial Networks (GANs) can learn data distributions without needing a parametric assumption on the output density. We show on a speech source separation experiment that, a multi-layer perceptron trained with a Wasserstein-GAN formulation outperforms NMF, auto-encoders trained with maximum likelihood, and variational auto-encoders in terms of source to distortion ratio. |
Tasks | |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.10779v1 |
http://arxiv.org/pdf/1710.10779v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-source-separation |
Repo | https://github.com/ycemsubakan/sourceseparation_misc |
Framework | pytorch |
Nonlinear Information Bottleneck
Title | Nonlinear Information Bottleneck |
Authors | Artemy Kolchinsky, Brendan D. Tracey, David H. Wolpert |
Abstract | Information bottleneck (IB) is a technique for extracting information in one random variable $X$ that is relevant for predicting another random variable $Y$. IB works by encoding $X$ in a compressed “bottleneck” random variable $M$ from which $Y$ can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete $X$ and $Y$ with small state spaces, and continuous $X$ and $Y$ with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous $X$ and $Y$, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed “variational IB” method on several real-world datasets. |
Tasks | |
Published | 2017-05-06 |
URL | https://arxiv.org/abs/1705.02436v9 |
https://arxiv.org/pdf/1705.02436v9.pdf | |
PWC | https://paperswithcode.com/paper/nonlinear-information-bottleneck |
Repo | https://github.com/burklight/convex-IB-Lagrangian-PyTorch |
Framework | pytorch |
A Regularized Framework for Sparse and Structured Neural Attention
Title | A Regularized Framework for Sparse and Structured Neural Attention |
Authors | Vlad Niculae, Mathieu Blondel |
Abstract | Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Our framework includes softmax and a slight generalization of the recently-proposed sparsemax as special cases. However, we also show how our framework can incorporate modern structured penalties, resulting in more interpretable attention mechanisms, that focus on entire segments or groups of an input. We derive efficient algorithms to compute the forward and backward passes of our attention mechanisms, enabling their use in a neural network trained with backpropagation. To showcase their potential as a drop-in replacement for existing ones, we evaluate our attention mechanisms on three large-scale tasks: textual entailment, machine translation, and sentence summarization. Our attention mechanisms improve interpretability without sacrificing performance; notably, on textual entailment and summarization, we outperform the standard attention mechanisms based on softmax and sparsemax. |
Tasks | Machine Translation, Natural Language Inference, Text Summarization |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07704v3 |
http://arxiv.org/pdf/1705.07704v3.pdf | |
PWC | https://paperswithcode.com/paper/a-regularized-framework-for-sparse-and |
Repo | https://github.com/weiwang2330/sparse-structured-attention |
Framework | pytorch |
Joint Topic-Semantic-aware Social Recommendation for Online Voting
Title | Joint Topic-Semantic-aware Social Recommendation for Online Voting |
Authors | Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, Minyi Guo |
Abstract | Online voting is an emerging feature in social networks, in which users can express their attitudes toward various issues and show their unique interest. Online voting imposes new challenges on recommendation, because the propagation of votings heavily depends on the structure of social networks as well as the content of votings. In this paper, we investigate how to utilize these two factors in a comprehensive manner when doing voting recommendation. First, due to the fact that existing text mining methods such as topic model and semantic model cannot well process the content of votings that is typically short and ambiguous, we propose a novel Topic-Enhanced Word Embedding (TEWE) method to learn word and document representation by jointly considering their topics and semantics. Then we propose our Joint Topic-Semantic-aware social Matrix Factorization (JTS-MF) model for voting recommendation. JTS-MF model calculates similarity among users and votings by combining their TEWE representation and structural information of social networks, and preserves this topic-semantic-social similarity during matrix factorization. To evaluate the performance of TEWE representation and JTS-MF model, we conduct extensive experiments on real online voting dataset. The results prove the efficacy of our approach against several state-of-the-art baselines. |
Tasks | |
Published | 2017-12-03 |
URL | http://arxiv.org/abs/1712.00731v1 |
http://arxiv.org/pdf/1712.00731v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-topic-semantic-aware-social |
Repo | https://github.com/hwwang55/JTS-MF |
Framework | none |
Fake News Detection on Social Media: A Data Mining Perspective
Title | Fake News Detection on Social Media: A Data Mining Perspective |
Authors | Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, Huan Liu |
Abstract | Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of “fake news”, i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users’ social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media. |
Tasks | Fake News Detection |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.01967v3 |
http://arxiv.org/pdf/1708.01967v3.pdf | |
PWC | https://paperswithcode.com/paper/fake-news-detection-on-social-media-a-data |
Repo | https://github.com/KaiDMML/FakeNewsNet |
Framework | none |
soc2seq: Social Embedding meets Conversation Model
Title | soc2seq: Social Embedding meets Conversation Model |
Authors | Parminder Bhatia, Marsal Gavalda, Arash Einolghozati |
Abstract | While liking or upvoting a post on a mobile app is easy to do, replying with a written note is much more difficult, due to both the cognitive load of coming up with a meaningful response as well as the mechanics of entering the text. Here we present a novel textual reply generation model that goes beyond the current auto-reply and predictive text entry models by taking into account the content preferences of the user, the idiosyncrasies of their conversational style, and even the structure of their social graph. Specifically, we have developed two types of models for personalized user interactions: a content-based conversation model, which makes use of location together with user information, and a social-graph-based conversation model, which combines content-based conversation models with social graphs. |
Tasks | |
Published | 2017-02-17 |
URL | http://arxiv.org/abs/1702.05512v3 |
http://arxiv.org/pdf/1702.05512v3.pdf | |
PWC | https://paperswithcode.com/paper/soc2seq-social-embedding-meets-conversation |
Repo | https://github.com/pbhatia243/Neural_Conversation_Models |
Framework | tf |
Are GANs Created Equal? A Large-Scale Study
Title | Are GANs Created Equal? A Large-Scale Study |
Authors | Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet |
Abstract | Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not find evidence that any of the tested algorithms consistently outperforms the non-saturating GAN introduced in \cite{goodfellow2014generative}. |
Tasks | Hyperparameter Optimization |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10337v4 |
http://arxiv.org/pdf/1711.10337v4.pdf | |
PWC | https://paperswithcode.com/paper/are-gans-created-equal-a-large-scale-study |
Repo | https://github.com/mseitzer/pytorch-fid |
Framework | pytorch |