July 28, 2019

3165 words 15 mins read

Paper Group ANR 416

Paper Group ANR 416

MomentsNet: a simple learning-free method for binary image recognition. Understanding the Feedforward Artificial Neural Network Model From the Perspective of Network Flow. The loss surface of deep and wide neural networks. Super-Resolution via Deep Learning. Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the em …

MomentsNet: a simple learning-free method for binary image recognition

Title MomentsNet: a simple learning-free method for binary image recognition
Authors Jiasong Wu, Shijie Qiu, Youyong Kong, Yang Chen, Lotfi Senhadji, Huazhong Shu
Abstract In this paper, we propose a new simple and learning-free deep learning network named MomentsNet, whose convolution layer, nonlinear processing layer and pooling layer are constructed by Moments kernels, binary hashing and block-wise histogram, respectively. Twelve typical moments (including geometrical moment, Zernike moment, Tchebichef moment, etc.) are used to construct the MomentsNet whose recognition performance for binary image is studied. The results reveal that MomentsNet has better recognition performance than its corresponding moments in almost all cases and ZernikeNet achieves the best recognition performance among MomentsNet constructed by twelve moments. ZernikeNet also shows better recognition performance on binary image database than that of PCANet, which is a learning-based deep learning network.
Tasks
Published 2017-02-22
URL http://arxiv.org/abs/1702.06767v1
PDF http://arxiv.org/pdf/1702.06767v1.pdf
PWC https://paperswithcode.com/paper/momentsnet-a-simple-learning-free-method-for
Repo
Framework

Understanding the Feedforward Artificial Neural Network Model From the Perspective of Network Flow

Title Understanding the Feedforward Artificial Neural Network Model From the Perspective of Network Flow
Authors Dawei Dai, Weimin Tan, Hong Zhan
Abstract In recent years, deep learning based on artificial neural network (ANN) has achieved great success in pattern recognition. However, there is no clear understanding of such neural computational models. In this paper, we try to unravel “black-box” structure of Ann model from network flow. Specifically, we consider the feed forward Ann as a network flow model, which consists of many directional class-pathways. Each class-pathway encodes one class. The class-pathway of a class is obtained by connecting the activated neural nodes in each layer from input to output, where activation value of neural node (node-value) is defined by the weights of each layer in a trained ANN-classifier. From the perspective of the class-pathway, training an ANN-classifier can be regarded as the formulation process of class-pathways of different classes. By analyzing the the distances of each two class-pathways in a trained ANN-classifiers, we try to answer the questions, why the classifier performs so? At last, from the neural encodes view, we define the importance of each neural node through the class-pathways, which is helpful to optimize the structure of a classifier. Experiments for two types of ANN model including multi-layer MLP and CNN verify that the network flow based on class-pathway is a reasonable explanation for ANN models.
Tasks
Published 2017-04-26
URL http://arxiv.org/abs/1704.08068v1
PDF http://arxiv.org/pdf/1704.08068v1.pdf
PWC https://paperswithcode.com/paper/understanding-the-feedforward-artificial
Repo
Framework

The loss surface of deep and wide neural networks

Title The loss surface of deep and wide neural networks
Authors Quynh Nguyen, Matthias Hein
Abstract While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.
Tasks
Published 2017-04-26
URL http://arxiv.org/abs/1704.08045v2
PDF http://arxiv.org/pdf/1704.08045v2.pdf
PWC https://paperswithcode.com/paper/the-loss-surface-of-deep-and-wide-neural
Repo
Framework

Super-Resolution via Deep Learning

Title Super-Resolution via Deep Learning
Authors Khizar Hayat
Abstract The recent phenomenal interest in convolutional neural networks (CNNs) must have made it inevitable for the super-resolution (SR) community to explore its potential. The response has been immense and in the last three years, since the advent of the pioneering work, there appeared too many works not to warrant a comprehensive survey. This paper surveys the SR literature in the context of deep learning. We focus on the three important aspects of multimedia - namely image, video and multi-dimensions, especially depth maps. In each case, first relevant benchmarks are introduced in the form of datasets and state of the art SR methods, excluding deep learning. Next is a detailed analysis of the individual works, each including a short description of the method and a critique of the results with special reference to the benchmarking done. This is followed by minimum overall benchmarking in the form of comparison on some common dataset, while relying on the results reported in various works.
Tasks Super-Resolution
Published 2017-06-28
URL http://arxiv.org/abs/1706.09077v1
PDF http://arxiv.org/pdf/1706.09077v1.pdf
PWC https://paperswithcode.com/paper/super-resolution-via-deep-learning
Repo
Framework

Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects

Title Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects
Authors Michel Melo Silva, Washington Luis Souza Ramos, Felipe Cadar Chamone, João Pedro Klock Ferreira, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento
Abstract The emergence of low-cost high-quality personal wearable cameras combined with the increasing storage capacity of video-sharing websites have evoked a growing interest in first-person videos, since most videos are composed of long-running unedited streams which are usually tedious and unpleasant to watch. State-of-the-art semantic fast-forward methods currently face the challenge of providing an adequate balance between smoothness in visual flow and the emphasis on the relevant parts. In this work, we present the Multi-Importance Fast-Forward (MIFF), a fully automatic methodology to fast-forward egocentric videos facing these challenges. The dilemma of defining what is the semantic information of a video is addressed by a learning process based on the preferences of the user. Results show that the proposed method keeps over $3$ times more semantic content than the state-of-the-art fast-forward. Finally, we discuss the need of a particular video stabilization technique for fast-forward egocentric videos.
Tasks
Published 2017-11-09
URL http://arxiv.org/abs/1711.03473v3
PDF http://arxiv.org/pdf/1711.03473v3.pdf
PWC https://paperswithcode.com/paper/making-a-long-story-short-a-multi-importance
Repo
Framework

Non-convex Optimization for Machine Learning

Title Non-convex Optimization for Machine Learning
Authors Prateek Jain, Purushottam Kar
Abstract A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization. On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques - popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties. This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. The monograph will lead the reader through several widely used non-convex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems.
Tasks
Published 2017-12-21
URL http://arxiv.org/abs/1712.07897v1
PDF http://arxiv.org/pdf/1712.07897v1.pdf
PWC https://paperswithcode.com/paper/non-convex-optimization-for-machine-learning
Repo
Framework

An analysis of incorporating an external language model into a sequence-to-sequence model

Title An analysis of incorporating an external language model into a sequence-to-sequence model
Authors Anjuli Kannan, Yonghui Wu, Patrick Nguyen, Tara N. Sainath, Zhifeng Chen, Rohit Prabhavalkar
Abstract Attention-based sequence-to-sequence models for automatic speech recognition jointly train an acoustic model, language model, and alignment mechanism. Thus, the language model component is only trained on transcribed audio-text pairs. This leads to the use of shallow fusion with an external language model at inference time. Shallow fusion refers to log-linear interpolation with a separately trained language model at each step of the beam search. In this work, we investigate the behavior of shallow fusion across a range of conditions: different types of language models, different decoding units, and different tasks. On Google Voice Search, we demonstrate that the use of shallow fusion with a neural LM with wordpieces yields a 9.1% relative word error rate reduction (WERR) over our competitive attention-based sequence-to-sequence model, obviating the need for second-pass rescoring.
Tasks Language Modelling, Speech Recognition
Published 2017-12-06
URL http://arxiv.org/abs/1712.01996v1
PDF http://arxiv.org/pdf/1712.01996v1.pdf
PWC https://paperswithcode.com/paper/an-analysis-of-incorporating-an-external
Repo
Framework

N-GrAM: New Groningen Author-profiling Model

Title N-GrAM: New Groningen Author-profiling Model
Authors Angelo Basile, Gareth Dwyer, Maria Medvedeva, Josine Rawee, Hessel Haagsma, Malvina Nissim
Abstract We describe our participation in the PAN 2017 shared task on Author Profiling, identifying authors’ gender and language variety for English, Spanish, Arabic and Portuguese. We describe both the final, submitted system, and a series of negative results. Our aim was to create a single model for both gender and language, and for all language varieties. Our best-performing system (on cross-validated results) is a linear support vector machine (SVM) with word unigrams and character 3- to 5-grams as features. A set of additional features, including POS tags, additional datasets, geographic entities, and Twitter handles, hurt, rather than improve, performance. Results from cross-validation indicated high performance overall and results on the test set confirmed them, at 0.86 averaged accuracy, with performance on sub-tasks ranging from 0.68 to 0.98.
Tasks
Published 2017-07-12
URL http://arxiv.org/abs/1707.03764v1
PDF http://arxiv.org/pdf/1707.03764v1.pdf
PWC https://paperswithcode.com/paper/n-gram-new-groningen-author-profiling-model
Repo
Framework

Post-edit Analysis of Collective Biography Generation

Title Post-edit Analysis of Collective Biography Generation
Authors Bo Han, Will Radford, Anaïs Cadilhac, Art Harol, Andrew Chisholm, Ben Hachey
Abstract Text generation is increasingly common but often requires manual post-editing where high precision is critical to end users. However, manual editing is expensive so we want to ensure this effort is focused on high-value tasks. And we want to maintain stylistic consistency, a particular challenge in crowd settings. We present a case study, analysing human post-editing in the context of a template-based biography generation system. An edit flow visualisation combined with manual characterisation of edits helps identify and prioritise work for improving end-to-end efficiency and accuracy.
Tasks Text Generation
Published 2017-02-20
URL http://arxiv.org/abs/1702.05821v1
PDF http://arxiv.org/pdf/1702.05821v1.pdf
PWC https://paperswithcode.com/paper/post-edit-analysis-of-collective-biography
Repo
Framework

On the Power Spectral Density Applied to the Analysis of Old Canvases

Title On the Power Spectral Density Applied to the Analysis of Old Canvases
Authors Francisco J. Simois, Juan J. Murillo-Fuentes
Abstract A routine task for art historians is painting diagnostics, such as dating or attribution. Signal processing of the X-ray image of a canvas provides useful information about its fabric. However, previous methods may fail when very old and deteriorated artworks or simply canvases of small size are studied. We present a new framework to analyze and further characterize the paintings from their radiographs. First, we start from a general analysis of lattices and provide new unifying results about the theoretical spectra of weaves. Then, we use these results to infer the main structure of the fabric, like the type of weave and the thread densities. We propose a practical estimation of these theoretical results from paintings with the averaged power spectral density (PSD), which provides a more robust tool. Furthermore, we found that the PSD provides a fingerprint that characterizes the whole canvas. We search and discuss some distinctive features we may find in that fingerprint. We apply these results to several masterpieces of the 17th and 18th centuries from the Museo Nacional del Prado to show that this approach yields accurate results in thread counting and is very useful for paintings comparison, even in situations where previous methods fail.
Tasks
Published 2017-05-29
URL http://arxiv.org/abs/1705.10060v1
PDF http://arxiv.org/pdf/1705.10060v1.pdf
PWC https://paperswithcode.com/paper/on-the-power-spectral-density-applied-to-the
Repo
Framework

Encoder-Decoder Shift-Reduce Syntactic Parsing

Title Encoder-Decoder Shift-Reduce Syntactic Parsing
Authors Jiangming Liu, Yue Zhang
Abstract Starting from NMT, encoder-decoder neu- ral networks have been used for many NLP problems. Graph-based models and transition-based models borrowing the en- coder components achieve state-of-the-art performance on dependency parsing and constituent parsing, respectively. How- ever, there has not been work empirically studying the encoder-decoder neural net- works for transition-based parsing. We apply a simple encoder-decoder to this end, achieving comparable results to the parser of Dyer et al. (2015) on standard de- pendency parsing, and outperforming the parser of Vinyals et al. (2015) on con- stituent parsing.
Tasks Dependency Parsing
Published 2017-06-24
URL http://arxiv.org/abs/1706.07905v1
PDF http://arxiv.org/pdf/1706.07905v1.pdf
PWC https://paperswithcode.com/paper/encoder-decoder-shift-reduce-syntactic
Repo
Framework

Narrative Variations in a Virtual Storyteller

Title Narrative Variations in a Virtual Storyteller
Authors Stephanie M. Lukin, Marilyn A. Walker
Abstract Research on storytelling over the last 100 years has distinguished at least two levels of narrative representation (1) story, or fabula; and (2) discourse, or sujhet. We use this distinction to create Fabula Tales, a computational framework for a virtual storyteller that can tell the same story in different ways through the implementation of general narratological variations, such as varying direct vs. indirect speech, character voice (style), point of view, and focalization. A strength of our computational framework is that it is based on very general methods for re-using existing story content, either from fables or from personal narratives collected from blogs. We first explain how a simple annotation tool allows naive annotators to easily create a deep representation of fabula called a story intention graph, and show how we use this representation to generate story tellings automatically. Then we present results of two studies testing our narratological parameters, and showing that different tellings affect the reader’s perception of the story and characters.
Tasks
Published 2017-08-29
URL http://arxiv.org/abs/1708.08585v1
PDF http://arxiv.org/pdf/1708.08585v1.pdf
PWC https://paperswithcode.com/paper/narrative-variations-in-a-virtual-storyteller
Repo
Framework

Deep-ESN: A Multiple Projection-encoding Hierarchical Reservoir Computing Framework

Title Deep-ESN: A Multiple Projection-encoding Hierarchical Reservoir Computing Framework
Authors Qianli Ma, Lifeng Shen, Garrison W. Cottrell
Abstract As an efficient recurrent neural network (RNN) model, reservoir computing (RC) models, such as Echo State Networks, have attracted widespread attention in the last decade. However, while they have had great success with time series data [1], [2], many time series have a multiscale structure, which a single-hidden-layer RC model may have difficulty capturing. In this paper, we propose a novel hierarchical reservoir computing framework we call Deep Echo State Networks (Deep-ESNs). The most distinctive feature of a Deep-ESN is its ability to deal with time series through hierarchical projections. Specifically, when an input time series is projected into the high-dimensional echo-state space of a reservoir, a subsequent encoding layer (e.g., a PCA, autoencoder, or a random projection) can project the echo-state representations into a lower-dimensional space. These low-dimensional representations can then be processed by another ESN. By using projection layers and encoding layers alternately in the hierarchical framework, a Deep-ESN can not only attenuate the effects of the collinearity problem in ESNs, but also fully take advantage of the temporal kernel property of ESNs to explore multiscale dynamics of time series. To fuse the multiscale representations obtained by each reservoir, we add connections from each encoding layer to the last output layer. Theoretical analyses prove that stability of a Deep-ESN is guaranteed by the echo state property (ESP), and the time complexity is equivalent to a conventional ESN. Experimental results on some artificial and real world time series demonstrate that Deep-ESNs can capture multiscale dynamics, and outperform both standard ESNs and previous hierarchical ESN-based models.
Tasks Time Series
Published 2017-11-13
URL http://arxiv.org/abs/1711.05255v1
PDF http://arxiv.org/pdf/1711.05255v1.pdf
PWC https://paperswithcode.com/paper/deep-esn-a-multiple-projection-encoding
Repo
Framework

UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition

Title UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition
Authors Jiankang Deng, Shiyang Cheng, Niannan Xue, Yuxiang Zhou, Stefanos Zafeiriou
Abstract Recently proposed robust 3D face alignment methods establish either dense or sparse correspondence between a 3D face model and a 2D facial image. The use of these methods presents new challenges as well as opportunities for facial texture analysis. In particular, by sampling the image using the fitted model, a facial UV can be created. Unfortunately, due to self-occlusion, such a UV map is always incomplete. In this paper, we propose a framework for training Deep Convolutional Neural Network (DCNN) to complete the facial UV map extracted from in-the-wild images. To this end, we first gather complete UV maps by fitting a 3D Morphable Model (3DMM) to various multiview image and video datasets, as well as leveraging on a new 3D dataset with over 3,000 identities. Second, we devise a meticulously designed architecture that combines local and global adversarial DCNNs to learn an identity-preserving facial UV completion model. We demonstrate that by attaching the completed UV to the fitted mesh and generating instances of arbitrary poses, we can increase pose variations for training deep face recognition/verification models, and minimise pose discrepancy during testing, which lead to better performance. Experiments on both controlled and in-the-wild UV datasets prove the effectiveness of our adversarial UV completion model. We achieve state-of-the-art verification accuracy, $94.05%$, under the CFP frontal-profile protocol only by combining pose augmentation during training and pose discrepancy reduction during testing. We will release the first in-the-wild UV dataset (we refer as WildUV) that comprises of complete facial UV maps from 1,892 identities for research purposes.
Tasks Face Alignment, Face Recognition, Robust Face Recognition, Texture Classification
Published 2017-12-13
URL http://arxiv.org/abs/1712.04695v1
PDF http://arxiv.org/pdf/1712.04695v1.pdf
PWC https://paperswithcode.com/paper/uv-gan-adversarial-facial-uv-map-completion
Repo
Framework
Title Modal Regression based Atomic Representation for Robust Face Recognition
Authors Yulong Wang, Yuan Yan Tang, Luoqing Li, Hong Chen
Abstract Representation based classification (RC) methods such as sparse RC (SRC) have shown great potential in face recognition in recent years. Most previous RC methods are based on the conventional regression models, such as lasso regression, ridge regression or group lasso regression. These regression models essentially impose a predefined assumption on the distribution of the noise variable in the query sample, such as the Gaussian or Laplacian distribution. However, the complicated noises in practice may violate the assumptions and impede the performance of these RC methods. In this paper, we propose a modal regression based atomic representation and classification (MRARC) framework to alleviate such limitation. Unlike previous RC methods, the MRARC framework does not require the noise variable to follow any specific predefined distributions. This gives rise to the capability of MRARC in handling various complex noises in reality. Using MRARC as a general platform, we also develop four novel RC methods for unimodal and multimodal face recognition, respectively. In addition, we devise a general optimization algorithm for the unified MRARC framework based on the alternating direction method of multipliers (ADMM) and half-quadratic theory. The experiments on real-world data validate the efficacy of MRARC for robust face recognition.
Tasks Face Recognition, Robust Face Recognition
Published 2017-11-05
URL http://arxiv.org/abs/1711.05861v1
PDF http://arxiv.org/pdf/1711.05861v1.pdf
PWC https://paperswithcode.com/paper/modal-regression-based-atomic-representation
Repo
Framework
comments powered by Disqus