July 27, 2019

2808 words 14 mins read

Paper Group ANR 610

Paper Group ANR 610

Mathematical foundations of matrix syntax. A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks. Stochastic Nonconvex Optimization with Large Minibatches. Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination. Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization. De-i …

Mathematical foundations of matrix syntax

Title Mathematical foundations of matrix syntax
Authors Roman Orus, Roger Martin, Juan Uriagereka
Abstract Matrix syntax is a formal model of syntactic relations in language. The purpose of this paper is to explain its mathematical foundations, for an audience with some formal background. We make an axiomatic presentation, motivating each axiom on linguistic and practical grounds. The resulting mathematical structure resembles some aspects of quantum mechanics. Matrix syntax allows us to describe a number of language phenomena that are otherwise very difficult to explain, such as linguistic chains, and is arguably a more economical theory of language than most of the theories proposed in the context of the minimalist program in linguistics. In particular, sentences are naturally modelled as vectors in a Hilbert space with a tensor product structure, built from 2x2 matrices belonging to some specific group.
Tasks
Published 2017-10-01
URL http://arxiv.org/abs/1710.00372v2
PDF http://arxiv.org/pdf/1710.00372v2.pdf
PWC https://paperswithcode.com/paper/mathematical-foundations-of-matrix-syntax
Repo
Framework

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

Title A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks
Authors Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren
Abstract FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this paper, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy efficiency when a CNN is trained with binary constraints on weights and activations. Specifically, we propose an optimized FPGA accelerator architecture tailored for bitwise convolution and normalization that features massive spatial parallelism with deep pipelines stages. A key advantage of the FPGA accelerator is that its performance is insensitive to data batch size, while the performance of GPU acceleration varies largely depending on the batch size of the data. Experiment results show that the proposed accelerator architecture for binary CNNs running on a Virtex-7 FPGA is 8.3x faster and 75x more energy-efficient than a Titan X GPU for processing online individual requests in small batch sizes. For processing static data in large batch sizes, the proposed solution is on a par with a Titan X GPU in terms of throughput while delivering 9.5x higher energy efficiency.
Tasks
Published 2017-02-20
URL http://arxiv.org/abs/1702.06392v2
PDF http://arxiv.org/pdf/1702.06392v2.pdf
PWC https://paperswithcode.com/paper/a-gpu-outperforming-fpga-accelerator
Repo
Framework

Stochastic Nonconvex Optimization with Large Minibatches

Title Stochastic Nonconvex Optimization with Large Minibatches
Authors Weiran Wang, Nathan Srebro
Abstract We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large minibatches of samples, using only first-order gradient information. Our algorithms provably converge to an approximate critical point of the expected objective with faster rates than minibatch stochastic gradient descent, and facilitate better parallelization by allowing larger minibatches.
Tasks Stochastic Optimization
Published 2017-09-25
URL http://arxiv.org/abs/1709.08728v4
PDF http://arxiv.org/pdf/1709.08728v4.pdf
PWC https://paperswithcode.com/paper/stochastic-nonconvex-optimization-with-large
Repo
Framework

Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination

Title Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination
Authors Fei Jiang, Xiao-Yang Liu, Hongtao Lu, Ruimin Shen
Abstract Sparse coding (SC) is an automatic feature extraction and selection technique that is widely used in unsupervised learning. However, conventional SC vectorizes the input images, which breaks apart the local proximity of pixels and destructs the elementary object structures of images. In this paper, we propose a novel two-dimensional sparse coding (2DSC) scheme that represents the input images as the tensor-linear combinations under a novel algebraic framework. 2DSC learns much more concise dictionaries because it uses the circular convolution operator, since the shifted versions of atoms learned by conventional SC are treated as the same ones. We apply 2DSC to natural images and demonstrate that 2DSC returns meaningful dictionaries for large patches. Moreover, for mutli-spectral images denoising, the proposed 2DSC reduces computational costs with competitive performance in comparison with the state-of-the-art algorithms.
Tasks Denoising
Published 2017-03-28
URL http://arxiv.org/abs/1703.09690v1
PDF http://arxiv.org/pdf/1703.09690v1.pdf
PWC https://paperswithcode.com/paper/efficient-two-dimensional-sparse-coding-using
Repo
Framework

Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization

Title Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization
Authors Aditya Devarakonda, Kimon Fountoulakis, James Demmel, Michael W. Mahoney
Abstract Parallel computing has played an important role in speeding up convex optimization methods for big data analytics and large-scale machine learning (ML). However, the scalability of these optimization methods is inhibited by the cost of communicating and synchronizing processors in a parallel setting. Iterative ML methods are particularly sensitive to communication cost since they often require communication every iteration. In this work, we extend well-known techniques from Communication-Avoiding Krylov subspace methods to first-order, block coordinate descent methods for Support Vector Machines and Proximal Least-Squares problems. Our Synchronization-Avoiding (SA) variants reduce the latency cost by a tunable factor of $s$ at the expense of a factor of $s$ increase in flops and bandwidth costs. We show that the SA-variants are numerically stable and can attain large speedups of up to $5.1\times$ on a Cray XC30 supercomputer.
Tasks
Published 2017-12-17
URL http://arxiv.org/abs/1712.06047v1
PDF http://arxiv.org/pdf/1712.06047v1.pdf
PWC https://paperswithcode.com/paper/avoiding-synchronization-in-first-order
Repo
Framework

De-identification of medical records using conditional random fields and long short-term memory networks

Title De-identification of medical records using conditional random fields and long short-term memory networks
Authors Zhipeng Jiang, Chao Zhao, Bin He, Yi Guan, Jingchi Jiang
Abstract The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing focuses on the de-identification of psychiatric evaluation records. This paper describes two participating systems of our team, based on conditional random fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing module was introduced for sentence detection and tokenization before de-identification. For CRFs, manually extracted rich features were utilized to train the model. For LSTMs, a character-level bi-directional LSTM network was applied to represent tokens and classify tags for each token, following which a decoding layer was stacked to decode the most probable protected health information (PHI) terms. The LSTM-based system attained an i2b2 strict micro-F_1 measure of 89.86%, which was higher than that of the CRF-based system.
Tasks Tokenization
Published 2017-09-20
URL http://arxiv.org/abs/1709.06901v2
PDF http://arxiv.org/pdf/1709.06901v2.pdf
PWC https://paperswithcode.com/paper/de-identification-of-medical-records-using
Repo
Framework

Theory of the superposition principle for randomized connectionist representations in neural networks

Title Theory of the superposition principle for randomized connectionist representations in neural networks
Authors E. Paxon Frady, Denis Kleyko, Friedrich T. Sommer
Abstract To understand cognitive reasoning in the brain, it has been proposed that symbols and compositions of symbols are represented by activity patterns (vectors) in a large population of neurons. Formal models implementing this idea [Plate 2003], [Kanerva 2009], [Gayler 2003], [Eliasmith 2012] include a reversible superposition operation for representing with a single vector an entire set of symbols or an ordered sequence of symbols. If the representation space is high-dimensional, large sets of symbols can be superposed and individually retrieved. However, crosstalk noise limits the accuracy of retrieval and information capacity. To understand information processing in the brain and to design artificial neural systems for cognitive reasoning, a theory of this superposition operation is essential. Here, such a theory is presented. The superposition operations in different existing models are mapped to linear neural networks with unitary recurrent matrices, in which retrieval accuracy can be analyzed by a single equation. We show that networks representing information in superposition can achieve a channel capacity of about half a bit per neuron, a significant fraction of the total available entropy. Going beyond existing models, superposition operations with recency effects are proposed that avoid catastrophic forgetting when representing the history of infinite data streams. These novel models correspond to recurrent networks with non-unitary matrices or with nonlinear neurons, and can be analyzed and optimized with an extension of our theory.
Tasks
Published 2017-07-05
URL http://arxiv.org/abs/1707.01429v1
PDF http://arxiv.org/pdf/1707.01429v1.pdf
PWC https://paperswithcode.com/paper/theory-of-the-superposition-principle-for
Repo
Framework
Title Ultraslow diffusion in language: Dynamics of appearance of already popular adjectives on Japanese blogs
Authors Hayafumi Watanabe
Abstract What dynamics govern a time series representing the appearance of words in social media data? In this paper, we investigate an elementary dynamics, from which word-dependent special effects are segregated, such as breaking news, increasing (or decreasing) concerns, or seasonality. To elucidate this problem, we investigated approximately three billion Japanese blog articles over a period of six years, and analysed some corresponding solvable mathematical models. From the analysis, we found that a word appearance can be explained by the random diffusion model based on the power-law forgetting process, which is a type of long memory point process related to ARFIMA(0,0.5,0). In particular, we confirmed that ultraslow diffusion (where the mean squared displacement grows logarithmically), which the model predicts in an approximate manner, reproduces the actual data. In addition, we also show that the model can reproduce other statistical properties of a time series: (i) the fluctuation scaling, (ii) spectrum density, and (iii) shapes of the probability density functions.
Tasks Time Series
Published 2017-07-21
URL http://arxiv.org/abs/1707.07066v3
PDF http://arxiv.org/pdf/1707.07066v3.pdf
PWC https://paperswithcode.com/paper/ultraslow-diffusion-in-language-dynamics-of
Repo
Framework

Particle Filtering for PLCA model with Application to Music Transcription

Title Particle Filtering for PLCA model with Application to Music Transcription
Authors D. Cazau, G. Revillon, W. Yuancheng, O. Adam
Abstract Automatic Music Transcription (AMT) consists in automatically estimating the notes in an audio recording, through three attributes: onset time, duration and pitch. Probabilistic Latent Component Analysis (PLCA) has become very popular for this task. PLCA is a spectrogram factorization method, able to model a magnitude spectrogram as a linear combination of spectral vectors from a dictionary. Such methods use the Expectation-Maximization (EM) algorithm to estimate the parameters of the acoustic model. This algorithm presents well-known inherent defaults (local convergence, initialization dependency), making EM-based systems limited in their applications to AMT, particularly in regards to the mathematical form and number of priors. To overcome such limits, we propose in this paper to employ a different estimation framework based on Particle Filtering (PF), which consists in sampling the posterior distribution over larger parameter ranges. This framework proves to be more robust in parameter estimation, more flexible and unifying in the integration of prior knowledge in the system. Note-level transcription accuracies of 61.8 $%$ and 59.5 $%$ were achieved on evaluation sound datasets of two different instrument repertoires, including the classical piano (from MAPS dataset) and the marovany zither, and direct comparisons to previous PLCA-based approaches are provided. Steps for further development are also outlined.
Tasks
Published 2017-03-28
URL http://arxiv.org/abs/1703.09772v1
PDF http://arxiv.org/pdf/1703.09772v1.pdf
PWC https://paperswithcode.com/paper/particle-filtering-for-plca-model-with
Repo
Framework

Improved Abusive Comment Moderation with User Embeddings

Title Improved Abusive Comment Moderation with User Embeddings
Authors John Pavlopoulos, Prodromos Malakasiotis, Juli Bakagianni, Ion Androutsopoulos
Abstract Experimenting with a dataset of approximately 1.6M user comments from a Greek news sports portal, we explore how a state of the art RNN-based moderation method can be improved by adding user embeddings, user type embeddings, user biases, or user type biases. We observe improvements in all cases, with user embeddings leading to the biggest performance gains.
Tasks
Published 2017-08-11
URL http://arxiv.org/abs/1708.03699v1
PDF http://arxiv.org/pdf/1708.03699v1.pdf
PWC https://paperswithcode.com/paper/improved-abusive-comment-moderation-with-user
Repo
Framework

Imagination improves Multimodal Translation

Title Imagination improves Multimodal Translation
Authors Desmond Elliott, Ákos Kádár
Abstract We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.
Tasks
Published 2017-05-11
URL http://arxiv.org/abs/1705.04350v2
PDF http://arxiv.org/pdf/1705.04350v2.pdf
PWC https://paperswithcode.com/paper/imagination-improves-multimodal-translation
Repo
Framework

Robust Stereo Feature Descriptor for Visual Odometry

Title Robust Stereo Feature Descriptor for Visual Odometry
Authors Ehsan Shojaedini, Reza Safabakhsh
Abstract In this paper, we propose a simple way to utilize stereo camera data to improve feature descriptors. Computer vision algorithms that use a stereo camera require some calculations of 3D information. We leverage this pre-calculated information to improve feature descriptor algorithms. We use the 3D feature information to estimate the scale of each feature. This way, each feature descriptor will be more robust to scale change without significant computations. In addition, we use stereo images to construct the descriptor vector. The Scale-Invariant Feature Transform (SIFT) and Fast Retina Keypoint (FREAK) descriptors are used to evaluate the proposed method. The scale normalization technique in feature tracking test improves the standard SIFT by 8.75% and improves the standard FREAK by 28.65%. Using the proposed stereo feature descriptor, a visual odometry algorithm is designed and tested on the KITTI dataset. The stereo FREAK descriptor raises the number of inlier matches by 19% and consequently improves the accuracy of visual odometry by 23%.
Tasks Visual Odometry
Published 2017-08-26
URL http://arxiv.org/abs/1708.07933v2
PDF http://arxiv.org/pdf/1708.07933v2.pdf
PWC https://paperswithcode.com/paper/robust-stereo-feature-descriptor-for-visual
Repo
Framework

A Convergent Algorithm for Bi-orthogonal Nonnegative Matrix Tri-Factorization

Title A Convergent Algorithm for Bi-orthogonal Nonnegative Matrix Tri-Factorization
Authors Andri Mirzal
Abstract A convergent algorithm for nonnegative matrix factorization with orthogonality constraints imposed on both factors is proposed in this paper. This factorization concept was first introduced by Ding et al. with intent to further improve clustering capability of NMF. However, as the original algorithm was developed based on multiplicative update rules, the convergence of the algorithm cannot be guaranteed. In this paper, we utilize the technique presented in our previous work to develop the algorithm and prove that it converges to a stationary point inside the solution space.
Tasks
Published 2017-10-29
URL http://arxiv.org/abs/1710.11478v2
PDF http://arxiv.org/pdf/1710.11478v2.pdf
PWC https://paperswithcode.com/paper/a-convergent-algorithm-for-bi-orthogonal
Repo
Framework

Matrix-normal models for fMRI analysis

Title Matrix-normal models for fMRI analysis
Authors Michael Shvartsman, Narayanan Sundaram, Mikio C. Aoi, Adam Charles, Theodore C. Wilke, Jonathan D. Cohen
Abstract Multivariate analysis of fMRI data has benefited substantially from advances in machine learning. Most recently, a range of probabilistic latent variable models applied to fMRI data have been successful in a variety of tasks, including identifying similarity patterns in neural data (Representational Similarity Analysis and its empirical Bayes variant, RSA and BRSA; Intersubject Functional Connectivity, ISFC), combining multi-subject datasets (Shared Response Mapping; SRM), and mapping between brain and behavior (Joint Modeling). Although these methods share some underpinnings, they have been developed as distinct methods, with distinct algorithms and software tools. We show how the matrix-variate normal (MN) formalism can unify some of these methods into a single framework. In doing so, we gain the ability to reuse noise modeling assumptions, algorithms, and code across models. Our primary theoretical contribution shows how some of these methods can be written as instantiations of the same model, allowing us to generalize them to flexibly modeling structured noise covariances. Our formalism permits novel model variants and improved estimation strategies: in contrast to SRM, the number of parameters for MN-SRM does not scale with the number of voxels or subjects; in contrast to BRSA, the number of parameters for MN-RSA scales additively rather than multiplicatively in the number of voxels. We empirically demonstrate advantages of two new methods derived in the formalism: for MN-RSA, we show up to 10x improvement in runtime, up to 6x improvement in RMSE, and more conservative behavior under the null. For MN-SRM, our method grants a modest improvement to out-of-sample reconstruction while relaxing an orthonormality constraint of SRM. We also provide a software prototyping tool for MN models that can flexibly reuse noise covariance assumptions and algorithms across models.
Tasks Latent Variable Models
Published 2017-11-08
URL http://arxiv.org/abs/1711.03058v2
PDF http://arxiv.org/pdf/1711.03058v2.pdf
PWC https://paperswithcode.com/paper/matrix-normal-models-for-fmri-analysis
Repo
Framework

Variational Bi-LSTMs

Title Variational Bi-LSTMs
Authors Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio
Abstract Recurrent neural networks like long short-term memory (LSTM) are important architectures for sequential prediction tasks. LSTMs (and RNNs in general) model sequences along the forward time direction. Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data. In the training of Bi-LSTMs, the forward and backward paths are learned independently. We propose a variant of the Bi-LSTM architecture, which we call Variational Bi-LSTM, that creates a channel between the two paths (during training, but which may be omitted during inference); thus optimizing the two paths jointly. We arrive at this joint objective for our model by minimizing a variational lower bound of the joint likelihood of the data sequence. Our model acts as a regularizer and encourages the two networks to inform each other in making their respective predictions using distinct information. We perform ablation studies to better understand the different components of our model and evaluate the method on various benchmarks, showing state-of-the-art performance.
Tasks
Published 2017-11-15
URL http://arxiv.org/abs/1711.05717v1
PDF http://arxiv.org/pdf/1711.05717v1.pdf
PWC https://paperswithcode.com/paper/variational-bi-lstms
Repo
Framework
comments powered by Disqus