July 27, 2019

2808 words 14 mins read

Paper Group ANR 610

Mathematical foundations of matrix syntax. A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks. Stochastic Nonconvex Optimization with Large Minibatches. Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination. Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization. De-i …

Mathematical foundations of matrix syntax


Title	Mathematical foundations of matrix syntax
Authors	Roman Orus, Roger Martin, Juan Uriagereka
Abstract	Matrix syntax is a formal model of syntactic relations in language. The purpose of this paper is to explain its mathematical foundations, for an audience with some formal background. We make an axiomatic presentation, motivating each axiom on linguistic and practical grounds. The resulting mathematical structure resembles some aspects of quantum mechanics. Matrix syntax allows us to describe a number of language phenomena that are otherwise very difficult to explain, such as linguistic chains, and is arguably a more economical theory of language than most of the theories proposed in the context of the minimalist program in linguistics. In particular, sentences are naturally modelled as vectors in a Hilbert space with a tensor product structure, built from 2x2 matrices belonging to some specific group.
Tasks
Published	2017-10-01
URL	http://arxiv.org/abs/1710.00372v2
PDF	http://arxiv.org/pdf/1710.00372v2.pdf
PWC	https://paperswithcode.com/paper/mathematical-foundations-of-matrix-syntax
Repo
Framework

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks


Title	A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks
Authors	Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren
Abstract	FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this paper, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy efficiency when a CNN is trained with binary constraints on weights and activations. Specifically, we propose an optimized FPGA accelerator architecture tailored for bitwise convolution and normalization that features massive spatial parallelism with deep pipelines stages. A key advantage of the FPGA accelerator is that its performance is insensitive to data batch size, while the performance of GPU acceleration varies largely depending on the batch size of the data. Experiment results show that the proposed accelerator architecture for binary CNNs running on a Virtex-7 FPGA is 8.3x faster and 75x more energy-efficient than a Titan X GPU for processing online individual requests in small batch sizes. For processing static data in large batch sizes, the proposed solution is on a par with a Titan X GPU in terms of throughput while delivering 9.5x higher energy efficiency.
Tasks
Published	2017-02-20
URL	http://arxiv.org/abs/1702.06392v2
PDF	http://arxiv.org/pdf/1702.06392v2.pdf
PWC	https://paperswithcode.com/paper/a-gpu-outperforming-fpga-accelerator
Repo
Framework

Stochastic Nonconvex Optimization with Large Minibatches


Title	Stochastic Nonconvex Optimization with Large Minibatches
Authors	Weiran Wang, Nathan Srebro
Abstract	We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large minibatches of samples, using only first-order gradient information. Our algorithms provably converge to an approximate critical point of the expected objective with faster rates than minibatch stochastic gradient descent, and facilitate better parallelization by allowing larger minibatches.
Tasks	Stochastic Optimization
Published	2017-09-25
URL	http://arxiv.org/abs/1709.08728v4
PDF	http://arxiv.org/pdf/1709.08728v4.pdf
PWC	https://paperswithcode.com/paper/stochastic-nonconvex-optimization-with-large
Repo
Framework

Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination


Title	Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination
Authors	Fei Jiang, Xiao-Yang Liu, Hongtao Lu, Ruimin Shen
Abstract	Sparse coding (SC) is an automatic feature extraction and selection technique that is widely used in unsupervised learning. However, conventional SC vectorizes the input images, which breaks apart the local proximity of pixels and destructs the elementary object structures of images. In this paper, we propose a novel two-dimensional sparse coding (2DSC) scheme that represents the input images as the tensor-linear combinations under a novel algebraic framework. 2DSC learns much more concise dictionaries because it uses the circular convolution operator, since the shifted versions of atoms learned by conventional SC are treated as the same ones. We apply 2DSC to natural images and demonstrate that 2DSC returns meaningful dictionaries for large patches. Moreover, for mutli-spectral images denoising, the proposed 2DSC reduces computational costs with competitive performance in comparison with the state-of-the-art algorithms.
Tasks	Denoising
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09690v1
PDF	http://arxiv.org/pdf/1703.09690v1.pdf
PWC	https://paperswithcode.com/paper/efficient-two-dimensional-sparse-coding-using
Repo
Framework

Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization


Title	Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization
Authors	Aditya Devarakonda, Kimon Fountoulakis, James Demmel, Michael W. Mahoney
Abstract	Parallel computing has played an important role in speeding up convex optimization methods for big data analytics and large-scale machine learning (ML). However, the scalability of these optimization methods is inhibited by the cost of communicating and synchronizing processors in a parallel setting. Iterative ML methods are particularly sensitive to communication cost since they often require communication every iteration. In this work, we extend well-known techniques from Communication-Avoiding Krylov subspace methods to first-order, block coordinate descent methods for Support Vector Machines and Proximal Least-Squares problems. Our Synchronization-Avoiding (SA) variants reduce the latency cost by a tunable factor of $s$ at the expense of a factor of $s$ increase in flops and bandwidth costs. We show that the SA-variants are numerically stable and can attain large speedups of up to $5.1\times$ on a Cray XC30 supercomputer.
Tasks
Published	2017-12-17
URL	http://arxiv.org/abs/1712.06047v1
PDF	http://arxiv.org/pdf/1712.06047v1.pdf
PWC	https://paperswithcode.com/paper/avoiding-synchronization-in-first-order
Repo
Framework

De-identification of medical records using conditional random fields and long short-term memory networks


Title	De-identification of medical records using conditional random fields and long short-term memory networks
Authors	Zhipeng Jiang, Chao Zhao, Bin He, Yi Guan, Jingchi Jiang
Abstract	The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing focuses on the de-identification of psychiatric evaluation records. This paper describes two participating systems of our team, based on conditional random fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing module was introduced for sentence detection and tokenization before de-identification. For CRFs, manually extracted rich features were utilized to train the model. For LSTMs, a character-level bi-directional LSTM network was applied to represent tokens and classify tags for each token, following which a decoding layer was stacked to decode the most probable protected health information (PHI) terms. The LSTM-based system attained an i2b2 strict micro-F_1 measure of 89.86%, which was higher than that of the CRF-based system.
Tasks	Tokenization
Published	2017-09-20
URL	http://arxiv.org/abs/1709.06901v2
PDF	http://arxiv.org/pdf/1709.06901v2.pdf
PWC	https://paperswithcode.com/paper/de-identification-of-medical-records-using
Repo
Framework

Theory of the superposition principle for randomized connectionist representations in neural networks


Title	Theory of the superposition principle for randomized connectionist representations in neural networks
Authors	E. Paxon Frady, Denis Kleyko, Friedrich T. Sommer
Abstract	To understand cognitive reasoning in the brain, it has been proposed that symbols and compositions of symbols are represented by activity patterns (vectors) in a large population of neurons. Formal models implementing this idea [Plate 2003], [Kanerva 2009], [Gayler 2003], [Eliasmith 2012] include a reversible superposition operation for representing with a single vector an entire set of symbols or an ordered sequence of symbols. If the representation space is high-dimensional, large sets of symbols can be superposed and individually retrieved. However, crosstalk noise limits the accuracy of retrieval and information capacity. To understand information processing in the brain and to design artificial neural systems for cognitive reasoning, a theory of this superposition operation is essential. Here, such a theory is presented. The superposition operations in different existing models are mapped to linear neural networks with unitary recurrent matrices, in which retrieval accuracy can be analyzed by a single equation. We show that networks representing information in superposition can achieve a channel capacity of about half a bit per neuron, a significant fraction of the total available entropy. Going beyond existing models, superposition operations with recency effects are proposed that avoid catastrophic forgetting when representing the history of infinite data streams. These novel models correspond to recurrent networks with non-unitary matrices or with nonlinear neurons, and can be analyzed and optimized with an extension of our theory.
Tasks
Published	2017-07-05
URL	http://arxiv.org/abs/1707.01429v1
PDF	http://arxiv.org/pdf/1707.01429v1.pdf
PWC	https://paperswithcode.com/paper/theory-of-the-superposition-principle-for
Repo
Framework

Ultraslow diffusion in language: Dynamics of appearance of already popular adjectives on Japanese blogs


Title	Ultraslow diffusion in language: Dynamics of appearance of already popular adjectives on Japanese blogs
Authors	Hayafumi Watanabe
Abstract	What dynamics govern a time series representing the appearance of words in social media data? In this paper, we investigate an elementary dynamics, from which word-dependent special effects are segregated, such as breaking news, increasing (or decreasing) concerns, or seasonality. To elucidate this problem, we investigated approximately three billion Japanese blog articles over a period of six years, and analysed some corresponding solvable mathematical models. From the analysis, we found that a word appearance can be explained by the random diffusion model based on the power-law forgetting process, which is a type of long memory point process related to ARFIMA(0,0.5,0). In particular, we confirmed that ultraslow diffusion (where the mean squared displacement grows logarithmically), which the model predicts in an approximate manner, reproduces the actual data. In addition, we also show that the model can reproduce other statistical properties of a time series: (i) the fluctuation scaling, (ii) spectrum density, and (iii) shapes of the probability density functions.
Tasks	Time Series
Published	2017-07-21
URL	http://arxiv.org/abs/1707.07066v3
PDF	http://arxiv.org/pdf/1707.07066v3.pdf
PWC	https://paperswithcode.com/paper/ultraslow-diffusion-in-language-dynamics-of
Repo
Framework

Particle Filtering for PLCA model with Application to Music Transcription


Title	Particle Filtering for PLCA model with Application to Music Transcription
Authors	D. Cazau, G. Revillon, W. Yuancheng, O. Adam
Abstract	Automatic Music Transcription (AMT) consists in automatically estimating the notes in an audio recording, through three attributes: onset time, duration and pitch. Probabilistic Latent Component Analysis (PLCA) has become very popular for this task. PLCA is a spectrogram factorization method, able to model a magnitude spectrogram as a linear combination of spectral vectors from a dictionary. Such methods use the Expectation-Maximization (EM) algorithm to estimate the parameters of the acoustic model. This algorithm presents well-known inherent defaults (local convergence, initialization dependency), making EM-based systems limited in their applications to AMT, particularly in regards to the mathematical form and number of priors. To overcome such limits, we propose in this paper to employ a different estimation framework based on Particle Filtering (PF), which consists in sampling the posterior distribution over larger parameter ranges. This framework proves to be more robust in parameter estimation, more flexible and unifying in the integration of prior knowledge in the system. Note-level transcription accuracies of 61.8 $%$ and 59.5 $%$ were achieved on evaluation sound datasets of two different instrument repertoires, including the classical piano (from MAPS dataset) and the marovany zither, and direct comparisons to previous PLCA-based approaches are provided. Steps for further development are also outlined.
Tasks
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09772v1
PDF	http://arxiv.org/pdf/1703.09772v1.pdf
PWC	https://paperswithcode.com/paper/particle-filtering-for-plca-model-with
Repo
Framework

Improved Abusive Comment Moderation with User Embeddings


Title	Improved Abusive Comment Moderation with User Embeddings
Authors	John Pavlopoulos, Prodromos Malakasiotis, Juli Bakagianni, Ion Androutsopoulos
Abstract	Experimenting with a dataset of approximately 1.6M user comments from a Greek news sports portal, we explore how a state of the art RNN-based moderation method can be improved by adding user embeddings, user type embeddings, user biases, or user type biases. We observe improvements in all cases, with user embeddings leading to the biggest performance gains.
Tasks
Published	2017-08-11
URL	http://arxiv.org/abs/1708.03699v1
PDF	http://arxiv.org/pdf/1708.03699v1.pdf
PWC	https://paperswithcode.com/paper/improved-abusive-comment-moderation-with-user
Repo
Framework

Imagination improves Multimodal Translation


Title	Imagination improves Multimodal Translation
Authors	Desmond Elliott, Ákos Kádár
Abstract	We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.
Tasks
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04350v2
PDF	http://arxiv.org/pdf/1705.04350v2.pdf
PWC	https://paperswithcode.com/paper/imagination-improves-multimodal-translation
Repo
Framework

Robust Stereo Feature Descriptor for Visual Odometry


Title	Robust Stereo Feature Descriptor for Visual Odometry
Authors	Ehsan Shojaedini, Reza Safabakhsh
Abstract	In this paper, we propose a simple way to utilize stereo camera data to improve feature descriptors. Computer vision algorithms that use a stereo camera require some calculations of 3D information. We leverage this pre-calculated information to improve feature descriptor algorithms. We use the 3D feature information to estimate the scale of each feature. This way, each feature descriptor will be more robust to scale change without significant computations. In addition, we use stereo images to construct the descriptor vector. The Scale-Invariant Feature Transform (SIFT) and Fast Retina Keypoint (FREAK) descriptors are used to evaluate the proposed method. The scale normalization technique in feature tracking test improves the standard SIFT by 8.75% and improves the standard FREAK by 28.65%. Using the proposed stereo feature descriptor, a visual odometry algorithm is designed and tested on the KITTI dataset. The stereo FREAK descriptor raises the number of inlier matches by 19% and consequently improves the accuracy of visual odometry by 23%.
Tasks	Visual Odometry
Published	2017-08-26
URL	http://arxiv.org/abs/1708.07933v2
PDF	http://arxiv.org/pdf/1708.07933v2.pdf
PWC	https://paperswithcode.com/paper/robust-stereo-feature-descriptor-for-visual
Repo
Framework

A Convergent Algorithm for Bi-orthogonal Nonnegative Matrix Tri-Factorization


Title	A Convergent Algorithm for Bi-orthogonal Nonnegative Matrix Tri-Factorization
Authors	Andri Mirzal
Abstract	A convergent algorithm for nonnegative matrix factorization with orthogonality constraints imposed on both factors is proposed in this paper. This factorization concept was first introduced by Ding et al. with intent to further improve clustering capability of NMF. However, as the original algorithm was developed based on multiplicative update rules, the convergence of the algorithm cannot be guaranteed. In this paper, we utilize the technique presented in our previous work to develop the algorithm and prove that it converges to a stationary point inside the solution space.
Tasks
Published	2017-10-29
URL	http://arxiv.org/abs/1710.11478v2
PDF	http://arxiv.org/pdf/1710.11478v2.pdf
PWC	https://paperswithcode.com/paper/a-convergent-algorithm-for-bi-orthogonal
Repo
Framework

Matrix-normal models for fMRI analysis


Title	Matrix-normal models for fMRI analysis
Authors	Michael Shvartsman, Narayanan Sundaram, Mikio C. Aoi, Adam Charles, Theodore C. Wilke, Jonathan D. Cohen
Abstract	Multivariate analysis of fMRI data has benefited substantially from advances in machine learning. Most recently, a range of probabilistic latent variable models applied to fMRI data have been successful in a variety of tasks, including identifying similarity patterns in neural data (Representational Similarity Analysis and its empirical Bayes variant, RSA and BRSA; Intersubject Functional Connectivity, ISFC), combining multi-subject datasets (Shared Response Mapping; SRM), and mapping between brain and behavior (Joint Modeling). Although these methods share some underpinnings, they have been developed as distinct methods, with distinct algorithms and software tools. We show how the matrix-variate normal (MN) formalism can unify some of these methods into a single framework. In doing so, we gain the ability to reuse noise modeling assumptions, algorithms, and code across models. Our primary theoretical contribution shows how some of these methods can be written as instantiations of the same model, allowing us to generalize them to flexibly modeling structured noise covariances. Our formalism permits novel model variants and improved estimation strategies: in contrast to SRM, the number of parameters for MN-SRM does not scale with the number of voxels or subjects; in contrast to BRSA, the number of parameters for MN-RSA scales additively rather than multiplicatively in the number of voxels. We empirically demonstrate advantages of two new methods derived in the formalism: for MN-RSA, we show up to 10x improvement in runtime, up to 6x improvement in RMSE, and more conservative behavior under the null. For MN-SRM, our method grants a modest improvement to out-of-sample reconstruction while relaxing an orthonormality constraint of SRM. We also provide a software prototyping tool for MN models that can flexibly reuse noise covariance assumptions and algorithms across models.
Tasks	Latent Variable Models
Published	2017-11-08
URL	http://arxiv.org/abs/1711.03058v2
PDF	http://arxiv.org/pdf/1711.03058v2.pdf
PWC	https://paperswithcode.com/paper/matrix-normal-models-for-fmri-analysis
Repo
Framework

Variational Bi-LSTMs


Title	Variational Bi-LSTMs
Authors	Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio
Abstract	Recurrent neural networks like long short-term memory (LSTM) are important architectures for sequential prediction tasks. LSTMs (and RNNs in general) model sequences along the forward time direction. Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data. In the training of Bi-LSTMs, the forward and backward paths are learned independently. We propose a variant of the Bi-LSTM architecture, which we call Variational Bi-LSTM, that creates a channel between the two paths (during training, but which may be omitted during inference); thus optimizing the two paths jointly. We arrive at this joint objective for our model by minimizing a variational lower bound of the joint likelihood of the data sequence. Our model acts as a regularizer and encourages the two networks to inform each other in making their respective predictions using distinct information. We perform ablation studies to better understand the different components of our model and evaluate the method on various benchmarks, showing state-of-the-art performance.
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05717v1
PDF	http://arxiv.org/pdf/1711.05717v1.pdf
PWC	https://paperswithcode.com/paper/variational-bi-lstms
Repo
Framework