Paper Group ANR 610
![Paper Group ANR 610](/2017/images/pwc/paper-arxiv_hu144ec288a26b3e360d673e256787de3e_28623_900x500_fit_q75_box.jpg)
Mathematical foundations of matrix syntax. A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks. Stochastic Nonconvex Optimization with Large Minibatches. Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination. Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization. De-i …
Mathematical foundations of matrix syntax
Title | Mathematical foundations of matrix syntax |
Authors | Roman Orus, Roger Martin, Juan Uriagereka |
Abstract | Matrix syntax is a formal model of syntactic relations in language. The purpose of this paper is to explain its mathematical foundations, for an audience with some formal background. We make an axiomatic presentation, motivating each axiom on linguistic and practical grounds. The resulting mathematical structure resembles some aspects of quantum mechanics. Matrix syntax allows us to describe a number of language phenomena that are otherwise very difficult to explain, such as linguistic chains, and is arguably a more economical theory of language than most of the theories proposed in the context of the minimalist program in linguistics. In particular, sentences are naturally modelled as vectors in a Hilbert space with a tensor product structure, built from 2x2 matrices belonging to some specific group. |
Tasks | |
Published | 2017-10-01 |
URL | http://arxiv.org/abs/1710.00372v2 |
http://arxiv.org/pdf/1710.00372v2.pdf | |
PWC | https://paperswithcode.com/paper/mathematical-foundations-of-matrix-syntax |
Repo | |
Framework | |
A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks
Title | A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks |
Authors | Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren |
Abstract | FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this paper, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy efficiency when a CNN is trained with binary constraints on weights and activations. Specifically, we propose an optimized FPGA accelerator architecture tailored for bitwise convolution and normalization that features massive spatial parallelism with deep pipelines stages. A key advantage of the FPGA accelerator is that its performance is insensitive to data batch size, while the performance of GPU acceleration varies largely depending on the batch size of the data. Experiment results show that the proposed accelerator architecture for binary CNNs running on a Virtex-7 FPGA is 8.3x faster and 75x more energy-efficient than a Titan X GPU for processing online individual requests in small batch sizes. For processing static data in large batch sizes, the proposed solution is on a par with a Titan X GPU in terms of throughput while delivering 9.5x higher energy efficiency. |
Tasks | |
Published | 2017-02-20 |
URL | http://arxiv.org/abs/1702.06392v2 |
http://arxiv.org/pdf/1702.06392v2.pdf | |
PWC | https://paperswithcode.com/paper/a-gpu-outperforming-fpga-accelerator |
Repo | |
Framework | |
Stochastic Nonconvex Optimization with Large Minibatches
Title | Stochastic Nonconvex Optimization with Large Minibatches |
Authors | Weiran Wang, Nathan Srebro |
Abstract | We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large minibatches of samples, using only first-order gradient information. Our algorithms provably converge to an approximate critical point of the expected objective with faster rates than minibatch stochastic gradient descent, and facilitate better parallelization by allowing larger minibatches. |
Tasks | Stochastic Optimization |
Published | 2017-09-25 |
URL | http://arxiv.org/abs/1709.08728v4 |
http://arxiv.org/pdf/1709.08728v4.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-nonconvex-optimization-with-large |
Repo | |
Framework | |
Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination
Title | Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination |
Authors | Fei Jiang, Xiao-Yang Liu, Hongtao Lu, Ruimin Shen |
Abstract | Sparse coding (SC) is an automatic feature extraction and selection technique that is widely used in unsupervised learning. However, conventional SC vectorizes the input images, which breaks apart the local proximity of pixels and destructs the elementary object structures of images. In this paper, we propose a novel two-dimensional sparse coding (2DSC) scheme that represents the input images as the tensor-linear combinations under a novel algebraic framework. 2DSC learns much more concise dictionaries because it uses the circular convolution operator, since the shifted versions of atoms learned by conventional SC are treated as the same ones. We apply 2DSC to natural images and demonstrate that 2DSC returns meaningful dictionaries for large patches. Moreover, for mutli-spectral images denoising, the proposed 2DSC reduces computational costs with competitive performance in comparison with the state-of-the-art algorithms. |
Tasks | Denoising |
Published | 2017-03-28 |
URL | http://arxiv.org/abs/1703.09690v1 |
http://arxiv.org/pdf/1703.09690v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-two-dimensional-sparse-coding-using |
Repo | |
Framework | |
Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization
Title | Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization |
Authors | Aditya Devarakonda, Kimon Fountoulakis, James Demmel, Michael W. Mahoney |
Abstract | Parallel computing has played an important role in speeding up convex optimization methods for big data analytics and large-scale machine learning (ML). However, the scalability of these optimization methods is inhibited by the cost of communicating and synchronizing processors in a parallel setting. Iterative ML methods are particularly sensitive to communication cost since they often require communication every iteration. In this work, we extend well-known techniques from Communication-Avoiding Krylov subspace methods to first-order, block coordinate descent methods for Support Vector Machines and Proximal Least-Squares problems. Our Synchronization-Avoiding (SA) variants reduce the latency cost by a tunable factor of $s$ at the expense of a factor of $s$ increase in flops and bandwidth costs. We show that the SA-variants are numerically stable and can attain large speedups of up to $5.1\times$ on a Cray XC30 supercomputer. |
Tasks | |
Published | 2017-12-17 |
URL | http://arxiv.org/abs/1712.06047v1 |
http://arxiv.org/pdf/1712.06047v1.pdf | |
PWC | https://paperswithcode.com/paper/avoiding-synchronization-in-first-order |
Repo | |
Framework | |
De-identification of medical records using conditional random fields and long short-term memory networks
Title | De-identification of medical records using conditional random fields and long short-term memory networks |
Authors | Zhipeng Jiang, Chao Zhao, Bin He, Yi Guan, Jingchi Jiang |
Abstract | The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing focuses on the de-identification of psychiatric evaluation records. This paper describes two participating systems of our team, based on conditional random fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing module was introduced for sentence detection and tokenization before de-identification. For CRFs, manually extracted rich features were utilized to train the model. For LSTMs, a character-level bi-directional LSTM network was applied to represent tokens and classify tags for each token, following which a decoding layer was stacked to decode the most probable protected health information (PHI) terms. The LSTM-based system attained an i2b2 strict micro-F_1 measure of 89.86%, which was higher than that of the CRF-based system. |
Tasks | Tokenization |
Published | 2017-09-20 |
URL | http://arxiv.org/abs/1709.06901v2 |
http://arxiv.org/pdf/1709.06901v2.pdf | |
PWC | https://paperswithcode.com/paper/de-identification-of-medical-records-using |
Repo | |
Framework | |
Theory of the superposition principle for randomized connectionist representations in neural networks
Title | Theory of the superposition principle for randomized connectionist representations in neural networks |
Authors | E. Paxon Frady, Denis Kleyko, Friedrich T. Sommer |
Abstract | To understand cognitive reasoning in the brain, it has been proposed that symbols and compositions of symbols are represented by activity patterns (vectors) in a large population of neurons. Formal models implementing this idea [Plate 2003], [Kanerva 2009], [Gayler 2003], [Eliasmith 2012] include a reversible superposition operation for representing with a single vector an entire set of symbols or an ordered sequence of symbols. If the representation space is high-dimensional, large sets of symbols can be superposed and individually retrieved. However, crosstalk noise limits the accuracy of retrieval and information capacity. To understand information processing in the brain and to design artificial neural systems for cognitive reasoning, a theory of this superposition operation is essential. Here, such a theory is presented. The superposition operations in different existing models are mapped to linear neural networks with unitary recurrent matrices, in which retrieval accuracy can be analyzed by a single equation. We show that networks representing information in superposition can achieve a channel capacity of about half a bit per neuron, a significant fraction of the total available entropy. Going beyond existing models, superposition operations with recency effects are proposed that avoid catastrophic forgetting when representing the history of infinite data streams. These novel models correspond to recurrent networks with non-unitary matrices or with nonlinear neurons, and can be analyzed and optimized with an extension of our theory. |
Tasks | |
Published | 2017-07-05 |
URL | http://arxiv.org/abs/1707.01429v1 |
http://arxiv.org/pdf/1707.01429v1.pdf | |
PWC | https://paperswithcode.com/paper/theory-of-the-superposition-principle-for |
Repo | |
Framework | |
Ultraslow diffusion in language: Dynamics of appearance of already popular adjectives on Japanese blogs
Title | Ultraslow diffusion in language: Dynamics of appearance of already popular adjectives on Japanese blogs |
Authors | Hayafumi Watanabe |
Abstract | What dynamics govern a time series representing the appearance of words in social media data? In this paper, we investigate an elementary dynamics, from which word-dependent special effects are segregated, such as breaking news, increasing (or decreasing) concerns, or seasonality. To elucidate this problem, we investigated approximately three billion Japanese blog articles over a period of six years, and analysed some corresponding solvable mathematical models. From the analysis, we found that a word appearance can be explained by the random diffusion model based on the power-law forgetting process, which is a type of long memory point process related to ARFIMA(0,0.5,0). In particular, we confirmed that ultraslow diffusion (where the mean squared displacement grows logarithmically), which the model predicts in an approximate manner, reproduces the actual data. In addition, we also show that the model can reproduce other statistical properties of a time series: (i) the fluctuation scaling, (ii) spectrum density, and (iii) shapes of the probability density functions. |
Tasks | Time Series |
Published | 2017-07-21 |
URL | http://arxiv.org/abs/1707.07066v3 |
http://arxiv.org/pdf/1707.07066v3.pdf | |
PWC | https://paperswithcode.com/paper/ultraslow-diffusion-in-language-dynamics-of |
Repo | |
Framework | |
Particle Filtering for PLCA model with Application to Music Transcription
Title | Particle Filtering for PLCA model with Application to Music Transcription |
Authors | D. Cazau, G. Revillon, W. Yuancheng, O. Adam |
Abstract | Automatic Music Transcription (AMT) consists in automatically estimating the notes in an audio recording, through three attributes: onset time, duration and pitch. Probabilistic Latent Component Analysis (PLCA) has become very popular for this task. PLCA is a spectrogram factorization method, able to model a magnitude spectrogram as a linear combination of spectral vectors from a dictionary. Such methods use the Expectation-Maximization (EM) algorithm to estimate the parameters of the acoustic model. This algorithm presents well-known inherent defaults (local convergence, initialization dependency), making EM-based systems limited in their applications to AMT, particularly in regards to the mathematical form and number of priors. To overcome such limits, we propose in this paper to employ a different estimation framework based on Particle Filtering (PF), which consists in sampling the posterior distribution over larger parameter ranges. This framework proves to be more robust in parameter estimation, more flexible and unifying in the integration of prior knowledge in the system. Note-level transcription accuracies of 61.8 $%$ and 59.5 $%$ were achieved on evaluation sound datasets of two different instrument repertoires, including the classical piano (from MAPS dataset) and the marovany zither, and direct comparisons to previous PLCA-based approaches are provided. Steps for further development are also outlined. |
Tasks | |
Published | 2017-03-28 |
URL | http://arxiv.org/abs/1703.09772v1 |
http://arxiv.org/pdf/1703.09772v1.pdf | |
PWC | https://paperswithcode.com/paper/particle-filtering-for-plca-model-with |
Repo | |
Framework | |
Improved Abusive Comment Moderation with User Embeddings
Title | Improved Abusive Comment Moderation with User Embeddings |
Authors | John Pavlopoulos, Prodromos Malakasiotis, Juli Bakagianni, Ion Androutsopoulos |
Abstract | Experimenting with a dataset of approximately 1.6M user comments from a Greek news sports portal, we explore how a state of the art RNN-based moderation method can be improved by adding user embeddings, user type embeddings, user biases, or user type biases. We observe improvements in all cases, with user embeddings leading to the biggest performance gains. |
Tasks | |
Published | 2017-08-11 |
URL | http://arxiv.org/abs/1708.03699v1 |
http://arxiv.org/pdf/1708.03699v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-abusive-comment-moderation-with-user |
Repo | |
Framework | |
Imagination improves Multimodal Translation
Title | Imagination improves Multimodal Translation |
Authors | Desmond Elliott, Ákos Kádár |
Abstract | We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text. |
Tasks | |
Published | 2017-05-11 |
URL | http://arxiv.org/abs/1705.04350v2 |
http://arxiv.org/pdf/1705.04350v2.pdf | |
PWC | https://paperswithcode.com/paper/imagination-improves-multimodal-translation |
Repo | |
Framework | |
Robust Stereo Feature Descriptor for Visual Odometry
Title | Robust Stereo Feature Descriptor for Visual Odometry |
Authors | Ehsan Shojaedini, Reza Safabakhsh |
Abstract | In this paper, we propose a simple way to utilize stereo camera data to improve feature descriptors. Computer vision algorithms that use a stereo camera require some calculations of 3D information. We leverage this pre-calculated information to improve feature descriptor algorithms. We use the 3D feature information to estimate the scale of each feature. This way, each feature descriptor will be more robust to scale change without significant computations. In addition, we use stereo images to construct the descriptor vector. The Scale-Invariant Feature Transform (SIFT) and Fast Retina Keypoint (FREAK) descriptors are used to evaluate the proposed method. The scale normalization technique in feature tracking test improves the standard SIFT by 8.75% and improves the standard FREAK by 28.65%. Using the proposed stereo feature descriptor, a visual odometry algorithm is designed and tested on the KITTI dataset. The stereo FREAK descriptor raises the number of inlier matches by 19% and consequently improves the accuracy of visual odometry by 23%. |
Tasks | Visual Odometry |
Published | 2017-08-26 |
URL | http://arxiv.org/abs/1708.07933v2 |
http://arxiv.org/pdf/1708.07933v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-stereo-feature-descriptor-for-visual |
Repo | |
Framework | |
A Convergent Algorithm for Bi-orthogonal Nonnegative Matrix Tri-Factorization
Title | A Convergent Algorithm for Bi-orthogonal Nonnegative Matrix Tri-Factorization |
Authors | Andri Mirzal |
Abstract | A convergent algorithm for nonnegative matrix factorization with orthogonality constraints imposed on both factors is proposed in this paper. This factorization concept was first introduced by Ding et al. with intent to further improve clustering capability of NMF. However, as the original algorithm was developed based on multiplicative update rules, the convergence of the algorithm cannot be guaranteed. In this paper, we utilize the technique presented in our previous work to develop the algorithm and prove that it converges to a stationary point inside the solution space. |
Tasks | |
Published | 2017-10-29 |
URL | http://arxiv.org/abs/1710.11478v2 |
http://arxiv.org/pdf/1710.11478v2.pdf | |
PWC | https://paperswithcode.com/paper/a-convergent-algorithm-for-bi-orthogonal |
Repo | |
Framework | |
Matrix-normal models for fMRI analysis
Title | Matrix-normal models for fMRI analysis |
Authors | Michael Shvartsman, Narayanan Sundaram, Mikio C. Aoi, Adam Charles, Theodore C. Wilke, Jonathan D. Cohen |
Abstract | Multivariate analysis of fMRI data has benefited substantially from advances in machine learning. Most recently, a range of probabilistic latent variable models applied to fMRI data have been successful in a variety of tasks, including identifying similarity patterns in neural data (Representational Similarity Analysis and its empirical Bayes variant, RSA and BRSA; Intersubject Functional Connectivity, ISFC), combining multi-subject datasets (Shared Response Mapping; SRM), and mapping between brain and behavior (Joint Modeling). Although these methods share some underpinnings, they have been developed as distinct methods, with distinct algorithms and software tools. We show how the matrix-variate normal (MN) formalism can unify some of these methods into a single framework. In doing so, we gain the ability to reuse noise modeling assumptions, algorithms, and code across models. Our primary theoretical contribution shows how some of these methods can be written as instantiations of the same model, allowing us to generalize them to flexibly modeling structured noise covariances. Our formalism permits novel model variants and improved estimation strategies: in contrast to SRM, the number of parameters for MN-SRM does not scale with the number of voxels or subjects; in contrast to BRSA, the number of parameters for MN-RSA scales additively rather than multiplicatively in the number of voxels. We empirically demonstrate advantages of two new methods derived in the formalism: for MN-RSA, we show up to 10x improvement in runtime, up to 6x improvement in RMSE, and more conservative behavior under the null. For MN-SRM, our method grants a modest improvement to out-of-sample reconstruction while relaxing an orthonormality constraint of SRM. We also provide a software prototyping tool for MN models that can flexibly reuse noise covariance assumptions and algorithms across models. |
Tasks | Latent Variable Models |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.03058v2 |
http://arxiv.org/pdf/1711.03058v2.pdf | |
PWC | https://paperswithcode.com/paper/matrix-normal-models-for-fmri-analysis |
Repo | |
Framework | |
Variational Bi-LSTMs
Title | Variational Bi-LSTMs |
Authors | Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio |
Abstract | Recurrent neural networks like long short-term memory (LSTM) are important architectures for sequential prediction tasks. LSTMs (and RNNs in general) model sequences along the forward time direction. Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data. In the training of Bi-LSTMs, the forward and backward paths are learned independently. We propose a variant of the Bi-LSTM architecture, which we call Variational Bi-LSTM, that creates a channel between the two paths (during training, but which may be omitted during inference); thus optimizing the two paths jointly. We arrive at this joint objective for our model by minimizing a variational lower bound of the joint likelihood of the data sequence. Our model acts as a regularizer and encourages the two networks to inform each other in making their respective predictions using distinct information. We perform ablation studies to better understand the different components of our model and evaluate the method on various benchmarks, showing state-of-the-art performance. |
Tasks | |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05717v1 |
http://arxiv.org/pdf/1711.05717v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-bi-lstms |
Repo | |
Framework | |